{"id":293,"date":"2011-10-24T13:44:37","date_gmt":"2011-10-24T17:44:37","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/djcp\/?p=293"},"modified":"2011-10-27T11:11:13","modified_gmt":"2011-10-27T15:11:13","slug":"fulltext-wildcard-searching-with-rubyrails-and-sunspot","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/djcp\/2011\/10\/fulltext-wildcard-searching-with-rubyrails-and-sunspot\/","title":{"rendered":"fulltext wildcard searching with ruby\/rails and sunspot"},"content":{"rendered":"<p>I love <a href=\"http:\/\/outoftime.github.com\/sunspot\/\" title=\"Sunspot\">Sunspot<\/a> for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.<\/p>\n<p>So &#8211; if we&#8217;re searching for &#8220;collis&#8221; in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get &#8220;colli&#8221; or &#8220;coll&#8221; to return records with &#8220;collis&#8221; in the fulltext index, you just need to modify the solr config (in <strong>$RAILS_ROOT\/solr\/conf\/schema.xml<\/strong>), changing:<\/p>\n<pre>\r\n&lt;fieldType name=\"text\" class=\"solr.TextField\" omitNorms=\"false\"&gt;\r\n  &lt;analyzer&gt;\r\n    &lt;tokenizer class=\"solr.StandardTokenizerFactory\"\/&gt;\r\n    &lt;filter class=\"solr.StandardFilterFactory\"\/&gt;\r\n    &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\r\n  &lt;\/analyzer&gt;\r\n&lt;\/fieldType&gt;\r\n<\/pre>\n<p>to:<\/p>\n<pre>\r\n&lt;fieldType name=\"text\" class=\"solr.TextField\" omitNorms=\"false\"&gt;\r\n  &lt;analyzer type=\"index\"&gt;\r\n    &lt;tokenizer class=\"solr.WhitespaceTokenizerFactory\"\/&gt;\r\n    &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\r\n    &lt;filter class=\"solr.EdgeNGramFilterFactory\" minGramSize=\"1\" maxGramSize=\"50\" side=\"front\"\/&gt;\r\n  &lt;\/analyzer&gt;\r\n  &lt;analyzer type=\"query\"&gt;\r\n    &lt;tokenizer class=\"solr.WhitespaceTokenizerFactory\"\/&gt;\r\n    &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\r\n  &lt;\/analyzer&gt;\r\n&lt;\/fieldType&gt;\r\n<\/pre>\n<p>which essentially makes the full text tokenizer create left-bound <a href=\"http:\/\/en.wikipedia.org\/wiki\/N-gram\">n-grams<\/a> for indexed terms. This taught me:<\/p>\n<ol>\n<li>Solr\/lucene\/sunspot rock, and<\/li>\n<li>I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.<\/li>\n<\/ol>\n<p>Thanks to Arndt Lehmann&#8217;s tip on <a href=\"http:\/\/railscasts.com\/episodes\/278-search-with-sunspot?view=comments\">this<\/a> page.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields. So &#8211; if we&#8217;re searching for &#8220;collis&#8221; in a set of fulltext &hellip; <a href=\"https:\/\/archive.blogs.harvard.edu\/djcp\/2011\/10\/fulltext-wildcard-searching-with-rubyrails-and-sunspot\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1984,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[17313,4166,615,17314,17317],"class_list":["post-293","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-lucene","tag-rails","tag-ruby","tag-solr","tag-testtag"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/posts\/293","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/users\/1984"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/comments?post=293"}],"version-history":[{"count":12,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/posts\/293\/revisions"}],"predecessor-version":[{"id":306,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/posts\/293\/revisions\/306"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/media?parent=293"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/categories?post=293"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/djcp\/wp-json\/wp\/v2\/tags?post=293"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}