lucene | Dan Collis-Puro

I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.

So – if we’re searching for “collis” in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get “colli” or “coll” to return records with “collis” in the fulltext index, you just need to modify the solr config (in $RAILS_ROOT/solr/conf/schema.xml), changing:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

to:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

which essentially makes the full text tokenizer create left-bound n-grams for indexed terms. This taught me:

Solr/lucene/sunspot rock, and
I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.

Thanks to Arndt Lehmann’s tip on this page.

Dan Collis-Puro

Tech. Open Source. Stuff that doesn't suck.

Tag Archives: lucene

feed-abstract gem updated to support twitter RSS and Atom

elasticsearch

fulltext wildcard searching with ruby/rails and sunspot