feed-abstract gem updated to support twitter RSS and Atom

I updated my feed-abstract gem to support twitter RSS/Atom, in that it will automatically parse hashtags and turn them into RSS item subjects/categories. Huzzah! This is pretty fun, as it allows tweets to be aggregated into TagTeam seamlessly and they can be remixed, archived, and searched by tag.

You can get at twitter RSS/Atom via URLs like:

https://search.twitter.com/search.atom?q=url encoded hashtag

so:

https://search.twitter.com/search.atom?q=%23rails

I’m sure there are more search parameters available too. If you want RSS, just change the “.atom” to “.rss”.

fulltext wildcard searching with ruby/rails and sunspot

I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.

So – if we’re searching for “collis” in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get “colli” or “coll” to return records with “collis” in the fulltext index, you just need to modify the solr config (in $RAILS_ROOT/solr/conf/schema.xml), changing:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

to:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

which essentially makes the full text tokenizer create left-bound n-grams for indexed terms. This taught me:

  1. Solr/lucene/sunspot rock, and
  2. I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.

Thanks to Arndt Lehmann’s tip on this page.