Solr real-time indexing on Websolr

Update: Unfortunately, we have since had to roll back our support for real-time commits in Solr 3, after wider public beta tests revealed some edge cases which break replication. We are currently waiting for the Solr 4.0 release with its official real-time implementation.


Real-time indexing: when would I want to use it, when would I not?

Real-time indexing is the ability for changes to your index to become searchable very quickly, typically within milliseconds rather than minutes.

Traditionally, in earlier versions of Solr, it has been a fairly expensive operation to refresh the index and make new updates available. By default, updates on Websolr experience a latency of up to 60 seconds before they are visible to your searches.

However, recent versions of Lucene have exposed real-time functionality with a computationally cheap "soft commit." This operation makes updates to your index available within milliseconds rather than seconds.

We have backported this real-time indexing functionality and sof-launched it to our Solr 3 servers, available today. We have gotten a lot of positive feedback from customers who are using real-time indexing. In time, as we gather more production usage data, we will eventually announce more widely and make it the default for all of our customers.

When should you use it? When your application's requirements call for updates to be visible instantly rather than within one minute. You may also see benefits if your application sends a steady stream of both updates and searches to Solr, where you don't necessarily want to purge your caches every minute.

When should you not use it? When you can tolerate changes being made available in greater than one minute. Particularly when you batch your updates rather than stream them in one at a time.

Finally, you should use real-time updates when you are confident that you can rebuild your entire index quickly. Because we consider this a beta feature, we cannot yet rule out the possibility of index corruption or faulty replication.