Planning Migration to Solr

After a week of tests, I realized that we must migrate to Solr, especially for these reasons:

  • Solr is the server-ization of Lucene (not a wrapper), that uses HTTP requests for indexing and querying documents: for a web application, performances are higher
  • Solr has a lot of interesting features, such as cache for faster query response, faceting, statistic, “more like this”, auto-complete, integration with JQuery and JavaScript…
  • Type control on the fields indexed: you can define with very precision the type of each field and the kind of analyzer you want to use to parse the field. This needs the design of a precise schema, that arises from the queries we want to do to the index
  • No need of merging during indexing and no duplicates ARN: indexing is simpler, because it scans all files you give in input and add them to the main index (before optimize, you can set to wait all searches have finished). If in the index there is a document with a given ARN, it is overwritten (problems with QL, that is sending us document with duplicates ARN, but maybe it is not our problem, because it is a conceptual error of QL). Indexing is faster.
  • No need to codifying the interaction with the index: current Java code is dirty and with poor performances also for these reasons, because we have to talk with the low-level index. Now it is all managed by Solr server, an high-level abstraction of Lucene.
  • Possibility to do more complex queries
  • Security control
  • Integration with Drupal (I haven't tested the module)
  • Possibility to have multiple cores and multiple indexes 
  • Other things…

Conclusions: we should move to Solr, but we need to entirely rewrite the application (only the GUI is ok, but we have to rewrite also the JSP code).