Solr Integration and Enhancements Solr has a lot of extensive features Todd Hatcher What is Solr? Solr offers advanced, optimized, scalable searching capabilities Communicate with Solr using XML, JSON and HTTP Includes a HTML admin interface Solr is built on top of Lucene Rich features of Lucene can be leveraged when using Solr Solr is very configurable Integration with ColdFusion Very little direct integration with ColdFusion ColdFusion communicates with Solr using HTTP Solr runs in its own JVM, does not share with ColdFusion Using ColdFusion installation, Solr runs in a jetty servlet container on port 8983 (http://localhost:8983/solr) Solr is exposed in production by default Important files located C:\ColdFusion9\solr\multicore Solr offers a lot more than what is available using cfindex cfcollection cfsearch Solr What is a core? – it’s like a verity collection (a searchable data group) Single Core (one index) vs Multicore (multiple isolated configurations/schemas/indexes using same Solr instance) C:\ColdFusion9\solr\multicore\solr.xml is the central file that points to locations of the Solr cores’ configuration and data (this what CF administrator reads/writes to when creating and using Solr collections) You can put your Solr cores under you project directory and keep them in source control [core]/conf/solrconfig.xml Main configuration for solr core <queryResponseWriter name=“json” /> determines the format of the results. ColdFusion uses xslt by default You can return JSON, XML, python, ruby, php Multiple query response writers can be configured, one can be set as default others can be specified by passing parameter wt:[name] (eg. wt:json) cfsearch type of methods will not work if the response writer is not what ColdFusion is expecting [core]/conf/schema.xml Field Types maps custom types to the solr/lucene type type solr.TextField allows for analyzers Analyzers can be run at index time or query time They allow for manipulations of the data (typically filtering) The order in which filters are declared is the order processed StopFilterFactory removes common words that do not help the search results WordDelimiterFilterFactory can adds words like WiFi, Wi, Fi by splitting the original into subwords [core]/conf/schema.xml cont. EnglishPorterFilterFactory determines root word using word variations like -ing determines root word and adds to index SynonymFilterFactory treats words as same DoubleMetaphoneFilterFactory for phonetic logic (better than Soundex which Verity uses) TextSpell/TextSpellPhrase feedback “did you mean” <copyField source=“fieldName” dest=“d”/> dest fieldtype can run different analyzers on source field and store result wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Adobe adds quite a bit to the file to create fieldtypes to be compatible with what was in verity [core]/conf/schema.xml cont. Similar to creating a database table. Maps field names to types using <field /> Gives you the ability to store additional data Field can be indexed (searchable) Field can be stored (referenced and returned with results) Field can be required <uniqueKey>[field name]</uniqueKey> <solrQueryParse defaultOperator=“OR” /> Indexing Data is sent using api - HTTP POST to Solr as XML/JSON/Binary Commit is an intensive task. Do bulk adds first then call commit <cfindex /> calls commit after each index (confirmed?) Commit after each would noticeably increase index time Efficient Process : add data (queue), commit, optimize Search Syntax field:term (*:* returns everything) A score is generated at query time, the value itself doesn’t have any meaning, the scores are relevant only when relative to each other (a scale) fq can filter query based on some supplied condition wt is the return type of the results (xml,json, etc.) qt is the request handler used to process the request (default is “standard”) fl is the list of fields to return (field must be stored) q is the query string You can specify the start value and maxrows DisMaxRequestHandler Declared in solrconfig.xml Allows simplified searching without strict syntax Can be configured with default weighted parameters (which can be overriden) Causes the q parameters to be parsed differently Resources Lucene In Action http://wiki.apache.org/solr/ http://cfadminsearcher.riaforge.org/ http://cfsolrlib.riaforge.org/ CF Solr Lib written by Shannon Hicks –Wrapper for Solr functionality