SPARK Search Engine Who am I? Martijn Harthoorn Programmer at Furore Implementer of the Search Engine of SPARK http://spark.furore.com/fhir/patient?... The work after the question mark. The place of Search Storage Spark REST Service MongoDB Index & Search Search Paradigm FHIR client should be easy. FHIR server needs to solve the complex issues. Search has some… Search First there was Storage Then there was Search Connectathon To test a client – you must have a tested server To test a server – you must have a tested client “One fool can ask more questions than seven wise men can answer” Connectathon “But what if you are wrong?” History Version 1. - A Generics based implementation On top of the FHIR data model. Programmed per search parameter programming. No meta data available yet. No indexing. Slow. History Version 2. - Data Model independent, Meta data not available - manually added Lucene.NET as indexer (Index in Lucene, Database in Mongo) Fast Standardised all parameter specifics into standard “modifiers”. All Code based on search parameter types. Joins are client side History Version 3. - Modified to store the Lucene index in Mongo - Index storage unreliable. - Never saw light of day History Version 4. CURRENT - Index storage to a dedicated Mongo collection Build expression tree from parameters Chained parameters have full functionality (modifiers, operators) Joins are client side Indexing Why indexing? Why indexing http://spark.furore.com/fhir/patient?provider.name:partial=Health Why indexing http://spark.furore.com/fhir/patient?provider.name:partial=Health Indexing. HOW-TO 1. 2. 3. 4. Harvest the Resource Determine data type Groom your data Store data in Index You DO want A de-serialized data to an object with all values strongly typed. You DON’T want to spend time analyzing and interpreting JSON and/or XML. Indexing. 1. Harvesting Resource: Patient Search parameter: family Searches for the family name and prefix of every HumanName that is registered with a Patient. Usage: http://spark.furore.com/fhir/patient?family=White Indexing. 1. Harvesting Using the Visitor pattern Resource: Patient Search parameter: family Path from Meta data: "patient.Name.Prefix" "patient.Name.Family" Patient Given List<Name> Prefix Name (HumanName) Family Name (HumanName) Name (HumanName) Suffix Indexing. 2. Determine data type > patient (Patient) > Name (HumanName) > LastName (string) Data type: string Search parameter type: string Selected indexing method: - Single value – as string - More values – as string array Indexing. 2. Determine data type > patient (Patient) > Gender (Coding) > Coding (List<Coding>) > Code (CodeableConcept) Data type: Code Search parameter type: Token Selected Indexing method: Store in an array each codeable concept - System (uri) - Code (string) - Display (string) Indexing. 3. Groom your data - Remove dashes, dots, slashes from dates etc. - If you implement a like search from the left side, you might want to split names at the dash in to multiple hits. Indexing. 4. Store in the index Field Value Resource "Patient" Local ID patient/1 Level 0 Family ["LaVaughn", "Robinson", "Obama"] Given "Michelle" Gender [ { System: “…”, Code: “..”, Display: “..” } , … … * Level The patient is not a contained resource (level 0) * Family In Mongo you can store an array that can be searched like a normal string. Future Version 5. NEXT - All parameters based on FHIR data types? - Joins using Mongo Map-Reduce? Complexity So what is the issue? Complexity Include & Chained parameters - Joining over references return multiple resource types - Client side (not in Mongo database) joins Complexity Transactions - FHIR has bulk POST - Split between Indexing and storage Complexity Multiple types Some properties do not have a fixed type. Example: observation.value Can be a: - CodeableConcept - String - Quantity (number + unit)