SPARK - Implementing search

advertisement
SPARK
Search Engine
Who am I?
Martijn Harthoorn
Programmer at Furore
Implementer of the Search Engine of SPARK
http://spark.furore.com/fhir/patient?...
The work after the question mark.
The place of Search
Storage
Spark
REST
Service
MongoDB
Index
&
Search
Search
Paradigm
FHIR client should be easy.
FHIR server needs to solve the complex issues.
Search has some…
Search
First there was Storage
Then there was Search
Connectathon
To test a client – you must have a tested server
To test a server – you must have a tested client
“One fool can ask more questions than seven wise men can answer”
Connectathon
“But what if you are wrong?”
History
Version 1.
-
A Generics based implementation
On top of the FHIR data model.
Programmed per search parameter programming.
No meta data available yet.
No indexing.
Slow.
History
Version 2.
-
Data Model independent,
Meta data not available - manually added
Lucene.NET as indexer (Index in Lucene, Database in Mongo)
Fast
Standardised all parameter specifics into standard “modifiers”.
All Code based on search parameter types.
Joins are client side
History
Version 3.
- Modified to store the Lucene index in Mongo
- Index storage unreliable.
- Never saw light of day
History
Version 4. CURRENT
-
Index storage to a dedicated Mongo collection
Build expression tree from parameters
Chained parameters have full functionality (modifiers, operators)
Joins are client side
Indexing
Why indexing?
Why indexing
http://spark.furore.com/fhir/patient?provider.name:partial=Health
Why indexing
http://spark.furore.com/fhir/patient?provider.name:partial=Health
Indexing. HOW-TO
1.
2.
3.
4.
Harvest the Resource
Determine data type
Groom your data
Store data in Index
You DO want
A de-serialized data to an object
with all values strongly typed.
You DON’T want
to spend time analyzing and interpreting
JSON and/or XML.
Indexing. 1. Harvesting
Resource: Patient
Search parameter: family
Searches for the family name and prefix of every
HumanName that is registered with a Patient.
Usage:
http://spark.furore.com/fhir/patient?family=White
Indexing. 1. Harvesting
Using the Visitor pattern
Resource: Patient
Search parameter: family
Path from Meta data:
"patient.Name.Prefix"
"patient.Name.Family"
Patient
Given
List<Name>
Prefix
Name (HumanName)
Family
Name (HumanName)
Name (HumanName)
Suffix
Indexing. 2. Determine data type
> patient (Patient)
> Name (HumanName)
> LastName (string)
Data type: string
Search parameter type: string
Selected indexing method:
- Single value – as string
- More values – as string array
Indexing. 2. Determine data type
> patient (Patient)
> Gender (Coding)
> Coding (List<Coding>)
> Code (CodeableConcept)
Data type: Code
Search parameter type: Token
Selected Indexing method:
Store in an array each codeable concept
- System (uri)
- Code (string)
- Display (string)
Indexing. 3. Groom your data
- Remove dashes, dots, slashes from dates etc.
- If you implement a like search from the left side, you might want
to split names at the dash in to multiple hits.
Indexing. 4. Store in the index
Field
Value
Resource
"Patient"
Local ID
patient/1
Level
0
Family
["LaVaughn", "Robinson", "Obama"]
Given
"Michelle"
Gender
[ { System: “…”, Code: “..”, Display: “..” } , …
…
* Level
The patient is not a contained resource (level 0)
* Family
In Mongo you can store an array that can be searched like a normal string.
Future
Version 5. NEXT
- All parameters based on FHIR data types?
- Joins using Mongo Map-Reduce?
Complexity
So what is the issue?
Complexity
Include & Chained parameters
- Joining over references return multiple resource types
- Client side (not in Mongo database) joins
Complexity
Transactions
- FHIR has bulk POST
- Split between Indexing and storage
Complexity
Multiple types
Some properties do not have a fixed type.
Example: observation.value
Can be a:
- CodeableConcept
- String
- Quantity (number + unit)
Download