Correct and Consumable Answers to
Complex Questions
Beverly Jamison, PhD, - Sr. Director IT Architecture
American Psychological Association
April 11, 2013
Correct and Consumable Answers to
Complex Questions
Agenda
•
•
•
•
•
•
Slide 2
Overview of the Search Product
How simple features cause complex queries
Search Architecture I (it seemed like a good idea at the time)
Search Architecture 2: Making it Correct
Users 2.0: Making it Consumable
Looking Ahead: Making it Cool
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
APA PsycNET Content Types
PsycINFO Database: Similar to MEDLINE
 3.5M records
 60M Cited References
500,000 Full Text items, including
 Journal Articles
 Book Chapters
 Psychology-based critiques of books and films
 “Gray Literature”
13,000 Psychological Tests and Measures
400 Streaming video psychotherapy demonstrations
Thesaurus of Psychological Index Terms
PsycNET delivery platform, powered by MarkLogic
Slide 3
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
PsycNET Search Results
‘Autism ’
Slide 4
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Thesaurus Selection
Slide 5
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Why MarkLogic Search Makes us Smile
Takes a layer off the architecture for easier maintenance
Performance tunes like a dream
Access to full content, not just an index
Ability to provide aggregated information from range indexes
Smooth data delivery since ML is our content repository
Unification of technologies as we move other searches to
MarkLogic
Allows us more options for how results are consumed:
Human Readable: Facets, charts, tables, content snippets
Machine to Machine: API, RDF, XML, Feeds
Slide 6
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
But Complications Can Still Happen
We expect complication in queries like this:
Keywords: media OR "computer gam*" OR films OR movie* OR
internet OR magazine* OR books OR multimedia OR music OR
newspapers OR "social network*" OR photograph* OR radio OR "role
playing gam*" OR "massively multiplayer" OR televis* OR TV OR
websites AND NOT Document Type: Dissertation AND Year: 2000 TO
2013
But even simple queries can hold surprises:
The thesaurus “shopping cart” yields queries such as:
IndexTerms: (Depression OR Abandonment)
Or relatively innocuous looking fields:
Keywords = IndexTerms + Keywords + Title
Anyfield = Searchable portion of the content
Slide 7
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
So What Could Go Wrong?
Constraint Plus Simple (non-nested) Boolean:
IndexTerms: (Depression OR Abandonment)
IndexTerms is looked up as a constraint
All the nodes from the parse tree (in this case
Depression and Abandonment) are concatenated in document
order and the constraint is applied. So we end up with
IndexTerms (Depression AND Abandonment)
MarkLogic Field Plus Range Index:
The Field mechanism is very helpful when we have one
external field name for multiple elements
The gotcha is that this mechanism is associated with a
Word Index and we are increasingly attached to our Range
matching
Slide 8
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Our “light-MVC” search architecture
Slide 9
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
The Basic Search Flow
Slide 10
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Slide 11
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
The Solution Approach:
A Custom Joiner Handler
Ingredients
Application-defined constraints
Application-defined field-mappings
Query parse trees
Default Implementation:
MLQP parses, then calls impl:textonly to extract text
nodes
Custom Implementation:
Call through to cc:apply-constraint for each node of the
parse tree
Slide 12
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What We are Excited About
• We wanted to keep search:search at the core
• We learned the hard way with Lucene to not mess up our
upgrade path
• We wanted to take advantage of all of the new things
MarkLogic would do
• We are excited about the increased access to the parse tree
from published interfaces
• These are convenient for us and they could be cool for new
ways to interact with advanced users
Slide 13
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Any Questions?
Slide 14
Copyright © 2013 MarkLogic® Corporation. All rights reserved.
For More Information
Beverly Jamison
bjamison@apa.org
Slide 15
Copyright © 2013 MarkLogic® Corporation. All rights reserved.