Daniel Brown
HON111
• Need for Watson to be fast to play Jeopardy successfully
– All computations have to be done in a few seconds
– Initial application speed: 1-2 hours processing time per question
• Unstructured Information Management
Architecture (UIMA): framework for NLP applications; facilitates parallel processing
– UIMA-AS: Asynchronous Scaleout
• UIMA chosen at start for these reasons; other optimization work only began after 2 years
(after QA accuracy/confidence improved)
• Type System
• Common Analysis Structure (CAS)
• Annotator
– CAS multiplier (CM): creates new “children” CASes
• Flow Controller
• CASes can be spread across multiple systems
(processed in parallel) for efficiency
• Two systems:
– Development (+question processing)
• Meant to analyze many questions accurately
– Production (+speed)
• Meant to answer one question quickly
• (UIMA-AS: Asynchronous Scaleout)
– Manages multithreading, communication between processes necessary for parallel processing
• Feasibility test: simulated production system with
110 processes, 110 8-core machines
– Goal: less than 3 seconds; actual: more than 3 seconds
– Two sources of latency: CAS serialization, network communication
– Optimizing CAS serialization resulted in runtime of <1s
• 400 processes, 72 machines
• How to find time bottlenecks in such a system?
– Monitoring tool
– Integrated timing measurements (in flow controller component)
• Wanted to avoid disk read/write time delays, so all (production system) data was put into RAM
• Some optimizations:
– Reference size reduction
– Java object size reduction
– Java object overhead
– String size
– Special hash tables
– Java garbage collection with large heap sizes
• *Full GC between games
• Indri search: used to find most relevant 1-2 sentences from Watson database
• Using single processor, primary search takes too long
(i.e. 100s)
– Supporting evidence search even longer
• Solution?
– Divide corpus (body of information to search) into chunks, then assign each search daemon a chunk
– (specifically, 50GB corpus of 6.8 million documents, 79 chunks of 100000 documents each, 79 Indri search daemons with 8 CPU cores each; end result, 32 passage queries could be run at once)
• Watson must first analyze the passage texts before being able to use them
– Deep NLP analysis - semantic/structural parsing, etc.
• Since Watson had to be self-contained, this analysis could be done before run time
(preprocessed)
– Used Hadoop (distributed file system software)
– 50 machines, 16GB/8 cores each
• Retrieving the preprocessed data?
– Preprocessed data much larger than unprocessed corpus (~300GB total)
– Built custom content server – allocated data to 14 machines, ~20GB each
– Documents then were accessed from these servers
• Parallel processing combined with a number of other performance optimizations resulted in a final average latency of less than 3 seconds.
– No one “silver bullet” solution