SCORE-overview

advertisement
SCORE
Voquette Company Confidential
Presentation Overview
• Industry Requirements
• Capabilities
• System Architecture and Technologies
• Examples and Scenarios
• Measures (Quality, Performance, Scalability,
Robustness)
• Deployment Information
• Questions & Answers: What if
• Business Development Issues
• Milestones and Schedules
Voquette Company Confidential
Intelligence Content
Management Challenges
1.
The Problem: massive, disparate information
•
Multiple isolated sources of intelligence information (FBI, CIA, etc.) that is not
shared or integrated
•
2.
Large variety (format, media) of open source, partner, FAA and IC information
The Difficulty: inability to have timely actionable info
•
Amount of data too overwhelming to use constructively
•
Manual methods of aggregating data not scaleable
=> Lack of a “complete picture” to make decisions
•
Inability to make timely, accurate and actionable conclusions based on informationat-hand
3.
The Solution: Voquette’s Semantic Technology
•
Technology to analyze and integrate data from disparate sources to provide a near-
real time, reliable, scaleable and actionable solution for intelligence and security
applications
Voquette Company Confidential
New Technical Challenges in
Enterprise Content Management
1.
Aggregation
•
Feed handlers/Agents that understand content representation and media semantics
•
Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different
types from proprietary, partner and open source
2.
3.
Homogenization and Enhancement
•
Enterprise-wide common and customizable view (information organization)
•
Domain model, taxonomy/classification, metadata standards
•
Semantic Metadata– created automatically if possible
•
Semantic associations/inferences (link analysis)
Semantic Applications (in near real-time)
•
Search, personalization, alerts, knowledge browsing/inference for improved
relevance, intelligent personalization, customization
Voquette Company Confidential
Voquette’s Unique Capabilities
• Semantics (understanding of content and user
needs)
• Extreme relevance
• Knowledge inferencing (semantic associations)
• Near real-time
• Multiple applications/usage patterns (not just search)
• Automation
• Scalability in all aspects
Voquette Company Confidential
Voquette Semantic Technology
System Architecture
Fast main-memory based query
engine with APIs and XML output
Distributed
Toolkit to agents
design that
and automatically
maintain the Knowledgebase
extract/mine
Distributed
agents
that
automatically
extract
relevant
Knowledgebase
represents
the
real-world
instantiation
CACS provides automatic classification
knowledge(w.r.t.
fromWorldModel)
trusted sources
semantic
metadata
structured
and unstructured content
relationships)
of from
the
WorldModel
from unstructured(entities
text andand
extracts
contextually
relevant
metadata
WorldModel specifies enterprise’s
normalized view of information (ontology)
Voquette Company Confidential
Workflow Process
• WorldModel™ (Domain Model),
Taxonomy/Classification, Knowledge base schema
• Classifiers
• Knowledge and Content Extraction Agents
• Automated or human-supervised run-time
(for classification and metadata enhancement,
knowledge base maintenance)
• Semantic Applications
All components support incremental extensions.
Voquette Company Confidential
Technological Innovation
• Semantic approach (classification/taxonomy, domain model,
entities and relationships) [All components]
• Semantic associations/ knowledge inferences
• Classification committee (multiple technologies, rather than one
size fits all) [CACS]
• Scalability throughout with distributed architecture and
implementation (number of content and knowledge sources,
indexing, etc.)
• Main memory implementation, incremental check pointing [SSE]
Voquette Company Confidential
Example:
Domain: Intelligence
Sub-domain: People, Org, Places
(Other Sub-domains: Financing, Methods & Training, Materials)
Voquette Company Confidential
Voquette Semantic Technology
System Architecture
Voquette Company Confidential
Intelligence WorldModel™
What is it?
WorldModel™: Template infrastructure to organize and index content contextually
What does it consist of?
Domains (categories) and domain-specific attributes, with geo-spatial and temporal info
Setting up a Terrorist Intelligence WorldModel™
What are the information pieces of possible interest?
Terrorism
Intelligence
(that can be modeled as WorldModel™ attributes)
Group
Person
• Groups: Nationalist, Terrorist, Political groups
Event
• Person: Terrorist, Suicide Bomber, Hijacker, Personality
Bank
• Event: Flight hijacking, WTC Crash,Kidnapping, Terrorist training
Attack Material
Name Alias
• Bank: Swiss bank, Belgian bank (where groups have accts)
Alias Email Address
• Attack Material: Knives, Plastic Explosives, RDX, AK47 Gun
Location
• Name Alias: Aliases of terrorists (Osama BL = Usama BL)
• Alias Email Addresses: Email addresses for alias names
• Location: Location related with event of interest
• Time: Date/time related to event of interest
Voquette Company Confidential
Time
Terrorism Intelligence
WorldModel™ (simplified)
Voquette Semantic Technology
System Architecture
Voquette Company Confidential
Intelligence Extractor Agents
What is it?
Extractor Agents: Intelligent software robots that work on structured content and automatically
extract metadata information that is relevant and meaningful to the domain/sub-domain at hand
How do they work?
• Intelligence extractor agents use the Intelligence WorldModel™ definition for meaningful
metadata extraction from trusted Intelligence content
• Extractor agents exploit the structure of Intelligence content and automatically “pick up”
meaningful Intelligence metadata information (as defined in the WorldModel™)
Terrorism
Intelligence
Group
Pick up syntax metadata
Person
Event
Bank
Attack Material
Name Alias
Alias Email Address
Location
Pick up group name
Extractor
Agent
For
CIA
Confidential
Content
Pick up person
Pick up attack material
Pick up bank name
Pick up location/date/time
Pick up name aliases
Time
Terrorism Intelligence
WorldModel™
Metadata extracted
Voquette Company Confidential
Voquette Semantic Technology
System Architecture
Voquette Company Confidential
Intelligence Knowledge Base
What is it?
Knowledge Base: Network of Intelligence objects (significant pieces of information) and
a representation of the real-world relationships (associations) between them
Group
originated in Country
(‘’Al Queda” originated in “Afghanistan”)
Group
accounts in Bank
(‘’Al Queda” accounts in “Swiss bank”)
Group
Group
works with
(‘Irish IRA” works with “Columbian Group”)
works for Group
(‘Nabil Almarabh” works for “Al Queda”)
Person
Alias
has alias
(‘Bin Laden” has alias “Mohammed”)
Alias
has email Email add
(‘Mohammed” has email “mohd@un.com”)
Person
Group
leads
(‘Bin Laden” leads “Al Queda”)
Person
involved in Event
(‘Bin Laden” involved in “WTC Crash”)
Event
occurred at
Location
(‘WTC Crash” occurred at “New York, USA”)
Event
occurred at
Time
(‘WTC Crash” occurred at “0903, 9/11/01”)
Person
Terrorism WorldModel™
Intelligence
Group
Intelligence Knowledge
Base Definition
EmailAdd
Alias
Person
Event
Bank
Group
Person
Bank
Attack Material
Name Alias
Country
Event
Alias Email Address
Location
Time
Voquette Company Confidential
Location
Time
Voquette Semantic Technology
System Architecture
Voquette Company Confidential
Categorization and
Auto-Cataloging System (CACS)
What is it?
CACS: Module that categorizes content and automatically creates metadata of content
How does it work?
Uses a hybrid of statistical, machine learning and Intelligence knowledge-base techniques
Application in Intelligence
CACS could be trained to intelligently process Intelligence content to classify the content piece
as a terrorism-related event (WTC Crash, Flight hijacking, etc.)
Intelligence Knowledge
Base Definition
EmailAdd
Information
exchange
for metadata
creation
Structured Intelligence
content
OR
Bank
Alias
Group
Person
Country
Event
CACS
Location
Time
Event: Pentagon Attack
Unstructured Intelligence
content
Metadata extracted:
Affiliation Country: Afghanistan
Terrorist Group: Al Queda
Person: Bin Laden
Allied Group: Saudi Misaal
Location: Washington, USA Person Alias: Mohammed
Time: 0918 hrs
Voquette Company Confidential
Voquette Semantic Technology
System Architecture
Voquette Company Confidential
Intelligence Semantic Engine
What is it?
Semantic Engine: Fast main memory-based front end query engine that enables the end-user
to retrieve highly relevant and personalized content via custom APIs
Features and Functionality
• Minimal input from security agent – system intelligent enough to provide all possible relevant
content to security agent (type in “Bin Laden” and get all relevant information on him and
other items related to him)
• Applications: Search, personalization, alerts, notifications, directory
Search
Personalization
User query
submitted
Directory
Semantic
Engine
Content
Enhancement
Technology
Alerts/Notifications
Intelligent Inference
Highly relevant
Content returned
Analyst WorkBench
Custom Apps.
Voquette Company Confidential
Confidential
Agent
Scenario 1:
Intelligent Analysis of Confidential Email
Voquette Company Confidential
Scenario 1:
Intelligent Analysis of Email (Contd.)
• Information underlined in blue are important metadata elements automatically picked up by
the Intelligence extractor agents
• Information shown in red boxes are names of terrorists (stored in our Knowledge Base) that
are also automatically picked up by the Intelligence extractor agents
• CACS can determine by content analysis that this is a “Terrorist Meeting” information
• Intelligent inferencing is possible due to semantic associations of the Knowledge Base
“Mohamed Atta met with Abdulaziz Alomari”
Works for
Al Qaeda
Originated in
Afghanistan
Works for
Picked up off explicit
mention in email
Voquette Knowledge Associations
Saudi Misaal
Originated in
Saudi Arabia
Voquette Company Confidential
Inference: Al Qaeda and Saudi Misaal
have possibly started working together
as allied groups
Inference: Afghanistan and Saudi
Arabia have groups that probably
collaborate - look for other relationships
Scenario 2: Analyst Workbench
• Voquette’s Semantic Technology enables highly relevant and comprehensive
terrorist research
• Example: A security agent wishes to perform research on “Bin Laden” (as he is prime suspect)
• News/Information directly about Bin Laden is retrieved (that mentions his name explicitly)
• News/Information on Al Qaeda is retrieved (Bin Laden  Al Qaeda association in KB)
• News/Information on WTC Crash is retrieved (WTC Crash  Bin Laden association in KB)
• News/Information on Mohammed is retrieved (Mohammed  Bin Laden ‘alias assoc.’ in KB)
• News/Information (intelligence) on Afghanistan is retrieved (Al Qaeda  Afghanistan in KB)
• News/Information (intelligence) on Swiss bank is retrieved (Al Qaeda  Swiss bank in KB)
• Combined together, this co-related information is extremely valuable in bringing together
multiple actionable perspectives and point-of-views on one screen
• Result: Less time-spending, faster and much better decision making, more security!
Voquette Company Confidential
Knowledge Inferencing Workflow
Syntax Metadata
Same
entity
led by
Semantic Metadata
Voquette Company Confidential
Humanassisted
inference
Analyst Usage Scenarios/Interfaces
for Knowledge Inference
Analysts can possibly use:
• Search
• Knowledge Base Browser / Directory
• Personalization/Alerts
• APIs for custom applications
All options support Reference Pages, Semantic Associations,
Knowledge-based browsing
Voquette Company Confidential
Intelligence Analyst Browsing
Scenario
Voquette Company Confidential
Core Competencies of
Voquette’s Semantic Technology
Content Aggregation, Integration and Normalization
•
Create a Customized WorldModel™ (domain model with customized domain attributes)
•
Content Aggregation and integration from multiple sources, formats and media (text/audio/video)
•
Support push or pull delivery/ingestion of content
•
Patented extractor agent technology
•
Metadata extraction from structured, semi-structured and unstructured text (fully automated)
•
Automatically homogenize content feed tags (fully automated)
Categorization and Auto-Cataloging
•
Automatically categorize structured and unstructured text
•
Create contextually relevant semantic metadata from unstructured text (fully automated)
•
Uniquely uses a hybrid of statistical, machine learning and knowledge-base techniques for classification
Voquette Company Confidential
Core Competencies of
Voquette’s Semantic Technology
Content Enhancement using Knowledge Base
•
Create and maintain a Customized Knowledge Base for any domain
• Automatically create content tags based on text Itself (fully automated)
• Automatically enhance content tags based on information outside of text (fully automated) by exploiting
Knowledge Base
• Provide end user relevant content not only relevant content he asked for, but also relevant content that
he did not explicitly ask for, but that he needs to know
Semantic Engine
•
Fast , main-memory based Semantic Engine
• Response Time of the order of 10s of milliseconds
• Performance: 1 million queries per hour per server
• Real Time Indexing (stories indexed for search/personalization within a minute)
• Near real-time search/personalization of new content and breaking news
• Information retrieval based on quality and not quantity
• Semantic Applications: Search, Directory, Personalization, Alert, Notifications, Custom enterprise
applications
Voquette Company Confidential
SCORE Implementation
Architecture
Fast main-memory based query
engine with APIs and XML output
Distributed
Toolkit to agents
design that
and automatically
maintain the Knowledgebase
extract/mine
Distributed
agents
that
automatically
extract
relevant
Knowledgebase
represents
the
real-world
instantiation
CACS provides automatic classification
knowledge(w.r.t.
fromWorldModel)
trusted sources
semantic
metadata
structured
and unstructured content
relationships)
of from
the
WorldModel
from unstructured(entities
text andand
extracts
contextually
relevant
metadata
WorldModel specifies enterprise’s
normalized view of information (ontology)
Voquette Company Confidential
Example
Domain: Financial Services
Sub-domain: Equity Market
(other potential sub-domains: Fixed Income, Mutual Funds, …)
Voquette Company Confidential
Content Enhancement Workflow
Syntax Metadata
Semantic Metadata
Voquette Company Confidential
Content Asset Index Evolution
Asset
Syntax Metadata
Producer: BusinessWire
Source: Bloomberg
Date: Sept. 10 2001
Location: San Jose, CA
URL: http://bloomberg.com/1.htm
Media: Text
Asset
Syntax Metadata
Producer: BusinessWire
Source: Bloomberg
Date: Sept. 10 2001
Location: San Jose, CA
URL: http://bloomberg.com/1.htm
Media: Text
Semantic Metadata
Company: Cisco Systems, Inc.
Topic: Company News
Semantic Metadata
Company: Cisco Systems, Inc.
Creates asset (index)
out of extracted
metadata
Scans text
for analysis
Metadata
extracted
automatically
Extractor
Agent
for
Bloomberg
Scans text
for analysis
XML Feed
Semantic
Engine
Categorization &
Auto-Cataloging
System (CACS)
Classifies document into
pre-defined category/topic
Leverages
knowledge
to enhance
metatagging
Enhanced
Content Asset
Indexed
Appends
topic
metadata
to asset
Syntax Metadata
Asset
Producer: BusinessWire
Source: Bloomberg
Date: Sept. 10 2001
Location: San Jose, CA
URL: http://bloomberg.com/1.htm
Media: Text
Semantic Metadata
Company: Cisco Systems, Inc.
Topic: Company News
Ticker: CSCO
Exchange: NASDAQ
Industry: Telecomm.
Sector: Computer Hardware
Executive: John Chambers
Competition: Nortel Networks
Headquarters: San Jose, CA
Voquette Company Confidential
Knowledge Base
Headquarters
Sector
San Jose
Executives
Computer
Hardware
Industry
John Chambers
Cisco
Systems
Company
Telecomm.
Exchange
NASDAQ
Competition
Ticker
CSCO
Nortel Networks
Voquette WorldModel™
What is it?
WorldModel™: Template infrastructure to organize and index content contextually
What does it consist of?
Domains (categories) and domain-specific attributes
Examples
Sports WorldModel™
Equity WorldModel™
Sports
Equity
Sport Name
Company
Location
Ticker
Industry
Golf
Sector
Executive
Headquarters
Football
Golfer
Player
Tourney
Team
Golf Course
League
Definition
Domain: Equity
Equity-specific attributes:
Company
Ticker
Industry
Sector
Executive
Headquarters
Coach
Definition
Domain: Sports
Sub-Domain: Golf
Sub-Domain: Football
Sports-specific
attributes:
Sport Name
Location
Golf-specific
attributes:
Golfer
Tourney
Golf Course
Football-specific attributes:
Player
Team
League
Coach
Voquette Company Confidential
Voquette Extractor Agents
What is it?
Extractor Agents: Intelligent software robots that work on structured content and automatically
extract metadata information that is relevant and meaningful to the domain/sub-domain at hand
How do they work?
• Extractor agents use the WorldModel™ definition for metadata extraction
• Extractor agents exploit the structure of content and automatically “pick up”
meaningful metadata information
• Write once, Extract permanently – schedulable according to needs
• Can work on Web content, feeds, XML, corporate databases, etc.
• Extractor agents specific to structure of content-at-hand
Equity WorldModel™
Pick up syntax metadata
Equity
Pick up company
Company
Ticker
Industry
Sector
Executive
Headquarters
Extractor
Agent
For
CNNfN
Pick up ticker
Pick up industry
Pick up sector
Pick up executives
Pick up headquarters
Metadata extracted
Voquette Company Confidential
Voquette Knowledge Base
What is it?
Knowledge Base: Network of entity objects (significant pieces of information) and
a representation of the real-world relationships (associations) between them
What does it consist of?
Entities (person, location, organization, etc.) and Entity-Relationships
How does it work?
•
•
•
•
Structured closely to the structure of the WorldModel™
Entity and relationship template definitions for the domain at hand
Work with knowledge extractor agents to collect instances of entities from trusted sources
Automatically create relationships between instances using type definitions
Equity WorldModel™
Equity
Company
Headquarters
Sector
Executives
Industry
John Chambers
Executives
Computer
Hardware
Sector
Company
Industry
Industry
Sector
Executive
Headquarters
San Jose
Ticker
Exchange
Knowledge Base
Equity Knowledge Base
Definition
Cisco
Systems
Company
Telecomm.
Exchange
Exchange
Ticker
NASDAQ
Competition
Ticker
CSCO
Headquarters
Voquette Company Confidential
Nortel Networks
Voquette Categorization and
Auto-Cataloging System (CACS)
What is it?
CACS: Module that categorizes content and automatically creates metadata of content
How does it work?
Uses a hybrid of statistical, machine learning and knowledge-base techniques
Features
• Core competency – Not only categorizes, but also catalogs (extracts metadata)
• Unique solution for semantic metadata extraction from unstructured content
• Flexibly adaptable for diverse domains
Equity Knowledge Base
Definition
Information
exchange
for metadata
creation
Structured content
CACS
Headquarters
Executives
Company
Exchange
Sector
Industry
Ticker
Topic: Company News
Metadata extracted:
Company: Convera
Ticker: CNVR
Exchange: NASDAQ
Unstructured content
Voquette Company Confidential
Industry: Content Management
Sector: Computer Software
Headquarters: Vienna, VA
Executives: Ronald Whittier
Voquette
Semantic
Engine
Semantic
Engine
What is it?
Semantic Engine: Fast main memory-based front end query engine that enables the end-user
to retrieve highly relevant and personalized content via custom APIs
Features and Functionality
• Minimal input from user – system intelligent enough to provide only relevant content to user
• Deep levels of personalization
• Applications: Search, personalization, alerts, notifications, directory, routing, syndication
• Custom applications: Research Dashboard (demo)
Search
Personalization
User query
submitted
Directory
Semantic
Engine
Content
Enhancement
Technology
Alerts/Notifications
Syndication
Highly relevant
Content returned
Dashboard
Custom Apps.
Voquette Company Confidential
End
Users
Semantic Application Example
– Research Dashboard
Automatic
3rd party
content
integration
Focused
relevant
content
organized
by topic
(semantic
categorization)
Related relevant
content not
explicitly asked for
(semantic
associations)
Automatic Content
Aggregation
from multiple
content providers
and feeds
Competitive
research
inferred
automatically
Voquette Company Confidential
COMTEX Content Enhancement
- Value-added metatagging
COMTEX Tagging
Value-added Voquette Semantic Tagging
Content
‘Enhancement’
Rich Semantic
Metatagging
Limited tagging
(mostly syntactic)
Value-added
relevant metatags
added by Voquette
to existing
COMTEX tags:
• Private companies
• Type of company
• Industry affiliation
• Sector
• Exchange
• Company Execs
• Competitors
Voquette Company Confidential
COMTEX Content Enhancement
- Tag Normalization
Source A Document with
normalized tag
Source A Document
<company_name=Merrill Lynch, Inc.>
Voquette
Knowledge
Base
<company_name=
Merrill Lynch & Co.>
<company_name=
Merrill Lynch & Co.>
Company name:
Merrill Lynch & Co.
Source B Document
<company_name=Merrill Lynch Corp.>
Source B Document with
normalized tag
Voquette Company Confidential
Classification & Extraction
Technology Comparisons
Technology
Classification
Metadata
Features and Advantages
Disadvantages and Limitations
Manual
Yes
Yes
Intelligent, adaptable to
changing business needs, high
levels of accuracy, rapid
integration and deployment,
minimal upfront investment
Extremely slow, high cost of
maintenance and ownership; may
not be possible to scale with very
high volume; difficult to have
uniformity across humans
Information
Retrieval/Document
Indexing
No
No
Keyword-based search
Typically poor relevance if used
alone on a large data set
Clustering
May be
N/A
User/Enterprise does not need
to give taxonomy
Many clusters might be
meaningless; broad commercial
success not yet demonstrated
Lexical/Natural language
(NLP)
N/A
No
Often better than keyword based
search; natural language
querying/phrases;
Good for summarizing document
Does not help beyond search and
summarization ; generally cannot
associate one document with other
(no inferencing)
Rules-based
Yes
No
Works well with complex
taxonomies, high consistency
Intelligence bounded, high cost of
maintenance, high computation cost
and possible scalability limitations
Voquette Company Confidential
Classification & Extraction
Technology Comparisons (Contd.)
Technology
Classific
ation
Metadata
Features and Advantages
Disadvantages and Limitations
Machine
Learning/AI
(Bayesian, HMM,
Neural Network)
Yes
No
User/Enterprise can define taxonomy;
combined with indexing can lead to better
keyword based search by limited search to
a node in taxonomy ; broad variety of
technology choices and good experience in
applying the technology
User needs to provide training set; retraining
needed if taxonomy is changed;
Success dependent on training;
usually unstructured documents/data onlynot structured or semi-structured content
Thesaurus,
Reference data,
(Ontology)
N/A
Limited
Metadata limited to Terms in reference data
or ontology
How is reference data kept up to date?
Context is limited and applications are
limited to narrow areas; sometimes “one size
fits all” good for Web search but not
necessarily for Enterprise applications ;
power of relationship missing
Domain Model
and Information
Extractors
Yes
Yes
For structured data and semi-structured
data (Feeds, Web sites); Domain model
allows user/enterprise to define contextually
relevant metadata;
Allows more precise query formulation
(attribute-value);
Homogenization/integration;Semantic
search
Need substantial toolkit support for writing
extraction, mapping heterogeneous sources
to uniform domain model
Knowledge Base
(Entities/Classes
plus
Relationships)
Enhances
Enhances
Extremely powerful, especially when
combined with Domain Model;
Automatic Metadata Enhancement;
very highly relevant search; beyond search
(personalization, semantic associations)
Requires creation and maintenance of
knowledge base and access to trusted
sources for mining/synthesizing knowledge
Voquette Company Confidential
ROI Comparative Effort Chart
Activity
Categorization of
Web pages
Traditional Effort
50 pages/day/editor
CET Effort
Comments
1,000 pages/day (with
human supervision) [at
least an order of magnitude
higher without supervision]
Much higher quality
metadata generation,
in addition to higher
quantity
Metatagging of news
feeds
 10-20 feeds (syntactic +
5,000-10,000 feeds/day
(fully automatic)
No human
supervision needed
Metatagging of
internal/enterprise
research content
50-100
assets/day/research
editor
500-1,000 assets (with
human supervision)
Human supervision
supports higher
quality metadata
Metatagging of
content from
multiple internal or
external sources
Content editors using
internally developed
tools typically manage
1 to 5 sources
Single person can
supervise automatic
tagging of content from
20-50 sources
semantic metadata)
 100 feeds (syntactic
metadata)
Voquette Company Confidential
Deployment System Architecture
Toolkits (Workstation)
Enterprise S/W (Server)
Knowledge Base Toolkit
Categorization and
Auto Cataloging System
Extractor Toolkit
Semantic Engine
Linux/Solaris
NT
(any system supporting JVM)
WorldModel™ Knowledge Base
More Developers
More Sources
Higher Performance,
Redundancy,
More content
.
.
.
Voquette Company Confidential
Measures
• Quality
– Categorization accuracy: Around 90 % (domain and training dependent)
– Metadata extraction: limited only by WorldModel™ and KB
(for which we have automated maintenance support)
– Relevance: near 100% (unlike IR techniques, typical
precision/recall limitation do not apply when we have
metadata)
• Scalability
– Millions of documents per server (for Semantic Engine)
– Unlimited number of documents due to distributed index seamlessly
spanning multiple servers
– Few to hundreds of content sources (distributed SW agents)
Voquette Company Confidential
Measures (Continued)
• Performance
– Inclusion of new content source: 2 to 8 hrs
– Building WorldModel™ and Knowledge Base: 2 to 8 weeks per domain
for an effort leading to useful results (approx. 1 million entities and
relationships)
– Extraction – several documents per second (processing time)
– Near real-time search/personalization of new content and breaking
news (sub-minute, due to incremental indexing)
– 1 million queries per hour per server, or 1 to 10s of ms query
response/inference time due to main-memory indexing/data structures
• Robustness
– Semantic Engine has not needed rebooted for over 400 days!
– Many other engineering solutions (HW/SW redundancy) to meet any
SLA
Voquette Company Confidential
Quantitative Measures
Reading and Classification
Reading , Classification,
Metadata Extraction,
Normalization, Enhancement
Voquette vs. The Rest
Voquette vs. The Rest
Pages Read
and Classified
Voquette
Average
Human
Per Minute
600 - 10,000
(batch mode)
1
Per Hour
36,000 –
600,000
60
Per Day
864,000 –
14.5 Million
480
Per Year
315 Million –
5.2 Trillion
120,000
Pages Read ,
Voquette
Classified, Metadata
extracted, Normalized
& Enhanced
Average
Human
Per Minute
30
1
Per Hour
1,800
60
Per Day
43,200
480
Per Year
16 Million
120,000
Voquette Company Confidential
Quantitative Comparison
(Continued)
Voquette Specifications
Semantic Engine &
Knowledge Base
Specs
Voquette
Queries per hour per server
1 Million
Query Response Time
(Lightly loaded server)
1 to 10 ms
Query Response Time
(Heavily loaded server)
100 to 200 ms
Semantic associations
created per hour
10,000
Semantic Associations
per domain
Over 1 million
Voquette Company Confidential
Download