our white paper on Yactraq`s Core Technology for more

advertisement

Yactraq Online Inc.

Contact: bizdev@yactraq.com www.yactraq.com

Yactraq Core Technology Whitepaper

Introduction

Data is accelerating across the internet and many other channels, with audio, video, and text based channels growing faster than ever. From a business intelligence perspective applications of harnessing these data include Voice-of-the-Customer, Media Monitoring, Competitive Intelligence, and Brand

Sentiment.

Yactraq provides an omni-channel Comprehensive Intelligence approach that involves indexing all relevant data, be it text, audio, or video. As a result we can automate the consolidation of omni-channel reporting into a single dashboard, thereby revealing deeper signals and powerful insights.

Accurate B2B speech systems need to understand product and brand names, as well as industry and company specific business terms; this requires the creation of custom speech vocabularies unique to each customer. At Yactraq we focus on automating the creation of custom speech vocabularies with three distinct advantages over legacy solutions: speed, scale, and cost. We can create custom speech vocabularies in a matter of days that consist of hundreds of thousands of keywords, and all at a fraction of the cost of legacy solutions.

What enables Yactraq’s technical capabilities is CoreTraq, Yactraq’s proprietary and patent pending speech based semantic platform. CoreTraq has API flexibility and various other applications, including video search and ad targeting.

Additionally, Yactraq is an IP-For-Defense partner of Intellectual Ventures, which uniquely poises Yactraq in the Speech Tech industry and provides Yactraq with an unparalleled freedom to operate and bring disruptive solutions to the market.

Present Status

Yactraq’s key technical approach to deliver business value to customers is through LVCSR technology combined with custom vocabularies, built quickly and inexpensively through increasing degrees of automation; we call this our CoreTraq platform.

We have built custom vocabularies focused on business intelligence, video search, and ad targeting applications. The taxonomies we employ include standards like Open Directory, IAB, and Internet Search

Terms, as well as customer specific taxonomies. Taxonomic data is first used to generate search terms, then standard web crawling techniques and API’s are used to collect web pages of related linguistic data. The taxonomic and linguistic data is then provided as system data to Yactraq’s NLU (Natural

Language Understanding) module. Data from the NLU module is then exported out and further provided as input to a machine learning process. The output of Yactraq’s machine learning process is provided as input to a set of tools which generate a compact statistical language model with a lexicon limited to

65,000 words for use by Yactraq’s embeddable speech engine.

At run time, audio and video data are decoded and sent to a pre-processor; the pre-processor then eliminates music, noise, and silence from the data. Only speech segments are sent on further to the speech recognizer. The outcome is throughput gains of 100X because media based operational data often contains substantial sections of music or noise. Because of the throughput gains, Yactraq’s speech recognizer is able to deliver near real-time performance. The output text from the speech recognizer is further processed via Yactraq’s proprietary machine learning and natural language understanding techniques by Yactraq’s NLU module. This additional processing allows our platform to quantitatively determine the primary subject topics of the given minute of audio or video data.

The current version of CoreTraq (release 1.5) has been proven across approximately four million minutes of web video, broadcast radio, and TV data, and delivers high levels of metadata accuracy. This validates that Yactraq’s patent pending process as described above is capable of generating custom vocabularies that are orders of magnitude faster and cheaper than legacy approaches.

Yactraq’s Core Technology

Audio & video is not yet natively understood by computers, which work with character data.

Audio & Video Inputs

• Phone Calls

• Radio

• TV

• Web Videos

Podcasts

• More

CoreTraq Platform

Machine Learning

(Offline)

Provides ongoing improvements to LVCSR

Speech

Recognizer

Text Transcripts

NLU*

Topic

Engine

*Natural Language

Understanding

CoreTraq is Yactraq’s unique speech based semantic platform that accurately converts audio

& video into character data.

Output Metadata

• Keywords

• Topics

• Time-tagging

• Sentiment

Word confidence

Self-learning capability allows for perpetual and inexpensive enhancements to the platform

As a specific example case, Yactraq video search customers need automated semantic signatures for web videos of two to eight minute average duration. A set of 20,000 leaf level topics structured in a 4 layer deep open directory taxonomy is used to collect around 600,000 web pages of related linguistic data with a lexicon of 300,000 words. This linguistic data is then further compressed into a compact statistical language model with a lexicon limited to 65,000 words for use either in cloud based or embeddable speech applications, via Yactraq’s versatile speech recognizer.

A key Yactraq video search client who started using CoreTraq before their Series A attracted an A round from a top 3 US publishing house, among others. In December 2014, the same client was acquired by a top 5 US digital publisher, proving the commercial value of Yactraq’s CoreTraq platform and its custom vocabulary capability.

Patent Pending Fusion of NLU and Speech Recognition

Our pending patent is a method for automating the generation of language models (LM’s) in speech recognition systems. As indicated above, this process includes web crawling techniques to populate linguistic data within a NLU engine. The NLU module then triggers a set of machine learning steps resulting in the automated generation of a speech language model. The key business benefit of this method is that it provides multiple order of magnitude improvements in the speed and cost of generating custom vocabulary LM’s.

Roadmap - Highlights

Versatile Speech Recognition:

The information compression capability provided by Yactraq’s proprietary machine learning tools is a key asset in building embedded systems as it allows large topic sets and lexicon’s to be compressed, with minimal loss of information, into smaller footprint embeddable speech recognizers.

Yactraq has already processed 4 million minutes of customer data using Yactraq’s speech recognizer , which has proven its versatility across scalable cloud environments as well as mobile and embedded environments. Additionally, Yactraq speech scientists have world class embedded speech capabilities, including mobile and embedded speech expertise specific to both cloud based and embedded speech systems. Their past embedded systems work has found application in the Eurofighter cockpit, Samsung mobile phones, and even watches.

Deep Neural Networks:

Yactraq is studying the application of Deep Neural Network (DNN) technology. Acoustic modeling is an area where neural network based systems have shown great promise in the last few years and are a key aspect of Yactraq’s approach to dealing with:

Multiple languages

Multiple accents

Acoustically challenging data

Class Based Language Models:

Crawling web pages was sufficient to build a video search capability because essentially the problem that video search systems try to solve is - determining the degree of similarity between data object A and data object B. But business intelligence applications have deeper requirements driven by the need to answer more specific questions. For example a call analytics Voice-Of-The-Customer application may want to know why the same issue sometimes can be resolved on a single customer service phone call, but can also take more than five calls on a separate occasion. In such case the system needs to

determine negative sentiment associated with very specific business objects. Alternately a local advertiser monitoring system may want to know who exactly is advertising in a specific location and how frequently. Also in such cases, only limited amounts of linguistic data may be available; consider the case of the ad monitoring application described above. The language patterns required may not be available through crawling of web pages, and in such cases class based language models represent a possible solution.

Complex Language Models:

In some cases class based language models may contain very specific language patterns and consequently are incapable of serving the core language modeling needs of an LVCSR system. In such cases Yactraq uses a combination of class based and web crawling techniques to develop custom vocabulary language models.

Dynamic Configuration API:

Yactraq is building the components required to build an API that would allow a customer to send

Yactraq a continuous configuration feed of target topics and entities. The expected outcome is an API that allows CoreTraq’ vocabulary to be dynamically reconfigured on a fully automated basis.

Taxonomies and Linguistic Data:

Yactraq is also building other standard and vertically specific vocabularies. Examples of standard data sets include Wikipedia, Freebase, Yellow Pages, and IAB. Verticals of interest include Performance

Marketing, Healthcare, Finance, and Government.

Yactraq Online Inc.

Contact: bizdev@yactraq.com www.yactraq.com

Download