Market Analysis - The SpeechDat Projects

advertisement
SPEECON Deliverable D11
Project ref. no.
IST-1999-10003
Project acronym
SPEECON
Project full title
Speech Driven Interfaces for Consumer Applications
Security (distribution level)
Project internal
Contractual date of delivery
M04 = May 2000
Actual date of delivery
2000-September-19
Deliverable number
D11
Deliverable name
Market Analysis
Type
Technical Report
Status & version
Final v2.2
Number of pages
20
WP contributing to the
deliverable
WP1
WP / Task / Deliverable
responsible
WP1: Nokia / T1.1: Siemens / D11: Siemens
Other contributors
Nokia, IBM, Sony, Temic, Infineon, L&H
Author(s)
Höge, Harald (Siemens)
Burchard, Bernd (Infineon)
Comeyne, Robrecht (L&H)
Diehl, Frank (Temic)
Fischer, Volker (IBM)
Häkkinen, Juha (Nokia)
Marasek, Krzysztof (Sony)
EC Project Officer
Perrotta, Domenico
Project Coordinator
Name:
Company:
Address:
Phone:
Fax:
E-mail:
Project web site:
Keywords
Consumer devices, recognition, market analysis
Abstract
(for dissemination)
Following market segments for voice driven consumer devices are
regarded: mobile phones ,information kiosks, audio/video devices, toys,
Personal digital assistants
Harald Höge
Siemens AG, ZT IK 5
Otto-Hahn-Ring 6, D-81739 München, Germany
+49-89-636-44195
+49-89-636-49802
harald.h.hoege@mchp.siemens.de
http://www.speecon.com
page 1 of 20
Document evolution:
Version
Date
Security
Notes
v1.0
23.Dec.99
Project internal
v1.1
25.Apr.00
Project internal
V1.2
12.May.00
Project internal
V1.3
11.July.00
Project internal
V1.4
20.July.00
Project internal
Presented on 3. SPEECON
Workshop
V2.1
14.Sept.00
Project internal
Prefinal version
V2.2
19.Sept.00
Project internal
Final version
Table of Contents
1. Executive Summary
2. Aim of the Market Analysis
3. Market segments
3.1
Definition of the Market Segments
3.2
Description of Products and Main Producer
4. Market Volume on Market Segments
5. Overview of Market Volume and Conclusions
page 2 of 20
Presented on 2. SPEECON
Workshop
1
Executive Summary
Within the SPEECON project speech databases are produced which allow to train recognizers
for voice driven consumer devices. The databases have to represent application specific
characteristics (acoustic environment of the speakers, application specific words used). Due to
the broad variety of applications many application specific characteristics have to be regarded
ideally. Due to financial constraints not all application specific characteristics can be
represented in the SPEECON databases with the same emphasis.
The aim of the market analysis is to have a decision basis which allows to favor such
application specific characteristics which are more relevant with respect to market potential.
The market analysis starts with the definition of market segments which are defined by
consumer devices with similar functionality. Six main segments have been identified:
 Mobile Phones
 Information Kiosks
 Audio/Video Devices
 Automotive Devices
 Toys
 PDAs
Due to the structure of the SPEECON consortium the market segment ‘home appliances’ is
not regarded.
Within each segment the products and the main producers are described and the market
volume estimated. The market volume is determined in terms of units of sold recognizers for
the years 2000, 2002, 2003 for Europe and ROW (Rest of World). The following table shows
the market volumes for the different segments in the years 2002, 2003. These years were
chosen because at that time the SPEECON databases will be used for the first time to train
recognizers for voice driven consumer devices.
Market Segment
Mobile Phones
Information Kiosks
Audio/Video Devices
Automotive Devices
Toys
Personal Digital Assistance
Total
Area
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Year 2002
4.300.000
8.400.000
11.000
11.000
240.000
600.000
960.000
2.200.000
1.000.000
4.900.000
111.000
257.000
22.990.000
page 3 of 20
Year 2003
9.900.000
20.000.000
22.000
22.000
710.000
1.100.000
1.900.000
4.500.000
1.400.000
7.000.000
236.000
536.000
47.326.000
2
Aim of the Market Analysis
Within the SPEECON project speech databases will be produced which allow to train
recognizers for voice driven consumer devices. Due to current speech recognition technology
these databases should be representative for the acoustic environments in which they are used.
Best recognition performance is achieved when the acoustic environment of the recorded
speech used to train the recognizers match to the acoustic environment in which the
recognizers are used.
The acoustic environment is mainly determined by the background noise which is present
during the use of the recognizer. Typical sources of noise are noise in public places, cars,
living rooms and offices. Within the project Aurora 1– an industrially guided project to
standardize the front end of recognizers – the influence of different environmental noises on
the performance of recognizers have been investigated. This ongoing project shows clearly
the strong relationship between recognition rates and the characteristics of the different
environmental noises.
Due to these findings the aim of the market analysis is to identify such acoustical
environments which have high market potential with respect to voice driven consumer
devices.
Concerning the market potential an important question is the time slot for which the market
analysis should be done. Assuming that the SPEECON databases will be ready end of year
2001 and taking into account the fast development of consumer devices and speech
technology the time slot will be targeted to the period 2002-2003.
As the market of voice driven consumer devices is just emerging only very tentative estimates
of market volumes can be made.
To achieve the aim of the market analysis following 4 steps are performed:

definition of the market segments for Future Voice Driven Consumer Devices

description of the products and producers within these segments

determining the segment specific market volume of Future Voice Driven Consumer
Devices for the period 2002-2003

Summarize the market for voice driven consumer devices for the period 2002-2003
1
http://www.etsi.org
page 4 of 20
3
3.1
Market Segments
Definition of Market Segments
The market segments for the consumer devices are defined by a certain common
functionality. Within the SPEECON consortium six market segments have been identified,
which are relevant for the partners. Due to the broad coverage of consumer products within
the consortium, the main market segment for consumer products are treated. Only the market
segment ‘home appliance’ is missing. The six market segments investigated are:
1
Mobile Phones
2
Information Kiosks
3
Audio/Video Devices
4
Automotive Devices
5
Toys
6
PDAs
3.2
Description of Products and Main Producers
In the following existing or planed products are described and the main potential producers
are enumerated.
3.2.1 Mobile Phones
Speech recognition is typically used for the following purposes in mobile phones currently in
the market:
 Name dialing
 Digit dialing
 Command and control
 Voice activation
The principle of name dialing is that the user can call a person whose name is stored in the
phonebook just by speaking the name. Naturally, a voice tag for the name must have been
trained in advance by the user, or it must have been obtained some other way. This name
dialing is the basic feature that every mobile phone with speech recognition has.
Digit dialing refers to the recognition of connected digit sequences, i.e., phone numbers, for
making the call. In addition to mobile handsets, there are currently also some external car
hands-free kits sold with this feature. They are included in the automotive segment in the
SPEECON project.
Limited command and control of the phone operations is supported by many manufacturers.
Answering the call by voice, and creation of voice macros for accessing some frequently used
features of the phone, are typical examples.
The phone may also be waiting for a "magic word" (a voice activation keyword) in standby
mode. The keyword is used to change the phone into name dialing, digit dialing, or some
other mode. Voice dialing and voice control of the phone operations or dialing functions has
been seen as an attractive method of phone operation in the car environment for a long time.
Speaker-dependent (SD) name dialing, where the user is prompted for a sample utterance (or
page 5 of 20
samples) of the name in a training phase, emerged already in mid 1990's into mobile phones.
In recent years, most of the phone manufacturers have introduced models with voice dialing
features.
Many mobile phone models already support command and control of the phone features by
voice. In current products, SD technology is mainly used since it is fairly robust and cheap to
implement, and it is also language-independent. Moreover, extensive speech resources are not
needed in the implementation. The main drawback of SD technology is that users have to
train the system with their own voice, which is a burdensome and also error-prone step.
Speaker independent (SI) technology, however, is ideally suited for voice control
applications. A few products with SI name dialing do exist already today, especially in Japan,
and more can be expected in the high-end segment in the near future. Moreover, the main
reason for the lack of SI technology is in fact the absence of appropriate speech databases in
many languages. Consumers expect that the products support their own language, which has
been difficult to realize so far. Thus, SPEECON is paving the way to a broader dissemination
of SI technology.
The following table presents a list of manufacturers that have either announced or are
presently delivering mobile phones with speech recognition features.2 In addition, Audiovox
and Kyocera supposedly have models, but no details were available at the time of writing this.
Manufacturer
Acer
Ericsson
Fujitsu
Motorola
NEC
Neopoint
Nokia
Panasonic
Philips
Samsung
Siemens
Telit
Networks3
GSM, CDMA
GSM
PDC
GSM
PDC
CDMA
GSM, PDC
PDC
GSM
TDMA, CDMA
GSM
GSM
It is evident from the table that most key manufacturers have at least some models with voice
dialing, although there are some notable absentees from the scene.
3.2.2 Information Kiosks
In contrast to touch-screen based solutions, kiosk systems characterized as multimedia kiosks
are considered as potential candidates for the incorporation of speech recognition and
understanding solutions. Rapidly changing from simple proxy information machines to
intelligent self-service stations, kiosk applications include automated teller machines and
2
The list is based on the information available to the consortium in June 2000. The main objective of the list is
to illustrate the broad commitment of the industry to voice dialing. Although care has been taken to make the
list as complete as possible, omissions and errors are possible.
3
The networks column roughly illustrates for which standards mobile phones with speech recognition are
provided. No distinction was made between the various versions of the standards that utilize different
frequency bands.
page 6 of 20
banking, E-mail, weather forecast, and travel information are among the most important
scenarios, but the most frequently listed feature of multimedia kiosks is (unrestricted) Internet
access.
The large number of producers offering turnkey system solutions indicates that number of
pieces with common features or functionalities are far below those in other market segments,
like e.g. in the PDA or mobile phone market. Although apparently not used by any of today’s
systems, speech technologies could offer a simple and natural user interface, which is
expected to be most critical for a broader acceptance of kiosk systems.
Typically installed in public places or offices, like e.g. in train stations or main halls, kiosks
are not assigned to individual end users, like e.g. mobile phones or PDAs. Therefore, each
single device will be used by a large population of end users, which makes speaker dependent
(SD) speech recognition irrelevant.
Frost & Sullivan’s Worldwide Interactive Kiosk Market Study, January 2000, makes only one
reference to speech enabled kiosk system that is related to Philips who plan to go forward
with some capability. Other multimedia kiosk producers are:

Intelligent Kiosk Company - Provides interactive kiosks for a wide range of applications.
 Ecotek – Provides Internet enabled multimedia kiosks and web sites for commercial and
government applications.
 Official Kiosk Group, Inc. - Offers a stand alone internet machine featuring e-mail, stock
quotes, weather forecast, and banking for public internet access.
 ARYA Systems Limited - Manufactures and distributes point of information and point of
sale kiosks.
 Winstanley Associates – Offers Multimedia kiosk design, integration and production for
clients requiring custom or turnkey solutions.
 ADCO Corporation -Products manufactured include telephone enclosures and
accessories, Interactive Multi Media kiosks, drive-up and walk-up ATM enclosures, airport
casework and millwork, and architectural structures.

First Impression A/S - A Danish info-kiosk in stainless steel, for public internet access and
for exhibitions.

IBM Kiosk Technology Services - Provides interactive multimedia touch-screen kiosks and
interactive display systems, kiosk consulting services, standard and custom cabinet
enclosure, and remote monitoring services for deployed kiosks.

NCR - Internet-enabled, transactional, self-service terminal providing the high
availability, openness and scalability demanded for today's kiosk environments.

Quarta TouchSystems information kiosks complete with software and public browser for
Internet kiosks.

Siemens Information and Communication Networks - Produces kiosks for customers
including software creation, hardware sourcing, production, installation, monitoring and
service.

WebRaiser - International Kiosk system integrator, hardware and software solutions.
Public internet access for hotels, cafes, hospitals, airports etc.
page 7 of 20
An independent alliance of kiosk companies providing news, information and current events
about public internet access terminals can be found on http://www.kiosks.org
3.2.3 Video/Audio
In current audio and video home products voice interface is not used, however, on the latest
trade shows and exhibitions some prototypes were presented (e.g. at IFA’99 Thomson CSF
presented isolated speaker-independent SR for control of TV, VCR, lately Philips announced
speech controlled TV-set). It seems that embedded speech technology (owned or offered by
OEM-partners) will be deployed to home appliances in next years.
Main potential applications:
The future speech interface to the audio and video devices will probably include not only
command and control, but also voice-driven content selection ( DVB4-enabled and delivery
on demand (video-on-demand, etc.)).
Depending of the future role of the TV-set at home also other applications are possible,
including almost all foreseen for the information kiosk and home automation per se.
Potential producers:
All main consumer electronics producers, chip makers (ex. already offered by Sensimetrics,
Infineon, Philips) or third-party companies (ex. Cocoonz, VoiceTrek).
3.2.4 Automotive
Within the automotive market three main products can be identified, telephone car kits, voice
controlled navigation systems and central voice control devices. In the following a short
description of the functionality of these products is given, followed by an overview of
products.
Telephone car kits serve exclusively for hand free calling during driving a car. Speech
recognition is usually limited on pure dialing features as direct or name dialing. Even
sometimes these features are fully integrated in the mobile phone, mobile phone manufactures
as well as independent producers offer such equipment. By using a telephone car kit, hands
free calling gets possible also for phones not equipped with automatic speech recognition. An
overview of manufacturers and their products is given below.
Manufacturer
Ericsson
Siemens
Motorola
THB Bury
Votronic
Funkwerk Dabendorf GmbH
Telephone Car Kit
Advanced Car Handsfree HCA-10
Car Kit Professional Voice
VR Handsfree Car Kit
THB VoiceDial
Article-No.: 10400
Audio Voice
Navigation systems guide a driver to a destination. Usually a couple of supplemental services
are offered too. This may be a detour and automatic reroute capability as well as calculating
the shortest way in time or distance. Also services like guidance to so called points of interest
are offered. A point of interest may be a hotel or a gas station.
Speech recognition simplifies the use of such equipment. Instead of long winded input
procedures for e.g. an address, just an instruction is given. Although the benefits of voice
4
digital video broadcasting
page 8 of 20
control are obvious, voice controlled navigation systems still are exceptions on the market.
This is due to the large vocabulary which has to be handled for addresses. Today, a common
workaround is entering the destination by spelling. The table below gives an overview about
competitors on the market.
Manufacturer
Clarion
Visteon
Magellan
Pronounced Technologies
Pioneer
Voice controlled Navigation Systems
Clarion AutoPC
Visteon Telematics Radio (VTR)
750NAV
AudioNav™
AVIV 50 S
Central voice control devices are meant as central instance for voice control in a car. They are
designed as integral parts of the car’s man machine interface. Such devices can not be offered
on the after sales market, they are pure ‘OEM’ products. Due to this general approach, central
voice control devices are the most complex speech recognition equipment in a car. Their use
varies from telephone and climate control to operating the navigation system or internet
access. Some products and their manufactures are listed below.
Manufacturer
Temic
Delphi
Visteon
Eclipse
Central Voice control Device
StarRec Car SDS
Communiport® Speech Processing
Visteon Voice Technology
Eclipse Commander
3.2.5 Toys
Within the toy market three different main product groups may be identified:
1. Emotion orientend products like dolls etc.
2. Speech controlled learning aids (function oriented)
3. Computer games (function oriented)
The first product group is characterized by “invisible” speech recognition. This means that the
perception of speech technology takes place during the decision to buy but NOT during the
usage of the product. The consumer buys a doll and not a low-cost robot. Typical products are
“furby” like game robots. (http://www.furby.com) This year several projects have been
launched by the dominating toy manufactures. These are in this segment: Bandai, Hasbro and
Mattel.
In contrast to the first market segment the other two market segments are more technically
oriented. The consumer focuses on the technical aspects of the product. Typical products in
this area are video games etc. Sample products: K2 Interactive & verbal commander speech
recognition for PC games from K2 Interactive L.L.C or Bosconian from Namco.
Schematically the toy market is shown in the picture. One can identify three different groups
of manufacturers. The so called toy allrounder offers all kinds of toys but are not specialized
to electronic devices like some toy specialists. The companies in that group are specialized to
one product group of the portfolio. As a target for speech recognition Vtech could be pointed
out. The third group contains the electronic giants such as Nintendo or Sony. This group can
also be defined as a target for high-end speech recognition solutions.
page 9 of 20
Selected Global Player in the Toy Market
Revenue
Sony*
6,3 Bil. US$
Mattel
4,7 Bil. US$
Nintendo
4,7 Bil. US$
Hasbro
3,3 Bil. US$
Sega
3 Bil. US$
Bandai
2,2 Bil. US$
Lego
1,1 Bil. US$ The Learning
Company
900 Mio. US$
Märklin
163 Mio. US$
Vtech**
400 Mio. US$
Geobra
Brandstätter
280 Mio. US$
Superjouet
109 Mio. US$
Steiff
Lehmann 58 Mio. US$
33 Mio. US$
Ravensburger
314 Mio. US$
TA
Spiel &
Freizeit
282 Mio. US$
Blatz Group
62 Mio. US$
Product
Diversity
Mono-Specialists
Multi-Specialists
Allrounder
* turnover related to games
** turnover related to the business field Electronic Learning Products
(C) Infineon Technologies 1999
3.2.6 PDAs
The user of a personal digital assistant (PDA) often stores important personal information
data on his or her device. The main purpose for that is to have easy access to the information
when being out of the office. These personal digital assistants range from palm-size devices to
pocket-PC’s.
The PDA has only one user but that person uses the device frequently. The information this
person manages is mainly his daily schedule and the address book with data about his or her
contacts. Applications that manage this data should provide easy and efficient access to that
data.
In order to query their data, the user of the classic user interface (UI) must use the stylus,
small keys or other pointing methods to walk through the menus of the various applications.
The requested information is returned visually. This process keeps the user’s hands and eyes
busy for the duration of the request and response.
The alternative application will provide the user with access to their important information by
voice. This approach will include speech synthesis to provide feedback and response to the
user’s commands. The user can make requests for specific information using a spoken
command, the device will then speak a response to the user’s query.
Such a device will allow an even more natural and efficient access to his data compared to a
classic agenda. This will become a need for the mobile professional in the global village.
Typical functionality using speech input is:
 Retrieve, speak and/or display the next scheduled appointment.
 Retrieve, speak and/or display the current day’s scheduled appointments and active tasks.
 Enter and change a new appointment time and date.
 Lookup and dial a contact’s phone number by spelling the contact name alphabetically.
 Preview emails and read the sender and subject of each e-mail message from the inbox.
 Create a reply message to the email that is currently being previewed.
 Take a VoiceNote
 Request time and date
 Launch an application
page 10 of 20
Below a list of producers of Personal Digital Assistants with the operating system used is
given.
page 11 of 20
Producer
Product
OS
Casio
CASSIOPEIA
Freedio
Windows-CE
Compaq
Aero 1550
Aero 8000 H/PC Pro
Windows-CE
Hewlett Packard
Jornada 540
Jornada 820
Jornada 590
Jornada 580
620LX
360LX
Windows-CE
Itronix
Hitachi
T5200
Windows-CE
e-plate HPW-600ETM
e-plate HPW-600ET
Windows-CE
NEC
MobilePro 780
MobilePro 880
Windows-CE
Novatel
Sharp
CONTACT Wireless H/PC
Windows-CE
Windows-CE
Vadem Clio
Clio C-1050
Clio C-1000
Windows-CE
Palm III
Palm IIIc
Palm IIIe
Palm IIIe Special Edition
Palm IIIx
Palm IIIxe
Palm V
Palm VII
Palm Vx
PalmPilot Professional
Palm-OS
Series 5mx
Revo
Proprietary
Psion
Oregon
Brother
Franklin
Fuga Corporation
Handspring
REX
IBM
TRG
Vtech
Mobilon Pro PV-5000
Mobilon TriPad PV-6000
Mobilon HC-4600
Proprietary
Proprietary
Proprietary
Proprietary
Palm-OS
Proprietary
Proprietary
Proprietary
Proprietary
GeoBook NB-80c
REX Pro
page 12 of 20
4
Market Volume on Market Segments
In the next sections the size of the markets for the six market segments of the consumer
devices is investigated. The size of the market is defined by the units of consumer devices
using SIR5 (The SPEECON databases are designed for SRI). Each market segment is treated
separately. Following approach was chosen in order to come to an estimate of the market size
within each segment:
First the total market size (in device units with and without speech recognition) of the market
within each segment is estimated. These estimates are based on market research studies when
available. In a second step the percentage of devices using an voice driven interface within
each segment is estimated leading to an estimate of a segment specific market size of voice
driven consumer devices.
4.1
Mobile Phones
Apparently there exists no market analysis on the importance of the speech recognition
feature for mobile phones, maybe because of the relative youth of the field. The mobile phone
market has been characterized by competition in, at least, price, size, style, and features.
Network operators may often have a significant role in what features the phones will have.
These decisions may be based on slightly different issues than those of an independent
consumer. The expectation is that as most mobile phones are already fairly small and
inexpensive, the added value of usability enhancements, such as voice control, will have an
even greater significance. Another issue is the fact that ultra-small terminals, e.g., wrist-watch
phones, which will have no keypad at all, will soon become available. In this case, voice
control will be more or less mandatory.
The importance of speech recognition in present and future mobile phones is obvious. Most
manufacturers have the feature in their mobile phones already today. Some manufacturers
have had voice dialing in some form in virtually every product announced recently, although
others are a little more conservative. The assumption is, since both memory and
computational resources are becoming cheaper all the time, that SI technology will become
more popular in the near future. The resources are in fact becoming available in many of
today's products. Also the emergence of several applications, like real-time video coding in
3G phones, implies that these resources will be present in most future terminals. It remains to
be seen whether the manufacturers choose to implement SI speech recognition features widely
as soon as it is technically feasible, or choose to wait until the end users' acceptance of the
technology has been confirmed. Anyhow, even at a lower growth rate the mobile phone
market segment will be very important for speech applications.
Estimate of SI speech recognition units sold:
The estimate is based on EMC Cellular terminal sales forecasts from 2000 to 2003. The
assumption is that x% of the terminals sold have SI speech recognition in the respective year.
The following table presents the estimated numbers in terms of units sold with or without
speech recognition.
5
speaker independent speech recognition
page 13 of 20
Europe
ROW
Total
2000
151,200,000
268,100,000
419,300,000
2002
215,700,000
418,700,000
634,400,000
2003
247,900,000
499,200,000
747,100,000
Tab.1 EMC terminal sales estimate in Europe and the rest of the world6.
The growth of the segment ‘mobile phones’ in terms of speaker independent speech
recognition technology is difficult to predict, but as more resources become available in
terminals, manufacturers are expected to utilize the benefits of SI technology more
extensively.
In tab.2 a first estimate ( based on x% of units are expected to be equipped with SI speech
recognition technology) is given.
x
Europe
ROW
Total
2000
0.50%
756,000
1,340,500
2,096,500
2002
2.00%
4,314,000
8,374,000
12,688,000
2003
4.00%
9,916,000
19,968,000
29,884,000
Tab. 2 Exemplary estimate of mobile terminals sold with SI ASR features
4.2
Information Kiosks
Market Studies
Frost & Sullivan: Interactive Kiosk Markets (U.S.), Code #5386-74, 1997
Frost & Sullivan: World Interactive Kiosk Markets. The Self-Service Solution for the
New Millennium. Code #7199-74, 2000, (excerpts by personal communication)
Market Size
Speech recognition is not yet considered as a key feature in kiosk systems, and the available
market research studies make no reference to speech recognition and understanding
technologies. While dropping hardware prizes are constantly contributing to growing sales
volumes for kiosk vendors, the ease of use is considered of critical importance for a higher
demand on interactive kiosk systems. The end users will ultimately decide whether or not
kiosks are feasible, and due to the widespread acceptance of ATMs many vendors believe that
interactive kiosks will achieve comparable success.
The overall unit shipment for the US market are expected to be 211.800 in 2000, 363.600
units in 2002, and 445.000 units in 2003, respectively. No particular market inhibitors are
known for the European market, and the world wide kiosk market is predicted to grow
exponentially.
In lack of profound market analysis on speech and interactive kiosks, the numbers given in
Table 1 should be read as an example for unit shipments, if a small but slightly growing
6
EMC World Cellular Database (http://www.emc-database.com/Website.nsf/index/databaseintro), Cellular
terminal sales forecasts, June, 2000. Europe covers West and East Europe as defined by EMC; ROW covers
Africa, Americas, APAC, Middle East, USA/Canada.
page 14 of 20
number (3 – 5 percent) of information kiosks start to incorporate speech recognition
capabilities.
2000
Europe 7.200
ROW 7.200
2002
11.100
11.100
2003
22.300
22.300
Remarks
assuming US market size
US only
Tab1. Market size( in units ) of voice driven Information Kiosks
4.3
Audio/Video devices
Source: Understanding & Solutions Limited, Digital Consumer Electronics And Home
Entertainment Watch:
- Home Video Market Update Report, January 2000
- CTV Market Update Report, March 2000
- Consumer Digital Imaging Market Report Update, February 2000
- Home Audio Market Update Report, March 2000
Apparently there exists no market analysis on the importance of speech recognition for the
audio/video devices segment. The market is characterized by competition with price and
features. For the analysis we identified following market segments: color TV, digital TV,
digital imaging, home audio and home video devices. Each of the groups is again split into
subclasses covering the most prominent classes of devices.
For some years the color TV markets in all major Western European countries have been in
mature stage of market development, but Europe has become a beacon for the rest of the
world on the path to the development of digital television. Thanks to well established digital
video broadcasting (DVB) standards, Europe is leading the way in developing Digital TV
broadcasting, especially Digital Terrestrial Television (DTT).
Given the great similarities in programming between various digital broadcasters, interactive
applications are set to rapidly become a major area of differentiation and competition between
broadcasters and service providers.
The camcorder market is expected to continue to grow throughout the forecast period,
especially for digital devices.
Following Understanding & Solutions, the home audio market is divided into two distinct
sectors, the integrated systems market and the audio separates or components sector. The
integrated system market maintains primarily the mass market, i.e. non-specialist sector of the
audio market. The component or separates market is primarily the preserve of the audio
enthusiast, who is seeking an optimal audio solution or new devices (Super Audio CD, solid
state devices, etc.). The audio market is characterized by the increasing demand for emerging
products, the arrival of new audio technologies and the rather stagnating traditional audio
market.
In the USA and Europe, VCR sales is expected to continue to be positive. Digital -VHS
products are available now in both Europe and the USA, however these products are expected
to have little impact on the overall video recording market. This is due to both the
commitment of the movie industry to DVD, and the development of other digital recording
devices that offer higher levels of functionality. (cited from Understanding & Solutions
Limited, Digital Consumer Electronics And Home Entertainment Watch)
page 15 of 20
Estimate of speech interface units sold in 2002 and 2003:
The estimate is based on forecasts. The assumption is that x% of the units sold have a speech
interface in the respective year. The following tab.1 presents the estimated numbers in terms
of units (in 1000) sold with speech recognition. The percentage of each of the device classes
is summarized in the next table.
Market Segment
Area
Color Television (000s) Europe
USA + Japan
Digital TV
Europe
USA + Japan
Digital Imaging
Europe
USA + Japan
Home Audio
Europe
USA + Japan
Home Video
Europe
USA + Japan
2000
28291
38700
11263
10050
3180
7092
46019
73335
18679
39945
2002
28430
39980
15775
13620
3355
8136
45289
74540
24947
48220
2003
29020
40750
17397
23735
3750
8776
43973
74505
27590
50660
Tab.1 Units in 1000 ; audio/video devices with and without speech recognition;
Market Segment
Color Television
Digital TV
Digital imaging
Home audio
Home video
Devices
4:3 TVs
16:9 TVs
TV+VCR combos
Integrated digital terrestrial, digital
terrestrial set-top boxes, digital terrestrial
receivers, satellite set-top, cable set-top
Camcorders
Integrated home components
separate amps & rec
personal + portables
VCR +DVD
2002
0.0
0.1
0.0
0.5
2003
0.1
1.0
0.1
1.0
0.2
0.1
0.1
0.01
0.5
0.5
0.5
1
0.05
1
Tab.2 Percentage of units with speech interface estimated for 2002 and 2003.
Taking into account the percentages given in Tab. 2 and the market size in Tab. 1 leads to an
estimate of the market size for Audio/Video devices as shown in Tab. 3.
2000
Europe 0
ROW 0
2002
235.500
602.000
2003
711.900
1.065.200
Remarks
US and Japan
Tab.3 Market size( in units ) of voice driven Audio/Video devices
page 16 of 20
4.4
Automotive devices
Market Size
Due to no public information about the volume of speech driven automotive devices are
available, the following estimation is based on the Temic sales figures until the end of 1999.
Till the end of 1999 Temic sold 100,000 entities of speech recognition devices. Assuming
Temic controlling one quarter of the automotive speech recognition market a total market of
400,000 entities in 1999 is supposed. As a rapidly growing market, exponential growth, e.g.
doubling each year, is presumed. This leads to the following marked figures.
2000
2001
2002
2003
Europe 240,000
480,000
960,000
1,920,000
[in units]
ROW 560,000
1,120,000
2,240,000
4,480,000
See assumption above
Total
800,000
1,600,000
3,200,000
6,400,000
Europe is assumed to share 30% of the market. This number is figured out of the estimated
global light vehicle production (motor car production) till the year 2003.
2000
Europe 16,55
ROW 38,23
Total
54,78
2001
16,41
39,46
55,87
2002
16,57
40,86
57,43
2003
16,61
42,09
58,70
[in million units]
Source: DRI 12/1999
As can be seen from both tables, no saturation effects have to be considered. Even in 2003 the
total estimated market volume covers only 11% of the global light vehicle market.
4.5
Toys
Actually no public data is available for speech recognition in the toy market. Nevertheless we
observed an increasing number of speech controlled toys presented at various toy related trade
shows world wide. Typically Europe follows the US and Asian market. Therefore the
estimated revenue in Europe is only about 20% of that within the rest of the world.
Nevertheless this does not imply that European companies will not participate in this
worldwide success. There are significant opportunities as game manufactures or technology
providers as well. We assume that the saturation will be reached in about 10 years. Not all
projects launched will be successful. Furthermore we expect the price to go down and the
number of successful projects with all types of speech recognition to increase.
Sum of successful projects world wide
Europe
ROW
Total
2000
4
200.000
1.000.000
1.200.000
Tab.1 Toys (in units) with speech recognition7
7
Infineon internal market research
page 17 of 20
2002
26
1.050.000
4.875.000
5.925.000
2003
50
1.400.000
7.000.000
8.400.000
4.6
PDAs
The estimated market data for these handheld devices by region can be found in the market
study from Dataquest of June 1999.
Handheld Computer Shipments By Platform And Region, 1998 To 2003
(Thousands of Units)
CAGR (%)
1998
1999
2000
2001
2002
2003
1998-2002
1206
495
1635
764
2037
986
2710
1331
3521
1720
4220
2088
28.5
33.3
77
353
72
217
59
153
57
142
53
146
49
137
-8.7
-17.2
PalmOS
Windows CE
256
181
477
442
703
734
1074
1018
1515
1357
1923
1780
49.7
57.9
EPOC
Others
398
139
539
152
573
139
588
156
629
193
744
279
13.3
15
8
66
37
229
104
423
210
647
370
958
614
1324
138.3
82
5
13
21
30
170
221
193
198
181
155
-1.8
PalmOS
Windows CE
32
42
66
110
108
200
167
310
244
442
336
618
60
71
EPOC
Others
26
66
40
78
45
57
52
39
62
33
80
32
25.2
-13.6
104
19
168
65
218
109
283
177
362
255
469
368
35.2
80.5
23
66
41
77
57
84
75
83
95
93
126
68
40.7
0.4
United States
PalmOS
Windows CE
EPOC
Others
Western Europe
Japan
PalmOS
Windows CE
EPOC
Others
36 NA
Asia/Pacific
ROW
PalmOS
Windows CE
EPOC
Others
Source: Dataquest (June 1999)
CAGR (%)
Total World
Palm OS
Windows CE
EPOC
Others
1998
1999
2000
2001
2002
2003 1998-2002
1,606
803
524
794
2,383
1,610
697
745
3,170
2,452
747
626
4,444
3,483
793
618
6,012
4,732
869
646
7,562
6,178
1,035
671
36%
50%
15%
-3%
Grand Totals by Year
3,727
5,435
6,995
9,338
12,259
15,446
33%
page 18 of 20
Adding the units of PDAs for the different platforms lead to the market size shown in Tab. 1.
2000
2.149.000
4.846.000
Western Europe
ROW
2002
2003
3.694.000 4.726.000
8.565.000 10.720.000
Tab.1 Units of PDAs with and without speech recognition
Percentage of units sold featuring speech technology
5%
5%
5%
4%
4%
3%
3%
3%
2%
2%
1%
1%
1%
0%
0%
2000
2001
2002
2003
Tab.2 Estimated percentage of speech enabled PDAs is given over time.
Based on the estimated percentage given in Tab. 2 the estimated market volume of voice
driven PDAs is given in Tab. 3.
Europe
ROW
2000
0
0
2002
110.820
256.950
2003
236.300
536.000
Tab3. Market size( in units ) of voice driven PDAs
page 19 of 20
Remarks
West Europe
5
Overview Market Volume and Conclusions
In table 1 an overview of the different market segments is given. Due to the uncertainty of this
just starting market these figures give only a very rough estimate as explained in chapter 4.
Nevertheless they give an impression of the market potential.
Market Segment
1.
Mobile phones
2.
Information kiosks
3.
Audio/video devices
4.
Automotive devices
5.
Toys
6.
PDAs
Area
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Europe
ROW
Year 2000
750.000
1.300.000
7.000
7.000
0
0
240.000
560.000
200.000
1.000.000
0
0
Year 2002
4.300.000
8.400.000
11.000
11.000
240.000
600.000
960.000
2.200.000
1.000.000
4.900.000
111.000
257.000
Year 2003
Comments
9.900.000
20.000.000
22.000
22.000 US only
710.000
1.100.000 US & Japan
1.900.000
4.500.000
1.400.000
7.000.000
236.000 West Europe
536.000
Tab. 1: Market volume of voice driven consumer devices. The market Volume is treated in
units of devices.
Clearly the segment of mobile phones dominates the market followed by the market of toys
and automotive devices. Concerning the specification of the SPEECON databases mobile
phones demand recordings in noisy public, home environments and rather quiet office
environments, toys demand for recordings of children and automotive devices for recordings
in car. In the other deliverables of WP1 and WP2 these issues are regarded.
page 20 of 20
Download