SPEECON Deliverable D11 Project ref. no. IST-1999-10003 Project acronym SPEECON Project full title Speech Driven Interfaces for Consumer Applications Security (distribution level) Project internal Contractual date of delivery M04 = May 2000 Actual date of delivery 2000-September-19 Deliverable number D11 Deliverable name Market Analysis Type Technical Report Status & version Final v2.2 Number of pages 20 WP contributing to the deliverable WP1 WP / Task / Deliverable responsible WP1: Nokia / T1.1: Siemens / D11: Siemens Other contributors Nokia, IBM, Sony, Temic, Infineon, L&H Author(s) Höge, Harald (Siemens) Burchard, Bernd (Infineon) Comeyne, Robrecht (L&H) Diehl, Frank (Temic) Fischer, Volker (IBM) Häkkinen, Juha (Nokia) Marasek, Krzysztof (Sony) EC Project Officer Perrotta, Domenico Project Coordinator Name: Company: Address: Phone: Fax: E-mail: Project web site: Keywords Consumer devices, recognition, market analysis Abstract (for dissemination) Following market segments for voice driven consumer devices are regarded: mobile phones ,information kiosks, audio/video devices, toys, Personal digital assistants Harald Höge Siemens AG, ZT IK 5 Otto-Hahn-Ring 6, D-81739 München, Germany +49-89-636-44195 +49-89-636-49802 harald.h.hoege@mchp.siemens.de http://www.speecon.com page 1 of 20 Document evolution: Version Date Security Notes v1.0 23.Dec.99 Project internal v1.1 25.Apr.00 Project internal V1.2 12.May.00 Project internal V1.3 11.July.00 Project internal V1.4 20.July.00 Project internal Presented on 3. SPEECON Workshop V2.1 14.Sept.00 Project internal Prefinal version V2.2 19.Sept.00 Project internal Final version Table of Contents 1. Executive Summary 2. Aim of the Market Analysis 3. Market segments 3.1 Definition of the Market Segments 3.2 Description of Products and Main Producer 4. Market Volume on Market Segments 5. Overview of Market Volume and Conclusions page 2 of 20 Presented on 2. SPEECON Workshop 1 Executive Summary Within the SPEECON project speech databases are produced which allow to train recognizers for voice driven consumer devices. The databases have to represent application specific characteristics (acoustic environment of the speakers, application specific words used). Due to the broad variety of applications many application specific characteristics have to be regarded ideally. Due to financial constraints not all application specific characteristics can be represented in the SPEECON databases with the same emphasis. The aim of the market analysis is to have a decision basis which allows to favor such application specific characteristics which are more relevant with respect to market potential. The market analysis starts with the definition of market segments which are defined by consumer devices with similar functionality. Six main segments have been identified: Mobile Phones Information Kiosks Audio/Video Devices Automotive Devices Toys PDAs Due to the structure of the SPEECON consortium the market segment ‘home appliances’ is not regarded. Within each segment the products and the main producers are described and the market volume estimated. The market volume is determined in terms of units of sold recognizers for the years 2000, 2002, 2003 for Europe and ROW (Rest of World). The following table shows the market volumes for the different segments in the years 2002, 2003. These years were chosen because at that time the SPEECON databases will be used for the first time to train recognizers for voice driven consumer devices. Market Segment Mobile Phones Information Kiosks Audio/Video Devices Automotive Devices Toys Personal Digital Assistance Total Area Europe ROW Europe ROW Europe ROW Europe ROW Europe ROW Europe ROW Year 2002 4.300.000 8.400.000 11.000 11.000 240.000 600.000 960.000 2.200.000 1.000.000 4.900.000 111.000 257.000 22.990.000 page 3 of 20 Year 2003 9.900.000 20.000.000 22.000 22.000 710.000 1.100.000 1.900.000 4.500.000 1.400.000 7.000.000 236.000 536.000 47.326.000 2 Aim of the Market Analysis Within the SPEECON project speech databases will be produced which allow to train recognizers for voice driven consumer devices. Due to current speech recognition technology these databases should be representative for the acoustic environments in which they are used. Best recognition performance is achieved when the acoustic environment of the recorded speech used to train the recognizers match to the acoustic environment in which the recognizers are used. The acoustic environment is mainly determined by the background noise which is present during the use of the recognizer. Typical sources of noise are noise in public places, cars, living rooms and offices. Within the project Aurora 1– an industrially guided project to standardize the front end of recognizers – the influence of different environmental noises on the performance of recognizers have been investigated. This ongoing project shows clearly the strong relationship between recognition rates and the characteristics of the different environmental noises. Due to these findings the aim of the market analysis is to identify such acoustical environments which have high market potential with respect to voice driven consumer devices. Concerning the market potential an important question is the time slot for which the market analysis should be done. Assuming that the SPEECON databases will be ready end of year 2001 and taking into account the fast development of consumer devices and speech technology the time slot will be targeted to the period 2002-2003. As the market of voice driven consumer devices is just emerging only very tentative estimates of market volumes can be made. To achieve the aim of the market analysis following 4 steps are performed: definition of the market segments for Future Voice Driven Consumer Devices description of the products and producers within these segments determining the segment specific market volume of Future Voice Driven Consumer Devices for the period 2002-2003 Summarize the market for voice driven consumer devices for the period 2002-2003 1 http://www.etsi.org page 4 of 20 3 3.1 Market Segments Definition of Market Segments The market segments for the consumer devices are defined by a certain common functionality. Within the SPEECON consortium six market segments have been identified, which are relevant for the partners. Due to the broad coverage of consumer products within the consortium, the main market segment for consumer products are treated. Only the market segment ‘home appliance’ is missing. The six market segments investigated are: 1 Mobile Phones 2 Information Kiosks 3 Audio/Video Devices 4 Automotive Devices 5 Toys 6 PDAs 3.2 Description of Products and Main Producers In the following existing or planed products are described and the main potential producers are enumerated. 3.2.1 Mobile Phones Speech recognition is typically used for the following purposes in mobile phones currently in the market: Name dialing Digit dialing Command and control Voice activation The principle of name dialing is that the user can call a person whose name is stored in the phonebook just by speaking the name. Naturally, a voice tag for the name must have been trained in advance by the user, or it must have been obtained some other way. This name dialing is the basic feature that every mobile phone with speech recognition has. Digit dialing refers to the recognition of connected digit sequences, i.e., phone numbers, for making the call. In addition to mobile handsets, there are currently also some external car hands-free kits sold with this feature. They are included in the automotive segment in the SPEECON project. Limited command and control of the phone operations is supported by many manufacturers. Answering the call by voice, and creation of voice macros for accessing some frequently used features of the phone, are typical examples. The phone may also be waiting for a "magic word" (a voice activation keyword) in standby mode. The keyword is used to change the phone into name dialing, digit dialing, or some other mode. Voice dialing and voice control of the phone operations or dialing functions has been seen as an attractive method of phone operation in the car environment for a long time. Speaker-dependent (SD) name dialing, where the user is prompted for a sample utterance (or page 5 of 20 samples) of the name in a training phase, emerged already in mid 1990's into mobile phones. In recent years, most of the phone manufacturers have introduced models with voice dialing features. Many mobile phone models already support command and control of the phone features by voice. In current products, SD technology is mainly used since it is fairly robust and cheap to implement, and it is also language-independent. Moreover, extensive speech resources are not needed in the implementation. The main drawback of SD technology is that users have to train the system with their own voice, which is a burdensome and also error-prone step. Speaker independent (SI) technology, however, is ideally suited for voice control applications. A few products with SI name dialing do exist already today, especially in Japan, and more can be expected in the high-end segment in the near future. Moreover, the main reason for the lack of SI technology is in fact the absence of appropriate speech databases in many languages. Consumers expect that the products support their own language, which has been difficult to realize so far. Thus, SPEECON is paving the way to a broader dissemination of SI technology. The following table presents a list of manufacturers that have either announced or are presently delivering mobile phones with speech recognition features.2 In addition, Audiovox and Kyocera supposedly have models, but no details were available at the time of writing this. Manufacturer Acer Ericsson Fujitsu Motorola NEC Neopoint Nokia Panasonic Philips Samsung Siemens Telit Networks3 GSM, CDMA GSM PDC GSM PDC CDMA GSM, PDC PDC GSM TDMA, CDMA GSM GSM It is evident from the table that most key manufacturers have at least some models with voice dialing, although there are some notable absentees from the scene. 3.2.2 Information Kiosks In contrast to touch-screen based solutions, kiosk systems characterized as multimedia kiosks are considered as potential candidates for the incorporation of speech recognition and understanding solutions. Rapidly changing from simple proxy information machines to intelligent self-service stations, kiosk applications include automated teller machines and 2 The list is based on the information available to the consortium in June 2000. The main objective of the list is to illustrate the broad commitment of the industry to voice dialing. Although care has been taken to make the list as complete as possible, omissions and errors are possible. 3 The networks column roughly illustrates for which standards mobile phones with speech recognition are provided. No distinction was made between the various versions of the standards that utilize different frequency bands. page 6 of 20 banking, E-mail, weather forecast, and travel information are among the most important scenarios, but the most frequently listed feature of multimedia kiosks is (unrestricted) Internet access. The large number of producers offering turnkey system solutions indicates that number of pieces with common features or functionalities are far below those in other market segments, like e.g. in the PDA or mobile phone market. Although apparently not used by any of today’s systems, speech technologies could offer a simple and natural user interface, which is expected to be most critical for a broader acceptance of kiosk systems. Typically installed in public places or offices, like e.g. in train stations or main halls, kiosks are not assigned to individual end users, like e.g. mobile phones or PDAs. Therefore, each single device will be used by a large population of end users, which makes speaker dependent (SD) speech recognition irrelevant. Frost & Sullivan’s Worldwide Interactive Kiosk Market Study, January 2000, makes only one reference to speech enabled kiosk system that is related to Philips who plan to go forward with some capability. Other multimedia kiosk producers are: Intelligent Kiosk Company - Provides interactive kiosks for a wide range of applications. Ecotek – Provides Internet enabled multimedia kiosks and web sites for commercial and government applications. Official Kiosk Group, Inc. - Offers a stand alone internet machine featuring e-mail, stock quotes, weather forecast, and banking for public internet access. ARYA Systems Limited - Manufactures and distributes point of information and point of sale kiosks. Winstanley Associates – Offers Multimedia kiosk design, integration and production for clients requiring custom or turnkey solutions. ADCO Corporation -Products manufactured include telephone enclosures and accessories, Interactive Multi Media kiosks, drive-up and walk-up ATM enclosures, airport casework and millwork, and architectural structures. First Impression A/S - A Danish info-kiosk in stainless steel, for public internet access and for exhibitions. IBM Kiosk Technology Services - Provides interactive multimedia touch-screen kiosks and interactive display systems, kiosk consulting services, standard and custom cabinet enclosure, and remote monitoring services for deployed kiosks. NCR - Internet-enabled, transactional, self-service terminal providing the high availability, openness and scalability demanded for today's kiosk environments. Quarta TouchSystems information kiosks complete with software and public browser for Internet kiosks. Siemens Information and Communication Networks - Produces kiosks for customers including software creation, hardware sourcing, production, installation, monitoring and service. WebRaiser - International Kiosk system integrator, hardware and software solutions. Public internet access for hotels, cafes, hospitals, airports etc. page 7 of 20 An independent alliance of kiosk companies providing news, information and current events about public internet access terminals can be found on http://www.kiosks.org 3.2.3 Video/Audio In current audio and video home products voice interface is not used, however, on the latest trade shows and exhibitions some prototypes were presented (e.g. at IFA’99 Thomson CSF presented isolated speaker-independent SR for control of TV, VCR, lately Philips announced speech controlled TV-set). It seems that embedded speech technology (owned or offered by OEM-partners) will be deployed to home appliances in next years. Main potential applications: The future speech interface to the audio and video devices will probably include not only command and control, but also voice-driven content selection ( DVB4-enabled and delivery on demand (video-on-demand, etc.)). Depending of the future role of the TV-set at home also other applications are possible, including almost all foreseen for the information kiosk and home automation per se. Potential producers: All main consumer electronics producers, chip makers (ex. already offered by Sensimetrics, Infineon, Philips) or third-party companies (ex. Cocoonz, VoiceTrek). 3.2.4 Automotive Within the automotive market three main products can be identified, telephone car kits, voice controlled navigation systems and central voice control devices. In the following a short description of the functionality of these products is given, followed by an overview of products. Telephone car kits serve exclusively for hand free calling during driving a car. Speech recognition is usually limited on pure dialing features as direct or name dialing. Even sometimes these features are fully integrated in the mobile phone, mobile phone manufactures as well as independent producers offer such equipment. By using a telephone car kit, hands free calling gets possible also for phones not equipped with automatic speech recognition. An overview of manufacturers and their products is given below. Manufacturer Ericsson Siemens Motorola THB Bury Votronic Funkwerk Dabendorf GmbH Telephone Car Kit Advanced Car Handsfree HCA-10 Car Kit Professional Voice VR Handsfree Car Kit THB VoiceDial Article-No.: 10400 Audio Voice Navigation systems guide a driver to a destination. Usually a couple of supplemental services are offered too. This may be a detour and automatic reroute capability as well as calculating the shortest way in time or distance. Also services like guidance to so called points of interest are offered. A point of interest may be a hotel or a gas station. Speech recognition simplifies the use of such equipment. Instead of long winded input procedures for e.g. an address, just an instruction is given. Although the benefits of voice 4 digital video broadcasting page 8 of 20 control are obvious, voice controlled navigation systems still are exceptions on the market. This is due to the large vocabulary which has to be handled for addresses. Today, a common workaround is entering the destination by spelling. The table below gives an overview about competitors on the market. Manufacturer Clarion Visteon Magellan Pronounced Technologies Pioneer Voice controlled Navigation Systems Clarion AutoPC Visteon Telematics Radio (VTR) 750NAV AudioNav™ AVIV 50 S Central voice control devices are meant as central instance for voice control in a car. They are designed as integral parts of the car’s man machine interface. Such devices can not be offered on the after sales market, they are pure ‘OEM’ products. Due to this general approach, central voice control devices are the most complex speech recognition equipment in a car. Their use varies from telephone and climate control to operating the navigation system or internet access. Some products and their manufactures are listed below. Manufacturer Temic Delphi Visteon Eclipse Central Voice control Device StarRec Car SDS Communiport® Speech Processing Visteon Voice Technology Eclipse Commander 3.2.5 Toys Within the toy market three different main product groups may be identified: 1. Emotion orientend products like dolls etc. 2. Speech controlled learning aids (function oriented) 3. Computer games (function oriented) The first product group is characterized by “invisible” speech recognition. This means that the perception of speech technology takes place during the decision to buy but NOT during the usage of the product. The consumer buys a doll and not a low-cost robot. Typical products are “furby” like game robots. (http://www.furby.com) This year several projects have been launched by the dominating toy manufactures. These are in this segment: Bandai, Hasbro and Mattel. In contrast to the first market segment the other two market segments are more technically oriented. The consumer focuses on the technical aspects of the product. Typical products in this area are video games etc. Sample products: K2 Interactive & verbal commander speech recognition for PC games from K2 Interactive L.L.C or Bosconian from Namco. Schematically the toy market is shown in the picture. One can identify three different groups of manufacturers. The so called toy allrounder offers all kinds of toys but are not specialized to electronic devices like some toy specialists. The companies in that group are specialized to one product group of the portfolio. As a target for speech recognition Vtech could be pointed out. The third group contains the electronic giants such as Nintendo or Sony. This group can also be defined as a target for high-end speech recognition solutions. page 9 of 20 Selected Global Player in the Toy Market Revenue Sony* 6,3 Bil. US$ Mattel 4,7 Bil. US$ Nintendo 4,7 Bil. US$ Hasbro 3,3 Bil. US$ Sega 3 Bil. US$ Bandai 2,2 Bil. US$ Lego 1,1 Bil. US$ The Learning Company 900 Mio. US$ Märklin 163 Mio. US$ Vtech** 400 Mio. US$ Geobra Brandstätter 280 Mio. US$ Superjouet 109 Mio. US$ Steiff Lehmann 58 Mio. US$ 33 Mio. US$ Ravensburger 314 Mio. US$ TA Spiel & Freizeit 282 Mio. US$ Blatz Group 62 Mio. US$ Product Diversity Mono-Specialists Multi-Specialists Allrounder * turnover related to games ** turnover related to the business field Electronic Learning Products (C) Infineon Technologies 1999 3.2.6 PDAs The user of a personal digital assistant (PDA) often stores important personal information data on his or her device. The main purpose for that is to have easy access to the information when being out of the office. These personal digital assistants range from palm-size devices to pocket-PC’s. The PDA has only one user but that person uses the device frequently. The information this person manages is mainly his daily schedule and the address book with data about his or her contacts. Applications that manage this data should provide easy and efficient access to that data. In order to query their data, the user of the classic user interface (UI) must use the stylus, small keys or other pointing methods to walk through the menus of the various applications. The requested information is returned visually. This process keeps the user’s hands and eyes busy for the duration of the request and response. The alternative application will provide the user with access to their important information by voice. This approach will include speech synthesis to provide feedback and response to the user’s commands. The user can make requests for specific information using a spoken command, the device will then speak a response to the user’s query. Such a device will allow an even more natural and efficient access to his data compared to a classic agenda. This will become a need for the mobile professional in the global village. Typical functionality using speech input is: Retrieve, speak and/or display the next scheduled appointment. Retrieve, speak and/or display the current day’s scheduled appointments and active tasks. Enter and change a new appointment time and date. Lookup and dial a contact’s phone number by spelling the contact name alphabetically. Preview emails and read the sender and subject of each e-mail message from the inbox. Create a reply message to the email that is currently being previewed. Take a VoiceNote Request time and date Launch an application page 10 of 20 Below a list of producers of Personal Digital Assistants with the operating system used is given. page 11 of 20 Producer Product OS Casio CASSIOPEIA Freedio Windows-CE Compaq Aero 1550 Aero 8000 H/PC Pro Windows-CE Hewlett Packard Jornada 540 Jornada 820 Jornada 590 Jornada 580 620LX 360LX Windows-CE Itronix Hitachi T5200 Windows-CE e-plate HPW-600ETM e-plate HPW-600ET Windows-CE NEC MobilePro 780 MobilePro 880 Windows-CE Novatel Sharp CONTACT Wireless H/PC Windows-CE Windows-CE Vadem Clio Clio C-1050 Clio C-1000 Windows-CE Palm III Palm IIIc Palm IIIe Palm IIIe Special Edition Palm IIIx Palm IIIxe Palm V Palm VII Palm Vx PalmPilot Professional Palm-OS Series 5mx Revo Proprietary Psion Oregon Brother Franklin Fuga Corporation Handspring REX IBM TRG Vtech Mobilon Pro PV-5000 Mobilon TriPad PV-6000 Mobilon HC-4600 Proprietary Proprietary Proprietary Proprietary Palm-OS Proprietary Proprietary Proprietary Proprietary GeoBook NB-80c REX Pro page 12 of 20 4 Market Volume on Market Segments In the next sections the size of the markets for the six market segments of the consumer devices is investigated. The size of the market is defined by the units of consumer devices using SIR5 (The SPEECON databases are designed for SRI). Each market segment is treated separately. Following approach was chosen in order to come to an estimate of the market size within each segment: First the total market size (in device units with and without speech recognition) of the market within each segment is estimated. These estimates are based on market research studies when available. In a second step the percentage of devices using an voice driven interface within each segment is estimated leading to an estimate of a segment specific market size of voice driven consumer devices. 4.1 Mobile Phones Apparently there exists no market analysis on the importance of the speech recognition feature for mobile phones, maybe because of the relative youth of the field. The mobile phone market has been characterized by competition in, at least, price, size, style, and features. Network operators may often have a significant role in what features the phones will have. These decisions may be based on slightly different issues than those of an independent consumer. The expectation is that as most mobile phones are already fairly small and inexpensive, the added value of usability enhancements, such as voice control, will have an even greater significance. Another issue is the fact that ultra-small terminals, e.g., wrist-watch phones, which will have no keypad at all, will soon become available. In this case, voice control will be more or less mandatory. The importance of speech recognition in present and future mobile phones is obvious. Most manufacturers have the feature in their mobile phones already today. Some manufacturers have had voice dialing in some form in virtually every product announced recently, although others are a little more conservative. The assumption is, since both memory and computational resources are becoming cheaper all the time, that SI technology will become more popular in the near future. The resources are in fact becoming available in many of today's products. Also the emergence of several applications, like real-time video coding in 3G phones, implies that these resources will be present in most future terminals. It remains to be seen whether the manufacturers choose to implement SI speech recognition features widely as soon as it is technically feasible, or choose to wait until the end users' acceptance of the technology has been confirmed. Anyhow, even at a lower growth rate the mobile phone market segment will be very important for speech applications. Estimate of SI speech recognition units sold: The estimate is based on EMC Cellular terminal sales forecasts from 2000 to 2003. The assumption is that x% of the terminals sold have SI speech recognition in the respective year. The following table presents the estimated numbers in terms of units sold with or without speech recognition. 5 speaker independent speech recognition page 13 of 20 Europe ROW Total 2000 151,200,000 268,100,000 419,300,000 2002 215,700,000 418,700,000 634,400,000 2003 247,900,000 499,200,000 747,100,000 Tab.1 EMC terminal sales estimate in Europe and the rest of the world6. The growth of the segment ‘mobile phones’ in terms of speaker independent speech recognition technology is difficult to predict, but as more resources become available in terminals, manufacturers are expected to utilize the benefits of SI technology more extensively. In tab.2 a first estimate ( based on x% of units are expected to be equipped with SI speech recognition technology) is given. x Europe ROW Total 2000 0.50% 756,000 1,340,500 2,096,500 2002 2.00% 4,314,000 8,374,000 12,688,000 2003 4.00% 9,916,000 19,968,000 29,884,000 Tab. 2 Exemplary estimate of mobile terminals sold with SI ASR features 4.2 Information Kiosks Market Studies Frost & Sullivan: Interactive Kiosk Markets (U.S.), Code #5386-74, 1997 Frost & Sullivan: World Interactive Kiosk Markets. The Self-Service Solution for the New Millennium. Code #7199-74, 2000, (excerpts by personal communication) Market Size Speech recognition is not yet considered as a key feature in kiosk systems, and the available market research studies make no reference to speech recognition and understanding technologies. While dropping hardware prizes are constantly contributing to growing sales volumes for kiosk vendors, the ease of use is considered of critical importance for a higher demand on interactive kiosk systems. The end users will ultimately decide whether or not kiosks are feasible, and due to the widespread acceptance of ATMs many vendors believe that interactive kiosks will achieve comparable success. The overall unit shipment for the US market are expected to be 211.800 in 2000, 363.600 units in 2002, and 445.000 units in 2003, respectively. No particular market inhibitors are known for the European market, and the world wide kiosk market is predicted to grow exponentially. In lack of profound market analysis on speech and interactive kiosks, the numbers given in Table 1 should be read as an example for unit shipments, if a small but slightly growing 6 EMC World Cellular Database (http://www.emc-database.com/Website.nsf/index/databaseintro), Cellular terminal sales forecasts, June, 2000. Europe covers West and East Europe as defined by EMC; ROW covers Africa, Americas, APAC, Middle East, USA/Canada. page 14 of 20 number (3 – 5 percent) of information kiosks start to incorporate speech recognition capabilities. 2000 Europe 7.200 ROW 7.200 2002 11.100 11.100 2003 22.300 22.300 Remarks assuming US market size US only Tab1. Market size( in units ) of voice driven Information Kiosks 4.3 Audio/Video devices Source: Understanding & Solutions Limited, Digital Consumer Electronics And Home Entertainment Watch: - Home Video Market Update Report, January 2000 - CTV Market Update Report, March 2000 - Consumer Digital Imaging Market Report Update, February 2000 - Home Audio Market Update Report, March 2000 Apparently there exists no market analysis on the importance of speech recognition for the audio/video devices segment. The market is characterized by competition with price and features. For the analysis we identified following market segments: color TV, digital TV, digital imaging, home audio and home video devices. Each of the groups is again split into subclasses covering the most prominent classes of devices. For some years the color TV markets in all major Western European countries have been in mature stage of market development, but Europe has become a beacon for the rest of the world on the path to the development of digital television. Thanks to well established digital video broadcasting (DVB) standards, Europe is leading the way in developing Digital TV broadcasting, especially Digital Terrestrial Television (DTT). Given the great similarities in programming between various digital broadcasters, interactive applications are set to rapidly become a major area of differentiation and competition between broadcasters and service providers. The camcorder market is expected to continue to grow throughout the forecast period, especially for digital devices. Following Understanding & Solutions, the home audio market is divided into two distinct sectors, the integrated systems market and the audio separates or components sector. The integrated system market maintains primarily the mass market, i.e. non-specialist sector of the audio market. The component or separates market is primarily the preserve of the audio enthusiast, who is seeking an optimal audio solution or new devices (Super Audio CD, solid state devices, etc.). The audio market is characterized by the increasing demand for emerging products, the arrival of new audio technologies and the rather stagnating traditional audio market. In the USA and Europe, VCR sales is expected to continue to be positive. Digital -VHS products are available now in both Europe and the USA, however these products are expected to have little impact on the overall video recording market. This is due to both the commitment of the movie industry to DVD, and the development of other digital recording devices that offer higher levels of functionality. (cited from Understanding & Solutions Limited, Digital Consumer Electronics And Home Entertainment Watch) page 15 of 20 Estimate of speech interface units sold in 2002 and 2003: The estimate is based on forecasts. The assumption is that x% of the units sold have a speech interface in the respective year. The following tab.1 presents the estimated numbers in terms of units (in 1000) sold with speech recognition. The percentage of each of the device classes is summarized in the next table. Market Segment Area Color Television (000s) Europe USA + Japan Digital TV Europe USA + Japan Digital Imaging Europe USA + Japan Home Audio Europe USA + Japan Home Video Europe USA + Japan 2000 28291 38700 11263 10050 3180 7092 46019 73335 18679 39945 2002 28430 39980 15775 13620 3355 8136 45289 74540 24947 48220 2003 29020 40750 17397 23735 3750 8776 43973 74505 27590 50660 Tab.1 Units in 1000 ; audio/video devices with and without speech recognition; Market Segment Color Television Digital TV Digital imaging Home audio Home video Devices 4:3 TVs 16:9 TVs TV+VCR combos Integrated digital terrestrial, digital terrestrial set-top boxes, digital terrestrial receivers, satellite set-top, cable set-top Camcorders Integrated home components separate amps & rec personal + portables VCR +DVD 2002 0.0 0.1 0.0 0.5 2003 0.1 1.0 0.1 1.0 0.2 0.1 0.1 0.01 0.5 0.5 0.5 1 0.05 1 Tab.2 Percentage of units with speech interface estimated for 2002 and 2003. Taking into account the percentages given in Tab. 2 and the market size in Tab. 1 leads to an estimate of the market size for Audio/Video devices as shown in Tab. 3. 2000 Europe 0 ROW 0 2002 235.500 602.000 2003 711.900 1.065.200 Remarks US and Japan Tab.3 Market size( in units ) of voice driven Audio/Video devices page 16 of 20 4.4 Automotive devices Market Size Due to no public information about the volume of speech driven automotive devices are available, the following estimation is based on the Temic sales figures until the end of 1999. Till the end of 1999 Temic sold 100,000 entities of speech recognition devices. Assuming Temic controlling one quarter of the automotive speech recognition market a total market of 400,000 entities in 1999 is supposed. As a rapidly growing market, exponential growth, e.g. doubling each year, is presumed. This leads to the following marked figures. 2000 2001 2002 2003 Europe 240,000 480,000 960,000 1,920,000 [in units] ROW 560,000 1,120,000 2,240,000 4,480,000 See assumption above Total 800,000 1,600,000 3,200,000 6,400,000 Europe is assumed to share 30% of the market. This number is figured out of the estimated global light vehicle production (motor car production) till the year 2003. 2000 Europe 16,55 ROW 38,23 Total 54,78 2001 16,41 39,46 55,87 2002 16,57 40,86 57,43 2003 16,61 42,09 58,70 [in million units] Source: DRI 12/1999 As can be seen from both tables, no saturation effects have to be considered. Even in 2003 the total estimated market volume covers only 11% of the global light vehicle market. 4.5 Toys Actually no public data is available for speech recognition in the toy market. Nevertheless we observed an increasing number of speech controlled toys presented at various toy related trade shows world wide. Typically Europe follows the US and Asian market. Therefore the estimated revenue in Europe is only about 20% of that within the rest of the world. Nevertheless this does not imply that European companies will not participate in this worldwide success. There are significant opportunities as game manufactures or technology providers as well. We assume that the saturation will be reached in about 10 years. Not all projects launched will be successful. Furthermore we expect the price to go down and the number of successful projects with all types of speech recognition to increase. Sum of successful projects world wide Europe ROW Total 2000 4 200.000 1.000.000 1.200.000 Tab.1 Toys (in units) with speech recognition7 7 Infineon internal market research page 17 of 20 2002 26 1.050.000 4.875.000 5.925.000 2003 50 1.400.000 7.000.000 8.400.000 4.6 PDAs The estimated market data for these handheld devices by region can be found in the market study from Dataquest of June 1999. Handheld Computer Shipments By Platform And Region, 1998 To 2003 (Thousands of Units) CAGR (%) 1998 1999 2000 2001 2002 2003 1998-2002 1206 495 1635 764 2037 986 2710 1331 3521 1720 4220 2088 28.5 33.3 77 353 72 217 59 153 57 142 53 146 49 137 -8.7 -17.2 PalmOS Windows CE 256 181 477 442 703 734 1074 1018 1515 1357 1923 1780 49.7 57.9 EPOC Others 398 139 539 152 573 139 588 156 629 193 744 279 13.3 15 8 66 37 229 104 423 210 647 370 958 614 1324 138.3 82 5 13 21 30 170 221 193 198 181 155 -1.8 PalmOS Windows CE 32 42 66 110 108 200 167 310 244 442 336 618 60 71 EPOC Others 26 66 40 78 45 57 52 39 62 33 80 32 25.2 -13.6 104 19 168 65 218 109 283 177 362 255 469 368 35.2 80.5 23 66 41 77 57 84 75 83 95 93 126 68 40.7 0.4 United States PalmOS Windows CE EPOC Others Western Europe Japan PalmOS Windows CE EPOC Others 36 NA Asia/Pacific ROW PalmOS Windows CE EPOC Others Source: Dataquest (June 1999) CAGR (%) Total World Palm OS Windows CE EPOC Others 1998 1999 2000 2001 2002 2003 1998-2002 1,606 803 524 794 2,383 1,610 697 745 3,170 2,452 747 626 4,444 3,483 793 618 6,012 4,732 869 646 7,562 6,178 1,035 671 36% 50% 15% -3% Grand Totals by Year 3,727 5,435 6,995 9,338 12,259 15,446 33% page 18 of 20 Adding the units of PDAs for the different platforms lead to the market size shown in Tab. 1. 2000 2.149.000 4.846.000 Western Europe ROW 2002 2003 3.694.000 4.726.000 8.565.000 10.720.000 Tab.1 Units of PDAs with and without speech recognition Percentage of units sold featuring speech technology 5% 5% 5% 4% 4% 3% 3% 3% 2% 2% 1% 1% 1% 0% 0% 2000 2001 2002 2003 Tab.2 Estimated percentage of speech enabled PDAs is given over time. Based on the estimated percentage given in Tab. 2 the estimated market volume of voice driven PDAs is given in Tab. 3. Europe ROW 2000 0 0 2002 110.820 256.950 2003 236.300 536.000 Tab3. Market size( in units ) of voice driven PDAs page 19 of 20 Remarks West Europe 5 Overview Market Volume and Conclusions In table 1 an overview of the different market segments is given. Due to the uncertainty of this just starting market these figures give only a very rough estimate as explained in chapter 4. Nevertheless they give an impression of the market potential. Market Segment 1. Mobile phones 2. Information kiosks 3. Audio/video devices 4. Automotive devices 5. Toys 6. PDAs Area Europe ROW Europe ROW Europe ROW Europe ROW Europe ROW Europe ROW Year 2000 750.000 1.300.000 7.000 7.000 0 0 240.000 560.000 200.000 1.000.000 0 0 Year 2002 4.300.000 8.400.000 11.000 11.000 240.000 600.000 960.000 2.200.000 1.000.000 4.900.000 111.000 257.000 Year 2003 Comments 9.900.000 20.000.000 22.000 22.000 US only 710.000 1.100.000 US & Japan 1.900.000 4.500.000 1.400.000 7.000.000 236.000 West Europe 536.000 Tab. 1: Market volume of voice driven consumer devices. The market Volume is treated in units of devices. Clearly the segment of mobile phones dominates the market followed by the market of toys and automotive devices. Concerning the specification of the SPEECON databases mobile phones demand recordings in noisy public, home environments and rather quiet office environments, toys demand for recordings of children and automotive devices for recordings in car. In the other deliverables of WP1 and WP2 these issues are regarded. page 20 of 20