Abstract Acknoledgement Table of Contents Chapter 1 Introduction 1.1 Motivation The invention of mobile telephone around 34 years ago was a big step in mobility. It gives the users a great degree of freedom to move around while talking on the phone. They can communicate with other users at anytime and from anywhere. The mobile phones were not enough. The same degree of mobility was needed for the internet users so they would not need to stick to their stationary computers when they need to be connected to the internet. Wire Less Local Area Network (W-LAN) made it possible to achieve a reasonable degree of mobility in a building environment. This is achieved by installing a number of Access Points (AP) on an existing Wide Area Network (WAN) and a wireless Internet card on the mobile device such as a laptop computer or a PDA. There already exists a W-LAN at the ITU and also the facility of positioning or locating devices is provided by the Ekahua Positioning Engine. These technologies made it possible to build various interesting location based applications at the ITU. Good examples of such applications are Messaging Services, Find Friends and so on developed by students at the ITU. A new concept which is attracting more attention is combination of Speech with the existing location based applications running on wireless devices. A number of application in this area have already be implemented and these projects have shown good results both by use of IBM ViaVoice Speech Engine (a commercial product) and the use of Speech Recognizer developed by the students at ITU developed in Java programming language. These projects build the foundation and attract attention to do more research in this area. These applications have proven the usability and importance of human computer interaction both for the system control and speech dialogs. Having all these basic technologies and my interest in this field I found it a good opportunity to develop the existing applications to provide more freedom and mobility to the users. The Location Based Interactive Speech System (LBISS) is planed to be fully controlled by speech. 1.2 Problem Definition The fundamental concept of this project is to combine the speech technology with the positioning technology and to build an application which is fully controlled by speech. The application will be divided into client part that will be running on a mobile device such as a laptop computer or in the future on a Personal Digital Assistant (PDA); this is because PDAs do not support speech input at the moment. The speech recognizer engine together with the speech synthesizer engines will be placed on the client side of the application. The synthesizer part of the speech engine will be responsible to speak the synthesized speech to the client providing the client with the information intended to be delivered to the client. This covers most part of the dialog between the client and the application. In order to increase the speed and functionality of the application and to prevent errors in the application the system will also, through the synthesized speech, be providing the key words (sort of instructions and choices) relevant to the information the system can provide to the client. The recognizer will then be listening to the user to say the words and recognize them. These words will be limited to the words in the Grammar file used in the system, but will be sufficient enough to make the dialog meaningful and complete. The number of the words in the grammar file solely depends on the choices and selection of services that the system will be providing to the client. These recognized words could vary from browsing commands, control commands and service request commands. The server side, on the other hand, will be running on a stationary server. When the user connects to the system the server side of the application will first determine the location of the user. This is done by the application having a connection to the positioning engine. As soon as the user signs in providing the user name and password to system, he will be authenticated by checking the registry of the registered users. A separate XML file will be provided for registering the user. At this moment LBISS is defined to register only staff and students of ITU, but as a real life application the categories of registered users could be expanded. According to the location and the identity of the user the application will then start the relevant dialog with the user. This means that the content of dialog will be filtered according to the user’s location and identity. In this research and development phase the application will have only two services that could, by almost no modification, be expanded to variety of services. The first type of service will be delivering information, via speech dialog, about the services provided in a specific location defined in LBISS (i.e. an office such as a reception, student’s administration, computer lab and so on). At this moment, for test purposes, only Reception and Exam Office are defined in LBISS. So any user connected and using LBISS will get relevant and pre-defined information about these locations. This information will also be filtered by the user identity. This means that a student registered in the LBISS will get different information than a guest who is not registered in LBISS. The second type of the service consists of Reminder and Event New that do not solely depend on location. Any employee registered in LBISS would be able to set their own reminders based on time. LBISS will be checking the time of the reminder continuously and will speak the contents of the reminder to the user as a synthesis speech. Event new is Institution specific information, in the scope of this LBISS ITU specific information that is intended for all the users excluding the quest. This could replace the newsletters and group mails. More specifically news about the annual party that invites all the students and staff of ITU could be delivered by this service of LBISS. Just after the log in phase the LBISS will check if there is any Event Information. In case there is such information LBISS will communicate this information. As mentioned before this is service is location independent and will be communicated to the user as soon as he connects to the system no independent of his location. All location dependent services or information will fall in the fist category of the services. In this development and research phase, LBISS is defined only for two locations at ITU namely Reception and Exam Office and only a couple of students and staff are registered, but it could, with very little or almost no modification, be used in a variety of areas such as offices, museums, supermarkets and schools to provide the user with audio localized information based on their current location. The greatest advantage of this application is that the user does not have to monitor the screen of the mobile device or take any other action to extract the information. All the users have to do is to communicate to the system by speech through the microphone and has the freedom to move around free hands. To content of information could very easily be changed and updated according to the area where LBISS is used. 1.3. Contribution Since the installation of W-LAN and Ekahua positioning engine students at the ITU developed a good number of location based applications. These applications rang from sending SMS to finding friends, audio streaming, multiplayer games, finding the shortest path and so on. Each application target a specific group and explore a different aspect and usability of W-LAN and the positioning engine. A course project “Tracking the Position of Mobile Clients by Mobile Phones” developed by myself and Hilmi Olgun explore the use of W-LAN and the positioning engine together with Java TM 2 Micro Edition (MJ2ME) to view the location and movements of targeted connected mobile user on the map of the area plotted in the screen of a mobile phone. Based in the scenario of the project this application is of particular importance in large targeted area where one would like to locate a friend of a co-worker. The application offers a great degree of mobility and a good degree of accuracy. Multiplayer game, audio streaming and other application mentioned above are examples of good applications built on the W-LAN and positioning engines, but I will leave that to the reader to explore.1 Finding the shortest path is another good example of such applications where the mobile device user provides the application with the destination name and the system will calculate and find the shortest path to that location. 1 Description of location based applications developed by students at ITU could be found at: http://positionserver.itu.dk:8000 In addition to the above application there have also been a number of projects where the students have explored, experimented and developed application with combination of W-LAN, positioning engine and Speech Technology. As and example I could talk about “User Authentication and Information Exchange by speech and Location Information in Mobile Systems” (UA) (find the revised title)a Master thesis by Emad El-Dean El-Akehal and “Position Dependent Communication System” (PDCS) a Master thesis by Thomas Lynge. In UA Emad El-Dean El-Akehal has built a recognizer in Java programming language which will identify and authenticate the users based on there voice. This voice biometrics is excellent mean of authentication that eliminates the chances of and opponent trying to steel ones identity. Furthermore, UA uses its speech recognizer for browsing command and service requests. The biometrics (identification) will again be used to filter the services to users. This means that as a result of the identification users of different categories (i.e. registered and unregistered or authorized and unauthorized) user will get different services according to their identity and location. UA provides two services in this stage namely Messaging where a user can leave or get messages in a specific location in a specific time and Find Friend where a user can track and find location of a registered friend. The PDCS on the other hand uses the IBM ViaVoice speech engine developed by IBM. This engine has both the recognizer and synthesizer so the users listen to and speak with the system. PDCS provides the shortest path service. In this application the system gets the current location of the user from the positioning server and the user is required to say (by speech) the name of the destination. The system will calculate and find the shortest path to the destination and guide the user (by synthesis speech) through the path and will notify the user when he arrives to the destination. LBISS, on the other hand, uses the same technologies to build a fully speech controlled system. All the browsing commands, control command and service requests are by accomplish by user speech. The only single area where the user needs to use system input (keyboard) is the log in process where the user has to enter the user name and password. LBISS provides a location based interactive information systems where when the user arrives to predefined location LBISS will start a speech dialog with the user. An example dialog could be as the flowing: LBISS: Welcome. This is reception. We provide information about Meetings and admission. What do you require? User: Meeting LBISS: Who do you have meeting with? User: John In this example dialog the system will compare the user name that is retrieved from the log in text box against the list of meetings for John and check the date and time of the meeting. LBISS will reply to the user whether he has a meeting or not. In the same manner LBISS will provide localized information to the user in a fully speech dialog about any location that is defined in LBISS. The volume of dialog between the user and LBISS depends on the amount of information provided to the system which should be conveyed to the user. In case of reminders LBISS will authenticate the user in the log on phase. While the user is a registered user and has an allocated reminder LBISS will fist check his reminder time against the current time and notify the user at the exact time registered in the reminder. To my up to date knowledge while writing this paragraph this is the fist fully speech controlled application experimented and developed at the ITU. In its experimental and development phase LBISS uses no security protocols to protect the user name and particularly the password. In future work, the voice biometrics of UA could be combined with LBISS or any encryption method could be used to ensure secure exchange of user name and password. On the other hand if we recall the dialog above one could guess that it would be very useful if the user is then guided to the room of the person he has meeting with. In this particular situation and in similar situations like that as a next step PDCS could be combined with LBISS. As a result LBISS will be authenticating the user, listening to the speaker requests and finally if the user needs to find the required area (a particular room) the system will then use the shortest path method of PDCS. It is of great advantage the both LBISS and PDCS use the same basic technologies and could be integrated with very little modification. As a closing paragraph for this section I would like to say that the combination of all these tree projects, LBISS, UA and PDCS will result in an ideal application for any type of location based speech communication system. It is very easy to see that, by doing so, all the aspects of a useful and realistic location based speech application could be covered. 1.4. Report Guide This report mainly consists of two parts: Part 1 and part 2 Part 1 This part covers the introduction to this report and the a short but general discussion about the basic technologies used in LBISS. Part 2 This part is LBISS specific which covers all aspects of LBISS including its programming, specific technologies used, tests and results and finally the conclusion. The whole report is then divided into 6 chapters. Chapter 1: I assume that by now you have almost finished chapter 1 of this report. The main focus of this chapter is the motivation, problem definition and finally what would be the contribution of LBISS to the existing research and work in this area. Chapter 2: This chapter will walk you through the basics and fundamentals of the technologies used in LBISS. You will find general information about these technologies and the topics would not be directly related to the specific points of the technologies used in LBISS. Chapter 3: This is the start of LBISS in specific words. It will go through the design steps of LBISS and the specific points about the technologies used in LBISS and to some extend arguments on why they have been selected to be used in LBISS. Chapter 4: This chapter will specifically go through the steps taken in the programming part and will explain how each component of LBISS was designed and implemented in the programming language. The programmers in particular those who would like to develop LBISS would find this section very interesting and useful. Chapter 5: This chapter will include different test performed on LBISS including the source code tests, usability tests and so on. It would also describe the results of the tests together with the areas of LBISS that would need improvements. Chapter 6: Finally this chapter will cover the conclusion about the work done in LBISS. And it will be followed by the reference lists and appendices. Chapter 2 Basic Technologies The main technologies, as mentioned in the problem definition section, for this project are Positioning technologies and Speech technologies. This chapter will go through the basics of these technologies together with XML which is used here to store data required for LBISS. But prior to that I would like to shortly explain what specific speech and positioning technologies are used in LBISS. The LBISS is a fully speech controlled system therefore it needs both Speech Recognizer and Speech Synthesizer. It is possible, but really demanding task, to build both Speech Synthesizer and a Speech Recognizer with a large vocabulary that is also able to work efficiently in noisy environment. Besides that is not the scope of this project. Instead LBISS is uses an existing speech engine, IBM ViaVoice that has both Speech Synthesizer and a Speech Recognizer. The speech recognizer part of the engine has a large vocabulary both in American and British English with a couple of other languages. LBISS is configured to use American English which will fulfill all the demands of this project. To use the IBM ViaVoice engine, the Java Speech API (JSPAI) is required. Java Speech API, a software interface is developed by Sun Microsystems which is supported by the IBM ViaVoice that makes it possible to use both the Speech Recognizer and Speech Synthesizer of the IBM ViaVoice. The speech synthesizer in LBISS is also configured to work with American English. To make the speech output of the synthesizer more like natural and human voice, LBISS uses Java Speech Mark up Language (JSML) in its default messages. The second technology used in LBISS is positioning. LBISS takes advantage of the existing Ekahau Positioning Engine that is installed at the IT University where LBISS will be tested. It is possible to Ekahau positioning engine through a Java interface makes it possible to track and give the position of a mobile device connected to system. The LBISS uses XML for storing data relevant to the system. SAX and DOM Parsers are used in order to access and read the data from the XML file. SAX and DOM are defined in …… developed by …………… give more details. 2.1 Positioning Technologies Using Wireless technology simply means greater mobility and freedom. Of course the degree of mobility and freedom to move around solely depends on the application and the technology being used. The use of wireless technologies in its turn arise new concepts and give birth to new applications particularly when the issue of positioning is the subject of the matter. Once the position of a mobile device is determined a variety of location based applications could be built on top of the wireless technologies. In this case the mobile user will be getting position dependant information. Thus, positioning plays an important role in today and future wireless applications as it is an excellent filter for providing localized information. This increases the value and relevance of information and services and therefore will increase usage. This has a particular importance when the mobile device user (with location based application) is moving in rich information domains for instance, supermarket, etc. Then the position dependant application will filter and select the flow of information to the interest of the mobile user. Now let’s ask ourselves a simple question, what is positioning? Positioning (location detecting) is a functionality, which detects a geographical location of any physical object, where in this context we are interested in a mobile device that could vary from a cellular telephone to a PDA, laptop computer or even a moving car. Among these, cellular technology, due to its wide acceptance and coverage, has attracted significant interest in the tracking industry. It could be said that for applications with infrequent position reporting rates and requiring voice communications, a cellular system is ideal. But still there is possibility of building attractive application with other devices named earlier. According to their functionalities, positioning technologies could be generally classified as outdoor and indoor positioning [1]. Outdoor Technologies: Wide positioning system which could be using an earthbound mobile network or a satellite navigation system. GPS is one of the biggest and internationally used and recognized systems that are used for outdoor positioning. Indoor Technologies: This system mostly uses the same principle with some sensors inside the building to get the signal of the wireless device and by performing some algebraic calculation determine the position of the wireless devices. There are different systems performing both the indoor and outdoor positioning that could be generally classified in the following groups: Cellular Network-Based Methods Space-Based Radio Navigation Systems WLAN and Short Range Connectivity Systems 2.1.1. Cellular Network-Based Methods Cellular Network-Based positioning methods gained a high interest in today’s technologies. It has been implemented and running in a number of countries namely Japan, USA etc. The applications provide a variety of services listed in previous table. Within this positioning system there is a subset of methods such as: Cell ID CI + Timing advance Time of Arrival etc Each system provides a different degree of accuracy in positioning. But it could be said that in this class of positioning the accuracy of positioning generally depends on the size of the cell to which the mobile station belongs at the time of position estimation. As an example Cell ID method of positioning will be shortly explained here. Cell ID Cell ID positioning, a pure network-based method, is the oldest and simplest way to locate a mobile station. In this technology all operators know where their cells are. Each cell has an ID and when a mobile device is connected to a particular cell, the cell ID, then represents the location of the mobile station. The estimated location of the mobile station is calculated as the mass center of the geographical area covered by the cell. The positioning estimation in this method is not accurate for many location based since the size of the cell could exceed many kilometers. In the current GSM NWs the cell size varies typically from 200m to 35km which is the main factor in the positioning accuracy. To provide more accurate position estimation other cellular positioning methods are used, though, in future CI may provide an accurate way to locate the MS in environments with very small cell sizes. [1] The following figure show an example of CI + Timing advance which provides much better position estimation than the pure CI method. Any individual cell in this figure could be seen as a pure CI method from where it could be clearly seen that even if the mobile device is at the extreme end of the cell, the CI method will estimate its position as the mass center of the cell. [1] 2.1.2. Space-Based Radio Navigation Systems Another widely used positioning technology is the Space-Based Radio Navigation System. This technology has a global coverage and generally the positioning is done by the use of satellites. Involving satellites, it is considered to be very expensive to start and keep running. Therefore, currently not many systems of this kind exist. So far, there are only three Space-Based Radio Navigation Systems namely [1]: Global Positioning System (GPS): Developed and maintained by the US Department of Defense which is fully operational since April 1995. The commercial receivers are also widely available in the market. (GALILEO): An European system that is still under construction and is planed to be operational in 2008. Global Navigation Satellite System (GLONASS): This is a Russian system that is operational since January 1996, but not fully available today because of the economical problems in Russia. We will explain the Global Positioning System in details and leave the other two systems for interested readers. Global Positioning System (GPS) Global Positioning System (GPS) navigation, an all weather, worldwide precise positioning system has been an integrated part of wireless data communication on land, in air and at sea. Starting out as an exclusively US military navigation system in1978 it has been fully operating since 1995 even for civil use. Small, inexpensive GPS-navigators have been available for years now. Satellite navigation has two major advantages over its earthbound alternative: the positioning accuracy is very high and is publicly available to anyone since no telecom operator owns it. Instead it is controlled by Pentagon, which can possibly prove to be an even worse alternative if they decide to shut it down. To meet this possible risk EU is working on setting up a parallel global satellite navigation system the GALILEO. Fundamental concept in GPS applications is that one can determine fairly accurately the location of any device that has a GPS transceiver mounted inside the device and has a clear sight to the stars. This determination of location is facilitated by a series of satellites. GPS basically consists of 3 segments. The Space segment: This part consists of specific GPS satellites launched in specific orbits with orbital periods of 12 hours. There are currently around 24 satellites that orbit the Earth . The Control segment: This part consists of the master control station at Colorado Springs along with few other stations positioned around the globe to monitor and maintain the condition and orbits of the GPS satellites. User segment: It consists of the military and the commercial users who use the GPS receivers to track signals from GPS satellites. These devices use the triangulation technique to find out their position on the surface of the earth. A standalone GPS receiver is capable of giving accuracies of about 100m. The satellites are synchronized to send out, or transmit encoded navigational information in bit streams which contains the satellite ID, GPS time and some other parameters. Any device equipped with a GPS receiver will intercept these transmissions. Using a simple mathematical formula derived from triangulation (Triangulation in the case of GPS is collecting of signals from three or more satellites in carefully monitored orbit from which the receiver computes its own spatial relationship to each satellite to determine its position). The fundamental task is to calculate the distance between the mobile station and the satellite. In mathematical terms this could be expressed as the following [4]: Pj = ( (xj – xu)2 + (yj – yu)2 + (zj – zu)2 )1/2 + ctu where Pj is the measured range tu is the user device time offset from the satellite c is speed of light (xj, yj, zj) is the coordinates of the satellite (xu, yu, zu) is the coordinates of the user device (mobile station) and The receiver should at least detect signal from four satellites to determine a 3-D position estimation. Though, signals from three satellites would be enough to obtain a 2D estimation. As mentioned before the mobile station should have a clear sight to stars. This means that GPS does not provide good coverage in the urban area and indoors. That is because the GPS signal strength is too low to penetrate a building [6]. In order to efficiently use GPS for indoor and urban positioning some indoor GPS solutions are applied. Mobile phones with GPS devices are available to provide indoor coverage in the GPS systems. Among the solutions is the use of Assisted GPS (A-GPS). In this case the positioning system consists of a targeted wireless device with partial GPS receiver, an A-GPS server that has a reference GPS receiver which can receive signals from the same satellites that the mobile device receives signals from and a wireless network infrastructure that connects the mobile device and the A-GPS server. The GPS receiver of the server has a clear sight to the stars and hence can get direct signals from the satellites. The A-GPS server uses its reference GPS receiver to continuously gather information related to the mobile device and then it transmits that assistance information over the wireless network to the mobile device in order to for the mobile device to find its more accurate position.[5] More over, future GPS satellites will broadcast three civil signals rather than just one. For the most part, civil use of GPS today is based on only one frequency, fL1=1575.42 MHz. In the future, two new signals at fL2=1227.60 MHz and fL5=1175 MHz will be available. These new signals will begin to be available starting in 2003 or so. They will provide so-called frequency diversity, which will help to mitigate the effect of multi-path. In other words, multi-path may interfere with one signal, but is much less likely to simultaneously degrade all three signals. In addition, the new signal at fL5 has been designed to be more powerful and to give better performance in multi-path environments. [7]. And finally GALILEO will contribute to the indoor positing. The signals from Galileo are being designed with use in cities and indoors as prime objectives. Galileo and GPS will be two components of a worldwide Global Navigation Satellite System (GNSS). Taken together, they will virtually ensure that several satellite signals are available even to users in tough environments [7]. 2.1.3. WLAN and Short Range Connectivity Systems The Internet has become main stream, with millions of households wired, though, for many accessing the Web means staying close to a PC. But time is changing and in the very near future internet access will look much different, transformed by the phenomenon of wireless Web. Just like the cell phones the wireless Web promises to make Internet access available whenever and wherever we need it. And in fact the mobile phone phenomenon has set the stage for widespread deployment of wireless Web services. Today’s installed base of cell phones can become the most widely available platform for mobile Internet services because enabling a mobile phone for Internet access is low cost software up grade. For indoor applications Wireless Local Area Networks (WLAN), Short Range Connectivity Systems (e.g. BlueTooth) and other Radio Frequency Location Systems (RFLS) offer interesting solution. WLAN access point can be set up and identified very easily in any office, mall, airport and any other point of interest. This technology is not very expensive because they could be offered as additional service built on existing networks with practically no extra costs. The only remaining items to complete the purpose of positioning would be installation of a positioning server integrated with the W-LAN. The task of a positioning engine in this case would be to collect and provide the location information through the access points to any application that provides location based services. As mentioned before many technologies both indoor and outdoor exist to provide collect such data. The GeoMode for instance is used for outdoor to collect the location information of mobile devices. GeoMode recently developed and successfully tried by Digital Earth System researchers demonstrates and accuracy of less that 20 meters and can be installed on a Serving Mobile Location Center of a Gateway Location Mobile Server or even on a GeoMode ASP Server. It ertreives the location data which is sent from the mobile station to the base station on regular bases and turn it into useful format for any location based service application.2 On the other hand the Ekakau positioning engine, developed by the Coplex Systems Computation group, is used for indoor positioning. It is based on Calibration where each sampled point contains received signal intensity (RSSI) and related map coordinate stored in an area-specific Positioning Model and returns location coordinates (x, y, floor). The greatest advantage of Ekahau positioning engine is that it is only software positioning server that require no hardware beyond a standard 802.11. It requires at least a Pentium II, 128 MB RAM, 200 MB HD and works with Windows 2000/XP and also works with other platforms such as Linux. The new TCP based Yax Protocol makes Ekahau support any platform and programming language and hence makes Ekahau available and easy to implement for any application.3 The only limitation that the W-LAN based positioning systems are facing is the shortage of IP addresses in the present time. This problem will remain unsolved until IP version 6 protocol is deployed. In the mean time it is also planed that in the future, subscribers may also have the option to use multiple devices with a single IP address. For example, a subscriber may use four different devices in her home office with one bay station (one “address” that goes out of the building). All the devices talk to the bay station, the bay station transmits to the rest of the world, and she receives one monthly bill. 2.1.4. Location Application Services The main goal is to deliver, to the mobile terminal, specific information according to the geographical location of the mobile device. This is normally based on knowing the position of a user with their 2 3 From: http://www.geomode.net From: http://www.ekahua.com mobile terminal. These services can be delivered through a wide range of devices, including wireless phones, PDAs, laptop computer and other devices attached to other moveable items such as people, packages and in-vehicle tracking devices and other types of mobile terminals. There is a wide range of information that could be delivered to the end-user application. The following bullet points give a short summary of the information according to the geographical location and needs and requirements of the mobile user.4 Positions: Fixed locations. Expressed in terms of coordinates, positions on a map, named places, and so on. Events: Time-dependent incidents at one or more locations. Distributions: The densities, frequencies, patterns, and trends of people, objects or events within a given area(s). Service Points: Points of service delivery. May also differ according to the interests of the user. Routes: Navigational information expressed in terms of coordinates, directions, named streets and distances, landmarks, and/or other navigation aids. Context/Overview: Maps, charts, three-dimensional scenes or other means for representing the context and relationships between people, objects and events for a given area(s). Transactions: Transactions for the exchange of goods, services, securities, etc. Trading services, Financial services. Sites: Characteristics of a given site. 2.1.5. Accuracy Depending on the application and area of use accuracy of location should be paid high attention to offer good and reliable service. For this purpose for most application to perform well according to the needs of users a lot of research has been done and more underway to find efficient ways to determine more accurate and precise location of the mobile user. For instance, for an indoor positioning application in a museum a very high accuracy is required since the content of information could change in the course of centimeters. On the other hand, for an outdoor application where the target is location of a passenger bus on it’s way form Paris to Copenhagen an accuracy of up to 100s of meters should be acceptable. 2.1.6. Privacy Location Dependent Services rely on the awareness of the end-user’s location in the mobile network to provide location relevant content and services. For example, a map can be displayed on a multimedia 4 This section is taken from: Location Services for Mobile Terminals, Samuli Niiranen, DMI/ TUT, stn@cs.tut.fi enabled mobile terminal, showing not only the actual location of the user but also that of other mobile users. This is of great advantage in many cases where privacy is not a matter. Examples are: At a university compass where the positioning system shows the location of classes, meeting rooms, library, administration offices and so on At a firm where the mobile workers need to be aware of the location of other mobile workers or equipment in order to monitor all the processes A museum, a very good application area but requiring a very high accuracy of ranging in centimeters, where clients have mobile unites and the application provides information according to their location ( for example when they stand in front of a painting the application will provide the information about the painting based on the location). The same application could be used to give tourist information in the city. To mention is that accuracy up to 10s of meters could be acceptable for this outdoor application. In this case use of GPS could be a good option. On the other hand, if the networks show both the location of the user and the other mobile users, this could not be fully acceptable in the local population. The reason is that the mobile users will never have any privacy and secrecy because their location could always be seen by other users. And that location has a context and gives not only give the coordinates on the geographical world but also convey some context information. As explained before someone being in a location in a particular time could mean performing a particular task or doing something obvious. Then a good application could be where the subscribers can decide themselves whether they want to use Location Dependent Services or let others know their whereabouts. Fortunately, this feature has been considered by providers in a number of countries. AT&T Wireless in the United States, for instance, has implemented location based applications in the mobile phones. Andre Dahan president of mobile multimedia services of AT&T Wireless says "We have a comprehensive privacy policy and our customers' information -including their geographical location -- is theirs to share with whom they want." Perry said users of the service could prevent being located by turning their phones off or using an "invisible" setting. Customers can also change their lists or "revoke a friend". ************************************************************************** 2.2 Extensible Markup Language XML sdgsdgsdgfsdf XML is just like a simple text so it can obviously be moved between different platforms, but more important is that XML should conform to a specification defined by World Wide Web Consortium (W3C)5. This in fact means that XML is a standard [2]. One of the most important layers to any XML-aware application is the XML parser. An XML parser handles the most important task of taking a raw XML document as input and making sense of the document. The parser will make sure that the document is well formatted and valid according to the DTD of schema if defined in the XML document. The parsed result will typically be a data structure that can be manipulated and handled by other XML tools of Java APIs. There are no set roles which parser to use but most often the speed of and XML parser becomes more important when the size and complexity of XML document grows. And also it is important that the parser conforms to the XML specification. ******************** There are many low-cost Word to XML converters available in the market, and a number of them support a twostage conversion process, first from Word into XML, and secondly from XML into HTML, using the XSLT stylesheet language. Because XML information is structured, it is easily manipulated, so converting XML into accessible HTML is achievable. You do have to write the XSLT stylesheet yourself, but this is a once-off task, and allows you to clean up the initial XML mark-up when converting to HTML. The following stand-alone tools support this two-stage process by default. From: Campbell, Eoin “Maintaining accessible websites with Microsoft Word and XML” XML Europe Conference, England, May 2003 ***************** XML Document (Well formatted XML Document) 5 W3C specification can be found at http://www.w3.org Any computer language in the computer science has its own format which is acceptable for the system to interpret, read and get the desired data from. An HTML page has its own tags; when tags not in the right order, with some exceptions, the application will not be able to display the contents in the intended format.This applies to an XML documents too. Thought XML gives the programmers a great degree of freedom in choosing the self-explanatory tags still the document should be well formatted meaning that it should match the predefined XML document structure. The structure of an XML document could also be defined by an internal source such as a DTD (Document Type Description. It is worth mentioning that the use of a DTD is not mandatory. Instead it is used for means of good performance. So as rule of thumb it could be said that an XML document is valid if it matches all internal structure descriptions along with the predefined XML document structure. ******************************************************* (XML as Database) A database could be thought of a container for data in the sense that it is programmatically easy to access and retrieve the data from. In practice all the data in the database is set in named tables and each entry with an ID. These references will then make is easy to access the data easily through some database query methods. As a matter of fact all files contain data in them and so does an XML document. An XML file is much similar to a database in the sense that it contains data that is placed in nodes and attributes. Each node and attribute has a programmer-defined-self-explanatory name. As an added advantage the data in an XML document could also be viewed as three or graph format. On the other hand, due to parsing and text conversions, the access to data in an XML document is slower compared to a database. From XML and Databases Copyright 1999-2003 by Ronald Bourret Last updated July, 2003 http://www.rpbourret.com/xml/XMLAndDatabases.htm#isxmladatabase ************************************************************* To access data in a database we first need to make a connection to the database through the driver the user is using. try { con = DriverManger.getConnection(“database address”); } ... Once this connection made then the some queries could be send to the data base to request specific data. An example database query could be: ... String query = "select Meetings, John" +”from Reception where Meeting_with = ‘Abdul Azim Saleh’”; Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery(query); ... This query will search for the desired table named Reception in the database and retrieve the desired data. On the other hand, XML document is treated just as a simple file by JAVA. To read or access an XML file we need no to open the stream to the specified XML document in the right path. ... File xmlFile = new File(“ReceptionServices.xml”); ... Once the file is accessible other methods are called to access and retrieve the desired data from the XML document. Another alternative for accessing different parts (nodes and attributes) is the use of XPath. From IBM website Developers works> xml zone Article name: Building XML Application Step 2 Generating XML from a Data Store ********************** XPath as could be seen from its name is to access and address nodes and attributes of an XML document in a hierarchical order and provides some facilities to manipulate Strings, Numbers and Booleans. In addition XPath provides facilities to check whether or not a specific node or attribute matches some predefined patterns. Same as DOM (Document Object Model) and SAX tree the XPath, too, models an XML document in a tree structure of nodes. XPath can fully differ between different types of nodes (Element node, attribute node and text nodes) and it reflects this information in the XPath tree model. XPath also supports XML Namespaces and thus models both the Namespace and the local part of the node name. from http://www.w3.org/TR/xpath see this page for more information ****************************** SAX SAX or SAX parser is a free standard which is defined by a Java implementation. It is among one of the most used programming interfaces used in XML related applications. SAX, basically, is a set of interfaces that through it well-known methods make it possible to process a document (a file or a heretical tree) and act according to the predefined methods in the application when it meets or reaches the application specific start tags, attributes and definitions. In other words, a programmer can get to and retrieve data from any specific part of a document and hence pass this data to any other part of the application or write to a file stream for further use. The freedom in the defining the tags and attributes of XML file are very useful in this content. That is, a programmer could name any nodes of the same structure with differently which makes it possible to access and retrieve the data from each node separately. From ( White Paper by Bird Step Technologies, Norway September 13 2002 http://www.birdstep.com/collaterals/rdmxwhitepaper.pdf ) *********************************** un formatted text****************** XML is the meta language defined by the World Wide Web Consortium (W3C) that can be used to describe a broad range of hierarchical mark up languages. It is a set of rules, guidelines, and conventions for describing structured data in a plain text, editable file. Using a text format instead of a binary format allows the programmer or even an end user to look at or utilize the data without relying on the program that produced it. However the primary producer and consumer of XML data is the computer program and not the end-user. Like HTML, XML makes use of tags and attributes. Tags are words bracketed by the ’<’ and ’>’ characters and attributes are strings of the form ’name="value"’ that are inside of tags. While HTML specifies what each tag and attribute means, as well as their presentation attributes in a browser, XML uses tags only to delimit pieces of data and leaves the interpretation of the data to the application that uses it. In other words, XML defines only the structure of the document and does not define any of the presentation semantics of that document. Development of XML started in 1996 leading to a W3C Recommendation in February of 1998. However, the technology is not entirely new. It is based on SGML (Standard Generalized Markup Language) which was developed in the early 1980’s and became an ISO standard in 1986. SGML has been widely used for large documentation projects and there is a large community that has experience working with SGML. The designers of XML took the best parts of SGML, used their experience as a guide and produced a technology that is just as powerful as SGML, but much simpler and easier to use. XML-based documents can be used in a wide variety of applications including vertical markets, e-commerce, businesstobusiness communication, and enterprise application messaging. From: Java API for XML Processing Rajiv Mordani James Duncan Davidson Scott Boag (Lotus) Sun Microsystems, Inc. 901 San Antonio Road Palo Alto CA 94303 USA 650 960-1300 November 18, 2000 ************************************************************************** In order to present your data (Excel file, Access database etc.) to the user in an attractive way in browser, mobile phone or PDF format, the original data must first be converted to the necessary XML formats. XML documents and data can be available in almost any structure whatsoever. Everyone can define his or her own XML document structure and describe it. However, a standard database table (Excel spreadsheet, Oracle table/view) is a plane, two-dimensional representation of data content. When you convert your data source table to XML format you get a plane column-row shaped XML output. 2.3 Speech Technologies Keyboards remain the most popular input device for desktop computers. However, performing input efficiently on a small mobile device is more challenging. This need continues to motivate innovators. Speech interaction on mobile devices has gained in currency over recent years, to the point now where a significant proportion of mobile include some form of speech recognition. The value proposition for speech interaction is clear: it is the most natural human modality, can be performed while mobile and is hands-free. grammar file A grammar defines or lists the set of words, phrases or sentences that the recognizer of a speech application is expecting the user to say. Moreover, the grammar defines the patterns in which these words, phrases and sentences should be said. The grammar is defined in a special format called Java Speech Grammar Format. This file should be provided to the application where in its turn would be read and activated by the speech application. Each set of words, phrases or sentences are put in rules. A single rule has the words that the user is expected to say. The recognizer part of the application will listen to only the words defined in the rule and all other words and background noises will be ignored. A rule could be declared as public in which case it could also be accessed and used by other applications. The names of the rules should be unique in any grammar file meaning that no rule name should repeat. The number of words and phrases in a rule and the number of rule in a grammar file is limited only to the size of the dictionary in the Engine which could reach more than 60,000 words. As a rule of thumb one could say that the larger the grammar the slower and more error porn the recognizer. This is clear from the fact that for a large size of grammar file the speech application would compare the audio input with all the words in the grammar until it finds the match and recognize the word. So it is strongly recommended to keep the grammar short and thin. A grammar file, same as an XML document, should be well formatted and valid, though the system will not check for validation of the grammar file as is done in case of XML document. The format and structure of a grammar file is exactly the same as that of XML document. This will be quite understandable once we know that there is an built-in XML processor in the speech recognizer engine which will process the input speech and the grammar so the XML shape format is a must for a grammar file to ease the work of the recognizer. A grammar file has a heading which declares the version of the grammar. The heading is followed by import statements in case external grammars are used. And then comes the body of the grammar where the rules are defined. Following is an example of a grammar file with a single public rule. grammar javax.speech.demo; public <sentence> = hello world | good morning | hello mighty computer; Feedback and error correction in speech recognizer In everyday human interaction a lot of information is conveyed by bogy language, facial expressions or pauses. Computers on the other hand are blindfolded in this case. An in fact pauses could cause of errors and miss understanding. If the client says something and the computer is slow in the process of recognition the client my repeat the message again resulting in recognition errors. The error to a great extend be handled by some mechanisms of reporting the error messages. For instance, if the recognizer fails to recognizer a word or a sentence the system should not repeat the same error message. In other words, the system could provide progressive assistance. For example for the first error in recognition the system could say “what?” If the error repeats again the error message could be “Please rephrase your words” or something to make the user repeat his message in different words. A much easier mechanism could be to make “yes and no” prompts and leave the grammar file short. Another factor which will contribute to the error handling and elimination is a Natural Dialog Study and the Usability Study. The Natural Dialog Study should be considered prior to designing the recognizer application. Doing so will enable the programmer to have a good grasp of the type of the conversation expected from the user and produce a grammar containing words and phrases that the user will naturally say. It will be much error porn if the grammar has unusual or irrelevant words. The Usability Test should, on the other hand, be carried out once part of the application is produced to check if the words are easy for the speech engine to recognize. This would also include avoiding use of vocabulary too close in pronunciation. A good example of such vocabulary could be “to”, “too” or “2” or vocabulary that is to difficult to pronounce which will cause the user to have pauses or wrong pronunciation. IBM Speech Engine Application users and application builder for long had the fascination to have machines that speak and understand human speech. This has now become a reality with advances in speech technology. Applications now use speech to enhance the users experience and ease the use of applications. The main parts In Java Speech API the term “Speech Engine” refers to a system which deals with speech input and speech output. In fact, both speech recognizer and speech synthesizer are instances of the speech engine. The same could be said about Speaker Verification and Speaker Identification systems. Java Speech API provides a complete implementation of all classes and interfaces used to facilitate the speech engine functionality. All together could be implemented in a fully software or combination of software and hardware for use. Part 2 Chapter 3 Application Design Chapter 4 Implementation Chapter 5 Testes and Results Chapter 6 Conclusion References [1] Samuli Niiranen; “Location Services for Mobile Terminals”; Tampere University of Technology, Department of Information Technology, Finland [2] Brett McLaughlin; “Java & XML” Second Edition; August 2001 [3] Michael Bieg, Tobias Zimmer and Christian Decker; “A location model for communicating and processing of context”; TecO, University of Karlsruhe Vincenz_Priessnitz.Str, 76131 Karlsruhe, Germany [4] Ng Ping Chung; “Positioning of Mobile Devices”, Oresund Summer University, August 2003 [5] DJUKNIC, G. M. AND RICHTON, R.E.; “Geolocation and Assisted GPS” IEEE Computer, 34, 2, pp. 123-125”; 2001 [6] CHEN, G. AND KOTZ, D; “A Survey of Context-Aware Mobile Computing Researc” Dartmouth, Computer Science Technical Report TR2000-381. 2000 [7] Andrew Chou, Wallace Mann, Anant Sahai, Jesse Stone and Ben Van Roy “Improving GPS Coverage and Continuity: Indoors and Downtown” Per Enge, Stanford University Rod Fan and Anil Tiwari, @Road Inc., Enuvis Inc. Appendices The idea of machines that speak and understand human speech has long been a fascination of application users and application builders. With advances in speech technology, this concept has now become a reality. Research projects have evolved and refined speech technology, making it feasible to develop applications that use speech technology to enhance the user's experience. There are two main speech technology concepts -- speech synthesis and speech recognition. The Java Speech API makes only one assumption about the implementation of a JSAPI engine: that it provides a true implementation of the Java classes and interfaces defined by the API. In supporting those classes and interfaces, an engine may completely software-based or may be a combination of software and hardware. The engine may be local to the client computer or remotely operating on a server. The engine may be written entirely as Java software or may be a combination of Java software and native code. The basic processes for using a speech engine in an application are as follows. 1. Identify the application's functional requirements for an engine (e.g, language or dictation capability). 2. Locate and create an engine that meets those functional requirements. 3. Allocate the resources for the engine. 4. Set up the engine. 5. Begin operation of the engine - technically, resume it. 6. Use the engine 7. Deallocate the resources of the engine. 4.2 Properties of a Speech Engine Applications are responsible for determining their functional requirements for a speech synthesizer and/or speech recognizer. For example, an application might determine that it needs a dictation recognizer for the local language or a speech synthesizer for Korean with a female voice. Applications are also responsible for determining behavior when there is no speech engine available with the required features. Based on specific functional requirements, a speech engine can be selected, created, and started. This section explains how the features of a speech engine are used in engine selection, and how those features are handled in Java software. Functional requirements are handled in applications as engine selection properties. Each installed speech synthesizer and speech recognizer is defined by a set of properties. An installed engine may have one or many modes of operation, each defined by a unique set of properties, and encapsulated in a mode descriptor object. The basic engine properties are defined in the EngineModeDesc class. Additional specific properties for speech recognizers and synthesizers are defined by the RecognizerModeDesc and SynthesizerModeDesc classes that are contained in the javax.speech.recognition and javax.speech.synthesis packages respectively. *****************From Java speech Programmers guide*********************** Human centered • Computer - human interaction is currently focused on the computer (computer-centric) Currently computers know little about their environment Where are we? Who is using me? Is the user still there? • Evolving Environment awareness Give computers senses via sensors Environment User identity and presence • You wear your own personal user interface interface can be consistent across all appliances not because each appliance supports the interface, but because the user’s own interface provides consistency • Make the human the focus of the computer’s interaction (human-centric) References From XML and Databases Copyright 1999-2003 by Ronald Bourret Last updated July, 2003 http://www.rpbourret.com/xml/XMLAndDatabases.htm#isxmladatabase 3.8 For More Information (References) The following sources provide additional information on speech user interface design. Fraser, N.M. and G.N. Gilbert, "Simulating Speech Systems," Computer Speech and Language, Vol. 5, Academic Press Limited, 1991. Raman, T.V. Auditory User Interfaces: Towards the Speaking Computer. Kluwer Academic Publishers, Boston, MA, 1997. Roe, D.B. and N.M. Wilpon, editors. Voice Communication Between Humans and Machines. National Academy Press, Washington D.C., 1994. Schmandt, C. Voice Communication with Computers: Conversational Systems . Van Nostrand Reinhold, New York, 1994. Yankelovich, N, G.A. Levow, and M. Marx, "Designing SpeechActs: Issues in Speech User Interfaces," CHI '95 Conference on Human Factors in Computing Systems, Denver, CO, May 7-11, 1995. Using Speech Recognition with Microsoft English Query Ed Hess Speech recognition is a rapidly maturing technology. It's a natural complement to English Query, a package that lets you query a SQL Server database using natural language. My job is to help developers design GUIs from the point of view of the people who will use the software. I'm currently doing research on how speech recognition can enhance the job performance of users in a health care setting. Speech recognition offers certain users the best way to interact with a computer and promises to be the dominant form of human-computer interaction in the near future. The Gartner Group predicts that by 2002, speech recognition and visual browsing capabilities will be integrated into mainstream operating systems. According to a recent survey of more than a thousand chief executives in health care organizations by Deloitte & Touche Consulting, 40% planned to use speech recognition within two years. Recent advances in software speech recognition engines and hardware performance are accelerating the development and acceptance of the technology. Microsoft® invested $45 million in Lernout and Hauspie (http://www.lhs.com) in 1997 to accelerate the growth of speech recognition in Microsoft products. Both IBM/Lotus and Corel are delivering to the market application suites that feature speech recognition. Most people are familiar with speech recognition applications based on dictation grammars, also known as continuous speech recognition. These applications require a large commitment from the user, who has to spend time training the computer and learning to speak in a consistent manner to assure a high degree of accuracy. This is too much of a commitment for the average user, who just wants to sit down and start using a product. Users of this technology tend to be those who must use it or are highly motivated to get it working for some other reason, like people with various physical disabilities. However, there are other forms of speech recognition based on different grammars. These grammars represent short-run solutions that can be used by more general audiences. Grammars A grammar defines the words or phrases that an application can recognize. Speech recognition is based on grammars. An application can perform speech recognition by using three different types of grammars: context-free, dictation, and limited-domain. Each type of grammar uses a different strategy for narrowing the set of sentences it will recognize. Context-free grammar uses rules that predict the next words that might possibly follow the word just spoken, reducing the number of candidates to evaluate in order to make recognition easier. Dictation grammar defines a context for the speaker by identifying the subject of the dictation, the expected language style, and the dictation that's already been performed. Limited-domain grammar does not provide strict syntax structures, but does provide a set of words to recognize. Limited-domain grammar is a hybrid between a context-free grammar and a full dictation grammar. Each grammar has its advantages and disadvantages. Context-free grammars offer a high degree of accuracy with little or no training required and mainstream PC requirements. Their drawback is that they cannot be used for data entry, except from a list of predefined phrases. They do offer a way to begin offering speech capabilities in products without making large demands on users before they understand the benefits of speech recognition. They represent an ideal entry point to begin rolling this technology out to a general audience. You can achieve up to 97% recognition accuracy by implementing commands and very small grammars. Dictation grammars require a much larger investment in time and money for most people to be able to use in any practical way. They deliver speech recognition solutions to the marketplace for those who need them now. Lernout and Hauspie's Clinical Reporter lets physicians use speech recognition to enter clinical notes into a database, then calculates their level of federal compliance. Speech recognition is an excellent fit for clinicians, who are accustomed to dictating patient information and then having transcriptionists type that data into a computer. The feedback from early adopter audiences is helping to accelerate the development of usable speech recognition interfaces. None of the current speech recognition vendors are achieving greater than 95% accuracy with general English dictation grammars. That translates to one mistake for every 20 words, which is probably not acceptable to most people. The problem is further magnified when a user verbally corrects something and their correction is not recognized. Most users will not tolerate this and will give up on the technology. If a more limited dictation grammar is used, levels of accuracy over 95% can be achieved with a motivated user willing to put in months of effort. Limited-domain grammars represent a way to increase speech recognition accuracy and flexibility in certain situations without placing large demands on users. An application might use a limited-domain grammar for the following purposes: Command and control that uses natural language processing to interpret the meaning of the commands Forms data entry in which the scope of the vocabulary is known ahead of time Text entry in which the scope of the vocabulary is known ahead of time This type of grammar could be an interim step between context-free and dictation grammarbased applications. English Query I had been working with Microsoft Agent (http://www.microsoft.com/msagent) for a couple of months before I saw Adam Blum's presentation on Microsoft English Query at Web TechEd. His session inspired me to try hooking speech recognition up to English Query to find information in a SQL Server™ database. I'd been showing around a speech-based Microsoft Agent demo, and many people asked if I could somehow keep the speech recognition, but make the animated character interface optional. Because I wanted to research that while still being able to use different types of speech recognition grammar, I started looking into the Microsoft Speech API (SAPI) SDK version 4.0, which is available at http://www.research.microsoft.com/research/srg/. English Query has two components: the domain editor and the engine. The English Query domain editor (mseqdev.exe) creates an English Query application. An English Query application is a program that lets you retrieve information from a SQL Server database using plain English rather than a formal query language like SQL. For example, you can ask, "How many cars were sold in Pennsylvania last year?" instead of using the following SQL statements: SELECT sum(Orders.Quantity) from Orders, Parts WHERE Orders.State='PA' and Datepart(Orders.Purchase_Date,'Year')='1998' and Parts.PartName='cars' and Orders.Part_ID=Parts.Part_ID An English Query application accepts English commands, statements, and questions as input and determines their meaning. It then writes and executes a database query in SQL and formats the answer. You create an English Query application by defining domain knowledge and compiling it into a file that can be deployed to the user. More information about how to build English Query applications can be found in the article "Add Natural Language Search Capabilities to Your Site with English Query," by Adam Blum (MIND, April 1998). English Query was delivered with SQL Server Version 6.5 Enterprise Edition, and is also part of SQL Server Version 7.0. The English Query engine uses the application to translate English queries into SQL. The Microsoft English Query engine is a COM automation object with no user interface. However, four samples included with English Query provide a convenient UI for Internet, client-based, middletier, or server-based applications. You must install the domain editor to build an English Query application. However, to use an existing English Query application with a client user interface, you need only install the engine. The English Query engine generates SQL for Microsoft SQL Server 6.5 or later; these queries may generate errors on other databases, such as Microsoft Access. With a patient orders database as a starting point, I went through the typical steps of creating an English Query application (see Figure 1). I'll skip the details of setting up the English Query application for my database. Since it's only a prototype, I just set up a couple of entities (patients and orders) and minimal relationships between entities ("patients have orders"). Figure 1: Query Steps I started with the sample Visual Basic-based query application that comes with English Query and modified it to point to my application: Global Const strDefaultConnectionFile = _ "D:\Program Files\Microsoft English Query\patient.eqc" I then modified the connection string in the InitializeDB function to point to my SQL Server database: Set g_objrdocn = objEnv.OpenConnection("patientdata", , , "uid=sa;pwd=;dsn=Patient") My English Query application then looked like what you see in Figure 2. Figure 2: An English Query App The next step was to add the Direct Speech Recognition ActiveX® control (xlisten.dll) to my Visual Basic Toolbox. The control comes with the SAPI 4.0 SDK, so you will need to download and install that first. After I added the control to my form, I set its Visible property to False and added the following code to Form_Load: On Error GoTo ErrorMessage engine = DirectSR1.Find("MfgName=Microsoft;Grammars=1") DirectSR1.Select engine DirectSR1.GrammarFromFile App.Path + "\patient.txt" DirectSR1.Activate GoTo NoError ErrorMessage: MsgBox "Unable to initialize speech recognition engine. Make sure an engine that supports speech recognition is installed." End NoError: The patient.txt file referenced in the DirectSR1.GrammarFromFile method contains my grammar, or list of recognized voice commands. I wanted to make my demo as bulletproof as possible, and I have found context-free grammars to be the most reliable. Because a contextfree grammar allows a speech recognition engine to reduce the number of recognized words to a predefined list, high levels of recognition can be achieved in a speaker-independent environment. Context-free grammars work great with no voice training, cheap microphones, and average CPUs. (This demo should work fine on a Pentium 150 MMX notebook with a built-in microphone.) The demo could be made even more powerful by using dictation grammars, voice training, more powerful CPUs, and better microphones, but I wanted to make as few demands as possible on the user and remain speaker-independent. My grammar file (patient.txt) looks like this: [Grammar] langid=1033 type=cfg [<start>] <start>=... orders for patient <Digits> <start>=submit "submit" <start>=show sequel "show SQL" <start>=close "close" <start>=exit "exit" <start>=... patient has the most orders "most" Langid=1033 means the application's language is English type=cfg means this uses a contextfree grammar. The <start> tags define each of the recognized voice commands. The first command translates to any words (...) followed by the phrase "orders for patient" followed by a list of digits. <Digits> is a built-in item in the direct speech recognition control (DirectSR1), which recognizes a series of single digits. If I can command my app to "Show the orders for patient 1051762," they appear like magic (see Figure 3). In the commands after the orders command, the words before the quotes are the values for the phrase object and the words in quotes are values for the parsed object. The SAPI SDK comes with a tool called the Speech Recognition Grammar Compiler for compiling and testing your grammars with different speech recognition engines. The compiler lives under the Tools menu after you install the SDK. After you speak a phrase and a defined time period has passed, the event shown in Figure 4 is fired. All of the Case options are based on the value of the parsed object that's captured after each voice command and corresponds to buttons on the form. The ctrlQuestion object is my rich text field. Normally, the user types their query here, but in this case they can be entered by voice. The ctrlSubmit_Click submits the ctrlQuestion.SelText to the English Query application and the results are immediately displayed by a DBGrid object. Programming for the Future I recently downloaded the Speech Control Panel from the Microsoft Agent Downloads Web site at http://msdn.microsoft.com/msagent/agentdl.asp. The Speech Control Panel enables you to list the compatible speech recognition and text-to-speech engines installed on your system, and to view and customize their settings. When you install the file, it adds a speech icon to your Control Panel. Note that this application will only install on Windows® 95, Windows 98, and Windows NT® 4.0-based systems. I've suggested to Microsoft that the program allow the user to pick a default speech recognition engine and TTS engine through this panel. If you could then programmatically pull a user's choice out of their registry with SAPI, you could code it once and never change it. This would give users more flexibility in their use of speech-enabled software. For example, a user might already be using a product from, say, Dragon for their speech recognition engine. If they wanted to continue using that engine and their training profiles, SAPI could allow that if it were defined as the default speech recognition engine in the registry. Summary The combination of speech recognition and English Query represents a powerful way for a user to access information in a SQL Server database very quickly. For users who work in an environment where speed and ease of access are critical, it holds enormous promise for future applications. As hardware continues to become more powerful and cheaper, speech recognition should continue to become more accurate and useful to increasingly wider audiences. See the sidebar "English Query Semantic Modeling Format". From the June 1999 issue of Microsoft Internet Developer. Get it at your local newsstand, or better yet, subscribe. (c) 1999 Microsoft Corporation. All rights reserved. Terms of Use. From : http://www.microsoft.com/mind/0699/equery/equery.asp Search for: within Use + - ( ) " " All of dW Search help IBM home IBM developerWorks | Products & services | Support & downloads | My account > XML zone Building an XML Application, Step 2: Generating XML from a Data Store Digital Earth Systems researchers, working with statistical modeling experts, have successfully completed field trials of the GeoMode™ Positioning Engine with demonstrated accuracy of less than 20 meters (FCC 67% threshold specification). This technology is the most commercially feasible approach to accurate wireless location. GeoMode can be installed on a Serving Mobile Location Center (SMLC) or Gateway Location Mobile Center (GMLC) or a GeoMode ASP Server external to the wireless network. The GeoMode GMLC and the ASP Server both require a SIM STK application. GeoMode Positioning Engine And How It Works In a GSM and similar digital systems, the network data between every Mobile Station (MS) and all available Base Stations (BTS) is measured at sub-second intervals (determined by the operator) and reported to the Base Station Controller (BSC) to facilitate handover. The GeoMode Positioning Engine retrieves this existing network data (i.e. LAC, BSIC, BCCH, RxLEV, RxLEV_Nbor [1-6] etc.) from the BSC Network Management Report (NMR). Depending on the wireless operator network configuration, this data is accessed via the GeoMode SMLC in the network, or via the SIM STK application . Additionally, this data can be collected directly from the 'A' interfaces using analyzing probes and collection software. There is no need for any costly network transmitters or infrastructure upgrades and therefore GeoMode is less costly and less complex to implement and manage. The GeoMode Positioning Engine uses a unique location positioning process based on advanced statistical modeling techniques and patented algorithms applied to the subscriber MS data and network propagation data models. These data models are created from existing network data parameters and data output from network planning tools. The result is a consistant and accurate record of MS positions across the complete coverage area. Implementation Options GeoMode can be implemented either within the wireless operator's network or external to the network with an independent data center or application service provider (ASP). The network measurement results (NMR) data is the only data required by GeoMode. The NMR for all MS subscribers is available (up-link) at the BSC from where it can be accessed by an SMLC, or this same data is available (down-link) from the BST directly via a SIM STK application on the handset, where it can be sent to a GMLC or ASP. Therefore, the configuration and location of the GeoMode server can vary depending on the network provider and the operator preferences. Wireless Network APIs Available • GSMMAP • UMTS MAP •ANSI 41 • CAMEL • WIN • JSTD 36 • BSSAP + • BSSAP-LE • BSSMAP • BSSAP • IS –634 • CDGIOS These API products can be used to provide several interface options for wireless operators. From : http://www.geomode.net/pages/3/ Put this picture in the assisted GPS section the paragraph is referenced and this picture is from the same article so no need to reference it