Introduction to Ultra-Performance Speech Recognition Technology for Smartphones

This seems like a good idea when speech recognition technology is applied to the computer desktop. However, for most people, speech recognition is not a substitute for keyboards and mice. Now, voice technology is being used in a whole new environment: mobile phones. The application of speech recognition technology in mobile phones will further promote the development and application of this technology in a new direction. This is the direction that speech recognition technology has never been involved in in desktop applications.

This article refers to the address: http://

IBM will commemorate its 100th anniversary this year. In the early 1960s, IBM created an experimental speech recognition system called "Shoebox." This system solves the problem of spoken language algorithms. Speech recognition technology first appeared as an early technology in the 1950s, mainly due to curiosity. In the early 1960s, IBM's "Shoebox" device recognized 16 spoken words and was able to answer simple mathematical questions such as "3 + 4 =?".

DragonDictate, launched by Dragon Systems for DOS computers in the early 1980s, is probably the first speech recognition application. This app can only recognize a single word, only one word at a time. Over time, this app has evolved into a product called "Dragon NaturallySpeaking" (currently the 11th version, owned by Nuance Communications). This application is capable of translating text read at normal session speech and speed.

There are two constraints to the use of speech recognition technology in desktop computers. First, in order for this application to work with greater accuracy, the application must be trained to recognize the user's speech characteristics. Third-party products such as local voice-converted text technology and Dragon NaturallySpeaking in Windows Vista and Windows 7 still require a user training period.

The second constraint is the popularity of the keyboard. Most people are used to keyboard typing rather than speaking, so voice control faces the same application barriers as Dvorak keyboard layouts. Why do you need to learn to use a Dvorak keyboard when a simple old-fashioned QWERTY keyboard is available and working well?

The Microsoft TellMe team is responsible for developing speech recognition technology for the multimedia environment. Abhi Rele, senior product manager at TellMe team, points out that in a desktop environment, users have convenient human-machine communication modes, such as keyboards and mice. Therefore, the use of voice is mainly for voice enthusiasts.

The wider application of voice control computing requires two things: a better and more convenient application and a place where the main use of voice. Mobile phones are such a place that has been growing for a long time.

Matt Revis, vice president of product management and marketing at Nuance, explains that the difference between a desktop computer and a mobile environment is this: a desktop computer is a fixed environment, and the focus is entirely on the use of desktop computers. Therefore, the voice technology of the desktop computer mainly performs the following tasks: supporting office applications, web browsing, communication, and the like. On the mobile side, voice is used more to support a variety of lifestyles: professionals on the move, fun activities outdoors, hands-free phones, and more.

Gartner analyst Tuong Nguyen agrees with the idea that speech makes more sense in a mobile environment. He said that from the point of view of use, the voice recognition function of handheld devices is more valuable. It adds user-friendly, convenient input.

Nguyen added that if you don't use voice technology to say a simple statement, but flip through a lot of menus or try to type on a small display keyboard, the value of speech recognition becomes apparent. As touch screen devices (without physical keyboards) grow in use, speech recognition technology will be used to enhance data input and output. Speech recognition also supports hands-free requirements or legal requirements.

In terms of mobile devices

Because mobile devices typically only support a portion of the storage and processing functions of a desktop computer, voice processing takes some time to appear in the phone in a basic form.

The Springer Handbook for Voice Processing explains the situation of mobile phones in the early 2000s. Although there were some limitations at the time, the phone was able to recognize the digit-by-digit dial-up speech after programming, and to some extent recognize the person's name. The main problem is memory, so most phones can only recognize 10 numbers or names at a time. However, another problem pointed out by these authors is that this feature is used less often, probably because mobile phone manufacturers are poorly marketing in this area.

With the increased memory and enhanced processing power of mobile phones, the recognition capabilities of ordinary mobile phones have also increased. Samsung’s $99 SCH-p-207 phone, released in 2005, adds voice-to-text dictation and voice dialing. With hundreds of megabytes of memory and a few kilobytes of storage capacity, this generation of smartphones is rarely limited.

Another key advance is the speed of the network. The faster wireless network wave has lifted many big ships, including the latest generation of voice processing technology. Faster networks can migrate voice processing tasks from the network to remote servers.

Amir Mane, Google Voice Search Product Manager, explains how faster networks can help Google Voice applications. He said that because all the heavy processing tasks are handled by Google servers on the network, we have reduced the computing power of handheld devices.

Current application

The current state of mobile phone speech recognition technology is not limited to voice dialing. Voice-enabled features actually include voice dialing. This is the first speech recognition feature that appears on mobile phones. Currently, even many low-end phones have this feature, although this feature handles some of the less common names in the phone book.

Gartner analyst Nguyen pointed out that the newer generation of voice features is more open. Instead of programming specific voice commands that perform certain functions, the application can recognize the speech and perform the appropriate actions. Higher-end, more powerful devices make these applications more viable. In other words, instead of being able to dial a phone number using the phrase "call 888-555-1212", the user can also say "call mom" or "call my mom."

Google Voice Search has fewer restrictions than previous voice recognition technologies because all the heavy tasks are done by web servers. This makes voice-driven applications such as Google Voice Search more feasible. For example, if you say "create war movie time", you will see a page listing the area number or location. This app not only recognizes the meaning of this phrase, but also provides information about your phone (your current location) and website (time of the show).

The app is also very familiar with English and automatically distinguishes between vocabulary differences without training. If I say "Motley Crue", the app can even use the band's unique spelling in search terms, even though it misses the diacritics. Search for "Motley's Crew" and you'll get a comedy.

This means that the limitations of Google's speech recognition clearly indicate that you will be further out of mainstream English. The name of the foreigner is not helpful. Another problem with speech recognition applications is the noise of the environment. Mobile users are often more affected by environmental noise than desktop users. According to Nuance's Revis, the accuracy of speech recognition is a problem in noisy outdoor environments.

Since the launch of Samsung's mobile phone in 2005, the dictation function has made great progress. The Dragon Dictation feature of the iPhone powered by Dragon NaturallySpeaking allows users to dictate everything from memos, emails to Twitter updates. The Dragon software for email provides similar functionality for BlackBerry devices.

For Android phones, Nuance offers FlexT9 software. This software combines the Dragon dictation function with three types of touch screen input methods. There is also a Handcent SMS app. This app integrates Android native speech recognition technology to help you send text messages with your voice.

Translations between texts have been available for many years (eg through the well-known Babel Fish website). The simultaneous translation feature is not available yet, but the software will be available soon. For example, the Jibbigo software for the iPhone translates words, phrases, and reasonable simple sentences, allowing both parties to alternately speak.
Future direction

Ask the next big step for everyone involved in developing voice technology. They usually give you an answer: natural language processing.

Revis interprets it as a system that understands what you mean. It doesn't just know what you are saying. In the conversational interaction mode, the user says that he has to say something, and there is no restriction on how the user can say this. He provided instructions or requests for information, such as "Where can I buy a Nikon camera under $100?" or "Send a message to Jenny saying that I am 20 minutes late" or "Tonight at Morton's The place for three people."

Google's Mane said that providing natural language processing in spoken conversations is a double challenge. First, you must identify these words, and then you must understand this. The first part has become easier. However, the second part is still difficult to solve: it means that it is determined by the context and difficult to cope with. The grammar analysis done by humans is not always successful.

Microsoft's Rele believes that the extra services provided by mobile phones (such as compass or GPS) can enhance the usefulness of natural language processing. He said that you can arrange meals and movies for two people by separating the tasks with data from different sources, such as calendars, restaurant rankings, movie reviews, and locations.

In addition, the mobile phone's service can be used to provide an environment for speaking. Rele said that the user's voice input and intelligent information obtained from other sensors and states related to the user and user environment can provide richer and more relevant results. For example, if you have just used the Foursquare website to view restaurants, some vague voice commands tend to go out to eat, book accommodation and get a taxi.

The multi-platform application Vlingo claims to be a "virtual assistant" and has been able to provide these features. This software is plugged into services such as OpenTable and Fandango to accomplish many tasks: booking hotels, booking movie tickets, and more.

Nguyen believes that another area of ​​improvement in future speech recognition technology is gaming. He said that voice can be used in games to increase the different atmospheres of playing games. For example, you can pass the command of the Kirk-style captain to the stellar spacecraft or interrogate the suspect in a mysterious thing.

is it you?

Another function that has been applied is to automatically apply speech recognition to a single user. This is the hands-free version of the voice training required by desktop computer speech recognition technology.

For example, the latest version of Google Voice Search has an opt-in feature that allows a user's customized voice features to be built over time. Mane explained that when users choose to use custom speech recognition, we draw a boundary between the user and the user's intonation, which allows us to establish a preliminary, personalized speech recognition mode.

However, personalized recognition is not a technology that solves all problems. It is just a transitional step to make speech recognition more seamless. Mane said that we did not see personalized identification as a unique solution, because there will be more series of technological innovations. Mane believes that future improvements in this technology will require more active participation by our users.

in conclusion

Mobile phones have been an incubator and driver of many technologies, including hardware and software technologies. So far, adding voice functionality to this combination has only produced gradual improvements, and the Google Voice app has great features.

However, these improvements are gradually paving the way for more important advances. Mobile technology provides a new arena for how to bring together these new technologies. The next step may not be a mobile phone that understands everything you say, but a more useful phone that fully understands what you are saying.

Consumer Electronics

Consumer Electronics,USB Charging Cable,Phone Charging Line,Phone Data Cable

Dong guan Sum Wai Electronic Co,. Ltd. , https://www.sw-cables.com