By now pretty much everyone has interacted with a speech based customer service system over the phone. And the results have been mixed; some people like the fact that you can just tell it your issues and it finds a solution quickly, while other dislike it because its misrecognitions are annoying and its tone patronizing. Therefore the future of these types of systems is up for debate: Given its known problems, are voice interactions more efficient and a better experience than keyboard-based interactions? And furthermore: does speech recognition have any other use in customer service applications beyond just the phone call?
In our new era of multi-function, application driven, and internet centered smartphones, interactions with customer service do not have to be tedious. However, given web's focus on entering data on the keyboard and the phone's focus on speaking, the choices for having an optimal customer service experience on a smartphone are limited and mostly unchartered. Many of the current popular smartphones have a soft keyboard which causes many headaches when entering data, requiring lots of corrections to get the right phrase across. Many wish there was a better way, but are not willing to sacrifice the big screen size, the touch interface, and the ability for the phone to smoothly fit into their pocket.
So let's look at what's happening in the consumer world to find an emerging trend that can help us predict where the enterprise may be going as well. A free app called "Siri" is available from the Apple App Store today. It's an app focusing on being your personal assistant by listening to what you want to accomplish and instantly acting on that request by combining it with data it already knows about you and the external environment. Using Siri, one can say "I need a taxi" or I'd like to see the social network around 5:30?" or "Tell me where the nearest ATM is". The information regarding your request is displayed on the iPhone screen and most of the time you don't even have add any additional information because Siri understands your request, accesses location and personal data, and acts appropriately by either booking you a cab at your location, giving you nearest movie listings, or shown you a map of nearest ATM locations. It works surprisingly well. Given Siri's recent purchase for more than $200 million by Apple, the notoriously quality-obsessed Steve Jobs seems to think so as well.
Siri, and other similar personal assistant apps like Tellme on Windows Phone use speech-to-text transcription technology by converting an uttered phrase into its text representation. Using computational lexicography algorithms, the phrase is analyzed to derive its meaning, typically including a subject such as "movie" or "taxi" or "ATM" and an action such as "want to see" or "book" or "give me". The personal assistant then automatically adds additional information from the smartphone's sensors, such as location, and personal data such as name, past searches, and preferences to put the action in context. From the given data, the app then has the required information to make the most intelligent assessment about what the user wants to do, perform an action with backend internet services, and return back the results. Because of increasing internet speeds provided by carriers, most of the computation including speed recognition does not even have to take place on the handset and performed in the backend where faster machines and economies of scale guarantee efficient results.
The entire system works surprisingly well. Unlike previous attempts at speech enabled IVR systems, today's speech recognition deployments reside primarily in the cloud and are accessed simultaneously by many different companies. The advantage of speech being hosted in the cloud is that every single speech utterance that is submitted by users is used not just for recognition purposes but also to improve the underlying recognition engine. The result is that the more people use the recognition engine, the better it gets. This continuous improvement of the speech engine has superb effect on quality, making the recognition system very usable and effective. Not only that, but because of increasing 3G and now 4G internet speeds provided by carriers, the entire mobile speech interaction can take less than 30 seconds. Compare that with the satisfaction of a customer that spends an average of 3 minutes in the IVR or over 5 minutes on the web.
So what is the enterprise opportunity here? The good news is that many of these types of applications have been implemented for consumer services so far and not by many companies' customer service departments. Enterprises have an opportunity to leap forward in their offerings by providing their customers with an easy to use, efficient, speech and screen enabled customer service interface. This type of interface can be easily applied to many different industries: For example, in Healthcare, one can imagine a service where customers can say "I need to file a claim" or "I need to speak to a doctor immediately about my cough". In Financial Services, customers can say "I need to transfer $1000 from my checking account to my savings account" or "I'd like to buy 30 shares of France Telecom stock". Personalization already entered into your smartphone, such as your name, medical and financial histories can automatically be passed through and used to make smarter decisions without you having to act. The results are quick data entry for users and cost effective decision making for enterprises, yielding a win/win value proposition.
Unlike developing VoiceXML applications which are standard and abound, developing speech enabled customer service applications for the smartphone has a few challenges on the horizon. Besides the current lack of examples and industry know-how, the smartphone speech world does not yet have simple standards like VoiceXML to spur development, nor many developers that are skilled in both the smartphone UI and speech recognition domains. So if you know any developers or companies that are skilled in both, find and secure them quick! Many carriers, including
So in conclusion, smartphones have created a new interaction medium where customer service can be accessed more efficiently and with better results. Unfortunately the current user interface for accessing customer service on the phone is poor, frustrates users, and is not really optimized for the capabilities offered by smartphones today. Since companies won't give customers instant access to human agents any time soon because of cost considerations, optimizing the customer experience on the smartphone by using speech as input and the large screen as output is your best bet. In order to achieve this perfect balance, don't forget to give your developers carriers a raise for the new skills they will soon acquire.
I am a Product Manager at Orange Silicon Valley and the creative force behind Sens.ly. I have a passion for creating game-changing products and I created one of the first enterprise voice portals, a speech-based outbound IVR platform and a location-based social network. I also holds several provisional patents in the areas of healthcare, security, telecommunications and augmented reality.