The idea of a personal robot assistant, able to effortlessly understand spoken (and unspoken) human intents and efficiently act on them while delivering a breezy quip, has been a staple of science fiction.
The 1940s had Zolo to scare away officious mailmen and refresh bouquets, while the Jetsons had Rosie to deal with prickly bosses. HAL 9000, the most evil red light in filmdom, may not have been keen to “open the pod bay doors” but it could still belt out a mean rendition of Daisy Bell.
Last week at its Worldwide Developers Conference (WWDC), Apple announced a raft of new features for its software-based intelligent personal assistant, Siri – a real-life approximation of this once-imagined future.
Apple lauded Siri as the standout feature of its iPhone 4S last October, showcasing several uses, including setting reminders and appointments, searching the web and answering the age-old question: “Should I carry an umbrella today?”
The app stole a march over the stilted spoken command interfaces of all mobile platforms including that of Apple’s iOS until then, with the seemingly effortless manner in which it understood natural spoken language.
A year before its star turn, Siri had actually debuted on the iOS platform as a standalone app that integrated with various web services and made it possible to locate restaurants and book tables with spoken language commands.
Siri’s functionality can be roughly broken down into three parts:
- speech recognition
- reasoning
- delegation
Speech recognition involves making sense of voice patterns and converting them into spoken phrases. This means separating the user’s voice from background noise and accurately translating it into words from a language.
Reasoning not only requires recognising the intent of the words but also the context in which they were spoken. A simple command such as “Give Mum a ring” requires the assistant to understand that the action required is to make a phone call to a contact called “Mum” and not to present an actual ring.
Delegation requires firing a specific handler – in this case, the app that actually makes the phone call – with the task of executing the action.
If things go wrong – for example, there is no contact called “Mum” – the assistant should be able to inform its human with a simple response and try to get more data to fulfill the request.
Speech recognition and reasoning require matching voice patterns against databases and running extensive statistical analysis – tasks that require computing power and memory in excess of that provided by the processor in iPhone 4S.
Therefore, Siri requires an active wireless internet connection over which it transmits data to Apple’s servers where most of the processing is performed. Naturally, data usage is higher on a Siri-enabled iPhone.
Siri is able to carry a conversation with its users and provide the semblance of a human personality. It’s also seemingly able to engage in witty repartee to answer questions about the meaning of life (“42”), suggest places to hide a body and report its owner to the Intelligent Agents’ Union for harassment.