11:43 PMHow Siri Works
Introduction to How Siri Works
Our expectations were set up decades ago by the likes of HAL in 2001: A Space Odyssey and the computer of the U.S.S. Enterprise on "Star Trek." We've all been waiting for our computers to start talking and responding to us like real people. Real people with massive IQs and tremendous amounts of knowledge at their virtual fingertips, that is.
Even Apple had early visions of this voice-assisted future. In the late 1980s, Apple came up with a concept called Knowledge Navigator. The company produced a series of videos that showed people interacting with this fictitious system via a touch screen tablet and natural conversation. Its avatar looked and sounded perfectly human, and it could conduct a flawless conversation with you to help you plan your schedule, communicate with friends and colleagues, and access all sorts of networked information. They haven't quite reached the level of interaction and sophistication displayed in those videos, but with the release of the touchscreen iPhone and iPad, and now Siri, today's Apple has taken the first big step toward this futuristic vision.
Siri is a voice-activated assistant that mimics human intelligence and natural conversation. She interprets your voice instructions, and, when possible, carries them out. Siri can open apps, give you movie times and sports scores, make dinner reservations, call or send messages to people in your contact list and perform a host of other useful tasks. She doesn't just take your commands and silently perform them. She tells you what she's doing and prompts you with questions so that you have a chance to make choices and correct her if she misinterprets you. Siri is a far cry from the voice-recognition systems of yesteryear.
Speech-recognition for computers has existed for quite some time, and Siri isn't even Apple's first real foray into the technology. In the 1980s, you could buy – at a premium -- special software and hardware to make your computer respond to your voice. It required training the software to understand you and could only handle a small set of commands. This was great for hands-free computer control for people who needed it for accessibility reasons, but for most users it was impractical. In the early 1990s, Apple released Macintosh Quadra AV (audio-visual) computers that included built in speech-recognition software and hardware. They couldn't handle dictation, but you could open applications and perform a limited set of tasks via voice. And more recently, Mac OS X (as well as Windows Vista and other operating systems) has included integrated speech-recognition, but again, this is touted as an accessibility feature. Keyboard and mouse are still faster and more accurate input methods than voice for your home computer.
Other device manufacturers and software retailers have had speech-recognition applications on the market for a while, as well. Third-party apps like Nuance's Dragon Dictate and its newer Dragon NaturallySpeaking voice dictation applications have been available since the 1990s. We've been talking to automated phone systems for years (primitive ancestors of the more robust dictation and assistant programs of today). Voice-recognition technology is even being used for things like entry of patient information at hospitals. But most of these are still demonstrations of voice-recognition artificial intelligence (AI) in its infancy, able to only perform simple tasks like taking down what you say or responding to a limited number of preset commands.
Siri takes the technology quite a few steps further. It was originally a third-party app called Siri Assistant, released in 2010 and slated for release on other platforms, but Apple bought the company, removed it from the App Store and made it a proprietary feature fully integrated with the iPhone's operating system (iOS). The integrated app was in beta when it was first released in 2011 as part of iOS 5, but since its upgrades in iOS 6 (released in 2012), it is considered fully functional. Siri comes native on iPhone 4S and later, iPad with Retina Display, iPad Mini and the fifth-generation iPod Touch, but isn't available for prior models.
Siri isn't even the only game in town, but in some areas, she may be the best. Read on to find out more about this marvel of voice-recognition and artificial intelligence.
What is Siri?
Siri is kind of a virtual assistant who listens to your requests and performs actions accordingly. The more you work with it and make corrections when you're misinterpreted, the better it's supposed to get at understanding what you mean. Rather than doing most of its work on your phone's processor, Siri communicates with servers in the cloud to interpret your requests and retrieve the information you need. Since most of Siri's brain exists on remote servers accessed by a great many people, the more people using it, the more it's supposed to learn from everyone else, too.
AI assistants have been the dream of technologists dream for a long time, but they weren't very feasible until recently. They have finally been made more so by things like much faster wireless speeds, more powerful processors (especially those in our mobile devices), the availability of vast amounts of data for training the AI, the advent of cloud computing and improvements in speech recognition methodologies. Most voice-recognition systems in the wild are like the ones you've probably dealt with when calling a large company on the telephone -- they only understand a very limited vocabulary. Siri has more data and learning ability behind it, and it continues to learn and grow.
Siri was not entirely developed by Apple, but instead sprung out of a huge AI initiative started in 2003 that was funded by the U.S. Defense Department's Defense Advanced Research Projects Agency (DARPA) and run by SRI International, a research entity affiliated with Stanford University until the 1970s. The intent was to come up with something that could help military personnel with office work and making decisions. The result of this project was called Cognitive Assistant that Learns and Organizes (CALO), an artificially intelligent assistant that could learn from its users and the vast amounts of data available to it. Not only could it be used to do things like schedule meetings and organize all the necessary documents for meeting participants, but it could even make decisions. For instance, if someone backed out of a meeting, CALO could assess whether they were vital enough to warrant canceling and rescheduling. Another SRI International project called Vanguard created a prototype assistant for a smartphone, but one with nowhere near CALO's capabilities. Several SRI employees created a startup to marry the ideas from both projects. Alumni from companies such as NASA, Apple and Google also worked for the new company, and their work led to Siri Assistant for iPhone 3GS.
This version of Siri would take questions from users via voice or keystrokes, and would send the voice or text data to a remote server for transcription (in the former case) and interpretation. Rather than try to break down entire sentences and interpret their meaning as a whole, as other natural language research has generally attempted, Siri used models of real objects and concepts, as well as how they might work together, to decipher the requests. People can say the same thing in a number of different ways, making sentence interpretation a Herculean effort, so Siri looked for keywords and their context instead. This simpler paradigm, along with a host of programmed phrases and requests it was designed to recognize and carry out, allowed Siri to guess what the user was asking and respond appropriately without having to understand every single word -- with a fair amount of accuracy. It had access to a large amount of data via various Web sites, and could use these sites' application program interfaces (APIs) to tap into any services they offered.
Apple tweaked this app into the Siri we know today. Siri actually lost some capabilities upon integration with iOS, as it used to have access to far more Web sites and services than Apple has paired with to date. It also lost some of its more biting humor and, apparently, a propensity for pottymouth responses. But it gained other skills, such as better integration with iPhone's built-in features, multilingual capabilities and an audible voice. And new features have been added with iOS's subsequent updates. For instance, it regained its ability to book dinner reservations and return movie times and reviews with the introduction of iOS 6. And the ability to purchase movie tickets was returned with the iOS 6.1 update in January 2013; however, it now books your seats through Fandango rather than Movietickets.com.
Unlike a search engine that returns long raw lists of links related to keywords you select, Siri is designed to interpret your request, hone in on what it thinks you want, and perform actions to give you a more limited but more correct amount of data or services in return. Siri understands context. And she still goes to servers in the cloud to retrieve answers via third party services, albeit a smaller set of them than before. Anything related to mathematical computation or scientific fact is likely to come from Wolfram|Alpha. Information related to businesses like restaurants or retail stores is likely to come from Yelp, although restaurant reservations are through OpenTable. Weather info comes from Apple's built-in Weather app, powered by Yahoo. And movie time listings, reviews and other movie information would likely come from Rotten Tomatoes. Any request Siri doesn't understand will cause her to ask you for more information to clarify, or to ask you if you want her to look it up on the Web. She uses your phone's GPS to retrieve and return information relevant to your current location.
What can Siri do?
Siri can look up lots of information from its various online data sources, including movie times, sports scores and weather forecasts, as well as general information on just about any topic that springs to mind. She can take dictation, send text messages and e-mails, make phone calls, post to the social media sites Facebook and Twitter, launch apps, schedule appointments on your Calendar, set reminders and give you directions (now spoken step-by-step), all in response to your natural spoken commands. You can say, "Play classical music," and she'll look for matches in your Music app and start playing them. You can command, "Wake me up at 5 a.m.," and she'll set an alarm in iPhone's Clock app. You can even ask her, "What can you do?" and she will return a list of a bunch of possible commands that give you a good idea of how to use her.
Incidentally, "she" really isn't the most accurate way to refer to Siri, since it's actually a genderless computer application, but many people do so because its U.S. English voice is a female one. Siri also has male voices in some languages and dialects, like the U.K. English voice.
You activate Siri in one of two ways: Hold down the Home button on the iPhone, or, provided you have the "Raise to Speak" setting turned on, bring the phone to your ear while the screen is on. Siri beeps twice to let you know that you can speak, and the microphone icon on the screen stays lit while it's listening to you. Siri beeps two times again in a slightly higher-pitched tone when it stops listening and starts processing. While Siri is active, you can continue to touch the microphone icon to enter new voice commands.
Before using Siri, you will want to go to "Settings" > "General" > "Siri" and set a few things. This is also where you can turn the app on or off. One thing you can set here is the "Raise to Speak" function mentioned above. You should also set your language, which with iOS 6 includes 19 different choices: three dialects of Chinese, four English, three French, two German, two Italian, and three Spanish, as well as Japanese and Korean. The four English choices are Australia, Canada, United Kingdom and United States. Your selection affects how Siri interprets your accent and instructions.
You also need to give Siri your identity by setting "My Info" to your own contact information from your Phone app contacts list. You can actually set it to someone else besides the person listed as the owner of the phone. After this, you can say things like "Send myself message," and it'll know to send a message to you, or whatever contact you chose.
Another Setting choice is "Voice Feedback." If you set it to "Always," Siri will always speak its answers to you, as well as display them to the screen. If you select "Handsfree Only," Siri will display its answers to the screen and remain silent unless you are using a hands-free device like a Bluetooth headset.
Those are the only settings explicitly for Siri, but there are other things you can control, like how it interprets names from your contacts. You can set nicknames and phonetic first and last names in your Phone app's contacts (by going to "Phone" > "Contacts" > "All Contacts" > person's name > "Edit" > "add field" > "Nickname," for instance). This is great if Siri routinely fails to recognize names of people you are trying to contact, which can happen due to unusual spellings or pronunciation misinterpretations. You still might have to play around with it to make the nickname or phonetic name something that Siri understands.
You can also set relationships within your contacts, which can come in handy if you want to be able to say, "Send a message to Mom," rather than saying her name. There are two ways to do this. You can edit your own Phone contact with the instructions from above, and under "add field" choose "Related People" and it will give you a list of choices like "parent," "mother," "father," etc. Once you have added one relation, you no longer have to do "add field," but can go to the bottom of your list of relations under "Edit" and add new ones. An alternate (and more fun) way is to skip going to the Phone app and just tell Siri, "Send a message to Mom." The first time you do it for each relationship, Siri will ask who that person is and then ask you if you want it to remember. If you say yes, Siri will store the information in your contact and next time you ask her a similar question, she'll know who you want to contact.
If you want to send a text message, and you have multiple contacts with similar names, Siri supplies a list of choices. After it reads off the list, it will double beep for you to pick the right person via speaking or tapping the screen. After you've chosen, it will say something like "I updated your message. Ready to send it?" and will await your permission. You can say or tap "Cancel" or "Send," or say things like "yes," "no" or "never mind." As long as Siri understands you, it'll react accordingly. All of this applies to Phone and Mail apps, as well. The commands and prompts might be slightly different, but are intuitive. Sometimes, it won't understand who you are talking about, even in the case of spelling variations that you might think would be obvious, in which case you are out of luck and have to open your Phone or SMS or Mail application and go manual.
Some car manufacturers are teaming with Apple to make their in-car voice systems work seamlessly with Siri. Mercedes has integrated Siri and other iPhone functions into the in-vehicle display of some of its models, allowing drivers to send e-mails and text messages, play their music and perform other Siri functions hands-free. On the less luxurious front, GM has also been able to get their MyLink "infotainment" systems working with Siri in their Chevrolet Spark and Sonic cars, allowing drivers to activate Siri with a steering wheel button and otherwise interact with the application hands and eyes-free. And there's no telling what other third-parties will decide to leverage Siri's features to improve their own offerings.
What are Siri's limitations?
As cool as Siri is, it has a few shortcomings. Since being removed as a standalone app, Siri is only available on iPhone models 4S and above and newer models of iPad and iPod Touch, so anyone with an older phone or tablet who wishes to try out a voice assistant either has to upgrade or find an alternative to Siri.
There are also some limitations to its functionality. Because Siri is interacting with servers on the cloud, not just your smartphone's processor, WiFi or cellular connectivity is necessary. Siri also won't work very well if you end up in an area with weak cellular service.
There are some accent and language issues, as well. Siri is more likely to understand you if you speak very precisely, and it has trouble understanding some accents. So you might have to modulate your voice for better results, which might mean some people can't have natural-sounding conversations with the digital assistant. One reviewer found that Siri understood him best when he spoke to it in an inflectionless robot-like voice [source: Poeter]. Siri also has trouble with background noise and low quality audio from certain headsets.
Siri can also be quite literal at times. If Siri doesn't understand what you are asking from the get go, it will most likely ask you if you want it to search the Web for the exact words you just uttered. And if you say something like, "Tell Bill he's going to have to leave without me," it will generate the message to your contact Bill along the lines of, "He's going to have to leave without me."
Siri doesn't understand all spelling variations. It'll successfully match names like "Sarah" and "Sara" or "Jeff" and "Geoff," but less common names, or just variants that haven't been programmed into the app yet (even sometimes just one letter off), can cause headaches.
Even if you speak clearly and in a noise-free area, Siri can still misinterpret you in creative ways, such as picking the wrong homophone or just mishearing you and speaking back phrases that only sound vaguely like what you said. If you are in the same session and it does something like mishear a city name, it may keep using the same incorrect location information even if you try to correct it. Siri mimics intelligence pretty well, but in some ways it's kind of dumb. It can make for some frustrating conversations, but hopefully things will improve with more use and the passing of time.
Because of variations in human language, there are multiple ways of giving the same commands. Siri's learning skills aside, as you use it, you will learn what variations yield the best results. If you want hands free directions, but ask for something that returns multiple results, you will have to go through the list selection process and will likely have to touch or look at the screen at some point. But you can change something like, "Find McDonalds," which would return multiple hits in just about any city, to, "Give me directions to the nearest McDonald's," and Siri will find the closest one and jump right into your Maps app's spoken directions.
Siri can't yet compound all its actions with most apps. For instance, Siri can search for something on the web, or open the App Store for you, but it can't search for something in the App Store. If you say, "Search for Widget in the App Store," it'll respond with something like, "Searching the Web for 'widget in app store,'" and bring up a Google search. It also can't close apps. Functionality with additional apps, besides just opening them, could also be on the horizon.
Even if you have "Raise to Speak" set in Siri's settings, it doesn't always activate when you raise your phone to your ear. You might still have to hold down the Home button, unless you discover the magic motion combination that always makes this feature work.
There is also one security issue with Siri, but it's easily fixable. Even if your phone is password protected, you can invoke Siri with the Home button. And in this state, it can be used to send messages and make calls to people in your contacts list without unlocking your phone. You can fix this by going to "Settings" > "General" > "Passcode lock" and setting Siri to "Off" under "Allow Access When Locked."
Unlike with many GPS devices, you can't pick and choose Siri's voice, so if you don't like the one that comes with your dialect, you're out of luck. If there happen to be multiple choices of dialect in your language, it can be fun to try another, but this can result in misunderstandings, as there are word definition and accent nuances between dialects. For instance, if you ask for football scores, the word "football" will mean different sports in the various English-speaking countries.
Because the Apple-integrated Siri of today doesn't pull data from as many sources as the original app, the information you get back may be more limited. It still can't order you a taxi. But Siri is young yet, and new features should be added with future iOS updates. And voice-recognition AIs are still not 100 percent accurate, so Siri is bound to disappoint at least some of the time. Frequent use is also likely to increase data usage, so users will have to keep an eye on their usage and limits.
How can people use Siri?
Surveys by Parks Associates in early 2012 found that most people were using Siri for basic tasks like texting, making calls and looking up information. Fewer were using it to set appointments and perform other more complicated duties [source: Vascellaro]. It can be a useful productivity tool, though, if you delve into its other functions. And even the simple stuff like making calls and taking dictation is usually in the realm of a human assistant, so those things can't be dismissed when gauging Siri's utility.
The app still can't replace a human assistant at this point, though, and is considered by some to be a novelty, at least in part because of its intermittent inaccuracy. But it should get better on the accuracy front over time.
Siri is fun to play with, in any case. You can ask it for all sorts of information, but you can also ask it questions that don't have concrete answers it can regurgitate from the Internet. These will sometimes yield one of the many witty "Easter Eggs" written into Siri.
Here are some of Siri's answers to the question, "What is the meaning of life?":
Here are a few responses to, "I love you.":
Other amusing questions and answers include:
This wasn't done to be frivolous, but to make Siri amusing and likable, qualities that make it less off-putting than a lot of other virtual assistants. Siri also tends to give different answers if you repeat the same question, to keep from being repetitive or annoying, like, for example, Microsoft's much despised Clippy AI assistant. You can't yet have a real human-level conversation with Siri, but you can at least have a pleasant interaction.
There's lots of hacking going on to make Siri capable of things well beyond her current scope. People have created things like Siri Proxy, a proxy server that allows you to build and run plug-ins to add custom features to Siri, to do things like manipulate devices remotely. Through such plug-ins, Siri has been used to start a car, control a TV, adjust a home thermostat and open a garage door via voice commands. Another plug-in called NowNow even lets you invoke a competitor of Siri, Google Voice Search, from anywhere in the operating system with the shortcut of your choice. But be forewarned, running third-party plug-ins requires jailbreaking your phone, which could render your phone inoperable.
Industrious users have also discovered useful hacks that don't require programming or jailbreaking. For instance, if you prefer the Google Maps app to Apple's built-in Maps, you can add "via transit" to your request for directions. Because Maps doesn't currently have transit directions, it will give you a list of all of your map applications and allow you to choose which you use. You can also create Phone contacts for locations you want to visit and add Google Maps URLs to them, then ask Siri to visit a contact's Web site.
One Siri user has managed to control lights in his house using Siri by plugging a lamp into a Belkin WeMo Switch (a WiFi enabled power outlet) and using an IFTTT.com account to set up a condition to turn it off whenever a text message is sent to a special phone number. He created a contact for the IFTTT number, and then could request that Siri send a text to toggle the lights via a simple command [source: Cipriani]. Other uses are bound to come to light as more people tinker with Siri.
Is Siri the start of something grand?
Though Siri isn't quite up to the task of replacing our administrative assistants, it's also a pretty recent development, and the first voice assistant to reach such a wide commercial audience. Like the iPad did for tablets, Siri may have whetted the public's appetite for the computer assistant of the future.
Siri's not the only game in town. Google Voice Search was released for iPhone in an update to the Google Search app in October 2012. It's touted as being faster than Siri, although it is more of an information search feature and doesn't have all of Siri's integrated capabilities. You can also get Vlingo, Nuance's Dragon Go! and True Knowledge's Evi voice assistant apps for iPhone and other devices. They can perform some, though not nearly all, of Siri's functions, but they certainly lack her personality. One disadvantage of third-party apps is that they aren't integrated with iOS, so you can only use them while the app is open, not from anywhere as you can with Siri. Other platforms besides Apple's also have voice recognition software, including Samsung S Voice, Microsoft's TellMe and Android's Speaktoit, so iPhone is not the only choice. However, all current voice-recognition systems fall prey to similar accuracy issues, so there is no perfect assistant out there yet.
Some people have found Siri so disappointing that they've filed false advertising lawsuits against Apple, saying that their claims for Siri's capabilities were overblown. But despite its limitations, Siri is a pretty big step beyond what we've had before. There's no telling what useful features will be added in the future, or how smart and capable the app will grow as it's given more and more data. And as with all technology, it will likely only improve over time, and spawn more advancements we didn't even know we needed until they existed.
Siri might be our first big step toward the fully interactive talking computers of yesterday's sci-fi. We might be in for a glorious future of having advanced artificial intelligences do most of our busy work for us. And heck, with the right hack, they might even be able to make us coffee.
Lots More InformationAuthor's Note
Ever since seeing the film "Wargames" when I was a kid, I've wanted a talking computer. And now, thanks to Siri and my iPhone, I have one in my pocket. The voice is even slightly electronic sounding and, thankfully, is unlikely to start a global thermonuclear war.
I actually told Siri, "Let's play global thermonuclear war," in my quest for silly requests, and it brought up a Wolfram|Alpha page that said, in print, "Input interpretation: How about Global Thermonuclear War? Result: Wouldn't you prefer a nice game of chess?" with a little info about the movie. I would have preferred to hear it spoken aloud, but it's really more important that Siri can give me the weather and send messages and set my alarm without me having to click through several steps in the Clock app.
There have been some misunderstandings between us, and I may never fully trust Siri to give me directions while I'm on the road after our recent, "No, Athens, not Evans," argument. But thankfully we are still on speaking terms. All in all, Siri is much more useful than not having speech-recognition. Using it may even improve my diction. And I have no doubt that it'll either get much better over time, or be usurped by the next great step in the technology. And hopefully, that won't be the predecessor to WOPR.
|Total comments: 0|