PART I INVENTION
Amazon, December 31, 2010
Annual net sales: | $34.20 billion |
Full- and part-time employees: | 33,700 |
End-of-year market capitalization: | $80.46 billion |
Jeff Bezos end-of-year net worth: | $15.86 billion |
CHAPTER 1 The Ăber Product Manager
There was nothing particularly distinctive about the dozen or so low-rise buildings in Seattleâs burgeoning South Lake Union district that Amazon moved into over the course of 2010. They were architecturally ordinary and, on the insistence of its CEO, bore no obvious signage indicating the presence of an iconic internet company with almost $35 billion in annual sales. Jeff Bezos had instructed colleagues that nothing good could come from that kind of obvious self-aggrandizement, noting that people who had business with the company would already know where it was located.
While the offices clustered around the intersection of Terry Avenue North and Harrison Street were largely anonymous, inside they bore all the distinguishing marks of a unique and idiosyncratic corporate culture. Employees wore color-coded badges around their necks signifying their seniority at the company (blue for those with up to five years of tenure, yellow for up to ten, red for up to fifteen), and the offices and elevators were decorated with posters delineating Bezosâs fourteen sacrosanct leadership principles.
Within these walls ranged Bezos himself, forty-six years old at the time, carrying himself in such a way as to always exemplify Amazonâs unique operating ideology. The CEO, for example, went to great lengths to illustrate Amazonâs principle #10, âfrugalityâ: Accomplish more with less. Constraints breed resourcefulness, self-sufficiency, and invention. There are no extra points for growing headcount, budget size, or fixed expense. His wife, MacKenzie, drove him to work most days in their Honda minivan, and when he flew with colleagues on his private Dassault Falcon 900EX jet, he often mentioned that he personally, not Amazon, had paid for the flight.
If Bezos took one leadership principle most to heartâwhich would also come to define the next half decade at Amazonâit was principle #8, âthink bigâ: Thinking small is a self-fulfilling prophecy. Leaders create and communicate a bold direction that inspires results. They think differently and look around corners for ways to serve customers. In 2010, Amazon was a successful online retailer, a nascent cloud provider, and a pioneer in digital reading. But Bezos envisioned it as much more. His shareholder letter that year was a paean to the esoteric computer science disciplines of artificial intelligence and machine learning that Amazon was just beginning to explore. It opened by citing a list of impossibly obscure terms such as ânaĂŻve Bayesian estimators,â âgossip protocols,â and âdata sharding.â Bezos wrote: âInvention is in our DNA and technology is the fundamental tool we wield to evolve and improve every aspect of the experience we provide our customers.â
Bezos wasnât only imagining these technological possibilities. He was also attempting to position Amazonâs next generation of products directly on its farthest frontier. Around this time, he started working intensively with the engineers at Lab126, Amazonâs Silicon Valley R&D subsidiary, which had developed the companyâs first gadget, the Kindle. In a flurry of brainstorming sessions, he initiated several projects to complement the Kindle and the coming Kindle Fire tablets, which were known internally at the time as Project A.
Project B, which became Amazonâs ill-fated Fire Phone, would use an assembly of front-facing cameras and infrared lights to conjure a seemingly three-dimensional smartphone display. Project C, or âShimmer,â was a desk lampâshaped device designed to project hologram-like displays onto a table or ceiling. It proved unfeasibly expensive and was never launched.
Bezos had peculiar ideas about how customers might interact with these devices. The engineers working on the third version of the Kindle discovered this when they tried to kill a microphone that was planned for the device, since no features were slated to actually use it. But the CEO insisted that the microphone remain. âThe answer I got is that Jeff thinks in the future weâll talk to our devices,â said Sam Bowen, then a Kindle hardware director. âIt felt a bit more like Star Trek than reality.â
Designers convinced Bezos to lose the microphone in subsequent versions of the Kindle, but he clung to his belief in the inevitability of conversational computing and the potential of artificial intelligence to make it practical. It was a trope in all his favorite science fiction, from TVâs Star Trek (âcomputer, open a channelâ) to authors like Arthur C. Clarke, Isaac Asimov, and Robert A. Heinlein, whose books lined the library of hundreds of volumes in his lakefront Seattle-area home. While others read these classics and only dreamed of alternate realities, Bezos seemed to consider the books blueprints for an exciting future. It was a practice that would culminate in Amazonâs defining product for a new decade: a cylindrical speaker that sparked a wave of imitators, challenged norms around privacy, and changed the way people thought about Amazonânot only as an e-commerce giant, but as an inventive technology company that was pushing the very boundaries of computer science.
The initiative was originally designated inside Lab126 as Project D. It would come to be known as the Amazon Echo, and by the name of its virtual assistant, Alexa.
As with several other projects at Amazon, the origins of Project D can be traced back to discussions between Bezos and his âtechnical advisor,â or TA, the promising executive handpicked to shadow the CEO. Among the TAâs duties were to take notes in meetings, write the first draft of the annual shareholder letter, and learn by interacting with the master closely for more than a year. In the role from 2009 to 2011 was Amazon executive Greg Hart, a veteran of the companyâs earliest retail categories, like books, music, DVDs, and video games. Originally from Seattle, Hart had attended Williams College in Western Massachusetts and, after a stint in the ad world, returned home at the twilight of the cityâs grunge era, sporting a goatee and a penchant for flannel shirts. By the time he was following Bezos around, the facial hair was gone and Hart was a rising corporate star. âYou sort of feel like youâre an assistant coach watching John Wooden, you know, perhaps the greatest basketball coach ever,â Hart said of his time as the TA.
Hart remembered talking to Bezos about speech recognition one day in late 2010 at Seattleâs Blue Moon Burgers. Over lunch, Hart demonstrated his enthusiasm for Googleâs voice search on his Android phone by saying, âpizza near me,â and then showing Bezos the list of links to nearby pizza joints that popped up on-screen. âJeff was a little skeptical about the use of it on phones, because he thought it might be socially awkward,â Hart remembered. But they discussed how the technology was finally getting good at dictation and search.
At the time, Bezos was also excited about Amazonâs growing cloud business, asking all of his executives, âWhat are you doing to help AWS?â Inspired by the conversations with Hart and others about voice computing, he emailed Hart, device vice president Ian Freed, and senior vice president Steve Kessel on January 4, 2011, linking the two topics: âWe should build a $20 device with its brains in the cloud thatâs completely controlled by your voice.â It was another idea from the boss who seemed to have a limitless wellspring of them.
Bezos and his employees riffed on the idea over email for a few days, but no further action was taken, and it could have ended there. Then a few weeks later, Hart met with Bezos in a sixth-floor conference room in Amazonâs headquarters, Day 1 North, to discuss his career options. His tenure as TA was wrapping up, so they discussed several possible opportunities to lead new initiatives at the company, including positions in Amazonâs video streaming and advertising groups. Bezos jotted their ideas down on a whiteboard, adding a few of his own, and then started to apply his usual criteria to assess their merit: If they work, will they grow to become big businesses? If the company didnât pursue them aggressively now, would it miss an opportunity? Eventually Bezos and Hart crossed off all the items on the list except oneâpursuing Bezosâs idea for a voice-activated cloud computer.
âJeff, I donât have any experience in hardware, and the largest software team Iâve led is only about forty people,â Hart recalled saying.
âYouâll do fine,â Bezos replied.
Hart thanked him for the vote of confidence and said, âOkay, well, remember that when we screw up along the way.â
Before they parted, Bezos illustrated his idea for the screenless voice computer on the whiteboard. The first-ever depiction of an Alexa device showed the speaker, microphone, and a mute button. And it identified the act of configuring the device to a wireless network, since it wouldnât be able to listen to commands right out of the box, as a challenge requiring further thought. Hart snapped a photo of the drawing with his phone.
Bezos would remain intimately involved in the project, meeting with the team as frequently as every other day, making detailed product decisions, and authorizing the investment of hundreds of millions of dollars in the project before the first Echo was ever released. Using the German superlative, employees referred to him as the ĂŒber product manager.
But it was Greg Hart who ran the team, just across the street from Bezosâs office, in Fiona, the Kindle building. Over the next few months, Hart hired a small group from in and outside the company, sending out emails to prospective hires with the subject line âJoin my missionâ and asking interview questions like âHow would you design a Kindle for the blind?â Then, just as obsessed with secrecy as his boss, he declined to specify what product candidates would be working on. One interviewee recalled guessing that it was Amazonâs widely rumored smartphone and said that Hart replied, âThereâs another team building a phone. But this is way more interesting.â
One early recruit was Amazon engineer Al Lindsay, who in a previous job had written some of the original code for telco US Westâs voice-activated directory assistance. Lindsay spent his first three weeks on the project on vacation at his cottage in Canada, writing a six-page narrative that envisioned how outside developers might program their own voice-enabled apps that could run on the device. Another internal recruit, Amazon manager John Thimsen, signed on as director of engineering and coined a formal code name for the initiative, Doppler, after the Project D designation. âAt the start, I donât think anybody really expected it to succeed, to be honest with you,â Thimsen told me. âBut to Gregâs credit, halfway through, we were all believers.â
The initial Alexa crew worked with a feverish sense of urgency due to their impatient boss. Unrealistically, Bezos wanted to release the device in six to twelve months. He would have a good reason to hurry. On October 4, 2011, just as the Doppler team was coming together, Apple introduced the Siri virtual assistant in the iPhone 4S, the last passion project of cofounder Steve Jobs, who died of cancer the next day. That the resurgent Apple had the same idea of a voice-activated personal assistant was both validating for Hart and his employees and discouraging, since Siri was first to market and with initial mixed reviews. The Amazon team tried to reassure themselves that their product was unique, since it would be independent from smartphones. Perhaps a more significant differentiator though was that Siri unfortunately could no longer have Jobsâs active support, while Alexa would have Bezosâs sponsorship and almost maniacal attention inside Amazon.
To speed up development and meet Bezosâs goals, Hart and his crew started looking for startups to acquire. It was a nontrivial challenge, since Nuance, the Boston-based speech giant whose technology Apple had licensed for Siri, had grown over the years by gobbling up the top American speech companies. Doppler execs tried to learn which of the remaining startups were promising by asking prospective targets to voice-enable the Kindle digital book catalog, then studying their methods and results. The search led to several rapid-fire acquisitions over the next two years, which would end up shaping Alexaâs brain and even the timbre of its voice.
The first company Amazon bought, Yap, a twenty-person startup based in Charlotte, North Carolina, automatically translated human speech such as voicemails into text, without relying on a secret workforce of human transcribers in low-wage countries. Though much of Yapâs technology would be discarded, its engineers would help develop the technology to convert what customers said to Doppler into a computer-readable format. During the prolonged courtship, Amazon execs tormented Yap execs by refusing to disclose what theyâd be working on. Even a week after the deal closed, Al Lindsay was with Yapâs engineers at an industry conference in Florence, Italy, where he insisted that they pretend they didnât know him, so that no one could catch on to Amazonâs newfound interest in speech technology.
After the purchase was finalized for around $25 million, Amazon dismissed the companyâs founders but kept its speech science group in Cambridge, Massachusetts, making it the seed of a new R&D office in Kendall Square, near MIT. Yap engineers flew to Seattle, walking into a conference room on the first floor of Fiona with locked doors and closed window blinds. There Greg Hart finally described âthis little device, about the size of a Coke can, that would sit on your table and you could ask it natural language questions and it would be a smart assistant,â recalled Yapâs VP of research, Jeff Adams, a two-decade veteran of the speech industry. âHalf of my team were rolling their eyes, saying âoh my word, what have we gotten ourselves into.â â
After the meeting, Adams delicately told Hart and Lindsay that their goals were unrealistic. Most experts believed that true âfar-field speech recognitionââcomprehending speech from up to thirty-two feet away, often amid crosstalk and background noiseâwas beyond the realm of established computer science, since sound bounces off surfaces like walls and ceilings, producing echoes that confuse computers. The Amazon executives responded by channeling Bezosâs resolve. âThey basically told me, âWe donât care. Hire more people. Take as long as it takes. Solve the problem,â â recalled Adams. âThey were unflappable.â
A few months after the Yap purchase, Greg Hart and his colleagues acquired another piece of the Doppler puzzle. It was the technological antonym of Yap, which converted speech into text. Instead, the Polish startup Ivona generated computer-synthesized speech that resembled a human voice.
Ivona was founded in 2001 by Lukasz Osowski, a computer science student at the GdanÂŽsk University of Technology. Osowski had the notion that so-called âtext to speech,â or TTS, could read digital texts aloud in a natural voice and help the visually impaired in Poland appreciate the written word. With a younger classmate, Michal Kaszczuk, he took recordings of an actorâs voice and selected fragments of words, called diphones, and then blended or âconcatenatedâ them together in different combinations to approximate natural-sounding words and sentences that the actor might never have uttered.
The Ivona founders got an early glimpse of how powerful their technology could be. While students, they paid a popular Polish actor named Jacek Labijak to record hours of speech to create a database of sounds. The result was their first product, Spiker, which quickly became the top-selling computer voice in Poland. Over the next few years, it was used widely in subways, elevators, and for robocall campaigns. Labijak subsequently began to hear himself everywhere and regularly received phone calls in his own voice urging him, for example, to vote for a candidate in an upcoming election. Pranksters manipulated the software to have him say inappropriate things and posted the clips online, where his children discovered them. The Ivona founders then had to renegotiate the actorâs contract after he angrily tried to withdraw his voice from the software. (Today âJacekâ remains one of the Polish voices offered by AWSâs Amazon Polly computer voice service.)
In 2006, Ivona began to enter and repeatedly win the annual Blizzard Challenge, a competition for the most natural computer voice, organized by Carnegie Mellon University. By 2012, Ivona had expanded into twenty other languages and had over forty voices. After learning of the startup, Greg Hart and Al Lindsay diverted to GdanÂŽsk on their trip through Europe looking for acquisition targets. âFrom the minute we walked into their offices, we knew it was a culture fit,â Lindsay said, pointing to Ivonaâs progress in a field where researchers often got distracted by high-minded pursuits. âTheir scrappiness allo...