We Are Data
eBook - ePub

We Are Data

Algorithms and The Making of Our Digital Selves

John Cheney-Lippold

Share book
  1. 320 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

We Are Data

Algorithms and The Making of Our Digital Selves

John Cheney-Lippold

Book details
Book preview
Table of contents
Citations

About This Book

What identity means in an algorithmic age: how it works, how our lives are controlled by it, and how we can resist it Algorithms are everywhere, organizing the near limitless data that exists in our world. Derived from our every search, like, click, and purchase, algorithms determine the news we get, the ads we see, the information accessible to us and even who our friends are. These complex configurations not only form knowledge and social relationships in the digital and physical world, but also determine who we are and who we can be, both on and offline. Algorithms create and recreate us, using our data to assign and reassign our gender, race, sexuality, and citizenship status. They can recognize us as celebrities or mark us as terrorists. In this era of ubiquitous surveillance, contemporary data collection entails more than gathering information about us. Entities like Google, Facebook, and the NSA also decide what that information means, constructing our worlds and the identities we inhabit in the process. We have little control over who we algorithmically are. Our identities are made useful not for us—but for someone else. Through a series of entertaining and engaging examples, John Cheney-Lippold draws on the social constructions of identity to advance a new understanding of our algorithmic identities. We Are Data will educate and inspire readers who want to wrest back some freedom in our increasingly surveilled and algorithmically-constructed world.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is We Are Data an online PDF/ePUB?
Yes, you can access We Are Data by John Cheney-Lippold in PDF and/or ePUB format, as well as other popular books in Social Sciences & Media Studies. We have over one million books available in our catalogue for you to explore.

Information

Publisher
NYU Press
Year
2017
ISBN
9781479802449

1

Categorization

Making Data Useful

This, the ability to take real-world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have this day. Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer.
—Christian Rudder, founder of OkCupid1
“We kill people based on metadata.”2
Metadata is data about data. It’s data about where you are, from where you send a text message, and to where that message is sent. It’s data that identifies what time and day you sent an email, the subject of that email, and even the type of device you used to send it. It’s data that flows openly through cell and fiber-optic networks, easily plucked from the ether and connected together. And it’s data about you that, when processed, is algorithmically spoken for in ways you probably wouldn’t want it to speak.
In the quotation that begins this chapter, former NSA chief Gen. Michael Hayden alerts us to how metadata can be spoken for “as if” it was produced by a ‘terrorist.’ That is, one’s metadata can be compared against a preexisting pattern, a “signature” in the parlance of the U.S. intelligence community. And if that metadata fits within this “signature” of a ‘terrorist’ template, one might find oneself at the receiving end of a Predator drone strike.
This data-based attack is a “signature strike,” a strike that requires no “target identification” but rather an identification of “groups of men who bear certain signatures, or defining characteristics associated with terrorist activity, but whose identities aren’t known.”3 With this in mind, we might choose to revise Hayden’s remarks to be a bit more specific: “we call people ‘terrorists’ based on metadata. The U.S.’s War on Terror does the rest.”
At the onset of the U.S.’s drone program in the early 2000s, strikes were “targeted.” Intelligence officials identified suspected individuals through their voice, their name, or on-the-ground reconnaissance. Then, a drone operator would launch a missile down onto where that individual was believed to be. But in 2008, and following Pentagon frustration with the constraints imposed by the Pakistani state’s military policy, the U.S. loosened its wartime drone guidelines. Now, a terrorist isn’t just who the U.S. claims is a terrorist but also who the U.S. considers a data-based ‘terrorist.’ While the U.S. doesn’t publicly differentiate between its “targeted” and “signature” strikes, one likely consequence of this shift was a spike in the frequency of drone attacks: there were 49 strikes during the five years between 2004 and 2008 and 372 during the seven years between 2009 and 2015.4
This loosening of legal restriction reindexed terrorist into ‘terrorist’: “a pre-identified ‘signature’ of behavior that the U.S. links to militant activity.”5 Since 2008, the U.S. government has launched what were billed as “precision” drone attacks against not just individual people but patterns in data—cell-phone and satellite data that looked “as if” it was a target that the U.S. wanted to kill, that is, a ‘terrorist.’6 Foreseeably, this “as if” mode of identification was not the same as “as.”
And hundreds of civilians have since died as a probable result. Journalist Tom Engelhardt proposes that “the obliterated wedding party may be the true signature strike of the post-9/11 era of American war-making, the strike that should, but never will, remind Americans that the war on terror was and remains, in distant lands, a war of terror.”7 The unintentional targeting of wedding parties, where individuals (and their cell phones) congregate outside city centers, producing data “as if” it was a terrorist meeting, reifies a level of permanent uncertainty in the geographic areas where these strikes happen. Even those who are not on a U.S. “kill list” live the potential to be identified “as if” they were—a precariousness of life that is terrorizing in and of itself.8
This operationalizing of ‘terrorist’ as an algorithmically processed categorization of metadata reframes who we are in terms of data. In our internetworked world, our datafied selves are tethered together, pattern analyzed, and assigned identities like ‘terrorist’ without attention to our own, historical particularities. As media scholar Mark Andrejevic writes, “such logic, like the signature strike, isn’t interested in biographical profiles and backstories, it does not deal in desires or motivations: it is post-narratival in the sense conjured up by [Ian] Bogost as one of the virtues of Object Oriented Ontology: ‘the abandonment of anthropocentric narrative coherence in favor of worldly detail.’”9
Yet even without an anthropocentric narrative, we are still narrativized when our data is algorithmically spoken for. We are strategically fictionalized, as philosopher Hans Vaihinger writes in his 1911 book The Philosophy of “As If”: “the purpose of the world of ideas as a whole is not the portrayal of reality . . . but to provide us with an instrument for finding our way about more easily in this world.”10 Importantly, those who use our data to create these ideas have the power to tell our “as if” stories for us. They “find” not “our way” but their way.
In this “as if” story of discovery, it is data that drives the plot. As Hayden described, “our species was putting more of its knowledge out there in ones and zeroes than it ever had at any time in its existence. In other words, we were putting human knowledge out there in a form that was susceptible to signals intelligence.”11 In a world inundated with data, traditional analyses fail to capture the rich, “worldly detail” of an NSA wiretap.12 Indeed, through this big-data perspective, to make sense of such vast quantities of data functionally requires the move from “targeted” to “signature.” Paring down the datafied world into “as if” ‘terrorist’ patterns is perceived as the logical next step.
Of course, the now-commonplace acceptance that the world is increasingly “data driven” might miss out on the fact that a ‘terrorist’ still looks and sounds very similar to whom the U.S. government has historically declared to be a terrorist. Both are most likely located in the Middle East and its neighbors. Both most likely speak Arabic or Urdu. Both most likely are not white. And both most likely practice Islam.
The discursive construction of terrorism in the U.S. draws from what Arab and Muslim American studies scholar Evelyn Alsultany describes as its Orientalist “baggage.”13 And this baggage also encounters, in the words of queer theorist Jasbir Puar and new-media scholar Amit S. Rai, the intersection of the racial and sexual “uncanny” of the “terrorist-monster.”14 Subsequently, the rhetoric of a monstrous other, one that designates the terrorist subject as a subject that deserves violence, flows easily into President Barack Obama’s routine defense of his own drone program: “let’s kill the people who are trying to kill us.”15
This othered monstrosity both defines contemporary U.S. enemyship and expands the conditions for who can be considered a terrorist. Here, the truism of “one man’s terrorist is another man’s freedom fighter” is reinforced by the fact that this identification is always made on terms favorable to the classifier’s geopolitical needs.16 So when the allocation of “terrorist” passes through the figure of the terrorist-monster, that is, one whose death is a priori justified, the already-dehumanizing protocol regulating aerial, “targeted” assassinations can be further dehumanized. Presently, a terrorist needs only to be a data “signature,” not a human being.
As an anonymous U.S. official told the Wall Street Journal in 2012, “You don’t necessarily need to know the guy’s name. You don’t have to have a 10-sheet dossier on him. But you have to know the activities this person has been engaged in.”17 Absent a legal requirement to target a single, identifiable individual, the ontological status of “target” is technologically rerouted. Rather than being a more adept or accurate processing feature, the U.S.’s ‘terrorist’ is merely a datafied object of simple, strategic convenience. It’s a functionalist category appropriate to the growing data-based logic of the NSA.
Rephrased in these functionalist terms, the loaded question of “who is a terrorist?” is answered in the logical vernacular of engineering. As Phil Zimmermann, creator of encryption software PGP, described, “The problem is mathematicians, scientists, engineers—they’ll find ways to turn these problems into engineering problems, because if you turn them into engineering problems then you can solve them. . . . The NSA has an incredible capability to turn things into engineering problems.”18 Knowledge about who we are is constructed according to what ethicist Luciano Floridi refers to as the “small patterns” in data, or what political theorist Stephen Collier would call “patterns of correlation,” that extend the limits of conventional knowledge.19 The NSA’s ‘terrorist’ doesn’t replace the concept of terrorist but adds onto it yet another layer. The classificatory family of terrorist must also include its algorithmic cousins.

The ‘Needle’ in the ‘Haystack’

Philosopher GrĂ©goire Chamayou has deftly outlined this shift to algorithmic knowledge in his investigation of NSA history, writing on the various ways a ‘terrorist’ signature can be defined—much like the different ways and reasons that someone is called a terrorist. For example, using a method called graph matching, a “pattern graph” is modeled in order to compare subsets of a larger graph, an “input graph.” If elements in an input graph find near equivalence with a pattern graph, those elements are classified accordingly.20 Or, following Chamayou and with attention to figure 1.1, “If Bill and Ted live at the same address, rent a truck, visit a sensitive location and buy some ammonium nitrate fertilizer (known to be used for growing potatoes but also for the production of home-made bombs), they would exhibit a behavioural pattern that corresponds to that of a terrorist signature, and the algorithm designed to process this data would then sound the alarm.”21
We might better understand this method in terms of one of the U.S. intelligence community’s favorite metaphors: the needle in a haystack. As former deputy attorney general James Cole argued, “if you’re looking for the needle in the haystack, you have to have the entire haystack to look through.”22 But there is no actual haystack. Rather, the haystack is the “observed activity” of the input graph, a technological construction according to the array of political decisions that determine what and whose activity is observed—and how that activity comes to be datafied.
Similarly, there is no such thing as a needle, either. While there may be a group of people who intend to commit an act of violence against U.S. soldiers or citizens, that intention cannot be “found” like a physical needle. Rather, the needle must be constructed. To do so, the NSA aggregates an “as if” set of datafied elements. Then, it uses that set to parse the constructed haystack (data set) in order to find something that statistically resembles its patterned equivalence. For the aforementioned hypothetical ‘terrorist,’ that needle looks like data about two people, who reside in the same house, buy fertilizer, rent a truck, and observe the same factory. In this way, it’s not a needle that U.S. government looks for; it’s a datafied representation of that ‘needle.’
Figure 1.1. A graph showing the “pattern” of a ‘terrorist’ and the “observed activity” of available data. Source: Seth Greenblatt, Thayne Coffman, and Sherry Marcus, “Behavioral Network Analysis for Terrorist Detection,” in Emergent Information Technologies and Enabling Policies for Counter Terrorism, ed. Robert L. Popp and John Yen (Hoboken, NJ: Wiley–IEEE, 2006), 334.
‘Needle’ is a new technical construction that facilitates algorithmic analyses of the datafied world. Much like the social constructions of gender, race, sexuality, and terrorist, the datafied world is not lying in wait to be discovered. Rather, it’s epistemologically fabricated. And because these constructions—of who counts as a terrorist or what it means to be a man—are legitimated through institutions like the state, media, medicine, and culture at large, they are also politicized and thus, in the words of legal scholar C. Edwin Baker, “corrupt.”23 They are “inventions,” to use social theorist Nikolas Rose’s term, born in contemporary relations of power and logics of classification and thus not authentic versions of who we think we might be.24
We return to Chamayou: “The entire [U.S. government] project rested on the premise that ‘terrorist signatures’ actually existed. Yet this premise did not hold up. The conclusion was inevitable: ‘The one thing predictable about predictive data mining for terrorism is that it would be consistently wrong.’”25 Any single, universal model of ‘terrorist’ will unavoidably fail to account for the wide varieties of different terror attacks that happen around the world.
In one reading, this regularity of error might suggest abandoning use of signatures in the first place. But following the engineering logic of the NSA, it simply means the constructed signature just needs better data. As the NSA mantra went, “sniff it all,” “collect it all,” “know it all,” “process it all,” and “exploit it all.”26
Who counts as a terrorist is certainly a construction, a classification of people or organizations that a certain state doesn’t like. Likewise, a ‘terrorist’ is also constructed, fabricated via patterns of data that seem “as if” they were made by a terrorist. This ‘terrorist,’ then, serves as a construction about a construction. But unlike the indefinite, relative category of terrorist, the category of a ‘terrorist’ empirically exists. It’s a datafied model, a material template that can be copied, changed, and ceaselessly compared.
“We are data” means we are made of these technical constructions, or what I describe as measurable types.
As noted by an array of different scholars from various disciplinary backgrounds, these datafied models are quickly becoming the primary mechanisms by which we’re interpreted by computer networks, governments, and even our friends.27 From the corporate data troves of Google to the governmental dragnets of the NSA, who we are is increasingly made through d...

Table of contents