1
Tracing Life through Data
The guide explains how itâs programmed to transmit biosensory information, like heart-rate, medical needs, sleep patterns. âIt will be your guardian, protector. It will bring good things to you.â . . .
In an earlier moment in her life she might have described him as a ghost, a spiritual manifestation of the past. But she knows better now; invisibility is a prison. âHauntingâ is a quaint and faint manifestation of the tortured. . . . She knew there was just one way forward and she understood the cost: the facts of her interior, available for use in a public dataset, as part of some kind of game. Besides, she hadnât made a fuss when he underwent his own erasure.
âYes, I am a pawn. Can we please go now?â
âJena Osman, Motion Studies
In the first few months of 2020, as the coronavirus pandemic spread across the United States, the data analytics firm Palantir Technologies won a contract with the US Department of Health and Human Services (HHS) to provide a COVID-19 contact-tracing platform called HHS Protect. Since its founding in 2003 by Silicon Valley libertarian entrepreneurs, Palantir has burnished its infamous reputation by developing platforms for military, police, and antiterrorism applications, such as Project Maven, which utilizes artificial intelligence in military drones. Palantir holds multiple contracts with the likes of the Pentagon, the Department of Homeland Security, and the Federal Bureau of Investigation (FBI). The company describes its core business as building software to support data-driven decision-making and operations. Alex Karp, founder and CEO, has characterized its platforms as clandestine services, where company products are âused on occasion to kill people. . . . If youâre looking for a terrorist in the world now youâre probably using our government product and youâre probably doing the operation that takes out the person in another product we build.â1 The data-mining company is probably best known for its work for Immigration and Customs Enforcement (ICE) to develop the controversial Investigative Case Management system. Palantirâs forty-one million dollar contract for its data surveillance technology has enabled ICE to intensify its detention raids on targeted communities, with serious consequences. The stepped-up raids have terrorized migrants and accelerated family separations; they increased undocumented immigrant deportations by tenfold in 2018 alone (Mijente 2019).2 When the news broke amid the pandemic that the company had secured the HHS COVID-19 contact-tracing contract, many, especially those in the human rights and immigrantsâ rights communities, voiced concerns about the data privacy and security protections on the platform.3 What data would Palantir collect, who exactly would have access to them, could the data be shared, and with whom? Could ICE use HHS Protect to come after undocumented migrants, also among the groups most at risk of contracting the coronavirus?
In a public-relations bid to assuage these concerns for a general audience, Alex Karp agreed to be interviewed by Axios journalist Mike Allen for its cable news program.4 In the online interview, Karp, who was quarantining in his rural New Hampshire home, preempted Allenâs questions concerning the publicâs fears about Palantirâs work on a coronavirus contact-tracing tool. Karp focused on what he knew ordinary people were afraid of and what they would want to know:
Where do you get the data? How long do you keep it? Where is it kept? Can it be removed? Is it being used to monetize me? How can I [be] guaranteed this not being repurposed for people to understand my personal life, my political views, my work habits outside of my work environment?
Allen responded, somewhat ironically, that Karp had done his job as a journalist for him. Karpâs rhetorical interview questions also touch on the concerns that recur throughout this book. Who is collecting data about us? How and why do they collect our data, and who do they share it with? How is personal data used to make money for others? What stories do data tell about us? Why do we trust these data narratives, especially when these become the âtruthsâ that increasingly define us? How and when did data become the facts of our lives?
TRACING THE INVISIBLE
This chapter opened with an epigraph from poet Jena Osmanâs Motion Studies, an essay poem comprised of several concurrent narratives in conversation with one another that revolve around visibility, the desire to disappear, and the impossibility of escape once captured and categorized by the machines of scientific inquiry and the political economy built around digital data. The first narrative is a speculative story featuring two characters, a woman and a man, who have won, by lottery, the right to be forgotten, to jump off the digital grid and âdisappear beyond the companyâs horizonâ (2019, 19). But much as, in Roald Dahlâs Charlie and the Chocolate Factory (1964), the golden ticket wins Charlie only the chance to prove his worthiness to Willy Wonka, the lottery ticket offers Osmanâs protagonists only the right to run a competitive race through the desert toward digital oblivion, against other âluckyâ lottery winners. The race takes the contestants ever-closer to the infinite horizon of anonymity, and in exchange, as they run, theyâre tracked: their movements mapped out, and their heart rates, breath, and other biometric data collected, analyzed, shared, and stored for eternity. Before they can cash in their lottery ticket, both must undergo a procedure that leaves the woman with a physical body, visible and opaque, and her partner, the man, with a body as solid as the womanâs, yet transparent and invisible to human eyes (but legible to computer vision). The womanâin her opacityârealizes that even in their attempt to jump off the grid of visibility, her partnerâs transparent body âlives more as a trace, a clue, dataâ but that both bodies serve as a dialectical contrast to the other, making them both legible to the corporationâs gaze (2019, 17).
The poemâs second entwined narrative concerns Ătienne-Jules Marey, a nineteenth-century French inventor and early photography experimentalist who was obsessed with how to make visible the bodyâs inner, invisible movements, like the heart pumping blood through the bodyâs complex network of arteries and veins. Marey made the bodyâs invisible movements graphically legible through a sphygmograph, a machine he invented that traces the pulse onto a piece of paper. All of Mareyâs inventions visualized the unseen into graphic traces, images, and lines. These inventions also included cameras, early cinematic prototypes, and chronophotography, which sequentially captured the movement of air and fluids, the flight of birds, and the galloping of horses. Some of Mareyâs experiments that attempted to capture as visual information the life-sustaining movements and processes that occur inside bodies became the basis of technologies used today in seemingly disparate contexts of institutionalized power. One such technology is the sphygmomanometer, which measures blood pressure. Another device that owes a lot to Marey is the polygraph machine, or the âlie detector test.â Law enforcement and other security fields in the United States still administer this test to job applicants, although the test is no longer admissible as evidence in court, as the testâs claim that a subjectâs change in heart rate or skin conductivity indicates a falsehood has been debunked.5 Yet as historians Lillian Daston and Peter Galison noted, Marey argued that to use mechanically generated depictions of phenomena was to speak in the âlanguage of the phenomena themselves,â and through the removal of the human, or at least of human intervention, the tendency is to let the machine take over and enable nature to speak for itself (1992, 81).
WHAT COUNTS AS DATA, WHAT DATA BECOME FACT?
Wittgenstein observed that âthe world is the totality of facts, not of thingsâ (2010, 25). How is something that is unquantifiable made to count, and to count as âfactâ? When asked what constitutes data, many data scientists respond that data are information that help to anchor facts or to get to âthe truthâ (Leonelli 2016a). Data are abstracted qualitative information that are objectified as quantifiable values, which are then used as evidence of a phenomenon or process. It is in this way that data are made into facts that are used to help ground or reveal a truth about a phenomenon. But this of course is all contingent, shaped by the sociopolitical contexts of where and how data are extracted, who is building the models and algorithms, and how the data will be put to use.
Mary Poovey, a cultural and economic historian, writes about how the modern fact rooted in numbers and measuresâdataâwas born in the early Renaissance. The modern factâs midwives were the European merchants and burgeoning capitalists tracking their wealth, profits gleaned from colonial empire-building (through expropriation and slavery) in the Americas and Asiaâand keeping it separate from the Church (Poovey 1998). Poovey trains a meticulous eye on the rise of double-entry accounting as it developed in fifteenth-century Italy, adapted from Indian and Jewish traders who pioneered the method. It seems to be no accident that this form of accounting coincided with the Western powersâ early colonization in the Americas as well as with the development of early Enlightenment knowledge production. Within her analysis, she shows how bringing knowledge about oneâs possessions, outstanding loans and debts, and transactions together into a ledger created connections that were at once both narrative and numerical. The ledger book was an early rhetorical attempt to make epistemic and factual statements about the world, separate from the authority of God or the Church. The doubleentry accounting system was a way to confer social authority to numbers, to make numbers both expose truth and bear facts, even if the numbers were invented: âFor late sixteenth-century readers, the balance conjured up both the scales of justice and the symmetry of Godâs worldâ (1998, 54).
By rhetorically making the numbers of the ledger book resemble or refer to the balance of Godâs order, rather than to witchcraft or sorcery, these early capitalists made numbers into facts. According to Poovey, this moment was the necessary societal shift in the West that gave numbers legitimacy and bestowed them with the authority to say something about the nature of things, about the âreal world.â It is all the more significant that these early capitalists used fictitious numbers to prove that the new accounting system was valid. In Pooveyâs account, the social and political legitimacy of the double-entry ledger book coincided with the rise of knowledge-making through the documentation and measurement of observable phenomena, another way of making numbers into facts. In the five hundred years since Italian merchants adapted double-entry book-keeping, we have seen scientific inquiry, revolutions and turmoil, slavery, and expropriation of labor, land, natural, and human resources all contribute to the making of the âmodernâ world and the construction of data as facts. But the process of objectifying phenomena into âdata factsâ necessarily involves power: power over the conditions of the extraction of the raw materials, and power over the collection, processing, analysis, and packaging of what becomes data.
Data might be understood as something that can be measured, counted, or defined by numbers or statisticsâthe temperature of the air, the pH level of a soil sample, or the number of people of a certain age who live within a neighborhood. For those who work in the data-based economy, data can be almost anythingâthe grainy surveillance camera image, the remaining charge on a phoneâs battery, or the blood glucose level measured over a three-month periodâthat can be captured and transformed into digital information. In other words, data can be any qualitative measure translatable into numbers. Once digitized, data can be transmitted, shared, and operationalized in many forms, among them spreadsheets, databases, and platforms. Surveillance cameras, credit card swipes, retail loyalty cards, and phone metadata all capture in a variety of ways what we presume to be our untraceable and fleeting actionsâspending idle moments on a sidewalk, gazing into a storefront, walking into a pharmacy, or browsing the aisles of a storeâand count them as data. Examining the commercial applications that track and capture consumer data, Shoshana Zuboff (2019) detailed the process of converting the intangibleâbehaviors and feelingsâinto finite, computer-readable data; these discrete inputs are fed into algorithmic models to either predict or drive future consumer behaviors.
That data and the algorithmic processes used to analyze them are neutral, unmediated, and unbiased statements on realityâthat data are factâ is a persuasive and persistent notion, even in fields like medicine that rely on human interpretation. In the university population health informatics laboratory where I was based during fieldwork for this book, physicians would seek out Theresa, the head of the lab, to collaborate on research that utilizes artificial intelligence (AI) analytical techniques. In one case, an OBGYN wanted to use the labâs expertise in artificial intelligence systems and deep learning techniques to analyze focus group data she had collected from mothers who experienced trauma in childbirth. The practitioner had the expressed belief that such methods, because they removed the âhuman,â would have less âbiasâ and the data could speak for themselves. In another case outside Theresaâs population health informatics lab, emergency medicine and medical AI researchers at University of California, Berkeley turned to deep-learning algorithms to read and interpret knee x-rays in order to override a persistent medical bias in which doctors underestimate the pain experienced by patients from underserved populations, such as ethnic minorities, women, or poor people (Pierson et al. 2021). In this study, the algorithm measured underlying osteoarthritic damage to the knee joints to predict the pain severity a patient experienced, with an accuracy rate almost five times better than that of the radiologists who were interpreting the x-rays. One of the studyâs authors, Ziad Obermeyer, when asked in an interview about building AI models to support clinical decision-making, responded: âDo we train the algorithm to listen to the doctor, and potentially replicate decades of bias built into medical knowledge . . . or do we train it to listen to the patient and represent underserved patientsâ experiences of pain accurately and make their pain visible?â6
Ethicists of artificial intelligence Alexander Campolo and Kate Crawford (2020) call this dynamic âenchanted determinism,â where magical thinking helps to rationalize a faith that AI is completely free of human bias. Many believe that AI uncovers the truth that data hold, rather than that data merely anchor a truth. But as Crawford notes in Atlas of AI (2021), nothing in AI computing is artificial or intelligent; rather, AI and the subjects (models, analyses, and so forth) it produces materially embody the biopolitical. âAI systems are not autonomous, rational or able to discern anything without extensive, computational intensive trainingâ by humans (Crawford 2021, 8). Powerful, political-corporate interests build these data- and capital-intensive AI systems, and as such, the analytical results that AI systems produce become a registry of that power. From the lithium extracted from conflict-encumbered countries for computer batteries, to the âdirtyâ or inaccurate and biased data that are mined and fed into machine learning algorithms, AI is made from massive amounts of natural resources and human labor (and human misery), all of which remain invisible. Notwithstanding Obermeyerâs and his coauthorsâ recognition that it is not enough to simply acknowledge the medical systemâs biases, AI models must be built to account for that bias as well as for the sociopolitical at the level of the technical. Accountability is far from the fieldâs norm, and it is often actively resisted (Raji and Buolamwini 2019). After Timnit Gebru, former cohead of Googleâs Ethical AI team, published findings of the bias baked into many of the companyâs products, along with her public crit...