CHAPTER 1
The Taming of the Web and the Rise of Algorithmic News
It is still quite common to use the term ânew mediaâ in connection with the Internet. However, the Internet is almost forty years old. The web, as a component of the Internet, is more than twenty-five years old. This is a sufficient passage of time that any analysis of our contemporary media ecosystem needs to be grounded in the historical context of the evolution of online media.
As we look back on this history, it becomes clear that the initial structure of the web (Web 1.0, as it has been called)1 was fundamentally incompatible with the economics of media and, in many ways, inhospitable to establishedâand perhaps intractableâdynamics of media usage. Thus, many of the characteristics we associate with the social media stage of Internet evolution (once widely referred to as Web 2.0) represent efforts to tame the web, to counteract fundamental Web 1.0 characteristics that were problematic for many stakeholders. Whether we are thinking about publishers seeking to provide content online, advertisers wanting to reach audiences online, or even audiences hoping to produce, transmit, and receive news and information, we can see ways in which the 1.0 version of the web represented a media system that was, to some extent, unsustainableâor at least undesirable.2
When we think about the web in its early incarnation, many characteristics, such as the information search and retrieval demands it placed on users, can be seen as incrediblyâeven debilitatinglyâinefficient for many categories of stakeholders.3 Embedded within the many undeniable advancements that we associate with the transition from the Web 1.0 to the Web 2.0 digital media ecosystem, however, are a number of less widely recognized regressions that can be interpreted as reestablishing elements of a more traditional, more manageable, more passive mass media framework for our digital media ecosystem.
Audience Disaggregation to Reaggregation
If there is one defining characteristic of the early web, it is the extent to which it represented a degree of fragmentation of both content options and audience attention that went well beyond any previous medium.4 This fragmentation was a function of the extent to which the web provided lower barriers to entry to producing and distributing content than any previous medium, as well as the lack of channel or space constraints that characterized previous media. This fragmentation was widely lauded as a central component of the âdemocratizationâ of the media that the Internet represented, in which the opportunity to speak and be heard could extend well beyond the privileged few who owned or operated the relatively few media outlets.5 This fragmentation represented, for many, the ideal in terms of providing an opportunity to serve the full gamut of audience tastes and preferences. No longer would a relatively limited number of gatekeepers (TV networks, cable systems, or local newspapers) wield so much control over the production and distribution of news, information, and cultural content. No longer would content catering to niche interests be unable to find distribution, and thus fail to reach those niche audiences.
However, there are a number of somewhat intractable problems inherent in a media ecosystem of unprecedented fragmentation, diminished gatekeepers, and exceptionally low barriers to entry. As the Internet grew, and evolved as a commercial medium, these fundamental problems became more pronounced.
AUDIENCE CHALLENGES
From an audience standpoint, effectively navigating the web was fairly labor intensive in comparison to other media. The number of choices available was astronomical. Fortunately, tools such as search engines and portal sites were available to assist in this process. Early search engines, however, were not nearly as comprehensive or efficient as Google, which did not arrive until the late 1990s and eventually came to dominate what was a fairly crowded search engine market.6 Throughout the late 1990s and early 2000s, more than half of the top ten online destinations in terms of audience traffic were search engines or portals.7 I recall a conversation during that period with a television industry professional, who made the point that this would be equivalent to six of the top ten television networks being various permutations of the TV Guide Channel, a TV channel that at the time featured a slowly scrolling list of all the programs currently airing on each available broadcast and cable network.8
In the early days of the web, there were even consumer magazines devoted exclusively to highlighting and reviewing websites of potential interest. There is of course something fascinatingly anachronistic about the idea of reading a print magazine to learn about which websites one might want to visit online.
Searching the web for content that interested you used to require a meaningful investment of time, along with the development of some basic search skills. The magnitude of these Web 1.0 search costs and the general unwillingness of online users to incur them were well reflected in the fairly limited channel repertoires that early web users developed. The term âchannel repertoireâ comes from television audience research; it refers to the extent to which television viewers tend to establish limited repertoires of channels that they consumed regularly.9 Importantly, as the number of channels available to viewers increases, channel repertoires increase only slightly, not nearly in proportion with the increase in available channels. This pattern suggests a process of diminishing returns in terms of the relationship between the number of content options provided and the number of content options actually consumed.
When this analytical framework was applied to early web users, findings indicated that, despite the exponentially greater availability of content offerings, individualsâ online channel repertoires (i.e., the number of websites they visited regularly) were not much bigger than their television channel repertoires.10 While one personâs online channel repertoire might be completely different from anotherâs, the reality was that individual users tended not to incur the search costs necessary to take advantage of the diversity of content offerings online in the way that we might have expected. In many ways, this was a recurrence of what happened when cable dramatically expanded our television viewing options, and is perhaps indicative of an embedded characteristic of audience behavior. We simply do not take full advantage of diverse content offerings when they are made available to us.
ADVERTISER CHALLENGES
The fragmentation of the Web 1.0 world produced a different set of challenges for advertisers. To understand these challenges, it is first important to understand how the process of buying and selling audience attention (the key product advertisers need) has traditionally worked. The audience commodity (as it has often been called) was produced through content providers (television programmers, newspaper publishers, websites, etc.) attracting audiences to their content offerings.11 The size and demographic characteristics of these audiences were determined by third-party audience measurement firms, who measured the media consumption behaviors of a very small sample of television viewers, radio listeners, or print readers. These samples by necessity tended to be small because the process of audience measurement has traditionally been quite expensive, given the costs associated with recruiting and training participants, as well as the costs associated with the measurement technologies. However, a small sample of media consumers was acceptable as long as it was representative of the population as a whole. Measurement firms could confidently make the claim that their ratings were generalizable to the population as a whole, and advertisers and content providers felt comfortable treating these ratings figures as the currency of exchange in the marketplace for audience attention.12
The web posed problems for this well-established dynamic. The first problem was that there were so many websites that needed to have their audiences measured. This meant that an enormous sample needed to be in place. Without an enormous sample, even for a website attracting a decent-sized audience, it can often be the case that none of the audience members in the sample are visiting this site. As a result, large swaths of audience attention can go unmeasuredâand consequently cannot be bought and sold in the audience marketplace.13 More choices in terms of content options means larger audience samples are necessary.
Fortunately, there was a solution to this problem. Instead of measuring the behaviors of a sample of audiences, audience measurement firms could âauditâ the traffic of individual websites through the sitesâ server logs, thereby providing an objective accounting of how many people were visiting the site. Thus, instead of deriving audience estimates from a small sample of the total online population, these estimates essentially came from a census of that population.
There was just one problem. This system could not really tell you who was visiting the site. That is, audience ratings figures derived from website server logs could not provide advertisers with the demographic information that they tended to rely upon in deciding how to allocate their advertising dollars. Unless a visitor to a site voluntarily provided his/her demographic information, server log systems could only tell you how many different devices were visiting a site.
Given the different strengths and weaknesses inherent in these two approaches to online audience measurement, it is not surprising that what has emerged as the industry standard are hybrid measurement systems that utilize server log analysis and then supplement this approach with demographic information culled from very large online samples.14 However, even these hybrid systems do not fully solve the problems described above.
So, another solution that emerged to address the problem of online audience fragmentation was ad networks. Ad networks started to come into being in the mid- to late 1990s as a way of pooling unsold ad inventory, in order to save advertisers the time and costs associated with dealing with individual websites. As digital intermediaries in the relationship between websites and advertisers, ad networks would distribute advertisersâ ads across their network of member sites, in accordance with subject matter and/or audience criteria specified by the advertiser. For instance, advertisers might specify that their ads appear within/alongside content that meets certain keyword criteria (e.g., womenâs fashion, car insurance). Or the advertiser could specify that the ads be directed at individuals who have demonstrated certain online behaviors (e.g., searching for an apartment, researching a car purchase) or who meet certain age, gender, or geographic location criteria. Ad networks helped to reaggregate audiences, to some extent, by allowing advertisers to more efficiently compile large aggregations of audiences without having to engage in transactions with each individual site.15
These networks had another important effect: they essentially decoupled advertisements from the media content in which they were displayed. Through ad networks, advertisers were purchasing the delivery of audiences without necessarily having to evaluate the content and/or content provider delivering those audiences, as the ad networkânot the content providerâwas now their point of contact. This tendency became more pronounced with the development of programmatic advertising, which essentially involves algorithms handling media-buying decisions that used to be handled by humans.16 Because intermediaries increasingly handled the process of placing advertisements within sites, based on particular audience criteria, the content through which these audiences were delivered essentially became secondaryâand somewhat marginalizedâfrom advertisersâ decision-making regarding how to reach audiences.
However, these ad networks still could not provide a completely effective means of targeting large aggregations of audiences according to detailed demographic criteria, given the limitations in the availability of audience demographic data for individual websites. If only there were some way in which audiences could be compiled into large aggregates online while at the same time voluntarily providing detailed demographic (and perhaps even behavioral and psychographic) data about themselvesâŚ.
CONTENT PROVIDER CHALLENGES
Content providers found the Web 1.0 scenario challenging as well. For starters, as has just been discussed, for sites with relatively small audiences seeking to monetize those audiences through advertising revenue, the available systems of audience measurement made this process challenging to say the least. Many sites found themselves unable to document the existence of their audience to advertisers in the audience size/demographics vocabulary that most advertisers preferred.17
A related problem had to do with the basic challenge of attracting audiences to a site in light of the unprecedented number of competitors for audience attention and the challenges associated with getting audiences to visit and (ideally) to integrate the site into their repertoire. As described in the previous section, the Web 1.0 envir...