1 Big Data in Developing Countries: Current Status, Opportunities and Challenges
Abstract
This chapter reviews the current state, potential and applications of big data (BD) in developing countries. Definitions and explanations of key terms used in the book are provided. This chapter also looks at characteristics of BD. Key areas of BD deployment in developing countries are described. This chapter also focuses on the relationship between BD, mobility, the Internet of Things and cloud computing in the context of developing countries. Some major determinants of the development of the BD industry and market are considered. Various forces to overcome the adverse economic, political and cultural circumstances are explored. It also evaluates the intricate relationship between agriculture, health and the environment. Finally, this chapter argues that BD offers no panacea or magic pill for all the ills.
1.1 Introduction
Big Data (hereinafter: BD) is emerging as a means for governments, international development agencies, non-government organizations (NGOs) and the private sector to improve economic, health, social and environmental conditions in developing economies. Consequently, the BD application areas in developing economies are also numerous and growing steadily. A large and growing number of firms, both local and foreign, are offering diverse BD solutions in these economies.
A key benefit of BD is that large and sometimes unrelated sources of data can help discover relationships that were previously undetected. To take an example, researchers from Swedenâs Karolinska Institute analysed data related to peopleâs movement patterns before and after the January 2010 earthquake in Haiti, which killed more than 200,000 people. The data were obtained from Digicel, Haitiâs largest mobile carrier. The data consisted of the call data records (CDRs) of 2 million phones from 42 days before to 158 days after the earthquake. Note that CDRs provide information about the number of users in a phone towerâs coverage and originâdestination matrices representing phone users that move between two towersâ coverage areas (Weslowski et al., 2013).
The analysis of CDRs indicated that 630,000 people who were in Port-au-Prince on the day of the earthquake, 12 January 2010, had left the city within 3 weeks. A comparison of the movement patterns before and after the earthquake indicated that individuals who fled the city went to the same places where they had been on Christmas and/or New Yearâs Day. The researchers at the Karolinska Institute also demonstrated the capability to analyse data on a near real-time basis. For instance, within 12 hours of receiving the data, the researchers were able to tell the number of people that had fled an area that was affected by a cholera outbreak. They were also able to figure out where people went (Talbot, 2013).
Another retrospective analysis of the 2010 cholera outbreak in Haiti showed that mining data from Twitter and online news reports could have given the countryâs health officials an accurate indication of the spread of the disease with a lead time of 2 weeks (Chunara et al., 2012). To take another example, a study of Serbian farmers by the Israeli company Agricultural Knowledge On-Line (AKOL) indicated a connection between drinking coffee and farm productivity. Farmers who did not drink coffee in the morning were less productive than those who did (Shamah, 2015).
In the past, decision makers needed to depend on data scientists, computer engineers and mathematicians to make sense of data (Fengler and Kharas, 2015). This is not the case anymore thanks to shared infrastructure such as cloud computing and the rapid diffusion of mobile phones. New programs and analytical solutions have put BD at the fingertips of any consumer with a smartphone. Another favourable trend is that personal computing devices such as smart-phones are becoming cheaper. For instance, in 2014, a phone with GPS (global positioning system), Wi-Fi and a camera could be bought for US$30 (Caulderwood, 2014). Due to these recent developments, BD is becoming increasingly personal.
Perhaps the greatest advantage offered by BD in the context of development is that it helps us gain a better understanding of the extent and nature of poverty and devise appropriate policy measures. For instance, mobile data can make it possible to better understand the dynamics of slum residents. The CDR and other information can provide insights into the slum population, which would help forecast the needs for toilets, clean drinking water and other infrastructural facilities (bigdata-startups.com, 2013). To take an example, in Nairobi, Kenya, geocoded mobile phone transaction data are used by the Engineering Social Systems project to model the growth of slums, which could help the government to optimize resource allocation for infrastructural development and other resources (Bays, 2014). Alternative data collection and analysis techniques such as surveys have a very low degree of usefulness for such purposes, as they may take months and even years to get results and are often out of date.
An encouraging trend is that the tools and expertise that are employed to make decisions and take actions related to behavioural advertising based on consumersâ real-time profiling are being used in addressing developmental problems. For instance, data generated by social media such as Twitter are being analysed in order to detect early signs that can lead to a spike in the price of staple foods, increase in unemployment, and outbreak of diseases such as malaria. Robert Kirkpatrick of the UN Global Pulse team referred to such signs as âdigital smoke signals of distressâ and noted that they can be detected months before official statistics (Lohr, 2013). The importance of this technique is even more pronounced if we consider the fact that there are no reliable statistics in many developing countries.
BD deployment in the developing world is currently in the infant stage of development. According to International Data Corporationâs Middle East Chief Information Officer Survey, in 2014 only 3% of the respondent organizations in the Gulf Cooperation Council countries had implemented BD (oilandgasbigdata.com, 2015). In some developing countries, the complete absence of a digital footprint renders BD irrelevant to a large proportion of the population. For instance, according to the International Telecommunications Union (ITU), as of 2014 Eritrea had a mobile phone penetration rate of 6.4% and an Internet penetration rate of 0.99% (see Chapter 2).
BD projects undertaken in the developing world vary widely in terms of the projectâs capital- and resource-intensiveness, sophistication, complexity, performance and impact. In order to illustrate this point, we make a brief comparison of BD deployments by Chinaâs Alibaba and a Kenyan-based mobile payment solution and service provider, MobiPayâs cloud-mobile platform Agrilife. In the context of this book it is worth noting that the financial affiliate of Alibaba Groupâs MYbank, which is an Internet-only bank, aspires to provide credits to farmers to buy agricultural machines and tools.
It is fair to say that of the firms based in the developing world, Alibabaâs BD tools are among the most advanced and sophisticated. In July 2014, Alibaba launched the Open Data Processing Service (ODPS), which allows users to remotely tap into Alibaba servers equipped with algorithms. According to Alibaba, the system had the capability to process 100 million high-definition moviesâ worth of data in 6 hours (Li, 2014). The program uses more than 100 computing models to process over 80 billion data entries every day. Alibaba mainly utilizes its huge online ecosystem that, as of early 2015, consisted of over 300 million registered users and 37 million small businesses on Alibaba Group marketplaces including Taobao and Tmall.com (alibabagroup.com, 2015).
Kenyaâs Agrilife, which connects farmers with value-chain partners such as dairy processors (who purchase milk), credit appraisers and local input/agrodealers, is technically less sophisticated than Alibabaâs ODPS. Agrilife also helps farmers to assess market opportunities and get the information required to grow, manage and market their produce. A farmer can make credit requests via a mobile phone. The credit appraiser uses a range of data about the farmer, produce and status of farms to assess the creditworthiness. The input provider then makes a decision on credit. The platform facilitated credit lines to about 120,000 small farmers by 2013. As of 2014, Agrilife served farmers in Kenya, Uganda and Zimbabwe (fin4ag.org, 2014).
BD offerings of Alibaba and Agrilife exhibit different levels of resource intensiveness. Compared to Alibabaâs ODPS, the Agrilife platform is simpler and cheaper. For instance, data volumes handled by Agrilife are not as big as those that Alibaba handles. Actions are taken on a near real-time basis rather than in a real-time manner. As of 2015, Alibaba had a market value of about US$233 billion, which made it the worldâs third-largest public Internet company, only behind Apple and Google (Schwarzmann, 2015). In 2014, Alibaba Groupâs online payment service, Alipay, handled payments worth US$800 billion (Kim, 2014). However, most organizations based in the developing world, such as MobiPay, tend to have limited access to the resources needed to set up BD-related businesses.
1.2 Definitions and Explanations of Key Terms
In this section, we clarify some of the key terms and concepts used in the book.
1.2.1 Algorithm
An algorithm is a procedure or formula for solving a problem. Algorithms are even more important than data as they convert data into actions and outcomes that can improve the effectiveness and efficiency of development efforts and improve the overall quality of lives of those living in the developing world.
1.2.2 Big Data
In order to define BD for the purpose of this book, we start with the technology research company Gartnerâs definition of BD, which is âhigh-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision makingâ (gartner.com, 2013). With regard to volume, Boyd and Crawford (2012, p. 663) note that big data is a âpoor termâ and argue that BD âis less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data setsâ. In this bookâs context, we define BD as datasets that can provide insights into human well-being, which satisfy at least one of the following characteristics compared to datasets that have been traditionally used in developmental issues: (i) are of higher volume; (ii) are of wider variety; or (iii) enable us to make decisions and act faster. In this way, the term BD is used in the broadest possible sense in order to be inclusive and uncover any possible use of data and information to improve the welfare and livelihood of people living in the developing world.
1.2.3 Business model
A business model is a description of a companyâs intention to create and capture value by linking new technological environments to business strategies (Hawkins, 2003).
1.2.4 Cloud computing
Cloud computing involves hosting applications on servers and delivering software and services via the Internet. In the cloud computing model, companies can access computing power and resources on the cloud and pay for services based on their usage. The cloud industry is defined as the set of sellers/providers of cloud-related products and services. Cloud providers or vendors, which are suppliers of cloud services, deliver value to users through various offerings such as Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). SaaS is a software distribution model, in which applications are hosted by a vendor and made available to customers over a network. It is considered to be the most mature type of cloud computing. In PaaS, applications are developed and executed through platforms provided by cloud vendors. This model allows quick and cost-effective development and deployment of applications. Some well-known PaaS vendors include Google (Google App Engine), Salesforce.com (Force.com) and Microsoft (Windows Azure platform). Some facilities provided under the PaaS model include database management, security, workflow management and application serving. In IaaS, computing power and storage space are offered on demand. IaaS can provide server, operating system, disk storage and database infrastructure, among other things. Amazon.com is the biggest IaaS provider. Its Elastic Compute Cloud (EC2) allows subscribers to run cloud application programs. IBM, VMware and HP also offer IaaS.
1.2.5 Developing economies
By developing economies, we mean low-, lower middle- and upper middle-income countries in the World Bank categorization (The World Bank Group, 2014). For the 2016 fiscal year, economies with a gross national income (GNI) per capita of US$1045 or less in 2014 based on the so-called Atlas method were categorized as low-income economies. Some examples include Eritrea and Haiti.
Lower middle-income economies are those with a GNI per capita of more than US$1045 but less than or equal to US$4125. Some examples of economies in this category are Kenya and Vietnam. Upper middle-income economies have a GNI per capita of more than US$4125 but less than US$12,736 (worldbank.org, 2016). Some examples in this category are China and Colombia.
1.2.6 Drip irrigation
Drip irrigation, which is also referred to as micro-irrigation or trickle irrigation, is a watering system that involves a network of pipes, tubing valves and emitters to deliver water directly to the soil at a gradual rate. Sensors track moisture in and around the root zone of each tree and water is delivered to the base. Water is thus used more efficiently. When a zone is saturated, the water supply is cut off.
1.2.7 Environmental monitoring
Environmental monitoring is defined as âmeasurements of physical, chemical, and/or biological variables, designed to answer questions about environmental changeâ (Lovett et al., 2007).
1.2.8 Institutionalization
Institutionalization is defined as the process by which a practice acquires legitimacy and achieves a taken for-granted status (Kshetri, 2009). This book uses the term in the context of BD utilization, data privacy and cybersecurity.
1.2.9 Least developed countries (LDCs)
The UN has recognized LDCs as a category of states, which are âhighly disadvantaged in their development processâ. Compared to other countries, LDCs face a higher risk of deeper poverty and remaining in a state of underde...