1
Open Data
Saif Aldeen Saleh AlRyalat
Data overview
In 2013, the world as a whole spent $1.48 trillion on research, with the United States and China alone spending around $1 trillion (ISSC, IDS, and UNESCO, 2016). On the other hand, the world’s low- and low-middle-income countries’ expenditure on research is exceedingly low, apparently because there are other priorities to spend money on instead (World Bank, 2018). This lack of sufficient research funds resulted in an incorrect belief among developing-world researchers and academics that it is difficult to do research in these regions (Horton, 2000). Moreover, funding bodies usually require researchers requesting monies to have previous experience, and even if researchers are able to get funding, they will struggle to find educated staff to support their work. Thus, a vicious circle begins, of researchers not being able to do studies because they can’t get funding, which requires research experience. An example of this situation occurs in Africa, where most universities do not put sufficient emphasis or funding for research, even for academic purposes (Horton, 2000), which results in their staff lacking research experience (Vernon et al., 2018).
Collecting high-quality data for any study is the step that requires the most money and experience, as well as time (Dicks et al., 2014). These data collection fundamentals are not readily available in developing countries, thus hindering research progress in these countries—the aforementioned vicious circle. To reduce the huge costs of research in developing countries, researchers and universities should depend on data collected by other more experienced researchers, who have sufficient funds to support their work. This model of research, based on using others’ data to do research, is known as open-access data research (open data research).
Open data research has emerged recently as a new model of research, one that wasn’t possible in the past due to lack of resources, systems technology to access data, bioinformatics expertise, and legal infrastructure to facilitate sharing (Bertagnolli et al., 2017). On the other hand, the open data approach is expected to need a good deal of support to reach its potential. To further show the importance of promotion of and support for open data, in 2003, the University of Rochester, one of the most highly ranked universities in New York, launched a digital archive (i.e., a repository) designed to share dissertations, preprints, working papers, photographs, music scores, and any other kind of digital data that the university’s investigators could produce. Six months of research had convinced the university and its investigators that a publicly accessible online archive would be well received. At the time that the repository was launched, the university librarians were worried that the flood of data that would be uploaded might overwhelm the available storage space. Six years later, the $200,000 repository was mostly empty (Nelson, 2009).
To promote open data research, an annual award was created to recognize data sharers whose data have been reused in impactful ways: the Research Symbiont Award (www.researchsymbionts.com). There was also a call for journals to promote open data research by publishing research done using open data, especially those journals already published the primary publication (Byrd, 2017). Moreover, a study that compared citation rates among papers with their data made openly available and those that didn’t found that making data openly available increased a paper’s citation rates, which further motivates and increases open data research (Piwowar et al., 2007).
Open data may be freely used, reused, and redistributed by anyone (Dietrich et al., 2009). Open data have three main characteristics:
•Accessible: The data must be available as a whole, and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable way, preferably in a machine-learning form (e.g., a Microsoft Excel spreadsheet).
•Reuse and redistribution: The data must be provided under terms that permit their reuse and redistribution, including intermixing with other data sets (e.g., combining two data sets on the same topic to create a larger one).
•Universal participation: Everyone must be able to use, reuse, and redistribute the data.
The concept of open data is mostly related to the field of economics, where governments and companies can use open data to improve their functionality and products. An example would be manufacturers using data obtained from their products already on the market to improve the development of their next products and to create innovative after-sales offerings (Manyika et al., 2011). The same concept can be used by scientists who use data on healthcare consumers collected by pharmaceutical companies, healthcare institution records, insurance companies, and other local and national authorities to perform research. The following example will further clarify how researchers can benefit from open data.
In 2010, the protocol of a large study called the Systolic Blood Pressure Intervention Trial (SPRINT) was finalized, stating that it would include 9361 patients from 102 centers in the United States, and they would be followed up for eight years (Ambrosius et al., 2014). The total cost of this single stu...