1
Big Data in Medical Image Processing
1.1 An Introduction to Big Data
Big data technologies are being increasingly used for biomedical and healthcare informatics research. Large amounts of biological and clinical data have been generated and collected at an exceptional speed and scale. Recent years have witnessed an escalating volume of medical image data, and observations are being gathered and accumulated. New technologies have made the acquisition of hundreds of terabytes/petabytes of data possible, which are being made available to the medical and scientific community. For example, the new generation of sequencing technologies enables the dispensation of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. Handling out these large datasets and processing them is a challenging task. Together with the new medical opportunities arising, new image and data processing algorithms are required for functioning with, and learning from, large scale medical datasets. This book aims to scrutinize recent progress in the medical imaging field, together with new opportunity stemming from increased medical data availability, as well as the specific challenges involved in Big data. āBig Dataā is a key word in medical and healthcare sector for patient care. NASA researchers coined the term big data in 1967 to describe the huge amount of information being generated by supercomputers. It has evolved to include all data streaming from various sourcesācell phones, mobile devices, satellites, Google, Amazon, Twitter, etc. The impact of big data is deep, and it will have in-depth implications for medical imaging as healthcare tracks, handles, exploits and documents relevant patient information.
Medical Data collection can necessitate an incredible amount of time and effort, however, once collected the information can be utilized in several ways:
⢠To improve early detection, diagnosis, and treatment
⢠To predict patient diagnosis; aggregated data are used to speck early warning symptoms and mobilize resources to proactively address care
⢠To increase interoperability and interconnectivity of healthcare (i.e., health information exchanges)
⢠To enhance patient care via mobile health, telemedicine, and selftracking or home devices
Storing and managing patient health information is a challenging task yet big data in the medical field is crucial. Ensuring patient data privacy and security is also a significant challenge for any healthcare organization seeking to comply with the new HIPAA omnibus rule. Any individual or organization that uses protected health information (PHI) must conform, and this includes employees, physicians, vendors or other business associates, and other covered entities.
HIPAA compliance for data (small or big) must cover the following systems, processes, and policies:
⢠Registration systems
⢠Patient portals
⢠Patient financial systems
⢠Electronic medical records
⢠E-prescribing
⢠Business associate and vendor contracts
⢠Audits
⢠Notice of privacy practice
1.2 Big Data in Biomedical Domain
In the biomedical informatics domain, big data is a new paradigm and an ecosystem that transforms case-based studies to large-scale, data-driven research. The healthcare sector historically has generated huge amounts of data, driven by record keeping, compliance and regulatory requirements, and patient care. While most data is stored in hard copy form, the current trend is toward rapid digitization of these large amounts of data. Driven by mandatory requirements and the potential to improve the quality of healthcare delivery while reducing the costs, these massive quantities of data (called ābig dataā) securely hold a wide range of supporting medical and healthcare functions, including amongst others clinical decision support systems, disease outbreak surveillance, and population health management. A disease may occur in greater numbers than expected in a community or region or during a season, while an outbreak may occur in one community or even extend to several countries. July 10, 2017, Measles kills 35 people in Europe as disease spreads through un-vaccinated children, communities are warned by the World Health Organization (WHO). An epidemic occurs when an infectious disease spreads rapidly through a population. For example, in 2003, the severe acute respiratory syndrome (SARS) epidemic took the lives of nearly 800 people worldwide. In Apr 2017, Zika virus is transmitted to people through the bite of an infected mosquito from the Aedes genus. This is the same mosquito that transmits dengue, chikungunya and yellow fever. A pandemic is a global disease outbreak. For example, HIV/AIDS is an example of one of the most destructive global pandemics in history.
Reports say data from the U.S. healthcare system alone reached, in 2011, 150 exabytes. At this rate of growth, big data for U.S. healthcare will soon reach the zettabyte (1021 gigabytes) scale and, not long after, the yottabyte (1024 gigabytes). Kaiser Permanente, the California-based health network, which has more than 9 million members, is believed to have between 26.5 and 44 petabytes of potentially rich data from EHRs, including images and annotation.
On 15 May 2017, the Ministry of Health and Family Welfare, Government of India (MoHFW) reported three laboratory-confirmed cases of Zika virus disease in Bapunagar area, Ahmedabad District, Gujarat. National Guidelines and Action Plan on Zika virus disease have been shared with the States to prevent an outbreak of Zika virus disease and containment of spread in case of any outbreak. All the international airports and ports have displayed information for travellers on Zika virus disease. The National Centre for Disease Control, and the National Vector Borne Disease Control Programme are monitoring appropriate vector control measures in airport premises. The Integrated Disease Surveillance Programme (IDSP) is tracking for clustering of acute febrile illness in the community. The Indian Council of Medical Research (ICMR) has tested 34,233 human samples and 12,647 mosquito samples for the presence of Zika virus. Among those, close to 500 mosquitos samples were collected from Bapunagar area, Ahmedabad District, in Gujarat, and were found negative for Zika.
With Zika virus spreading so rapidly, it is really necessary to control the problem on a broader scale. Thus, to put in use analytics-derived insights in this filed, thereās a need for ideal platforms that can analyze multiple data sources and create analysis reports. This well-filtered, detailed and real-time information will help vaccine and drug developers to make more powerful vaccines and drugs to fight the disease. The main challenge is how data mining algorithms can track diseases like the Zika virus and help create a different type of response to global disease outbreaks. Based on the outcome of the analysis, neighborhood environment and community health sectors, clinical centers and hospitals can alert patients by presenting with a list of diseases, including zika virus fever, dengue, or information on cases of malaria. The information can also be used to spread more awareness among the public against various diseases.
To fight against Zika, Big Data and analytics can be the major role players as they were in dealing with epidemics such Ebola, flu, and dengue fever. Big Data has already done wonders while dealing with certain complicated global issues and holds broader potential to continue doing so. From a technological perspective, the big data technology can be smartly leveraged to gain insights in how to develop the remedial vaccines for Zika virus by isolating and identifying every single aspect of the virusā characteristics. Although, the statistical modeling and massive data sets are being used across the healthcare community to respond towards the emergency, several big data analytics are still needed to predict these types of contagious diseases. Moreover, the use of technology must be encouraged among the people as well as among the healthcare systems and groups to spread more awareness against the threats, consequences and possible solutions.
1.3 Importance of 4Vs in Medical Image Processing
The potential of big data in healthcare lies in combining traditional data with new forms of data, both individually and on a population level. We are already seeing that data sets from a multitude of sources support faster and more reliable research and discovery. If, for example, pharmaceutical developers could integrate population clinical data sets with genomics data, this development could facilitate those developers gaining approvals on more and better drug therapies more quickly than in the past and, more importantly, expedite distribution to the right patients. The prospects for all areas of healthcare are infinite. The characteristics of big data is defined by four major Vs such as Volume, Variety, Velocity and Veracity.
1.3.1 Volume
Big data implies enormous volumes of data. First and most significantly, the volume of data is growing exponentially in the biomedical informatics fields. For example, the Proteomics DB covers 92% (18,097 of 19,629) of known human genes that are annotated in the Swiss-Prot database. Proteomics DB has a data volume of 5.17 TB. This data used to be created/produced by human interaction, however, now that data is generated by machines, networks and human interaction on systems like social media the volume of data to be analyzed is massive. There are several acquisition devices are available to capture medical image modalities. They vary in size and cost. Depending up on the machine, they can capture a huge volume of medical data from human beings. The structured data in EMRs and EHRs include familiar input record fields such as patient name, date of birth, address, physicianās name, hospital name and address, treatment reimbursement codes, and other information easily coded into and handled by automated databases. The need to field-code data at the point of care for electronic handling is a major barrier to acceptance of EMRs by physicians and nurses, who lose the natural language ease of entry and understanding that handwritten notes provide. On the other hand, most providers agree that an easy way to reduce prescription errors is to use digital entries rather than handwritten scripts.
Data quality issues are of acute concern in healthcare for two reasons: life or death decisions depend on having the accurate information, and the quality of healthcare data, especially unstructured data, is highly variable and all too often incorrect. (Inaccurate ātranslationsā of poor handwriting on prescriptions are perhaps the most infamous example.)
In the clinical realm, the promotion of the HITECH Act has nearly tripled the adoption rate of electronic health records (EHRs) in hospitals to 44% from 2009 to 2012. Data from millions of patients have already been collected and stored in an electronic format, and this accumulated data could potentially enhance health-care services and increase research opportunities. In addition, medical imaging (e.g., MRI, CT scans) produces vast amounts of data with even more complex features and broader dimensions. One such example is the Visible Human Project, which has archived 39 GB of female datasets. These and other datasets will provide future opportunities for large aggregate collection and analysis.
1.3.2 Variety
The second predominant feature of big data is the variety of data types and structures. The ecosystem of biomedical big data comprises many different levels of data sources to create a rich array of data for researchers. Much of the data that is unstructured (e.g., notes from EHRs clinical trial results, medical images, and medical sensors) provide many opportunities and a unique challenge to formulate new investigations. Variety refers to the many sources and different types of data both structured and unstructured. In clinical informatics, there are different variety of data like pharmacy data, clinical data, ECG data, Scan images, anthropometric data and imaging data. We used to store this unstructured data into a clinical repository supported by NoSQL databases. Now medical data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing data.
1.3.3 Velocity
The third important characteristic of big data, velocity, refers to producing and processing data. Big Data Velocity deals with the pace at which data flows in from sources like medical acquisition devices and human interaction with things like social media sites, mobile devices, etc. The speed of the data generated by each radiology centre is deemed to be high. The flow of data is massive and continuous. This real-time data can help researchers and businesses make valuable decisions that provide strategic competitive advantages and ROI if one is able to handle the velocity. The new generation of sequencing technologies enables the production of billions of DNA sequence data each day at a relatively low cost. Because faster speeds are required for gene sequencing, big data technologies will be tailored to match the speed of producing data, ...