Design of Intelligent Applications using Machine Learning and Deep Learning Techniques
  1. 448 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

About this book

Machine learning (ML) and deep learning (DL) algorithms are invaluable resources for Industry 4.0 and allied areas and are considered as the future of computing. A subfield called neural networks, to recognize and understand patterns in data, helps a machine carry out tasks in a manner similar to humans. The intelligent models developed using ML and DL are effectively designed and are fully investigated – bringing in practical applications in many fields such as health care, agriculture and security. These algorithms can only be successfully applied in the context of data computing and analysis. Today, ML and DL have created conditions for potential developments in detection and prediction.

Apart from these domains, ML and DL are found useful in analysing the social behaviour of humans. With the advancements in the amount and type of data available for use, it became necessary to build a means to process the data and that is where deep neural networks prove their importance. These networks are capable of handling a large amount of data in such fields as finance and images. This book also exploits key applications in Industry 4.0 including:

· Fundamental models, issues and challenges in ML and DL.

· Comprehensive analyses and probabilistic approaches for ML and DL.

· Various applications in healthcare predictions such as mental health, cancer, thyroid disease, lifestyle disease and cardiac arrhythmia.

· Industry 4.0 applications such as facial recognition, feather classification, water stress prediction, deforestation control, tourism and social networking.

· Security aspects of Industry 4.0 applications suggest remedial actions against possible attacks and prediction of associated risks.

- Information is presented in an accessible way for students, researchers and scientists, business innovators and entrepreneurs, sustainable assessment and management professionals.

This book equips readers with a knowledge of data analytics, ML and DL techniques for applications defined under the umbrella of Industry 4.0. This book offers comprehensive coverage, promising ideas and outstanding research contributions, supporting further development of ML and DL approaches by applying intelligence in various applications.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

1

Data Acquisition and Preparation for Artificial Intelligence and Machine Learning Applications
Kallol Bosu Roy Choudhuri
Cognizant Technology Solutions
Ramchandra S. Mangrulkar
University of Mumbai

Contents

1.1 Introduction
1.2 Reference Architecture
1.2.1 Data Sources
1.2.2 Data Storage
1.2.3 Batch Processing
1.2.4 Real-Time Message Ingestion
1.2.5 Stream Processing
1.2.6 Machine Learning
1.2.7 Analytical Data Store
1.2.8 Analytics and Reports
1.2.9 Orchestration
1.3 Data Acquisition Layer
1.3.1 File Systems
1.3.2 Databases
1.3.3 Applications
1.3.4 Devices
1.3.5 Enterprise Data Gateway
1.3.6 Field Gateway
1.3.7 Data Integration Services
1.3.8 Data Ingestion Services
1.4 Data Ingestion Layer
1.4.1 Data Storage Layer
1.4.2 Landing Layer
1.4.3 Cleansed Layer
1.4.4 Processed Layer
1.4.5 Data Processing Layer
1.4.6 Data Processing Engine
1.4.7 Data Processing Programs
1.4.8 Scheduling Engine
1.4.9 Scheduling Scripts
1.5 Data Quality and Cleansing Layer
1.5.1 Master Data Management (MDM) System
1.5.2 Master Data Management (MDM) Referencing Programs
1.5.3 Data Quality Check Programs
1.5.4 Rejected/Quarantined Layer
Bibliography

1.1 Introduction

This chapter introduces the essential concepts of data acquisition, ingestion, data quality, cleansing and preparation. Data forms the basis of all decision-making processes. AI and ML being heavily dependent on accurate and reliable data, these stages are important prerequisites to build any AI and ML application. Before data can be effectively fed into a ML model or an AI algorithm, the data goes through the following stages such as:
  • Data acquisition
  • Data ingestion
  • Data quality and cleansing
In the following sections of this chapter, we will understand what each of the above stages in data processing means. We will also study the various components and concepts involved in successfully executing each phase of the data processing.

1.2 Reference Architecture

In the next few sections, we will look at each of the above stages in more detail. Before proceeding further, let us look at the below diagram that depicts the general reference architecture for data collection, ingestion, cleansing and preparation for AI and ML applications (Figure 1.1).
FIGURE 1.1 AI/ML data processing reference architecture.
Figure 1.1, gives the details about the different stages. The first stage is to establish data ingestion components from the source, followed by cleansing the raw data and then preparing the data for training a ML model or feeding it as an input for AI applications. The various components of the reference architecture are listed below:
  • Data sources
  • Data storage
  • Batch processing
  • Real-time message ingestion
  • Stream processing
  • ML
  • Analytical data store
  • Analytics and reports
  • Orchestration
In the below sections, we will look at each of these data processing components.

1.2.1 Data Sources

For AI/ML applications, data can be sourced from various types of sources. The data is first collected (acquired), followed by storage and further processing. Usually, data can come from the following types of source systems:
  • Application data stores such as relational databases (SQL Server, Oracle, MySQL and so on), or it can also be NoSQL databases (Cassandra, MongoDB, PostgreSQL and so on)
  • Log files and flat files generated by various types of business applications, monitoring and logging software (Splunk, ELK and so on)
  • IoT devices that produce data

1.2.2 Data Storage

Data acquired from various sources comes in various types of file formats such as XML, JSON, CSV, ORC, AVRO and Parquet to name a few popular file types. In AI/ML applications, the data is so huge that usually the data is stored in the form of files on affordable commodity storage media governed by powerful file system management software. This type of data is known as big data, and the underlying hardware storage media coupled with the file system management software is known as a data lake. Hadoop is an example of a big data lake system.

1.2.3 Batch Processing

The acquired data keeps accumulating in the data lake. At regular intervals, the accumulated data is processed by scheduled workflows and processes. These scheduled workflows and processes are known as batch processes, while the process of scheduled processing of the accumulated data at regular intervals is known as batch processing. Owing to the massive amount of data to be processed, often, parallel computing is employed to achieve efficiency and speed. Apache Spark is an example of a big data parallel computing engine.

1.2.4 Real-Time Message Ingestion

At times, usually in the case of IoT device data, there is an immediate need to process the data. This type of data is processed immediately in order to extract vital time-sensitive information, such as remote monitoring of manufacturing equipment. In such cases, it will not be enough to have scheduled workflows waiting to process the incoming data at regular intervals. Instead, processes are triggered as soon as new data arrives. This process of immediate (near real time) data processing is known as real-time processing.
Sometimes, a trade-off between batch processing and real-time processing may be required for certain systems. In such cases, the concept of micro-batching is employed. Micro-batching is a technique where the scheduled workflows are executed at shorter intervals of, say, every 5 minutes (instead of hourly or daily).

1.2.5 Stream Processing

Once the real-time messages have been ingested from IoT device sources or applications, there may be requirements to operate directly on the real-time data. The processing of data in real time enables the discovery of critical time-sensitive information, such as predicting a health emergency from a heart patient’s Fitbit data in real time. Apache Storm and Spark Streaming are the examples of stream data processing components.

1.2.6 Machine Learning

The processed data is fed into ML components for the purpose of model training and tuning. Often, it may not be possible to feed the raw data into a ML model. There may be many inaccuracies, outliers, string values, special characters and so on that if addressed before the data can be used by a ML process. The ML components use the cleansed and processed data for various learning purposes.

1.2.7 Analytical Data Store

After the data has been cleansed, the data is stored in a central data store for consumption by various downstream processes. Analytical data stores are often referred to as data warehouses where processed data is properly organized and generally stored in the form of fact dimensional models. Data from the tables in the analytical data stores is used in various downstream reports. Examples of some modern popular analytical data stores are SQL Data Warehouse, Amazon Redshift and Google BigQuery.

1.2.8 Analytics and ...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. Preface
  7. Editors
  8. Contributors
  9. 1. Data Acquisition and Preparation for Artificial Intelligence and Machine Learning Applications
  10. 2. Fundamental Models in Machine Learning and Deep Learning
  11. 3. Research Aspects of Machine Learning: Issues, Challenges, and Future Scope
  12. 4. Comprehensive Analysis of Dimensionality Reduction Techniques for Machine Learning Applications
  13. 5. Application of Deep Learning in Counting WBCs, RBCs, and Blood Platelets Using Faster Region-Based Convolutional Neural Network
  14. 6. Application of Neural Network and Machine Learning in Mental Health Diagnosis
  15. 7. Application of Machine Learning in Cardiac Arrhythmia
  16. 8. Advances in Machine Learning and Deep Learning Approaches for Mammographic Breast Density Measurement for Breast Cancer Risk Prediction: An Overview
  17. 9. Applications of Machine Learning in Psychology and the Lifestyle Disease Diabetes Mellitus
  18. 10. Application of Machine Learning and Deep Learning in Thyroid Disease Prediction
  19. 11. Application of Machine Learning in Fake News Detection
  20. 12. Authentication of Broadcast News on Social Media Using Machine Learning
  21. 13. Application of Deep Learning in Facial Recognition
  22. 14. Application of Deep Learning in Deforestation Control and Prediction of Forest Fire Calamities
  23. 15. Application of Convolutional Neural Network in Feather Classifications
  24. 16. Application of Deep Learning Coupled with Thermal Imaging in Detecting Water Stress in Plants
  25. 17. Machine Learning Techniques to Classify Breast Cancer
  26. 18. Application of Deep Learning in Cartography Using UNet and Generative Adversarial Network
  27. 19. Evaluation of Intrusion Detection System with Rule-Based Technique to Detect Malicious Web Spiders Using Machine Learning
  28. 20. Application of Machine Learning to Improve Tourism Industry
  29. 21. Training Agents to Play 2D Games Using Reinforcement Learning
  30. 22. Analysis of the Effectiveness of the Non-Vaccine Countermeasures Taken by the Indian Government against COVID-19 and Forecasting Using Machine Learning and Deep Learning
  31. 23. Application of Deep Learning in Video Question Answering System
  32. 24. Implementation and Analysis of Machine Learning and Deep Learning Algorithms
  33. 25. Comprehensive Study of Failed Machine Learning Applications Using a Novel 3C Approach
  34. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Design of Intelligent Applications using Machine Learning and Deep Learning Techniques by Ramchandra Sharad Mangrulkar, Antonis Michalas, Narendra Shekokar, Meera Narvekar, Pallavi Vijay Chavan, Ramchandra Sharad Mangrulkar,Antonis Michalas,Narendra Shekokar,Meera Narvekar,Pallavi Vijay Chavan in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.