Essentials of Data Science and Analytics
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

Share book
  1. 150 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

Book details
Book preview
Table of contents
Citations

About This Book

This text provides a comprehensive overview of Data Science.

With the continued advancement in storage and computing technologies, data science has emerged as one of the most desired fields in driving business decisions. Data science employs techniques and methods from many other fields, such as statistics, mathematics, computer science, and information science. Besides the methods and theories drawn from several fields, data science uses visualization techniques using specially designed big data software and statistical programming language, such as R programming, and Python. Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI).

The book is divided into four different areas divided into different chapters. These chapters explain the core of Data Science. Part I of the book introduces the field of Data Science, different disciplines it comprises of, and the scope with future outlook and career prospects. This section also explains analytics, business analytics, and business intelligence and their similarities and differences with Data Science. Since the data is at the core of Data science, Part II is devoted to explaining the data, big data, and other features of data. One full chapter is devoted to Data Analysis, creating visuals, pivot table, and other applications using Excel with office 365. Part III explains the statistics behind Data Science. It uses several chapters to explain the statistics and its importance, numerical and data visualization tools and methods, probability, and probability distribution applications in Data Science. Other chapters in the Part III are Sampling, Estimation, and Hypothesis Testing. All these are integral part of Data Science applications. Part IV of the book provides the basics of Machine Learning (ML) and R-statistical software.

Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI) and R-statistical software is widely used by data science professionals. The book also outlines a brief history, the body of knowledge, skills and education requirements for Data Scientist and data science professionals. Some statistics on job growth and prospects are also summarized. A career in data science is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Essentials of Data Science and Analytics an online PDF/ePUB?
Yes, you can access Essentials of Data Science and Analytics by Amar Sahay in PDF and/or ePUB format, as well as other popular books in Economia & Statistiche per il settore aziendale ed economico. We have over one million books available in our catalogue for you to explore.
PART I
Data Science, Analytics, and Business Analytics
CHAPTER 1
Data Science and Its Scope
Chapter Highlights
ā€¢ Introduction
ā€¢ What Is Data Science?
ā€¢ Objective and Overview of Chapters
ā€¢ What Is Data Science?
ā€¢ Another Look at Data Science
ā€¢ Data Science and Statistics
ā€¢ Role of Statistics in Data Science
ā€¢ Data Science: A Brief History
ā€¢ Difference between Data Science and Data Analytics
ā€¢ Knowledge and Skills for Data Science Professionals
ā€¢ Some Technologies used in Data Science
ā€¢ Career Path for Data Science Professional and Data Scientist
ā€¢ Future Outlook
ā€¢ Summary
Introduction
Data science is about extracting knowledge and insights from data. The tools and techniques of data science are used to drive business and process decisions. It can be seen as a major data-driven decision-making approach to decision making. Data science is a multidisciplinary field that involves the ability to understand, process, and visualize data in the initial stages followed by applications of statistics, modeling, mathematics, and technology to address and solve analytically complex problems using structured and unstructured data. At the core of data science is data. It is about using this data in creative and effective ways to help businesses in making data-driven business decisions.
The knowledge of statistics in data science is as important as the applications of computer science. Companies now collect massive amounts of data from exabytes to zettabytes, which are both structured and unstructured. The advancement in technology and the computing capabilities have made it possible to store, process, and analyze this huge data with smarter storage spaces.
Data science is applied to extract information from both structured and unstructured data.1,2
Unstructured data is usually not organized in a structured manner and may contain qualitative or categorical elements, such as dates, categories, and so on, and are text heavy. They also contain numbers and other forms of measurements. Compared to structured data, the unstructured data contain irregularities. The ambiguities in unstructured data make it difficult to apply traditional tools of statistics and data analysis. Structured data are usually stored in clearly defined fields in databases. The software applications and programs are designed to process such data. In recent years, a number of newly developed tools and software programs have emerged that are capable of analyzing big and unstructured data. One of the earliest applications of unstructured data is in analyzing text data using text-mining and other methods.
Recently, unstructured data is becoming more prevalent. In 1998, Merrill Lynch said, ā€œunstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%.ā€1 Here are some other predictions: As of 2012, IDC (International Data Group)3 and Dell EMC4 project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.4 More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 20255 and majority of that will be unstructured. The Computer World magazine7 states that unstructured information might account for more than 70 to 80 percent of all data in in organizations. (https://en.wikipedia.org/wiki/Unstructured_data)8
Objective and Overview of Chapters
The objective of this book is to provide an introductory overview of data science, understand what data science is, and why data science is such an important field. We will also explore and outline the role of data scientists/professionals and what they do.
The initial chapters of the book introduce data science and closely related areas. The terms data science, data analytics, business analytics, and business intelligence are often used interchangeably even by the professions in the fields. Therefore, Chapter 1, which provides an overview of data science, is followed by two chapters that explain the relationship between data science, analytics, and business intelligence. Analytics itself is wide area and different forms of analytics including descriptive, predictive, and prescriptive analytics are used by companies to drive major business decisions. Chapters 2 and 3 outline the differences and similarities between data science, analytics, and business intelligence. Chapter 2 also outlines the tools of descriptive, predictive, and prescriptive analytics along with the most recent and emerging technologies of machine learning and artificial intelligence. Since the field is data science is about the data, a chapter is devoted to data and data types. Chapter 4 provides definitions of data, different forms of data, and their types followed by some tools and techniques for working with data. One of the major objectives of data science is to make sense from the massive amounts of data companies collect. One of the ways of making sense from data is to apply data visualization or graphical techniques used in data analysis. Understanding other tools and techniques for working with data are also important. A chapter is devoted to data visualization.
Data science is a vast area. Besides visualization techniques and statistical analysis, it uses statistical programming language such as R programming, and a knowledge of databases (SQL or MySQL) or other data base management system.
One major application of data science is in the area of Machine Learning (ML) and Artificial Intelligence. The book provides a detailed overview of data science by defining and outlining the tools and techniques. As mentioned earlier, the book also explains the differences and similarities between data science and data analytics. The other concepts related to data science including analytics, business analytics, and business intelligence (BI) are discussed in detail. The field of data science is about processing, cleaning, and analyzing data. These concepts and topics are important to understand the field of data science and are discussed in this book. Data science is an emerging field in data analysis and decision making.
What Is Data Science?
Data science may be thought of as a data driven decision making approach that uses several different areas, methods, algorithms, models, and disciplines with a purpose of extracting insights and knowledge from structured and unstructured data. These insights are helpful in applying algorithms and models to make decisions. The models in data science are used in predictive analytics to predict future outcomes.
Data science, as a field, has much broader scope than analytics, business analytics, or business intelligence. It brings together and combines several disciplines and areas including statistics, data analysis9, statistical modeling, data mining,10,11,12,13,14 big data,15 machine learning,16 and artificial intelligence (AI), management science, optimization techniques, and related methods in order to ā€œunderstand and analyze actual phenomenaā€ from data.17
Data science employs techniques and methods from many other fields, such as mathematics, statistics, computer science, and information science. Besides the methods and theories drawn from several fields, data science also uses data visualization techniques using specially designed softwareā€”Tableau and other big data software. The concepts of relational data bases (such as SQL), R-statistical software, and programming language Python are all used in different applications to analyze, extract information, and draw conclusions from data. These are the tools of data science. These tools, techniques, and programming languages provide a unifying approach to explore, analyze, draw conclusions, and make decisions from massive amounts of data companies collect.
Data science employs the tools of information technology, management science (mathematical modeling, and simulation), along with data mining and fact-based data to measure past performance to guide an organization in planning and predicting future outcomes to aid in effective decision making.
Turing award18 winner Jim Gray viewed data science as a ā€œfourth paradigmā€ of science (empirical, theoretical, computational, and now data-driven) and asserted that ā€œeverything about science is changing because of the impact of information technologyā€ and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, distributed and parallel systems as the three emerging foundational professional communities.
Another Look at Data Science
Data science can be viewed as a multidisciplinary field focused on finding actionable insights from large sets of raw, structured...

Table of contents