Essentials of Data Science and Analytics
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

  1. 150 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

This text provides a comprehensive overview of Data Science.

With the continued advancement in storage and computing technologies, data science has emerged as one of the most desired fields in driving business decisions. Data science employs techniques and methods from many other fields, such as statistics, mathematics, computer science, and information science. Besides the methods and theories drawn from several fields, data science uses visualization techniques using specially designed big data software and statistical programming language, such as R programming, and Python. Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI).

The book is divided into four different areas divided into different chapters. These chapters explain the core of Data Science. Part I of the book introduces the field of Data Science, different disciplines it comprises of, and the scope with future outlook and career prospects. This section also explains analytics, business analytics, and business intelligence and their similarities and differences with Data Science. Since the data is at the core of Data science, Part II is devoted to explaining the data, big data, and other features of data. One full chapter is devoted to Data Analysis, creating visuals, pivot table, and other applications using Excel with office 365. Part III explains the statistics behind Data Science. It uses several chapters to explain the statistics and its importance, numerical and data visualization tools and methods, probability, and probability distribution applications in Data Science. Other chapters in the Part III are Sampling, Estimation, and Hypothesis Testing. All these are integral part of Data Science applications. Part IV of the book provides the basics of Machine Learning (ML) and R-statistical software.

Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI) and R-statistical software is widely used by data science professionals. The book also outlines a brief history, the body of knowledge, skills and education requirements for Data Scientist and data science professionals. Some statistics on job growth and prospects are also summarized. A career in data science is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Essentials of Data Science and Analytics est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Essentials of Data Science and Analytics par Amar Sahay en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Economia et Statistiche per il settore aziendale ed economico. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

PART I
Data Science, Analytics, and Business Analytics
CHAPTER 1
Data Science and Its Scope
Chapter Highlights
‱ Introduction
‱ What Is Data Science?
‱ Objective and Overview of Chapters
‱ What Is Data Science?
‱ Another Look at Data Science
‱ Data Science and Statistics
‱ Role of Statistics in Data Science
‱ Data Science: A Brief History
‱ Difference between Data Science and Data Analytics
‱ Knowledge and Skills for Data Science Professionals
‱ Some Technologies used in Data Science
‱ Career Path for Data Science Professional and Data Scientist
‱ Future Outlook
‱ Summary
Introduction
Data science is about extracting knowledge and insights from data. The tools and techniques of data science are used to drive business and process decisions. It can be seen as a major data-driven decision-making approach to decision making. Data science is a multidisciplinary field that involves the ability to understand, process, and visualize data in the initial stages followed by applications of statistics, modeling, mathematics, and technology to address and solve analytically complex problems using structured and unstructured data. At the core of data science is data. It is about using this data in creative and effective ways to help businesses in making data-driven business decisions.
The knowledge of statistics in data science is as important as the applications of computer science. Companies now collect massive amounts of data from exabytes to zettabytes, which are both structured and unstructured. The advancement in technology and the computing capabilities have made it possible to store, process, and analyze this huge data with smarter storage spaces.
Data science is applied to extract information from both structured and unstructured data.1,2
Unstructured data is usually not organized in a structured manner and may contain qualitative or categorical elements, such as dates, categories, and so on, and are text heavy. They also contain numbers and other forms of measurements. Compared to structured data, the unstructured data contain irregularities. The ambiguities in unstructured data make it difficult to apply traditional tools of statistics and data analysis. Structured data are usually stored in clearly defined fields in databases. The software applications and programs are designed to process such data. In recent years, a number of newly developed tools and software programs have emerged that are capable of analyzing big and unstructured data. One of the earliest applications of unstructured data is in analyzing text data using text-mining and other methods.
Recently, unstructured data is becoming more prevalent. In 1998, Merrill Lynch said, “unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%.”1 Here are some other predictions: As of 2012, IDC (International Data Group)3 and Dell EMC4 project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.4 More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 20255 and majority of that will be unstructured. The Computer World magazine7 states that unstructured information might account for more than 70 to 80 percent of all data in in organizations. (https://en.wikipedia.org/wiki/Unstructured_data)8
Objective and Overview of Chapters
The objective of this book is to provide an introductory overview of data science, understand what data science is, and why data science is such an important field. We will also explore and outline the role of data scientists/professionals and what they do.
The initial chapters of the book introduce data science and closely related areas. The terms data science, data analytics, business analytics, and business intelligence are often used interchangeably even by the professions in the fields. Therefore, Chapter 1, which provides an overview of data science, is followed by two chapters that explain the relationship between data science, analytics, and business intelligence. Analytics itself is wide area and different forms of analytics including descriptive, predictive, and prescriptive analytics are used by companies to drive major business decisions. Chapters 2 and 3 outline the differences and similarities between data science, analytics, and business intelligence. Chapter 2 also outlines the tools of descriptive, predictive, and prescriptive analytics along with the most recent and emerging technologies of machine learning and artificial intelligence. Since the field is data science is about the data, a chapter is devoted to data and data types. Chapter 4 provides definitions of data, different forms of data, and their types followed by some tools and techniques for working with data. One of the major objectives of data science is to make sense from the massive amounts of data companies collect. One of the ways of making sense from data is to apply data visualization or graphical techniques used in data analysis. Understanding other tools and techniques for working with data are also important. A chapter is devoted to data visualization.
Data science is a vast area. Besides visualization techniques and statistical analysis, it uses statistical programming language such as R programming, and a knowledge of databases (SQL or MySQL) or other data base management system.
One major application of data science is in the area of Machine Learning (ML) and Artificial Intelligence. The book provides a detailed overview of data science by defining and outlining the tools and techniques. As mentioned earlier, the book also explains the differences and similarities between data science and data analytics. The other concepts related to data science including analytics, business analytics, and business intelligence (BI) are discussed in detail. The field of data science is about processing, cleaning, and analyzing data. These concepts and topics are important to understand the field of data science and are discussed in this book. Data science is an emerging field in data analysis and decision making.
What Is Data Science?
Data science may be thought of as a data driven decision making approach that uses several different areas, methods, algorithms, models, and disciplines with a purpose of extracting insights and knowledge from structured and unstructured data. These insights are helpful in applying algorithms and models to make decisions. The models in data science are used in predictive analytics to predict future outcomes.
Data science, as a field, has much broader scope than analytics, business analytics, or business intelligence. It brings together and combines several disciplines and areas including statistics, data analysis9, statistical modeling, data mining,10,11,12,13,14 big data,15 machine learning,16 and artificial intelligence (AI), management science, optimization techniques, and related methods in order to “understand and analyze actual phenomena” from data.17
Data science employs techniques and methods from many other fields, such as mathematics, statistics, computer science, and information science. Besides the methods and theories drawn from several fields, data science also uses data visualization techniques using specially designed software—Tableau and other big data software. The concepts of relational data bases (such as SQL), R-statistical software, and programming language Python are all used in different applications to analyze, extract information, and draw conclusions from data. These are the tools of data science. These tools, techniques, and programming languages provide a unifying approach to explore, analyze, draw conclusions, and make decisions from massive amounts of data companies collect.
Data science employs the tools of information technology, management science (mathematical modeling, and simulation), along with data mining and fact-based data to measure past performance to guide an organization in planning and predicting future outcomes to aid in effective decision making.
Turing award18 winner Jim Gray viewed data science as a “fourth paradigm” of science (empirical, theoretical, computational, and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, distributed and parallel systems as the three emerging foundational professional communities.
Another Look at Data Science
Data science can be viewed as a multidisciplinary field focused on finding actionable insights from large sets of raw, structured...

Table des matiĂšres