Essentials of Data Science and Analytics
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

Compartir libro
  1. 150 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Essentials of Data Science and Analytics

Statistical Tools, Machine Learning, and R-Statistical Software Overview

Amar Sahay

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

This text provides a comprehensive overview of Data Science.

With the continued advancement in storage and computing technologies, data science has emerged as one of the most desired fields in driving business decisions. Data science employs techniques and methods from many other fields, such as statistics, mathematics, computer science, and information science. Besides the methods and theories drawn from several fields, data science uses visualization techniques using specially designed big data software and statistical programming language, such as R programming, and Python. Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI).

The book is divided into four different areas divided into different chapters. These chapters explain the core of Data Science. Part I of the book introduces the field of Data Science, different disciplines it comprises of, and the scope with future outlook and career prospects. This section also explains analytics, business analytics, and business intelligence and their similarities and differences with Data Science. Since the data is at the core of Data science, Part II is devoted to explaining the data, big data, and other features of data. One full chapter is devoted to Data Analysis, creating visuals, pivot table, and other applications using Excel with office 365. Part III explains the statistics behind Data Science. It uses several chapters to explain the statistics and its importance, numerical and data visualization tools and methods, probability, and probability distribution applications in Data Science. Other chapters in the Part III are Sampling, Estimation, and Hypothesis Testing. All these are integral part of Data Science applications. Part IV of the book provides the basics of Machine Learning (ML) and R-statistical software.

Data Science has wide applications in the areas of Machine Learning (ML) and Artificial Intelligence (AI) and R-statistical software is widely used by data science professionals. The book also outlines a brief history, the body of knowledge, skills and education requirements for Data Scientist and data science professionals. Some statistics on job growth and prospects are also summarized. A career in data science is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Essentials of Data Science and Analytics un PDF/ePUB en línea?
Sí, puedes acceder a Essentials of Data Science and Analytics de Amar Sahay en formato PDF o ePUB, así como a otros libros populares de Economia y Statistiche per il settore aziendale ed economico. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

PART I
Data Science, Analytics, and Business Analytics
CHAPTER 1
Data Science and Its Scope
Chapter Highlights
Introduction
What Is Data Science?
Objective and Overview of Chapters
What Is Data Science?
Another Look at Data Science
Data Science and Statistics
Role of Statistics in Data Science
Data Science: A Brief History
Difference between Data Science and Data Analytics
Knowledge and Skills for Data Science Professionals
Some Technologies used in Data Science
Career Path for Data Science Professional and Data Scientist
Future Outlook
Summary
Introduction
Data science is about extracting knowledge and insights from data. The tools and techniques of data science are used to drive business and process decisions. It can be seen as a major data-driven decision-making approach to decision making. Data science is a multidisciplinary field that involves the ability to understand, process, and visualize data in the initial stages followed by applications of statistics, modeling, mathematics, and technology to address and solve analytically complex problems using structured and unstructured data. At the core of data science is data. It is about using this data in creative and effective ways to help businesses in making data-driven business decisions.
The knowledge of statistics in data science is as important as the applications of computer science. Companies now collect massive amounts of data from exabytes to zettabytes, which are both structured and unstructured. The advancement in technology and the computing capabilities have made it possible to store, process, and analyze this huge data with smarter storage spaces.
Data science is applied to extract information from both structured and unstructured data.1,2
Unstructured data is usually not organized in a structured manner and may contain qualitative or categorical elements, such as dates, categories, and so on, and are text heavy. They also contain numbers and other forms of measurements. Compared to structured data, the unstructured data contain irregularities. The ambiguities in unstructured data make it difficult to apply traditional tools of statistics and data analysis. Structured data are usually stored in clearly defined fields in databases. The software applications and programs are designed to process such data. In recent years, a number of newly developed tools and software programs have emerged that are capable of analyzing big and unstructured data. One of the earliest applications of unstructured data is in analyzing text data using text-mining and other methods.
Recently, unstructured data is becoming more prevalent. In 1998, Merrill Lynch said, “unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%.”1 Here are some other predictions: As of 2012, IDC (International Data Group)3 and Dell EMC4 project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.4 More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 20255 and majority of that will be unstructured. The Computer World magazine7 states that unstructured information might account for more than 70 to 80 percent of all data in in organizations. (https://en.wikipedia.org/wiki/Unstructured_data)8
Objective and Overview of Chapters
The objective of this book is to provide an introductory overview of data science, understand what data science is, and why data science is such an important field. We will also explore and outline the role of data scientists/professionals and what they do.
The initial chapters of the book introduce data science and closely related areas. The terms data science, data analytics, business analytics, and business intelligence are often used interchangeably even by the professions in the fields. Therefore, Chapter 1, which provides an overview of data science, is followed by two chapters that explain the relationship between data science, analytics, and business intelligence. Analytics itself is wide area and different forms of analytics including descriptive, predictive, and prescriptive analytics are used by companies to drive major business decisions. Chapters 2 and 3 outline the differences and similarities between data science, analytics, and business intelligence. Chapter 2 also outlines the tools of descriptive, predictive, and prescriptive analytics along with the most recent and emerging technologies of machine learning and artificial intelligence. Since the field is data science is about the data, a chapter is devoted to data and data types. Chapter 4 provides definitions of data, different forms of data, and their types followed by some tools and techniques for working with data. One of the major objectives of data science is to make sense from the massive amounts of data companies collect. One of the ways of making sense from data is to apply data visualization or graphical techniques used in data analysis. Understanding other tools and techniques for working with data are also important. A chapter is devoted to data visualization.
Data science is a vast area. Besides visualization techniques and statistical analysis, it uses statistical programming language such as R programming, and a knowledge of databases (SQL or MySQL) or other data base management system.
One major application of data science is in the area of Machine Learning (ML) and Artificial Intelligence. The book provides a detailed overview of data science by defining and outlining the tools and techniques. As mentioned earlier, the book also explains the differences and similarities between data science and data analytics. The other concepts related to data science including analytics, business analytics, and business intelligence (BI) are discussed in detail. The field of data science is about processing, cleaning, and analyzing data. These concepts and topics are important to understand the field of data science and are discussed in this book. Data science is an emerging field in data analysis and decision making.
What Is Data Science?
Data science may be thought of as a data driven decision making approach that uses several different areas, methods, algorithms, models, and disciplines with a purpose of extracting insights and knowledge from structured and unstructured data. These insights are helpful in applying algorithms and models to make decisions. The models in data science are used in predictive analytics to predict future outcomes.
Data science, as a field, has much broader scope than analytics, business analytics, or business intelligence. It brings together and combines several disciplines and areas including statistics, data analysis9, statistical modeling, data mining,10,11,12,13,14 big data,15 machine learning,16 and artificial intelligence (AI), management science, optimization techniques, and related methods in order to “understand and analyze actual phenomena” from data.17
Data science employs techniques and methods from many other fields, such as mathematics, statistics, computer science, and information science. Besides the methods and theories drawn from several fields, data science also uses data visualization techniques using specially designed software—Tableau and other big data software. The concepts of relational data bases (such as SQL), R-statistical software, and programming language Python are all used in different applications to analyze, extract information, and draw conclusions from data. These are the tools of data science. These tools, techniques, and programming languages provide a unifying approach to explore, analyze, draw conclusions, and make decisions from massive amounts of data companies collect.
Data science employs the tools of information technology, management science (mathematical modeling, and simulation), along with data mining and fact-based data to measure past performance to guide an organization in planning and predicting future outcomes to aid in effective decision making.
Turing award18 winner Jim Gray viewed data science as a “fourth paradigm” of science (empirical, theoretical, computational, and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, distributed and parallel systems as the three emerging foundational professional communities.
Another Look at Data Science
Data science can be viewed as a multidisciplinary field focused on finding actionable insights from large sets of raw, structured...

Índice