Data Science Programming All-in-One For Dummies
eBook - ePub

Data Science Programming All-in-One For Dummies

John Paul Mueller, Luca Massaron

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Data Science Programming All-in-One For Dummies

John Paul Mueller, Luca Massaron

Book details
Book preview
Table of contents
Citations

About This Book

Your logical, linear guide to the fundamentals of data science programming

Data science is exploding—in a good way—with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models.

Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time.

  • Get grounded: the ideal start for new data professionals
  • What lies ahead: learn about specific areas that data is transforming
  • Be meaningful: find out how to tell your data story
  • See clearly: pick up the art of visualization

Whether you're a beginning student or already mid-career, get your copy now and add even more meaning to your life—and everyone else's!

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Data Science Programming All-in-One For Dummies an online PDF/ePUB?
Yes, you can access Data Science Programming All-in-One For Dummies by John Paul Mueller, Luca Massaron in PDF and/or ePUB format, as well as other popular books in Informatica & Data mining. We have over one million books available in our catalogue for you to explore.

Information

Publisher
For Dummies
Year
2019
ISBN
9781119626145
Edition
1
Subtopic
Data mining
Book 1

Defining Data Science

Contents at a Glance

  1. Chapter 1: Considering the History and Uses of Data Science
    1. Considering the Elements of Data Science
    2. Defining the Role of Data in the World
    3. Creating the Data Science Pipeline
    4. Comparing Different Languages Used for Data Science
    5. Learning to Perform Data Science Tasks Fast
  2. Chapter 2: Placing Data Science within the Realm of AI
    1. Seeing the Data to Data Science Relationship
    2. Defining the Levels of AI
    3. Creating a Pipeline from Data to AI
  3. Chapter 3: Creating a Data Science Lab of Your Own
    1. Considering the Analysis Platform Options
    2. Choosing a Development Language
    3. Obtaining and Using Python
    4. Obtaining and Using R
    5. Presenting Frameworks
    6. Accessing the Downloadable Code
  4. Chapter 4: Considering Additional Packages and Libraries You Might Want
    1. Considering the Uses for Third-Party Code
    2. Obtaining Useful Python Packages
    3. Locating Useful R Libraries
  5. Chapter 5: Leveraging a Deep Learning Framework
    1. Understanding Deep Learning Framework Usage
    2. Working with Low-End Frameworks
    3. Understanding TensorFlow
Chapter 1

Considering the History and Uses of Data Science

IN THIS CHAPTER
check
Understanding data science history and uses
check
Considering the flow of data in data science
check
Working with various languages in data science
check
Performing data science tasks quickly
The burgeoning uses for data in the world today, along with the explosion of data sources, create a demand for people who have special skills to obtain, manage, and analyze information for the benefit of everyone. The data scientist develops and hones these special skills to perform such tasks on multiple levels, as described in the first two sections of this chapter.
Data needs to be funneled into acceptable forms that allow data scientists to perform their tasks. Even though the precise data flow varies, you can generalize it to a degree. The third section of the chapter gives you an overview of how data flow occurs.
As with anyone engaged in computer work today, a data scientist employs various programming languages to express the manipulation of data in a repeatable manner. The languages that a data scientist uses, however, focus on outputs expected from given inputs, rather than on low-level control or a precise procedure, as a computer scientist would use. Because a data scientist may lack a formal programming education, the languages tend to focus on declarative strategies, with the data scientist expressing a desired outcome rather than devising a specific procedure. The fourth section of the chapter discusses various languages used by data scientists, with an emphasis on Python and R.
The final section of the chapter provides a very quick overview of getting tasks done quickly. Optimization without loss of precision is an incredibly difficult task and you see it covered a number of times in this book, but this introduction is enough to get you started. The overall goal of this first chapter is to describe data science and explain how a data scientist uses algorithms, statistics, data extraction, data manipulation, and a slew of other technologies to employ it as part of an analysis.
Remember
You don’t have to type the source code for this chapter manually (or, actually at all, given that you use it only to obtain an understanding of the data flow process). In fact, using the downloadable source is a lot easier. The source code for this chapter appears in the DSPD_0101_Quick_Overview.ipynb source code file for Python. See the Introduction for details on how to find these source files.

Considering the Elements of Data Science

At one point, the world viewed anyone working with statistics as a sort of accountant or perhaps a mad scientist. Many people consider statistics and the analysis of data boring. However, data science is one of those occupations in which the more you learn, the more you want to learn. Answering one question often spawns more questions that are even more interesting than the one you just answered. However, what makes data science so sexy is that you see it everywhere, used in an almost infinite number of ways. The following sections give you more details on why data science is such an amazing field of study.

Considering the emergence of data science

Data science is a relatively new term. William S. Cleveland coined the term in 2001 as part of a paper entitled “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” It wasn't until a year later that the International Council for Science actually recognized data science and created a committee for it. Columbia University got into the act in 2003 by beginning publication of the Journal of Data Science.
Remember
However, the mathematical basis behind data science is centuries old because data science is essentially a method of viewing and analyzing statistics and probability. The first essential use of statistics as a term comes in 1749, but statistics are certainly much older than that. People have used statistics to recognize patterns for thousands of years. For example, the historian Thucydides (in his History of the Peloponnesian War) describes how the Athenians calculated the height of the wall of Platea in fifth century BC by counting bricks in an unplastered section of the wall. Because the count needed to be accurate, the Athenians took the average of the count by several solders.
The process of quantifying and understanding statistics is relatively new, but the science itself is quite old. An early attempt to begin documenting the importance of statistics appears in the ninth century, when Al-Kindi wrote Manuscript on Deciphering Cryptographic Messages. In this paper, Al-Kindi describes how to use a combination of statistics and frequency analysis to decipher encrypted messages. Even in the beginning, statistics saw use in the practical application of science for tasks that seemed virtually impossible to complete. Data science continues this process, and to some people it might actually seem like magic.

Outlining the core competencies of a data scientist

As is true of anyone performing most complex trades today, the data scientist requires knowledge of a broad range of skills to perform the required tasks. In fact, so many different skills are required that data scientists often work in teams. Someone who is good at gathering data might team up with an analyst and someone gifted in presenting information. Finding a single person who possesses all the required skills would be hard. With this in mind, the following list describes areas in which a data scientist can excel (with more competencies being better):
  • Data capture: It doesn’t matter what sort of math skills you have if you can’t obtain data to analyze in the first place. The act of capturing data begins by managing a data source using database-management skills. However, raw data isn’t particularly useful in many situations; you must also understand the data domain so that you can look at the data and begin formulating the sorts of questions to ask. Finally, you must have data-modeling skills so that you understand how the data is connected and whether the data is structured.
  • Analysis: After you have data to work with and understand the complexities of that data, you can begin to perform an analysis on it. You perform some analysis using basic statistical tool skills, much like those that just about everyone learns in college. However, the use of specialized math tricks and algorithms can make patterns in the data more obvious or help you draw conclusions that you can’t draw by reviewing the data alone.
  • Presentation: Most people don’t understand numbers well. They can’t see the patterns that the data scientist sees. Providing a graphical presentation of these patterns is important to help others visualize what the numbers mean and how to apply them in a meaningful way. More important, the presentation must tell a specific story so that the impact of the data isn’t lost.

Linking data science, big data, and AI

Interestingly enough, the act of moving data around so that someone can perform analysis ...

Table of contents