eBook - ePub

The Essentials of Data Science: Knowledge Discovery Using R

Name: The Essentials of Data Science: Knowledge Discovery Using R
ISBN: 9781351647496

Graham J. Williams,

322 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

The Essentials of Data Science: Knowledge Discovery Using R

Graham J. Williams,

About this book

The Essentials of Data Science: Knowledge Discovery Using R presents the concepts of data science through a hands-on approach using free and open source software. It systematically drives an accessible journey through data analysis and machine learning to discover and share knowledge from data.

Building on over thirty years' experience in teaching and practising data science, the author encourages a programming-by-example approach to ensure students and practitioners attune to the practise of data science while building their data skills. Proven frameworks are provided as reusable templates. Real world case studies then provide insight for the data scientist to swiftly adapt the templates to new tasks and datasets.

The book begins by introducing data science. It then reviews R's capabilities for analysing data by writing computer programs. These programs are developed and explained step by step. From analysing and visualising data, the framework moves on to tried and tested machine learning techniques for predictive modelling and knowledge discovery. Literate programming and a consistent style are a focus throughout the book.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2017

Print ISBN

9780367488376

9781498740005

Edition

eBook ISBN

9781351647496

Topic

Computer Science

Subtopic

Data Mining

Index

Computer Science

Preface

List of Figures

List of Tables

1 Data Science

1.1 Exercises

2 Introducing R

2.1 Tooling For R Programming

2.2 Packages and Libraries

2.3 Functions, Commands and Operators

2.4 Pipes

2.5 Getting Help

2.6 Exercises

3 Data Wrangling

3.1 Data Ingestion

3.2 Data Review

3.3 Data Cleaning

3.4 Variable Roles

3.5 Feature Selection

3.6 Missing Data

3.7 Feature Creation

3.8 Preparing the Metadata

3.9 Preparing for Model Building

3.10 Save the Dataset

3.11 A Template for Data Preparation

3.12 Exercises

4 Visualising Data

4.1 Preparing the Dataset

4.2 Scatter Plot

4.3 Bar Chart

4.4 Saving Plots to File

4.5 Adding Spice to the Bar Chart

4.6 Alternative Bar Charts

4.7 Box Plots

4.8 Exercises

5 Case Study: Australian Ports

5.1 Data Ingestion

5.2 Bar Chart: Value/Weight of Sea Trade

5.3 Scatter Plot: Throughput versus Annual Growth

5.4 Combined Plots: Port Calls

5.5 Further Plots

5.6 Exercises

6 Case Study: Web Analytics

6.1 Sourcing Data from CKAN

6.2 Browser Data

6.3 Entry Pages

6.4 Exercises

7 A Pattern for Predictive Modelling

7.1 Loading the Dataset

7.2 Building a Decision Tree Model

7.3 Model Performance

7.4 Evaluating Model Generality

7.6 Comparison of Performance Measures

7.7 Save the Model to File

7.8 A Template for Predictive Modelling

7.9 Exercises

8 Ensemble of Predictive Models

8.1 Loading the Dataset

8.2 Random Forest

8.3 Extreme Gradient Boosting

8.4 Exercises

9 Writing Functions in R

9.1 Model Evaluation

9.2 Creating a Function

9.3 Function for ROC Curves

9.4 Exercises

10 Literate Data Science

10.1 Basic L^AT_EX Template

10.2 A Template for our Narrative

10.3 Including R Commands

10.4 Inline R Code

10.5 Formatting Tables Using Kable

10.6 Formatting Tables Using XTable

10.7 Including Figures

10.8 Add a Caption and Label

10.9 Knitr Options

10.10 Exercises

11 R with Style

11.1 Why We Should Care

11.2 Naming

11.3 Comments

11.4 Layout

11.5 Functions

11.6 Assignment

11.7 Miscellaneous

11.8 Exercises

Bibliography

Index

Preface

From data we derive information and by combining different bits of information we build knowledge. It is then with wisdom that we deploy knowledge into enterprises, governments, and society. Data is core to every organisation as we continue to digitally capture volumes and a variety of data at an unprecedented velocity. The demand for data science continues to growing substantially with a shortfall of data scientists worldwide.

Professional data scientists combine a good grounding in computer science and statistics with an ability to explore through the space of data to make sense of the world. Data science relies on their aptitude and art for observation, mathematics, and logical reasoning.

This book introduces the essentials of data analysis and machine learning as the foundations for data science. It uses the free and open source software R (R Core Team, 2017) which is freely available to anyone. All are permitted, and indeed encouraged, to read the source code to learn, understand, verify, and extend it. Being open source we also have the assurance that the software will always be available. R is supported by a worldwide network of some of the world’s leading statisticians and professional data scientists.

Features

A key feature of this book, differentiating it from other textbooks on data science, is the focus on the hands-on end-to-end process. It covers data analysis including loading data into R, wrangling the data to improve its quality and utility, visualising the data to gain understanding and insight, and, importantly, using machine learning to discover knowledge from the data.

This book brings together the essentials of doing data science based on over 30 years of the practise and teaching of data science. It presents a programming-by-example approach that allows students to quickly achieve outcomes whilst building a skill set and knowledge base, without getting sidetracked into the details of programming.

The book systematically develops an end-to-end process flow for ...

Cover
Halftitle
Title
Copyright
Table of Contents

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is The Essentials of Data Science: Knowledge Discovery Using R an online PDF/ePUB?

Yes, you can access The Essentials of Data Science: Knowledge Discovery Using R by Graham J. Williams in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over 1.5 million books available in our catalogue for you to explore.

The Essentials of Data Science: Knowledge Discovery Using R

The Essentials of Data Science: Knowledge Discovery Using R

About this book

Trusted by 375,005 students

Information

Contents

Preface

Features

Table of contents

Frequently asked questions