The Essentials of Data Science: Knowledge Discovery Using R
eBook - ePub

The Essentials of Data Science: Knowledge Discovery Using R

  1. 322 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

The Essentials of Data Science: Knowledge Discovery Using R

About this book

The Essentials of Data Science: Knowledge Discovery Using R presents the concepts of data science through a hands-on approach using free and open source software. It systematically drives an accessible journey through data analysis and machine learning to discover and share knowledge from data.

Building on over thirty years' experience in teaching and practising data science, the author encourages a programming-by-example approach to ensure students and practitioners attune to the practise of data science while building their data skills. Proven frameworks are provided as reusable templates. Real world case studies then provide insight for the data scientist to swiftly adapt the templates to new tasks and datasets.

The book begins by introducing data science. It then reviews R's capabilities for analysing data by writing computer programs. These programs are developed and explained step by step. From analysing and visualising data, the framework moves on to tried and tested machine learning techniques for predictive modelling and knowledge discovery. Literate programming and a consistent style are a focus throughout the book.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Essentials of Data Science: Knowledge Discovery Using R by Graham J. Williams in PDF and/or ePUB format, as well as other popular books in Economía & Minería de datos. We have over one million books available in our catalogue for you to explore.

Information

Contents


Preface
List of Figures
List of Tables
1 Data Science
1.1 Exercises
2 Introducing R
2.1 Tooling For R Programming
2.2 Packages and Libraries
2.3 Functions, Commands and Operators
2.4 Pipes
2.5 Getting Help
2.6 Exercises
3 Data Wrangling
3.1 Data Ingestion
3.2 Data Review
3.3 Data Cleaning
3.4 Variable Roles
3.5 Feature Selection
3.6 Missing Data
3.7 Feature Creation
3.8 Preparing the Metadata
3.9 Preparing for Model Building
3.10 Save the Dataset
3.11 A Template for Data Preparation
3.12 Exercises
4 Visualising Data
4.1 Preparing the Dataset
4.2 Scatter Plot
4.3 Bar Chart
4.4 Saving Plots to File
4.5 Adding Spice to the Bar Chart
4.6 Alternative Bar Charts
4.7 Box Plots
4.8 Exercises
5 Case Study: Australian Ports
5.1 Data Ingestion
5.2 Bar Chart: Value/Weight of Sea Trade
5.3 Scatter Plot: Throughput versus Annual Growth
5.4 Combined Plots: Port Calls
5.5 Further Plots
5.6 Exercises
6 Case Study: Web Analytics
6.1 Sourcing Data from CKAN
6.2 Browser Data
6.3 Entry Pages
6.4 Exercises
7 A Pattern for Predictive Modelling
7.1 Loading the Dataset
7.2 Building a Decision Tree Model
7.3 Model Performance
7.4 Evaluating Model Generality
7.6 Comparison of Performance Measures
7.7 Save the Model to File
7.8 A Template for Predictive Modelling
7.9 Exercises
8 Ensemble of Predictive Models
8.1 Loading the Dataset
8.2 Random Forest
8.3 Extreme Gradient Boosting
8.4 Exercises
9 Writing Functions in R
9.1 Model Evaluation
9.2 Creating a Function
9.3 Function for ROC Curves
9.4 Exercises
10 Literate Data Science
10.1 Basic LATEX Template
10.2 A Template for our Narrative
10.3 Including R Commands
10.4 Inline R Code
10.5 Formatting Tables Using Kable
10.6 Formatting Tables Using XTable
10.7 Including Figures
10.8 Add a Caption and Label
10.9 Knitr Options
10.10 Exercises
11 R with Style
11.1 Why We Should Care
11.2 Naming
11.3 Comments
11.4 Layout
11.5 Functions
11.6 Assignment
11.7 Miscellaneous
11.8 Exercises
Bibliography
Index

Preface


From data we derive information and by combining different bits of information we build knowledge. It is then with wisdom that we deploy knowledge into enterprises, governments, and society. Data is core to every organisation as we continue to digitally capture volumes and a variety of data at an unprecedented velocity. The demand for data science continues to growing substantially with a shortfall of data scientists worldwide.
Professional data scientists combine a good grounding in computer science and statistics with an ability to explore through the space of data to make sense of the world. Data science relies on their aptitude and art for observation, mathematics, and logical reasoning.
This book introduces the essentials of data analysis and machine learning as the foundations for data science. It uses the free and open source software R (R Core Team, 2017) which is freely available to anyone. All are permitted, and indeed encouraged, to read the source code to learn, understand, verify, and extend it. Being open source we also have the assurance that the software will always be available. R is supported by a worldwide network of some of the world’s leading statisticians and professional data scientists.

Features

A key feature of this book, differentiating it from other textbooks on data science, is the focus on the hands-on end-to-end process. It covers data analysis including loading data into R, wrangling the data to improve its quality and utility, visualising the data to gain understanding and insight, and, importantly, using machine learning to discover knowledge from the data.
This book brings together the essentials of doing data science based on over 30 years of the practise and teaching of data science. It presents a programming-by-example approach that allows students to quickly achieve outcomes whilst building a skill set and knowledge base, without getting sidetracked into the details of programming.
The book systematically develops an end-to-end process flow for ...

Table of contents

  1. Cover
  2. Halftitle
  3. Title
  4. Copyright
  5. Table of Contents