eBook - ePub

Just Enough R!

Name: Just Enough R!
ISBN: 9781000073560

An Interactive Approach to Machine Learning and Analytics

Richard J. Roiger,

346 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Just Enough R!

An Interactive Approach to Machine Learning and Analytics

Richard J. Roiger,

About this book

Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step-by-step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided, allowing the reader to execute the scripts as they study the explanations given in the text.

Features

Gets you quickly using R as a problem-solving tool

Uses RStudio's integrated development environment

Shows how to interface R with SQLite

Includes examples using R's Rattle graphical user interface

Requires no prior knowledge of R, machine learning, or computer programming

Offers over 50 scripts written in R, including several problem-solving templates that, with slight modification, can be used again and again

Covers the most popular machine learning techniques, including ensemble-based methods and logistic regression

Includes end-of-chapter exercises, many of which can be solved by modifying existing scripts

Includes datasets from several areas, including business, health and medicine, and science

About the Author

Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato, where he taught and performed research in the Computer and Information Science Department for over 30 years.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2020

Print ISBN

9780367439149

eBook ISBN

9781000073560

Topic

Computer Science

Subtopic

Statistics for Business & Economics

Index

Computer Science

CHAPTER 1 Introduction to Machine Learning

In This Chapter

Definitions and Terminology
Machine Learning Strategies
Evaluation Techniques
Ethical Issues

THE R LANGUAGE CONTINUES to maintain its status as one of the top-rated problem-solving tools within the areas of machine learning, data science, data analytics, data mining, and statistical analysis. It’s easy to see why: R is free, contains thousands of packages, is supported by a growing community of users, and is easy to use when interfaced with RStudio’s integrated development environment!

R’s popularity has resulted in the development of thousands of tutorials on machine learning. The information is all there! Unfortunately, it’s easy to get lost in a maze of too much information. Valuable time is spent trying to find exactly what is needed to solve problems. The end result is frustration and difficulty understanding what’s important.

We believe our approach of presenting and clearly explaining script-based problem-solving techniques provides the tools you need for machine learning with R. The book’s title reflects its purpose. Just Enough R! gives you just enough of the R language and machine learning methods to minimize stumbling blocks and cut through the maze. Our goal is to give you what you need to become productive with R as quickly as possible.

In this chapter, we offer a brief introduction to machine learning. In Chapter 2, we move right into the nuts and bolts of the R language and the problem-solving techniques it offers. We conclude this chapter with a short summary, key term definitions, and a set of exercises. Let’s get started!

1.1 Machine Learning, Statistical Analysis, and Data Science

It’s almost impossible to surf the Web, open a newspaper, or turn on the TV without being exposed to terms such as machine learning, statistical analysis, data science, data analytics, and data mining. Most people have some idea about what these terms mean, but if you ask for a precise definition of any of them, you get a variety of answers. Here are a few distinctions:

Building models to find structure in data has its roots in the fields of mathematics and statistics. Statistical methods are differentiated from other techniques in that they make certain assumptions about the nature of the data. Technically, if these assumptions are violated, the models built with these techniques may be inaccurate.
Machine learning can be differentiated from statistical modeling in that assumptions about data distributions and variable independence are not a concern. Machine learning is generally considered an area of specialization within the broader field of artificial intelligence. However, most textbooks make little or no distinction between machine learning and statistical methods.
Data science or data analytics is often defined as the process of extracting meaningful knowledge from data. Its methods come from several disciplines including computer science, mathematics, statistics, data warehousing, and distributed processing to name a few. Although machine learning is often seen in data science applications, it is not required.
Data mining first became popular in the academic community about 1995 and can be defined as the process of using one or several machine learning algorithms to find structure in data. The structure may take many forms including a set of rules, a graph or network, a tree, one or several equations, and more. The structure can be part of a complex visual dashboard or as simple as a list of political candidates and an associated number representing voter sentiment based on twitter feeds.
The phrase knowledge discovery in databases (KDD) was coined in 1989 to emphasize that knowledge can be derived from data-driven discovery and is frequently used interchangeably with data mining. In addition to performing data mining, a typical KDD process model includes a methodology for extracting and preparing data as well as making decisions about actions to be taken once data mining has taken place. As much of today’s data is not found in a traditional data warehouse, KDD is most often associated with knowledge discovery in data.

Although these general distinctions might be made, the most important point is that all of these terms define techniques designed to solve problems by finding interesting structure in data. We prefer to use the term machine learning as our focus is both on how to apply the algorithms and on understanding how the algorithms work. However, we often interchange the terms machine learning and data mining.

1.2 Machine Learning: A First Example

Supervised learning is probably the best and most widely used technique for machine learning. The purpose of supervised learning is twofold. First, we use supervised learning to build classification models from sets of data containing examples and nonexamples of the concepts to be learned. Each example or nonexample is known as an instance of data. Second, once a classification model has been constructed, the model is used to determine the classificat...

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface
Acknowledgment
Author
Chapter 1 ◾ Introduction to Machine Learning
Chapter 2 ◾ Introduction to R
Chapter 3 ◾ Data Structures and Manipulation
Chapter 4 ◾ Preparing the Data
Chapter 5 ◾ Supervised Statistical Techniques
Chapter 6 ◾ Tree-Based Methods
Chapter 7 ◾ Rule-Based Techniques
Chapter 8 ◾ Neural Networks
Chapter 9 ◾ Formal Evaluation Techniques
Chapter 10 ◾ Support Vector Machines
Chapter 11 ◾ Unsupervised Clustering Techniques
Chapter 12 ◾ A Case Study in Predicting Treatment Outcome
Bibliography
Appendix A: Supplementary Materials and More Datasets
Appendix B: Statistics for Performance Evaluation
Subject Index
Index of R Functions
Script Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Just Enough R! by Richard J. Roiger in PDF and/or ePUB format, as well as other popular books in Computer Science & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions