Just Enough R!
An Interactive Approach to Machine Learning and Analytics
Richard J. Roiger
- 346 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Just Enough R!
An Interactive Approach to Machine Learning and Analytics
Richard J. Roiger
About This Book
Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing" as it first gives step-by-step explanations using simple, understandable examples of how the various machine learning algorithms work independent of any programming language. This is followed by detailed scripts written in R that apply the algorithms to solve nontrivial problems with real data. The script code is provided, allowing the reader to execute the scripts as they study the explanations given in the text.
Features
- Gets you quickly using R as a problem-solving tool
-
- Uses RStudio's integrated development environment
-
- Shows how to interface R with SQLite
-
- Includes examples using R's Rattle graphical user interface
-
- Requires no prior knowledge of R, machine learning, or computer programming
-
- Offers over 50 scripts written in R, including several problem-solving templates that, with slight modification, can be used again and again
-
- Covers the most popular machine learning techniques, including ensemble-based methods and logistic regression
-
- Includes end-of-chapter exercises, many of which can be solved by modifying existing scripts
-
- Includes datasets from several areas, including business, health and medicine, and science
-
About the Author
Richard J. Roiger is a professor emeritus at Minnesota State University, Mankato, where he taught and performed research in the Computer and Information Science Department for over 30 years.
Frequently asked questions
Information
CHAPTER 1
Introduction to Machine Learning
- Definitions and Terminology
- Machine Learning Strategies
- Evaluation Techniques
- Ethical Issues
1.1 Machine Learning, Statistical Analysis, and Data Science
- Building models to find structure in data has its roots in the fields of mathematics and statistics. Statistical methods are differentiated from other techniques in that they make certain assumptions about the nature of the data. Technically, if these assumptions are violated, the models built with these techniques may be inaccurate.
- Machine learning can be differentiated from statistical modeling in that assumptions about data distributions and variable independence are not a concern. Machine learning is generally considered an area of specialization within the broader field of artificial intelligence. However, most textbooks make little or no distinction between machine learning and statistical methods.
- Data science or data analytics is often defined as the process of extracting meaningful knowledge from data. Its methods come from several disciplines including computer science, mathematics, statistics, data warehousing, and distributed processing to name a few. Although machine learning is often seen in data science applications, it is not required.
- Data mining first became popular in the academic community about 1995 and can be defined as the process of using one or several machine learning algorithms to find structure in data. The structure may take many forms including a set of rules, a graph or network, a tree, one or several equations, and more. The structure can be part of a complex visual dashboard or as simple as a list of political candidates and an associated number representing voter sentiment based on twitter feeds.
- The phrase knowledge discovery in databases (KDD) was coined in 1989 to emphasize that knowledge can be derived from data-driven discovery and is frequently used interchangeably with data mining. In addition to performing data mining, a typical KDD process model includes a methodology for extracting and preparing data as well as making decisions about actions to be taken once data mining has taken place. As much of todayâs data is not found in a traditional data warehouse, KDD is most often associated with knowledge discovery in data.