Feature Engineering Made Easy
eBook - ePub

Feature Engineering Made Easy

  1. 289 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Feature Engineering Made Easy

About this book

A perfect guide to speed up the predicting power of machine learning algorithms

Key Features

  • Design, discover, and create dynamic, efficient features for your machine learning application
  • Understand your data in-depth and derive astonishing data insights with the help of this Guide
  • Grasp powerful feature-engineering techniques and build machine learning systems

Book Description

Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective.

You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data.

By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.

What you will learn

  • Identify and leverage different feature types
  • Clean features in data to improve predictive power
  • Understand why and how to perform feature selection, and model error analysis
  • Leverage domain knowledge to construct new features
  • Deliver features based on mathematical insights
  • Use machine-learning algorithms to construct features
  • Master feature engineering and optimization
  • Harness feature engineering for real world applications through a structured case study

Who this book is for

If you are a data science professional or a machine learning engineer looking to strengthen your predictive analytics model, then this book is a perfect guide for you. Some basic understanding of the machine learning concepts and Python scripting would be enough to get started with this book.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2018
Print ISBN
9781787287600
Edition
1
eBook ISBN
9781787286474

Feature Construction

In the previous chapter, we worked with the Pima Indian Diabetes Prediction dataset to get a better understanding of which given features in our dataset are most valuable. Working with the features that were available to us, we identified missing values within our columns and employed techniques of dropping missing values, imputing, and normalizing/standardizing our data to improve the accuracy of our machine learning model.
It is important to note that, up to this point, we have only worked with features that are quantitative. We will now shift into dealing with categorical data, in addition to the quantitative data that has missing values. Our main focus will be to work with our given features to construct entirely new features for our models to learn from.
There are various methods we can utilize to construct our features, with the most basic starting with the pandas library in Python to scale an existing feature by a multiples. We will be diving into some more mathematically intensive methods, and will employ various packages available to us through the scikit-learn library; we will also create our own custom classes. We will go over these classes in detail as we get into the code.
We will be covering the following topics in our discussions:
  • Examining our dataset
  • Imputing categorical features
  • Encoding categorical variables
  • Extending numerical features
  • Text-specific feature construction

Examining our dataset

For demonstrative purposes, in this chapter, we will utilize a dataset that we have created, so that we can showcase a variety of data levels and types. Let's set up our DataFrame and dive into our data.
We will use pandas to create the DataFrame we will work with, as this is the primary data structure in pandas. The advantage of a pandas DataFrame is that there are several attributes and methods available for us to perform on our data. This allows us to logically manipulate the data to develop a thorough understanding of what we are working with, and how best to structure our machine learning models:
  1. First, let's import pandas:
# import pandas as pd
  1. Now, we can set up our DataFrame X. To do this, we will utilize the DataFrame method in pandas, which creates a tabular data structure (table with rows and columns). This method can take in a few types of data (NumPy arrays or dictionaries, to name a couple). Here, we will be passing-in a dictionary with keys as column headers and values as lists, with each list representing a column:
X = pd.DataFrame({'city':['tokyo', None, 'london', 'seattle', 'san francisco', 'tokyo'], 'boolean':['yes', 'no', None, 'no', 'no', 'yes'], 'ordinal_column':['somewhat like'...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Packt Upsell
  4. Contributors
  5. Preface
  6. Introduction to Feature Engineering
  7. Feature Understanding – What's in My Dataset?
  8. Feature Improvement - Cleaning Datasets
  9. Feature Construction
  10. Feature Selection
  11. Feature Transformations
  12. Feature Learning
  13. Case Studies
  14. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Feature Engineering Made Easy by Sinan Ozdemir, Divya Susarla, Michael Smith in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.