
Python Data Mining Quick Start Guide
A beginner's guide to extracting valuable insights from your data
- 188 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Python Data Mining Quick Start Guide
A beginner's guide to extracting valuable insights from your data
About this book
Explore the different data mining techniques using the libraries and packages offered by Python
Key Features
- Grasp the basics of data loading, cleaning, analysis, and visualization
- Use the popular Python libraries such as NumPy, pandas, matplotlib, and scikit-learn for data mining
- Your one-stop guide to build efficient data mining pipelines without going into too much theory
Book Description
Data mining is a necessary and predictable response to the dawn of the information age. It is typically defined as the pattern and/ or trend discovery phase in the data mining pipeline, and Python is a popular tool for performing these tasks as it offers a wide variety of tools for data mining.
This book will serve as a quick introduction to the concept of data mining and putting it to practical use with the help of popular Python packages and libraries. You will get a hands-on demonstration of working with different real-world datasets and extracting useful insights from them using popular Python libraries such as NumPy, pandas, scikit-learn, and matplotlib. You will then learn the different stages of data mining such as data loading, cleaning, analysis, and visualization. You will also get a full conceptual description of popular data transformation, clustering, and classification techniques.
By the end of this book, you will be able to build an efficient data mining pipeline using Python without any hassle.
What you will learn
- Explore the methods for summarizing datasets and visualizing/plotting data
- Collect and format data for analytical work
- Assign data points into groups and visualize clustering patterns
- Learn how to predict continuous and categorical outputs for data
- Clean, filter noise from, and reduce the dimensions of data
- Serialize a data processing model using scikit-learn's pipeline feature
- Deploy the data processing model using Python's pickle module
Who this book is for
Python developers interested in getting started with data mining will love this book. Budding data scientists and data analysts looking to quickly get to grips with practical data mining with Python will also find this book to be useful. Knowledge of Python programming is all you need to get started.
Trusted by 375,005 students
Access to over 1 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Prediction with Regression and Classification
- Mathematical machinery, including loss functions and gradient descent
- Linear regression and penalties
- Logistic regression
- Tree-based classification, including random forests
- Support vector machines
- Tuning methodologies including cross-validation and hyperparameter selection
Scikit-learn Estimator API
- Import the module
- Instantiate the estimator object (regression or classification model in the following diagram)
- Fit the model-to-map input training data (X_train in the following diagram) to the ground truth y_train labels
- Predict y_pred on the new test data (X_test in the following diagram)

Introducing prediction concepts
- (Assumption) There is a relationship between X and y, namely that X are independent variables and y is dependent on X
- (Assumption) Future data will have the same distribution as the training set
- What behavior is important to our problem statement
- A strategy for optimizing that behavior
Prediction nomenclature
Table of contents
- Title Page
- Copyright and Credits
- Dedication
- About Packt
- Contributors
- Preface
- Data Mining and Getting Started with Python Tools
- Basic Terminology and Our End-to-End Example
- Collecting, Exploring, and Visualizing Data
- Cleaning and Readying Data for Analysis
- Grouping and Clustering Data
- Prediction with Regression and Classification
- Advanced Topics - Building a Data Processing Pipeline and Deploying It
- Other Books You May Enjoy
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app