
eBook - ePub
Python Data Cleaning and Preparation Best Practices
A practical guide to organizing and handling data from various sources and formats using Python
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Python Data Cleaning and Preparation Best Practices
A practical guide to organizing and handling data from various sources and formats using Python
About this book
Take your data preparation skills to the next level by converting any type of data asset into a structured, formatted, and readily usable dataset
Key Features
- Maximize the value of your data through effective data cleaning methods
- Enhance your data skills using strategies for handling structured and unstructured data
- Elevate the quality of your data products by testing and validating your data pipelines
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description
Professionals face several challenges in effectively leveraging data in today's data-driven world. One of the main challenges is the low quality of data products, often caused by inaccurate, incomplete, or inconsistent data. Another significant challenge is the lack of skills among data professionals to analyze unstructured data, leading to valuable insights being missed that are difficult or impossible to obtain from structured data alone. To help you tackle these challenges, this book will take you on a journey through the upstream data pipeline, which includes the ingestion of data from various sources, the validation and profiling of data for high-quality end tables, and writing data to different sinks. You'll focus on structured data by performing essential tasks, such as cleaning and encoding datasets and handling missing values and outliers, before learning how to manipulate unstructured data with simple techniques. You'll also be introduced to a variety of natural language processing techniques, from tokenization to vector models, as well as techniques to structure images, videos, and audio. By the end of this book, you'll be proficient in data cleaning and preparation techniques for both structured and unstructured data.What you will learn
- Ingest data from different sources and write it to the required sinks
- Profile and validate data pipelines for better quality control
- Get up to speed with grouping, merging, and joining structured data
- Handle missing values and outliers in structured datasets
- Implement techniques to manipulate and transform time series data
- Apply structure to text, image, voice, and other unstructured data
Who this book is for
Whether you're a data analyst, data engineer, data scientist, or a data professional responsible for data preparation and cleaning, this book is for you. Working knowledge of Python programming is needed to get the most out of this book.
]]>Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead
Information
Table of contents
- Python Data Cleaning and Preparation Best Practices
- Contributors
- About the reviewers
- Preface
- Part 1: Upstream Data Ingestion and Cleaning
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- Part 2: Downstream Data Cleaning – Consuming Structured Data
- 8
- 9
- 10
- 11
- Part 3: Downstream Data Cleaning – Consuming Unstructured Data
- 12
- 13
- Index
- Other Books You May Enjoy
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Python Data Cleaning and Preparation Best Practices by Maria Zervou in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.