Getting Started with Beautiful Soup
eBook - ePub

Getting Started with Beautiful Soup

Vineeth G. Nair

Partager le livre
  1. 130 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Getting Started with Beautiful Soup

Vineeth G. Nair

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

In Detail

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need without writing excess code for an application. It doesn't take much code to write an application using Beautiful Soup.

Getting Started with Beautiful Soup is a practical guide to Beautiful Soup using Python. The book starts by walking you through the installation of each and every feature of Beautiful Soup using simple examples which include sample Python codes as well as diagrams and screenshots wherever required for better understanding. The book discusses the problems of how exactly you can get data out of a website and provides an easy solution with the help of a real website and sample code.

Getting Started with Beautiful Soup goes over the different methods to install Beautiful Soup in both Linux and Windows systems. You will then learn about searching, navigating, content modification, encoding support, and output formatting with the help of examples and sample Python codes for each example so that you can try them out to get a better understanding. This book is a practical guide for scraping information from any website. If you want to learn how to efficiently scrape pages from websites, then this book is for you.

Approach

This book is a practical, hands-on guide that takes you through the techniques of web scraping using Beautiful Soup.

Who this book is for

Getting Started with Beautiful Soup is great for anybody who is interested in website scraping and extracting information. However, a basic knowledge of Python, HTML tags, and CSS is required for better understanding.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Getting Started with Beautiful Soup est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Getting Started with Beautiful Soup par Vineeth G. Nair en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Programming in HTML. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2014
ISBN
9781783289554
Édition
1

Getting Started with Beautiful Soup


Table of Contents

Getting Started with Beautiful Soup
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Installing Beautiful Soup
Installing Beautiful Soup
Installing Beautiful Soup in Linux
Installing Beautiful Soup using package manager
Installing Beautiful Soup using pip or easy_install
Installing Beautiful Soup using pip
Installing Beautiful Soup using easy_install
Installing Beautiful Soup in Windows
Verifying Python path in Windows
Installing Beautiful Soup using setup.py
Using Beautiful Soup without installation
Verifying the installation
Quick reference
Summary
2. Creating a BeautifulSoup Object
Creating a BeautifulSoup object
Creating a BeautifulSoup object from a string
Creating a BeautifulSoup object from a file-like object
Creating a BeautifulSoup object for XML parsing
Understanding the features argument
Tag
Accessing the Tag object from BeautifulSoup
Name of the Tag object
Attributes of a Tag object
The NavigableString object
Quick reference
Summary
3. Search Using Beautiful Soup
Searching in Beautiful Soup
Searching with find()
Finding the first producer
Explaining find()
Searching for tags
Searching for text
Searching based on regular expressions
Searching based on attribute values of a tag
Finding the first primary consumer
Searching based on custom attributes
Searching based on the CSS class
Searching using functions defined
Applying searching methods in combination
Searching with find_all()
Finding all tertiary consumers
Understanding parameters used with find_all()
Searching for Tags in relation
Searching for the parent tags
Searching for siblings
Searching for next
Searching for previous
Using search methods to scrape information from a web page
Quick reference
Summary
4. Navigation Using Beautiful Soup
Navigation using Beautiful Soup
Navigating down
Using the name of the child tag
Using predefined attributes
The .contents attribute
The .children attribute
The .descendants attribute
Special attributes for navigating down
The .string attribute
The .strings attribute
Navigating up
The .parent attribute
The .parents attribute
Navigating sideways to the siblings
The .next_sibling attribute
The .previous_sibling attribute
Navigating to the previous and next objects parsed
Quick reference
Summary
5. Modifying Content Using Beautiful Soup
Modifying Tag using Beautiful Soup
Modifying the name property of Tag
Modifying the attribute values of Tag
Updating the existing attribute value of Tag
Adding new attribute values to Tag
Deleting the tag attributes
Adding a new tag
Adding a new producer using new_tag() and append()
Creating a new tag using new_tag()
Adding a new tag using append()
Adding a new div tag to the li tag using insert()
Modifying string contents
Using .string to modify the string content
Adding strings using .append(), insert(), and new_string()
Deleting tags from the HTML document
Deleting the producer using decompose()
Deleting the producer using extract()
Deleting the contents of a tag using Beautiful Soup
Special functions to modify content
Quick reference
Summary
6. Encoding Support in Beautiful Soup
Encoding in Beautiful Soup
Understanding the original encoding of the HTML document
Specifying the encoding of the HTML document
Output encoding
Quick reference
Summary
7. Output in Beautiful Soup
Formatted printing
Unformatted printing
Output formatters in Beautiful Soup
The minimal formatter
The html formatter
The None formatter
The function formatter
Using get_text()
Quick reference
Summary
8. Creating a Web Scraper
Getting book details from PacktPub.com
Finding pages with a list of books
Finding book details
Getting selling prices from Amazon
Getting the selling price from Barnes and Noble
Summary
Index

Getting Started with Beautiful Soup

Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2014
Production Reference: 1170114
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-955-4
www.packtpub.com
Cover Image by Mohamed Raoof ()

Credits

Author
Vineeth G. Nair
Reviewers ...

Table des matiĂšres