Getting Started with Beautiful Soup
eBook - ePub

Getting Started with Beautiful Soup

Vineeth G. Nair

Condividi libro
  1. 130 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Getting Started with Beautiful Soup

Vineeth G. Nair

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

In Detail

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need without writing excess code for an application. It doesn't take much code to write an application using Beautiful Soup.

Getting Started with Beautiful Soup is a practical guide to Beautiful Soup using Python. The book starts by walking you through the installation of each and every feature of Beautiful Soup using simple examples which include sample Python codes as well as diagrams and screenshots wherever required for better understanding. The book discusses the problems of how exactly you can get data out of a website and provides an easy solution with the help of a real website and sample code.

Getting Started with Beautiful Soup goes over the different methods to install Beautiful Soup in both Linux and Windows systems. You will then learn about searching, navigating, content modification, encoding support, and output formatting with the help of examples and sample Python codes for each example so that you can try them out to get a better understanding. This book is a practical guide for scraping information from any website. If you want to learn how to efficiently scrape pages from websites, then this book is for you.

Approach

This book is a practical, hands-on guide that takes you through the techniques of web scraping using Beautiful Soup.

Who this book is for

Getting Started with Beautiful Soup is great for anybody who is interested in website scraping and extracting information. However, a basic knowledge of Python, HTML tags, and CSS is required for better understanding.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Getting Started with Beautiful Soup è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Getting Started with Beautiful Soup di Vineeth G. Nair in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatik e Programmierung in HTML. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2014
ISBN
9781783289554
Edizione
1
Argomento
Informatik

Getting Started with Beautiful Soup


Table of Contents

Getting Started with Beautiful Soup
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Installing Beautiful Soup
Installing Beautiful Soup
Installing Beautiful Soup in Linux
Installing Beautiful Soup using package manager
Installing Beautiful Soup using pip or easy_install
Installing Beautiful Soup using pip
Installing Beautiful Soup using easy_install
Installing Beautiful Soup in Windows
Verifying Python path in Windows
Installing Beautiful Soup using setup.py
Using Beautiful Soup without installation
Verifying the installation
Quick reference
Summary
2. Creating a BeautifulSoup Object
Creating a BeautifulSoup object
Creating a BeautifulSoup object from a string
Creating a BeautifulSoup object from a file-like object
Creating a BeautifulSoup object for XML parsing
Understanding the features argument
Tag
Accessing the Tag object from BeautifulSoup
Name of the Tag object
Attributes of a Tag object
The NavigableString object
Quick reference
Summary
3. Search Using Beautiful Soup
Searching in Beautiful Soup
Searching with find()
Finding the first producer
Explaining find()
Searching for tags
Searching for text
Searching based on regular expressions
Searching based on attribute values of a tag
Finding the first primary consumer
Searching based on custom attributes
Searching based on the CSS class
Searching using functions defined
Applying searching methods in combination
Searching with find_all()
Finding all tertiary consumers
Understanding parameters used with find_all()
Searching for Tags in relation
Searching for the parent tags
Searching for siblings
Searching for next
Searching for previous
Using search methods to scrape information from a web page
Quick reference
Summary
4. Navigation Using Beautiful Soup
Navigation using Beautiful Soup
Navigating down
Using the name of the child tag
Using predefined attributes
The .contents attribute
The .children attribute
The .descendants attribute
Special attributes for navigating down
The .string attribute
The .strings attribute
Navigating up
The .parent attribute
The .parents attribute
Navigating sideways to the siblings
The .next_sibling attribute
The .previous_sibling attribute
Navigating to the previous and next objects parsed
Quick reference
Summary
5. Modifying Content Using Beautiful Soup
Modifying Tag using Beautiful Soup
Modifying the name property of Tag
Modifying the attribute values of Tag
Updating the existing attribute value of Tag
Adding new attribute values to Tag
Deleting the tag attributes
Adding a new tag
Adding a new producer using new_tag() and append()
Creating a new tag using new_tag()
Adding a new tag using append()
Adding a new div tag to the li tag using insert()
Modifying string contents
Using .string to modify the string content
Adding strings using .append(), insert(), and new_string()
Deleting tags from the HTML document
Deleting the producer using decompose()
Deleting the producer using extract()
Deleting the contents of a tag using Beautiful Soup
Special functions to modify content
Quick reference
Summary
6. Encoding Support in Beautiful Soup
Encoding in Beautiful Soup
Understanding the original encoding of the HTML document
Specifying the encoding of the HTML document
Output encoding
Quick reference
Summary
7. Output in Beautiful Soup
Formatted printing
Unformatted printing
Output formatters in Beautiful Soup
The minimal formatter
The html formatter
The None formatter
The function formatter
Using get_text()
Quick reference
Summary
8. Creating a Web Scraper
Getting book details from PacktPub.com
Finding pages with a list of books
Finding book details
Getting selling prices from Amazon
Getting the selling price from Barnes and Noble
Summary
Index

Getting Started with Beautiful Soup

Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2014
Production Reference: 1170114
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-955-4
www.packtpub.com
Cover Image by Mohamed Raoof ()

Credits

Author
Vineeth G. Nair
Reviewers ...

Indice dei contenuti