
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Google BigQuery Analytics
About this book
How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets
Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results.
- Features a companion website that includes all code and data sets from the book
- Uses real-world examples to explain everything analysts need to know to effectively use BigQuery
- Includes web application examples coded in Python
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
Part I
BigQuery Fundamentals
In This Part
- Chapter 1: The Story of Big Data at Google
- Chapter 2: BigQuery Fundamentals
- Chapter 3: Getting Started with BigQuery
- Chapter 4: Understanding the BigQuery Object Model
Chapter 1
The Story of Big Data at Google
- Scaled-up machines are expensive. If you need one that has twice the processing power, it might cost you five times as much.
- Scaled-up machines are single points of failure. You might need to get more than one expensive server in case of a catastrophic problem, and each one usually ends up being built with so many backup and redundant pieces that you're paying for a lot more hardware than you actually need.
- Scale up has limits. At some point, you lose the ability to add more processors or RAM; you've bought the most expensive and fastest machine that is made (or that you can afford), and it still might not be fast enough.
- Scale up doesn't protect you against software failures. If you have a Big Iron server that has a kernel bug, that machine will crash just as easily (and as hard) as your Windows laptop.
Big Data Stack 1.0
- Anything can fail, at any time, so write your software expecting unreliable hardware. At most companies, when a database server crashes, it is a serious event. If a network switch dies, it will probably cause downtime. By running in an environment in which individual components fail often, you paradoxically end up with a much more stable system because your software is designed to handle those failures. You can quantify your risk beyond blindly quoting statistics, such as mean time between failures (MTBFs) or service-level agreements (SLAs).
- Use only commodity, off-the-shelf components. This has a number of advantages: You don't get locked into a particular vendor's feature set; you can always find replacements; and you don't experience big price discontinuities when you upgrade to the ābiggerā version.
- The cost for twice the amount of capacity should not be considerably more than the cost for twice the amount of hardware. This means the software must be built to scale out, rather than up. However, this also imposes limits on the types of operations that you can do. For instance, if you scale out your database, it may be difficult to do a
JOINoperation, since you'd need to join data together that lives on different machines. - āA foolish consistency is the hobgoblin of little minds.ā If you abandon the āCā (consistency) in ACID database operations, it becomes much easier to parallelize operations. This has a cost, however; loss of consistency means that programmers have to handle cases in which reading data they just wrote might return a stale (inconsistent) copy. This means you need smart programmers.
- Google File System (GFS): A distributed, cluster-based filesystem. GFS assumes that any disk can fail, so data is stored in multiple locations, which means that data is still available even when a disk that it was stored on crashes.
- MapReduce: A computing paradigm that divides problems into easily parallelizable pieces and orchestrates running them across a cluster of machines.
- Bigtable: A forerunner of the NoSQL database, Bigtable enables structured storage to scale out to multiple servers. Bigtable is also replicated, so failure of any particular tablet server doesn't cause data loss.
Big Data Stack 2.0 (and Beyond)
- MapReduce is hard. It can be difficult to set up and difficult to decompose your problem into Map and Reduce phases. If you need multiple MapReduce rounds (which is common for many real-world problems), you face the issue of how to deal with state in between phases and how to deal with partial failures without having to restart the whole thing.
- MapReduce can be slow. If you want to ask questions of your data, you have to wait minutes or hours to get the answers. Moreover, you have to write custom C++ or Java code each time you want to change the question that you're asking.
- GFS, while improving durability of the data (since it is replicated multiple times) can suffer from reduced availability, since the metadata server is...
Table of contents
- Cover
- Part I: BigQuery Fundamentals
- Part II: Basic BigQuery
- Part III: Advanced BigQuery
- Part IV: BigQuery Applications
- Introduction
- End User License Agreement