Summary
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.
Foreword by Rob Thomas.
About the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.
About the book
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
What's inside
Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL
About the reader
This book does not assume previous experience with Spark, Scala, or Hadoop.
About the author
Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years.
Table of Contents
PART 1 - THE THEORY CRIPPLED BY AWESOME EXAMPLES
1 So, what is Spark, anyway?
2 Architecture and flow
3 The majestic role of the dataframe
4 Fundamentally lazy
5 Building a simple app for deployment
6 Deploying your simple app
PART 2 - INGESTION
7 Ingestion from files
8 Ingestion from databases
9 Advanced ingestion: finding data sources and building
your own
10 Ingestion through structured streaming
PART 3 - TRANSFORMING YOUR DATA
11 Working with SQL
12 Transforming your data
13 Transforming entire documents
14 Extending transformations with user-defined functions
15 Aggregating your data
PART 4 - GOING FURTHER
16 Cache and checkpoint: Enhancing Spark’s performances
17 Exporting data and building full data pipelines
18 Exploring deployment

eBook - ePub
Spark in Action
Covers Apache Spark 3 with Examples in Java, Python, and Scala
- 576 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
About this book
Trusted by 375,005 students
Access to over 1.5 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Table of contents
- Copyright
- brief contents
- contents
- front matter
- Part 1. The theory crippled by awesome examples
- 1. So, what is Spark, anyway?
- 2. Architecture and flow
- 3. The majestic role of the dataframe
- 4. Fundamentally lazy
- 5. Building a simple app for deployment
- 6. Deploying your simple app
- Part 2. Ingestion
- 7. Ingestion from files
- 8. Ingestion from databases
- 9 Advanced ingestion: finding data sources and building your own
- 10. Ingestion through structured streaming
- Part 3. Transforming your data
- 11. Working with SQL
- 12 Transforming your data
- 13. Transforming entire documents
- 14. Extending transformations with user-defined functions
- 15. Aggregating your data
- Part 4. Going further
- 16. Cache and checkpoint: Enhancing Spark’s performances
- 17. Exporting data and building full data pipelines
- 18. Exploring deployment constraints: Understanding the ecosystem
- Appendixes.
- Appendix A. Installing Eclipse
- Appendix B. Installing Maven
- Appendix C. Installing Git
- Appendix D. Downloading the code and getting started with Eclipse
- Appendix E. A history of enterprise data
- Appendix F. Getting help with relational databases
- Appendix G. Static functions ease your transformations
- Appendix H. Maven quick cheat sheet
- Appendix I. Reference for transformations and actions
- Appendix J. Enough Scala
- Appendix K. Installing Spark in production and a few tips
- Appendix L. Reference for ingestion
- Appendix M. Reference for joins
- Appendix N. Installing Elasticsearch and sample data
- Appendix O. Generating streaming data
- Appendix P. Reference for streaming
- Appendix Q. Reference for exporting data
- Appendix R. Finding help when you’re stuck
- index
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Spark in Action by Jean-Georges Perrin in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in Java. We have over 1.5 million books available in our catalogue for you to explore.