Spark in Action
eBook - ePub

Spark in Action

Covers Apache Spark 3 with Examples in Java, Python, and Scala

  1. 576 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Spark in Action

Covers Apache Spark 3 with Examples in Java, Python, and Scala

About this book

Summary
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you'll learn to take advantage of Spark's core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark's powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. Foreword by Rob Thomas. About the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the book
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you'll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you'll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's inside Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL About the reader
This book does not assume previous experience with Spark, Scala, or Hadoop. About the author
Jean-Georges Perrin is an experienced data and software architect. He is France's first IBM Champion and has been honored for 12 consecutive years. Table of Contents PART 1 - THE THEORY CRIPPLED BY AWESOME EXAMPLES 1 So, what is Spark, anyway? 2 Architecture and flow 3 The majestic role of the dataframe 4 Fundamentally lazy 5 Building a simple app for deployment 6 Deploying your simple app PART 2 - INGESTION 7 Ingestion from files 8 Ingestion from databases 9 Advanced ingestion: finding data sources and building your own 10 Ingestion through structured streaming PART 3 - TRANSFORMING YOUR DATA 11 Working with SQL 12 Transforming your data 13 Transforming entire documents 14 Extending transformations with user-defined functions 15 Aggregating your data PART 4 - GOING FURTHER 16 Cache and checkpoint: Enhancing Spark's performances 17 Exporting data and building full data pipelines 18 Exploring deployment

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Spark in Action by Jean-Georges Perrin in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in Java. We have over one million books available in our catalogue for you to explore.

Information

Table of contents

  1. Copyright
  2. brief contents
  3. contents
  4. front matter
  5. Part 1. The theory crippled by awesome examples
  6. 1. So, what is Spark, anyway?
  7. 2. Architecture and flow
  8. 3. The majestic role of the dataframe
  9. 4. Fundamentally lazy
  10. 5. Building a simple app for deployment
  11. 6. Deploying your simple app
  12. Part 2. Ingestion
  13. 7. Ingestion from files
  14. 8. Ingestion from databases
  15. 9 Advanced ingestion: finding data sources and building your own
  16. 10. Ingestion through structured streaming
  17. Part 3. Transforming your data
  18. 11. Working with SQL
  19. 12 Transforming your data
  20. 13. Transforming entire documents
  21. 14. Extending transformations with user-defined functions
  22. 15. Aggregating your data
  23. Part 4. Going further
  24. 16. Cache and checkpoint: Enhancing Spark’s performances
  25. 17. Exporting data and building full data pipelines
  26. 18. Exploring deployment constraints: Understanding the ecosystem
  27. Appendixes.
  28. Appendix A. Installing Eclipse
  29. Appendix B. Installing Maven
  30. Appendix C. Installing Git
  31. Appendix D. Downloading the code and getting started with Eclipse
  32. Appendix E. A history of enterprise data
  33. Appendix F. Getting help with relational databases
  34. Appendix G. Static functions ease your transformations
  35. Appendix H. Maven quick cheat sheet
  36. Appendix I. Reference for transformations and actions
  37. Appendix J. Enough Scala
  38. Appendix K. Installing Spark in production and a few tips
  39. Appendix L. Reference for ingestion
  40. Appendix M. Reference for joins
  41. Appendix N. Installing Elasticsearch and sample data
  42. Appendix O. Generating streaming data
  43. Appendix P. Reference for streaming
  44. Appendix Q. Reference for exporting data
  45. Appendix R. Finding help when you’re stuck
  46. index