Instant Pentaho Data Integration Kitchen
Instant Pentaho Data Integration Kitchen
Copyright Ā© 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2013
Production Reference: 1240713
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84969-690-6
www.packtpub.com
Author
Sergio Ramazzina
Reviewer
Joel Latino
Acquisition Editor
Erol Staveley
Commissioning Editor
Shreerang Deshpande
Technical Editor
Sampreshita Maheshwari
Copy Editor
Insiya Morbiwala
Project Coordinator
Suraj Bist
Proofreader
Paul Hindle
Production Coordinator
Zahid Shaikh
Cover Work
Prachali Bhiwandkar
Cover Image
Aditi Gajjar
Sergio Ramazzina is a software architect/trainer with over 20 years of experience working on a large number of projects for banks and major Italian companies as well as designing complex enterprise solutions in Java/JavaEE and Ruby. He started using Pentaho products from the very beginning (late 2003), gaining vast experience by deploying Pentaho as an open source, standalone BI solution. He also deeply integrated Pentaho as the analytics engine of choice in other applications he designed. Starting from 2009, based on his experience in the Java/JavaEE world and because of his appreciation for the open source world and its principles, he began participating actively as a contributor to some Pentaho projects, such as JPivot, Saiku, CDF, and CDA, and he has achieved the title of Pentaho Active Contributor.
In late 2010, he founded Serasoft, a young Italian consulting company specialized in the design and delivery of open source business intelligence solutions, and he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main heroes. He is also the CTO of Athilab (Athirat Innovation Lab), sharing his experience in the design and delivery of high-value innovative enterprise solutions. He is always looking for innovative solutions that can help users make their work more efficient. He is also passionate about skiing, tennis, and photography.
Joel Latino was born in Ponte de Lima, Portugal, in 1989. He has been working in the IT industry since 2010, mostly as a software developer and BI developer.
He started his career at Xpand-ITāa Portuguese company specialized in strategic planning, consulting, implementation, and the maintenance of enterprise software that is fully adapted to the customer's needsāand earned his graduate degree in Informatics Engineering at the School of Technology and Management of the Viana do Castelo Polytechnic Institute.
Joel mainly focuses on open source web technology, databases, and business intelligence, and has some fascination with mobile technologies. He is responsible for developing some plugins to Pentaho Data Integration, such as Android and Apple push notification steps.
Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at
www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
<[email protected]>
for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
- Fully searchable across every book published by Packt
- Copy and paste, print and bookmark content
- On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Pentaho Data Integration (PDI) is an ETL tool that was born 10 years ago. Its creator, Matt Caster, celebrated the 10th anniversary of this product, originally named Kettle (you can read the celebratory post on Matt's blog at: http://www.ibridge.be/?p=211), this year on March 8th 2013. The term K. E. T. T. L. E. is an acronym that stands for Kettle Extraction Transformation Transport Load Environment. When Pentaho acquired Kettle, its name was changed to Pentaho Data Integration, but actually, many developers continue to call it by the old name: Kettle.
The history of Kettle began in 2001 when Matt Caster, Pentaho Data Integration's chief architect and creator of Kettle, was working as a BI consultant. He had the idea of writing his own ETL tool to have a better and cheaper way to transfer data from one place to another. He was looking for a different solution, something that was better than inventing ugly data warehouse solutions written in PL/SQL, VB, or Shell scripts. He spent two years doing a thorough analysis of the problem. Because he was busy all the time with his work as a consultant, he worked on this project either during the weekends or at night. A...