Data Science and Analytics with Python
eBook - ePub

Data Science and Analytics with Python

Jesus Rogel-Salazar

Compartir libro
  1. 400 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Data Science and Analytics with Python

Jesus Rogel-Salazar

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Data Science and Analytics with Python is designed for practitioners in data science and data analytics in both academic and business environments. The aim is to present the reader with the main concepts used in data science using tools developed in Python, such as SciKit-learn, Pandas, Numpy, and others. The use of Python is of particular interest, given its recent popularity in the data science community. The book can be used by seasoned programmers and newcomers alike.

The book is organized in a way that individual chapters are sufficiently independent from each other so that the reader is comfortable using the contents as a reference. The book discusses what data science and analytics are, from the point of view of the process and results obtained. Important features of Python are also covered, including a Python primer. The basic elements of machine learning, pattern recognition, and artificial intelligence that underpin the algorithms and implementations used in the rest of the book also appear in the first part of the book.

Regression analysis using Python, clustering techniques, and classification algorithms are covered in the second part of the book. Hierarchical clustering, decision trees, and ensemble techniques are also explored, along with dimensionality reduction techniques and recommendation systems. The support vector machine algorithm and the Kernel trick are discussed in the last part of the book.

About the Author

Dr. Jesús Rogel-Salazar

is a Lead Data scientist with experience in the field working for companies such as AKQA, IBM Data Science Studio, Dow Jones and others. He is a visiting researcher at the Department of Physics at Imperial College London, UK and a member of the School of Physics, Astronomy and Mathematics at the University of Hertfordshire, UK, He obtained his doctorate in physics at Imperial College London for work on quantum atom optics and ultra-cold matter. He has held a position as senior lecturer in mathematics as well as a consultant in the financial industry since 2006. He is the author of the book Essential Matlab and Octave, also published by CRC Press. His interests include mathematical modelling, data science, and optimization in a wide range of applications including optics, quantum mechanics, data journalism, and finance.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Data Science and Analytics with Python un PDF/ePUB en línea?
Sí, puedes acceder a Data Science and Analytics with Python de Jesus Rogel-Salazar en formato PDF o ePUB, así como a otros libros populares de Informatica y Programmazione di giochi. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781351647717
Edición
1
Categoría
Informatica
1
Trials and Tribulations of a Data Scientist
THE EVER INCREASING AVAILABILITY OF data requires the use of tools that enable businesses and researchers to draw conclusions and make decisions based on the evidence provided by the data itself. From performing a regression analysis to determining the relationship between data features, or improving on recommendation systems used in e-commerce, data science and analytics are used every day by all of us. This book is intended to provide those interested in data science and analytics a perspective into the subject matter using Python, a popular programming language available for various platforms and widely used both in business and academia.
Data science and analytics is used every day by all of us.
Python will be used throughout the book, get well acquainted with it!
In this chapter we will cover what data science is and how it is related to various disciplines from mathematics to business intelligence and from programming to design. We will discuss the characteristics that make a good data scientist and the composition of a data science team. We will also provide an overview of the typical workflow in a data science and analytics project and shall see the trials and tribulations in the work cycle of a data scientist.
1.1 Data? Science? Data Science!
THE USE OF DATA AS evidence in support for decision making is nothing new. You only have to take a look at the original meaning of the word statistics as the analysis and interpretation of information relating to states such as economic and demographic data. Nowadays, the word statistics is either understood as a branch of mathematics that deals with the collection, analysis, interpretation and presentation of data; or more colloquially as a fact or figure obtained from a study based on large quantities of data. Simply take a look at the news on any given day and you will surely get to hear about statistics, proportions and percentages, all in support (or not) of a new initiative, plan or recommendation. The power of data is all around us and we use it all the time.
Statistics was originally understood as the analysis and interpretation of information about states.
Now, what about the word science? Well, you may remember from your school days that science is a system that enables the organisation of knowledge, based on testable evidence and predictions. Notice that key word evidence mentioned there again.
Science is organised knowledge.
No surprises here so far, right? From a very simplified point of view, the scientific method makes use of data and their analysis to acquire, correct and integrate knowledge. Nonetheless, data science is not just simply the direct use of statistics, or the systematisation of data. How shall we understand that much loved combination of the words data and science?
However, Data Science ≠ Data + Science
1.1.1 So, What Is Data Science?
DATA SCIENCE AND ANALYTICS ARE rapidly gaining prominence as some of the more sought after disciplines in academic and professional circles. In a nutshell, data science can be understood as the extraction of knowledge and insight from various sources of data, and the skills required to achieve this range from programming to design, and from mathematics to storytelling.
Data science skills range from programming to design, and from mathematics to storytelling.
There is no doubt that the term data science is a true neologism of our time. The term has started being used and, to a certain extent, even abused. As we have mentioned before data science is rather more than the sum of data on the one hand and science on the other one, although it is inevitably related to both concepts.
In the case of defining data science, the whole is indeed greater than the parts.
Currently, data science can be considered a budding field with applications in a wide range of areas and industries, as well as in academic research. It is fair to say that it is elusive to define this emerging field, and throughout this book we shall consider data science and analytics as a portmanteau for a number of overlapping tasks related to data - from collection, provision and preparation, analysis and visualisation, curation and storage - that exploit tools from empirical sciences, mathematics, business intelligence, machine learning and artificial intelligence. The aim of these tasks is to enable effective, pragmatic and most importantly actionable decisions.
In this book we will use a practical definition for data science as a combination of overlapping tasks related to data with the aim to derive actionable decisions.
The motivation for data science and analytics in deriving valuable insights from data is great, and widely welcomed by businesses. However, this is a very challenging task. Companies such as Google, Netflix and Amazon have demonstrated that careful storage and analysis of data delivers a very competitive edge. These days there are easier and cheaper ways to collect large amounts of data than ever before, and mobile is becoming a ubiquitous presence. This has allowed companies, particularly start-ups, to develop in-house data science capabilities.
Careful storage and analysis of data delivers a very competitive edge.
Typical examples of data science products are better explained by the questions they aim to answer; these questions are the drivers to the acquisition and selection of the appropriate data to be interrogated in order to provide insight into an area of interest. I am sure you can come up with a few of examples relevant to you, but here are some that come to mind:
Some examples of data science outputs make it easier to clarify what the discipline does. This list is by no means exhaustive.
• What product will sell better in conjunction with another popular product?
Market basket analysis
• Who will be declared Prime Minister (or President, or winner; depending on the flavour of the government system of interest) in the next general election?
Predictive analytics
• How can customers be encouraged to spend a longer time in an online portal?
E-commerce
• Are there any discernible patterns that allow us to characterise different groups of sales agents, customers or businesses?
Clustering analysis and market segmentation
• What advertisement should be placed on what site?
Advertising and marketing
• Given the interests of a customer, what other products can be recommended to them?
Recommendation systems
• What are the latest developments and breakout reports in newspapers and social media that may affect the industry of interest?
Social media analysis
• Given someone’s interests and hobbies, who may be suitable potential partners?
Online services
• How can we keep potentially sensitive information protected and react proactively to information we store?
Cybersecurity
• How can we distinguish valid, relevant documents such as emails (ham), from invalid, irrelevant ones (spam)?
Classification analysis
• How to determine if a retail transaction is valid or not?
Fraud prevention
• What is the demand for a particular service at a particular time or place?
Demand forecasting
These are not questions that decision-makers, businesses and industries, large and small, have recently started formulating. So, why the resurgence in seeking answers to them? The main answer is the availability of potentially useful data, big or small, together with the impact of technology, computer science and statistics in everyday life. Out of the ingredients mentioned above, accessible datasets may be the most important one since without them the insight provided by technology alone is rather limited. After all, the plural of anecdote is not data. Having said that, it is important to note that this does not mean that every single data science case to be tackled falls into the category of so-called big data, particularly when we take into account that the adjective big can be used in a relative manner. We shall expand on this point later on in Section 1.3.1.
The availability of large volumes of data has enabled data science to flourish.
The plural of anecdote is not data.
One important thing to bear in mind about the outputs of data science and analytics is that in the vast majority of cases they do not uncover hidden patterns or relationships as if by magic, and in the case of predictive analytics they do not tell us exactly what will happen in the future. Instead, they enable us to forecast what may come. In other words, once we have carried out some modelling there is still a lot of work to do to make sense out of the results obtained, taking into account the constraints and assumptions in the model, as well as considering what an acceptable level of reliability is in each scenario.
Note that this book is about data science, not necessarily about big data.
Predictive analytics do not tell us the future; instead they allow us to forecast.
Similarly, there is the tacit prerequisite of having accurate, timely data that can be readily utilised to make sense out of the modelling results, and reflect the state-of-the-art in an application. It is therefore imperative that decision makers as well as IT and business stakeholders take time to understand the information that will be needed, as well as being prepared to realise that certain data may not be fit for their purpose. It is indeed disheartening to come to terms with the fact that some data may not have the necessary features to be used in building a prediction, for example. Nonetheless, it is better to realise that is the case at an early stage, rather than relying on unsuitable results to make important decisions that impact the business.
For data to be useful it should be available and it has to be timely.
Realising that data may not be fit for answering the questions at hand is a difficult but important thing to bear in mind.
Even if data science may not yet be considered a well-defined subject, the number of academic and training programmes being offered by universities and at various workplaces has seen a healthy increase. This is a natural result of the need that exists for well-informed, capable experts that we get to call data scientists. So what do data scientists do and what do they look like? It will all shall be uncovered.
The need for capable data scientists in industry has seen a healthy increase in recent times.
1.2 The Data Scientist: A Modern Jackalope
THE NEW TERM USED TO describe the person that deals with the seemingly disparate array of tasks described above may seem to be yet another, more fashionable way to describe a statistician or a business analyst. However, we can certainly agree that there is a gap between the latter two, and that the skills required by a dat...

Índice