Data Science with Julia
eBook - ePub

Data Science with Julia

Paul D. McNicholas, Peter Tait

Partager le livre
  1. 220 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Data Science with Julia

Paul D. McNicholas, Peter Tait

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

"This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist."- Professor Charles Bouveyron, INRIA Chair in Data Science, Université CÎte d'Azur, Nice, France

Julia, an open-source programming language, was created to be as easy to use as languages such as R and Python while also as fast as C and Fortran. An accessible, intuitive, and highly efficient base language with speed that exceeds R and Python, makes Julia a formidable language for data science. Using well known data science methods that will motivate the reader, Data Science with Julia will get readers up to speed on key features of the Julia language and illustrate its facilities for data science and machine learning work.

Features:



  • Covers the core components of Julia as well as packages relevant to the input, manipulation and representation of data.


  • Discusses several important topics in data science including supervised and unsupervised learning.


  • Reviews data visualization using the Gadfly package, which was designed to emulate the very popular ggplot2 package in R. Readers will learn how to make many common plots and how to visualize model results.


  • Presents how to optimize Julia code for performance.


  • Will be an ideal source for people who already know R and want to learn how to use Julia (though no previous knowledge of R or any other programming language is required).

The advantages of Julia for data science cannot be understated. Besides speed and ease of use, there are already over 1, 900 packages available and Julia can interface (either directly or through packages) with libraries written in R, Python, Matlab, C, C++ or Fortran. The book is for senior undergraduates, beginning graduate students, or practicing data scientists who want to learn how to use Julia for data science.

"This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist."

Professor Charles Bouveyron
INRIA Chair in Data Science
Université CÎte d'Azur, Nice, France

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Data Science with Julia est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Data Science with Julia par Paul D. McNicholas, Peter Tait en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Economics et Statistics for Business & Economics. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2019
ISBN
9781351013659
CHAPTER 1
Introduction
DATA SCIENCE is discussed and some important connections, and contrasts, are drawn between statistics and data science. A brief discussion of big data is provided, the Julia language is briefly introduced, and all Julia packages used in this monograph are listed together with their respective version numbers. The same is done for the, albeit smaller number of, R packages used herein. Providing such details about the packages used helps ensure that the analyses illustrated herein can be reproduced. The datasets used in this monograph are also listed, along with some descriptive characteristics and their respective sources. Finally, the contents of this monograph are outlined.
1.1 DATA SCIENCE
What is data science? It is an interesting question and one without a widely accepted answer. Herein, we take a broad view that data science encompasses all work related to data. While this includes data analysis, it also takes in a host of other topics such as data cleaning, data curation, data ethics, research data management, etc. This monograph discusses some of those aspects of data science that are commonly handled in Julia, and similar software; hence, its title.
The place of statistics within the pantheon of data science is a topic on which much has been written. While statistics is certainly a very important part of data science, statistics should not be taken as synonymous with data science. Much has been written about the relationship between data science and statistics. On the one extreme, some might view data science — and data analysis, in particular — as a retrogression of statistics; yet, on the other extreme, some may argue that data science is a manifestation of what statistics was always meant to be. In reality, it is probably an error to try to compare statistics and data science as if they were alternatives. Herein, we consider that statistics plays a crucial role in data analysis, or data analytics, which in turn is a crucial part of the data science mosaic.
Contrasting data analysis and mathematical statistics, Hayashi (1998) writes:

 mathematical statistics have been prone to be removed from reality. On the other hand, the method of data analysis has developed in the fields disregarded by mathematical statistics and has given useful results to solve complicated problems based on mathematico-statistical methods (which are not always based on statistical inference but rather are descriptive).
The views expressed by Hayashi (1998) are not altogether different from more recent observations that, insofar as analysis is concerned, data science tends to focus on prediction, while statistics has focused on modelling and inference. That is not to say that prediction is not a part of inference but rather that prediction is a part, and not the goal, of inference. We shall return to this theme, i.e., inference versus prediction, several times within this monograph.
Breiman (2001b) writes incisively about two cultures in statistical modelling, and this work is wonderfully summarized in the first few lines of its abstract:
There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems.
The viewpoint articulated here leans towards a view of data analysis as, at least partly, arising out of one culture in statistical modelling.
In a very interesting contribution, Cleveland (2001) outlines a blueprint for a university department, with knock-on implications for curricula. Interestingly, he casts data science as an “altered field” — based on statistics being the base, i.e., unaltered, field. One fundamental alteration concerns the role of computing:
One outcome of the plan is that computer science joins mathematics as an area of competency for the field of data science. This enlarges the intellectual foundations. It implies partnerships with computer scientists just as there are now partnerships with mathematicians.
Writing now, as we are 17 years later, it is certainly true that computing has become far more important to the field of statistics and is central to data science. Cleveland (2001) also presents two contrasting views of data science:
A very limited view of data science is that it is practiced by statisticians. The wide view is that data science is practiced by statisticians and subject matter analysts alike, blurring exactly who is and who is not a statistician.
Certainly, the wider view is much closer to what has been observed in the intervening years. However, there are those who can claim to be data scientists but may consider themselves neither statisticians nor subject matter experts, e.g., computer scientists or librarians and other data curators. It is noteworthy that there is a growing body of work on how to introduce data science into curricula in statistics and other disciplines (see, e.g., Hardin et al., 2015).
One fascinating feature of data science is the extent to which work in the area has penetrated into the popular conscience, and media, in a way that statistics has not. For example, Press (2013) gives a brief history of data science, running from Tukey (1962) to Davenport and Patil (2012) — the title of the latter declares data scientist the “sexiest job of the 21st century”! At the start of this timeline is the prescient paper by Tukey (1962) who, amongst many other points, outlines how his view of his own work moved away from that of a statistician:
For a long time I have thought I was a statistician, interested in inferences from the particular to the general. 
 All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.
The wide range of views on data science, data...

Table des matiĂšres