eBook - ePub

Linear Models with Python

Name: Linear Models with Python
Author: Julian J. Faraway

Julian J. Faraway

Compartir libro

298 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Linear Models with Python

Julian J. Faraway

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Praise for Linear Models with R:

This book is a must-have tool for anyone interested in understanding and applying linear models. The logical ordering of the chapters is well thought out and portrays Faraway's wealth of experience in teaching and using linear models. … It lays down the material in a logical and intricate manner and makes linear modeling appealing to researchers from virtually all fields of study. -Biometrical Journal

Throughout, it gives plenty of insight … with comments that even the seasoned practitioner will appreciate. Interspersed with R code and the output that it produces one can find many little gems of what I think is sound statistical advice, well epitomized with the examples chosen…I read it with delight and think that the same will be true with anyone who is engaged in the use or teaching of linear models. -Journal of the Royal Statistical Society

Like its widely praised, best-selling companion version, Linear Models with R, this book replaces R with Python to seamlessly give a coherent exposition of the practice of linear modeling. Linear Models with Python offers up-to-date insight on essential data analysis topics, from estimation, inference and prediction to missing data, factorial models and block designs. Numerous examples illustrate how to apply the different methods using Python.

Features:

Python is a powerful, open source programming language increasingly being used in data science, machine learning and computer science. Python and R are similar, but R was designed for statistics, while Python is multi-talented.
This version replaces R with Python to make it accessible to a greater number of users outside of statistics, including those from Machine Learning.
A reader coming to this book from an ML background will learn new statistical perspectives on learning from data.
Topics include Model Selection, Shrinkage, Experiments with Blocks and Missing Data.
Includes an Appendix on Python for beginners.

Linear Models with Python explains how to use linear models in physical science, engineering, social science and business applications. It is ideal as a textbook for linear models or linear regression courses.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Linear Models with Python un PDF/ePUB en línea?

Sí, puedes acceder a Linear Models with Python de Julian J. Faraway en formato PDF o ePUB, así como a otros libros populares de Economia y Statistiche per il settore aziendale ed economico. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Chapman and Hall/CRC

Año

2021

ISBN

9781351053396

Edición

Categoría

Economia

Categoría

Statistiche per il settore aziendale ed economico

Chapter 1

Introduction

1.1 Before You Start

Statistics starts with a problem, proceeds with the collection of data, continues with the data analysis and finishes with conclusions. It is a common mistake of inexperienced statisticians to plunge into a complex analysis without paying attention to the objectives or even whether the data are appropriate for the proposed analysis. As Einstein said, the formulation of a problem is often more essential than its solution which may be merely a matter of mathematical or experimental skill.

To formulate the problem correctly, you must:

Understand the physical background. Statisticians often work in collaboration with others and need to understand something about the subject area. Regard this as an opportunity to learn something new rather than a chore.
Understand the objective. Again, often you will be working with a collaborator who may not be clear about what the objectives are. Beware of “fishing expeditions” — if you look hard enough, you will almost always find something, but that something may just be a coincidence.
Make sure you know what the client wants. You can often do quite different analyses on the same dataset. Sometimes statisticians perform an analysis far more complicated than the client really needed. You may find that simple descriptive statistics are all that are needed.
Put the problem into statistical terms. This is a challenging step and where irreparable errors are sometimes made. Once the problem is translated into the language of statistics, the solution is often routine. This is where human intelligence is decidedly superior to artificial intelligence. Defining the problem is hard to program. That a statistical method can read in and process the data is not enough. The results of an inapt analysis may be meaningless.

It is important to understand how the data were collected.

Are the data observational or experimental? Are the data a sample of convenience or were they obtained via a designed sample survey? How the data were collected has a crucial impact on what conclusions can be made.
Is there nonresponse? The data you do not see may be just as important as the data you do see.
Are there missing values? This is a common problem that is troublesome and time consuming to handle.
How are the data coded? In particular, how are the categorical variables represented?
What are the units of measurement?
Beware of data entry errors and other corruption of the data. This problem is all too common — almost a certainty in any real dataset of at least moderate size. Perform some data sanity checks.

1.2 Initial Data Analysis

This is a critical step that should always be performed. It is simple but it is vital. You should make numerical summaries such as means, standard deviations (SDs), maximum and minimum, correlations and whatever else is appropriate to the specific dataset. Equally important are graphical summaries. There is a wide variety of techniques to choose from. For one variable at a time, you can make boxplots, histograms, density plots and more. For two variables, scatterplots are standard while for even more variables, there are numerous good ideas for display including interactive and dynamic graphics. In the plots, look for outliers, data-entry errors, skewed or unusual distributions and structure. Check whether the data are distributed according to prior expectations.

Getting data into a form suitable for analysis by cleaning out mistakes and aberrations is often time consuming. It often takes more time than the data analysis itself. One might consider this the core work of data science. In this book, all the data will be ready to analyze, but you should realize that in practice this is rarely the case.

Let’s look at an example. The National Institute of Diabetes and Digestive and Kidney Diseases conducted a study on 768 adult female Pima Indians living near Phoenix. The following variables were recorded: number of times pregnant, plasma glucose concentration at 2 hours in an oral glucose tolerance test, diastolic blood pressure (mmHg), triceps skin fold thickness (mm), 2-hour serum insulin (mu U/ml), body mass index (weight in kg/(height in m²)), diabetes pedigree function, age (years) and a test whether the patient showed signs of diabetes (coded zero if negative, one if positive). The data may be obtained from UCI Repository of machine learning databases at archive.ics.uci.edu/ml.

Base Python has only limited functionality for numerical work. You will surely need to import some packages before you can accomplish anything. It is common to load all the packages you will need in a session at the beginning. We start with: import pandas as pd import numpy as np import matplotlib.pyplot as plt import scipy as sp import seaborn as sns import statsmodels.formula.api as smf

You can wait until you need them but it can be helpful when you share or return to your work later to have them all listed at the beginning so all will know which packages you need. The as pd means we can refer to functions in the pandas with the abbreviation pd.

Before doing anything else, one should find out the purpose of the study and more about how the data were collected. However, let’s skip ahead to a look at the data: import faraway.datasets.pima pima = faraway.datasets.pima.load() pima.head() pregnant glucose diastolic triceps insulin bmi diabetes age test 0 6 148 72 35 0 33.6 0.627 50 1 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1

Many of the datasets used in this book are supplied in the faraway package. See the appendix for how to install this package. Any time you want to use one of these datasets, you will need to import the package containing the data you require and then load it.

The command pima.head() prints out the first five lines of the data frame. This is a good way to see what variables we have and what sort of values they take. You can type pima to see the whole data frame but...