Beyond Spreadsheets with R
eBook - ePub

Beyond Spreadsheets with R

Jonathan Carroll

Compartir libro
  1. English
  2. ePUB (apto para móviles)
  3. Disponible en iOS y Android
eBook - ePub

Beyond Spreadsheets with R

Jonathan Carroll

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You'll build on simple programming techniques like loops and conditionals to create your own custom functions. You'll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Beyond Spreadsheets with R un PDF/ePUB en línea?
Sí, puedes acceder a Beyond Spreadsheets with R de Jonathan Carroll en formato PDF o ePUB, así como a otros libros populares de Informatique y Traitement des données. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781617294594

1
Introducing data and the R language

This chapter covers
  • Why data analysis is important
  • How to make your analysis robust
  • How and why R works with data
  • RStudio: Your interface to R
You have your data, and you want to start doing something awesome with it, right? Brilliant! I promise you, we’ll get to that as soon as we can. But first, let’s take a step back. Telling you to dive right in now would be like handing you a pile of different timbers, pointing you toward the workshop, and telling you to make some furniture. It’s a good idea to first understand both the materials and the tools you’re about to use.
We’ll go through what data means in general — to you and to those who may potentially inherit your data — because if you don’t fully comprehend what you already have, then building on that won’t be useful (and at worst will be flat out wrong). Poorly preparing data merely delays dealing with it properly and grows your technical debt (making things easier now, but later making it necessary to pay back that time when you have difficulties working with poorly formed data).
We’ll discuss how to set yourself up for a rigorous analysis (one that can be repeated) and then begin working with one of the best data analysis tools available: the R programming language. For now, let’s go through what it means to “have some data.”

1.1 Data: What, where, how?

I said you have some data that you want to do something with, which wasn’t a very precise statement. That was intentional. I guarantee you have some data even if you don’t realize it. You may be thinking that data is exclusively whatever is stored in your Excel file, but data is much more than that. We all have data, because it’s everywhere. Before you go analyzing your own data, it’s important to recognize its structure (both as you understand it, and as R will) so that you begin with a solid foundation of what it means to have some data.

1.1.1 What is data?

Data exists in many forms, not just as numbers and letters in a spreadsheet. It may also be stored in a different file type, such as comma-separated values (CSV), as words in a book, or as values in a table on a web page.
Note It’s common to store comma-separated values in a .csv file. This format is particularly useful because it’s plain text — values separated by commas. We’ll return to why that’s useful in section 1.1.6.
Data may not be stored at all — streaming data comes as a flow of information, such as the signal your TV picks up and processes, your Twitter feed, or the output from a measuring device. We can store this data if we want to, but often we want to understand the flow as it’s happening.
Data isn’t always pretty (in fact, most times it’s dirty, mundane, and seemingly uninteresting), and it isn’t always in the format we want. Having some tools on hand to manage data is a powerful advantage and is critical to achieving a reliable goal, but that’s only useful if you know what your data represents before you do anything further with it. “Garbage in, garbage out” warns that you can’t perform an analysis on terrible data and expect to get a meaningful result. You may very well have tried to evaluate a calculation in Excel only to have the result show up as #VALUE! because you tried to divide a number by some text, even though that “text” looked like numbers. The types of your values (text, numbers, images, and so on) are themselves pieces of data with possible meanings behind them, and you’ll learn how to best make use of them.
So what is “good data”? What do the values you have represent?

1.1.2 Seeing the world as data sources

We experience the world through our senses — touching, seeing, hearing, tasting, smelling, and generally absorbing life around us. Each of those input channels handles available data, and our brains process them, mixing the signals together to form our picture of the world in a brilliantly complex way that we constantly take for granted.
Every time you use any of your senses, you’re taking a measurement of the world. How bright is the sun today? Is a car approaching? Is something burning? Is there enough coffee left in the pot for another cup? We construct measuring tools to make life easier for us and handle some of the data consistently — thermometers to measure temperatures, scales to measure weights, rulers to measure lengths.
We go a step further and create more tools to summarize that data — car instrument panels to simplify the internal measurements of the engine; weather stations to summarize temperature, wind, and pressure. With the digital age, we now have an overload of data sources at our disposal. The internet provides data on virtually any and all aspects of the world we might be interested in, and we create more tools to manage these — weather, finance, social media, the number of astronauts currently in space (www.howmanypeopleareinspacerightnow.com), lists of episodes of The Simpsons, all available at our disposal. The world is truly made up of data.
That’s not to say the data is in any way finite. We constantly add to the available sources of data, and by asking new questions we can identify new data we want to obtain. Data itself also generates more data. Metadata is the additional data that describes some other data — the number of subjects in a trial, the units of a measurement, the time at which a sample was taken, the website from which the data was collected. All these are data too and need to be stored, maintained, and updated as they change.
You interact with data in various ways all the time. One of the greatest achievements of the World Wide Web has been to gather, collate, and summarize our data for us in more easily digestible forms. Think about how you would have requested a taxi 20 years ago, before the rise of smartphones and the app ecosystem. You’d look up the phone number of a taxi company, phone them, tell the dispatcher where you were or would be, where you wanted to go, and what time you wanted to be picked up. The dispatcher would send out ...

Índice