Beyond Spreadsheets with R
eBook - ePub

Beyond Spreadsheets with R

Jonathan Carroll

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Beyond Spreadsheets with R

Jonathan Carroll

Book details
Book preview
Table of contents
Citations

About This Book

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You'll build on simple programming techniques like loops and conditionals to create your own custom functions. You'll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Beyond Spreadsheets with R an online PDF/ePUB?
Yes, you can access Beyond Spreadsheets with R by Jonathan Carroll in PDF and/or ePUB format, as well as other popular books in Informatik & Datenverarbeitung. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781617294594

1
Introducing data and the R language

This chapter covers
  • Why data analysis is important
  • How to make your analysis robust
  • How and why R works with data
  • RStudio: Your interface to R
You have your data, and you want to start doing something awesome with it, right? Brilliant! I promise you, we’ll get to that as soon as we can. But first, let’s take a step back. Telling you to dive right in now would be like handing you a pile of different timbers, pointing you toward the workshop, and telling you to make some furniture. It’s a good idea to first understand both the materials and the tools you’re about to use.
We’ll go through what data means in general — to you and to those who may potentially inherit your data — because if you don’t fully comprehend what you already have, then building on that won’t be useful (and at worst will be flat out wrong). Poorly preparing data merely delays dealing with it properly and grows your technical debt (making things easier now, but later making it necessary to pay back that time when you have difficulties working with poorly formed data).
We’ll discuss how to set yourself up for a rigorous analysis (one that can be repeated) and then begin working with one of the best data analysis tools available: the R programming language. For now, let’s go through what it means to “have some data.”

1.1 Data: What, where, how?

I said you have some data that you want to do something with, which wasn’t a very precise statement. That was intentional. I guarantee you have some data even if you don’t realize it. You may be thinking that data is exclusively whatever is stored in your Excel file, but data is much more than that. We all have data, because it’s everywhere. Before you go analyzing your own data, it’s important to recognize its structure (both as you understand it, and as R will) so that you begin with a solid foundation of what it means to have some data.

1.1.1 What is data?

Data exists in many forms, not just as numbers and letters in a spreadsheet. It may also be stored in a different file type, such as comma-separated values (CSV), as words in a book, or as values in a table on a web page.
Note It’s common to store comma-separated values in a .csv file. This format is particularly useful because it’s plain text — values separated by commas. We’ll return to why that’s useful in section 1.1.6.
Data may not be stored at all — streaming data comes as a flow of information, such as the signal your TV picks up and processes, your Twitter feed, or the output from a measuring device. We can store this data if we want to, but often we want to understand the flow as it’s happening.
Data isn’t always pretty (in fact, most times it’s dirty, mundane, and seemingly uninteresting), and it isn’t always in the format we want. Having some tools on hand to manage data is a powerful advantage and is critical to achieving a reliable goal, but that’s only useful if you know what your data represents before you do anything further with it. “Garbage in, garbage out” warns that you can’t perform an analysis on terrible data and expect to get a meaningful result. You may very well have tried to evaluate a calculation in Excel only to have the result show up as #VALUE! because you tried to divide a number by some text, even though that “text” looked like numbers. The types of your values (text, numbers, images, and so on) are themselves pieces of data with possible meanings behind them, and you’ll learn how to best make use of them.
So what is “good data”? What do the values you have represent?

1.1.2 Seeing the world as data sources

We experience the world through our senses — touching, seeing, hearing, tasting, smelling, and generally absorbing life around us. Each of those input channels handles available data, and our brains process them, mixing the signals together to form our picture of the world in a brilliantly complex way that we constantly take for granted.
Every time you use any of your senses, you’re taking a measurement of the world. How bright is the sun today? Is a car approaching? Is something burning? Is there enough coffee left in the pot for another cup? We construct measuring tools to make life easier for us and handle some of the data consistently — thermometers to measure temperatures, scales to measure weights, rulers to measure lengths.
We go a step further and create more tools to summarize that data — car instrument panels to simplify the internal measurements of the engine; weather stations to summarize temperature, wind, and pressure. With the digital age, we now have an overload of data sources at our disposal. The internet provides data on virtually any and all aspects of the world we might be interested in, and we create more tools to manage these — weather, finance, social media, the number of astronauts currently in space (www.howmanypeopleareinspacerightnow.com), lists of episodes of The Simpsons, all available at our disposal. The world is truly made up of data.
That’s not to say the data is in any way finite. We constantly add to the available sources of data, and by asking new questions we can identify new data we want to obtain. Data itself also generates more data. Metadata is the additional data that describes some other data — the number of subjects in a trial, the units of a measurement, the time at which a sample was taken, the website from which the data was collected. All these are data too and need to be stored, maintained, and updated as they change.
You interact with data in various ways all the time. One of the greatest achievements of the World Wide Web has been to gather, collate, and summarize our data for us in more easily digestible forms. Think about how you would have requested a taxi 20 years ago, before the rise of smartphones and the app ecosystem. You’d look up the phone number of a taxi company, phone them, tell the dispatcher where you were or would be, where you wanted to go, and what time you wanted to be picked up. The dispatcher would send out ...

Table of contents