eBook - ePub
Beyond Spreadsheets with R
Jonathan Carroll
This is a test
Share book
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Beyond Spreadsheets with R
Jonathan Carroll
Book details
Book preview
Table of contents
Citations
About This Book
Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You'll build on simple programming techniques like loops and conditionals to create your own custom functions. You'll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.
Frequently asked questions
How do I cancel my subscription?
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoâs features. The only differences are the price and subscription period: With the annual plan youâll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Beyond Spreadsheets with R an online PDF/ePUB?
Yes, you can access Beyond Spreadsheets with R by Jonathan Carroll in PDF and/or ePUB format, as well as other popular books in Informatik & Datenverarbeitung. We have over one million books available in our catalogue for you to explore.
Information
Topic
InformatikSubtopic
Datenverarbeitung 1
Introducing data and the R language
This chapter covers
- Why data analysis is important
- How to make your analysis robust
- How and why R works with data
- RStudio: Your interface to R
You have your data, and you want to start doing something awesome with it, right? Brilliant! I promise you, weâll get to that as soon as we can. But first, letâs take a step back. Telling you to dive right in now would be like handing you a pile of different timbers, pointing you toward the workshop, and telling you to make some furniture. Itâs a good idea to first understand both the materials and the tools youâre about to use.
Weâll go through what data means in general â to you and to those who may potentially inherit your data â because if you donât fully comprehend what you already have, then building on that wonât be useful (and at worst will be flat out wrong). Poorly preparing data merely delays dealing with it properly and grows your technical debt (making things easier now, but later making it necessary to pay back that time when you have difficulties working with poorly formed data).
Weâll discuss how to set yourself up for a rigorous analysis (one that can be repeated) and then begin working with one of the best data analysis tools available: the R programming language. For now, letâs go through what it means to âhave some data.â
1.1 Data: What, where, how?
I said you have some data that you want to do something with, which wasnât a very precise statement. That was intentional. I guarantee you have some data even if you donât realize it. You may be thinking that data is exclusively whatever is stored in your Excel file, but data is much more than that. We all have data, because itâs everywhere. Before you go analyzing your own data, itâs important to recognize its structure (both as you understand it, and as R will) so that you begin with a solid foundation of what it means to have some data.
1.1.1 What is data?
Data exists in many forms, not just as numbers and letters in a spreadsheet. It may also be stored in a different file type, such as comma-separated values (CSV), as words in a book, or as values in a table on a web page.
Note Itâs common to store comma-separated values in a .csv file. This format is particularly useful because itâs plain text â values separated by commas. Weâll return to why thatâs useful in section 1.1.6.
Data may not be stored at all â streaming data comes as a flow of information, such as the signal your TV picks up and processes, your Twitter feed, or the output from a measuring device. We can store this data if we want to, but often we want to understand the flow as itâs happening.
Data isnât always pretty (in fact, most times itâs dirty, mundane, and seemingly uninteresting), and it isnât always in the format we want. Having some tools on hand to manage data is a powerful advantage and is critical to achieving a reliable goal, but thatâs only useful if you know what your data represents before you do anything further with it. âGarbage in, garbage outâ warns that you canât perform an analysis on terrible data and expect to get a meaningful result. You may very well have tried to evaluate a calculation in Excel only to have the result show up as
#VALUE!
because you tried to divide a number by some text, even though that âtextâ looked like numbers. The types of your values (text, numbers, images, and so on) are themselves pieces of data with possible meanings behind them, and youâll learn how to best make use of them.So what is âgood dataâ? What do the values you have represent?
1.1.2 Seeing the world as data sources
We experience the world through our senses â touching, seeing, hearing, tasting, smelling, and generally absorbing life around us. Each of those input channels handles available data, and our brains process them, mixing the signals together to form our picture of the world in a brilliantly complex way that we constantly take for granted.
Every time you use any of your senses, youâre taking a measurement of the world. How bright is the sun today? Is a car approaching? Is something burning? Is there enough coffee left in the pot for another cup? We construct measuring tools to make life easier for us and handle some of the data consistently â thermometers to measure temperatures, scales to measure weights, rulers to measure lengths.
We go a step further and create more tools to summarize that data â car instrument panels to simplify the internal measurements of the engine; weather stations to summarize temperature, wind, and pressure. With the digital age, we now have an overload of data sources at our disposal. The internet provides data on virtually any and all aspects of the world we might be interested in, and we create more tools to manage these â weather, finance, social media, the number of astronauts currently in space (www.howmanypeopleareinspacerightnow.com), lists of episodes of The Simpsons, all available at our disposal. The world is truly made up of data.
Thatâs not to say the data is in any way finite. We constantly add to the available sources of data, and by asking new questions we can identify new data we want to obtain. Data itself also generates more data. Metadata is the additional data that describes some other data â the number of subjects in a trial, the units of a measurement, the time at which a sample was taken, the website from which the data was collected. All these are data too and need to be stored, maintained, and updated as they change.
You interact with data in various ways all the time. One of the greatest achievements of the World Wide Web has been to gather, collate, and summarize our data for us in more easily digestible forms. Think about how you would have requested a taxi 20 years ago, before the rise of smartphones and the app ecosystem. Youâd look up the phone number of a taxi company, phone them, tell the dispatcher where you were or would be, where you wanted to go, and what time you wanted to be picked up. The dispatcher would send out ...