R for Political Data Science
eBook - ePub

R for Political Data Science

A Practical Guide

  1. 436 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

R for Political Data Science

A Practical Guide

About this book

R for Political Data Science: A Practical Guide is a handbook for political scientists new to R who want to learn the most useful and common ways to interpret and analyze political data. It was written by political scientists, thinking about the many real-world problems faced in their work. The book has 16 chapters and is organized in three sections. The first, on the use of R, is for those users who are learning R or are migrating from another software. The second section, on econometric models, covers OLS, binary and survival models, panel data, and causal inference. The third section is a data science toolbox of some the most useful tools in the discipline: data imputation, fuzzy merge of large datasets, web mining, quantitative text analysis, network analysis, mapping, spatial cluster analysis, and principal component analysis.

Key features:



  • Each chapter has the most up-to-date and simple option available for each task, assuming minimal prerequisites and no previous experience in R


  • Makes extensive use of the Tidyverse, the group of packages that has revolutionized the use of R


  • Provides a step-by-step guide that you can replicate using your own data


  • Includes exercises in every chapter for course use or self-study


  • Focuses on practical-based approaches to statistical inference rather than mathematical formulae


  • Supplemented by an R package, including all data

As the title suggests, this book is highly applied in nature, and is designed as a toolbox for the reader. It can be used in methods and data science courses, at both the undergraduate and graduate levels. It will be equally useful for a university student pursuing a PhD, political consultants, or a public official, all of whom need to transform their datasets into substantive and easily interpretable conclusions.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access R for Political Data Science by Francisco Urdinez,Andres Cruz in PDF and/or ePUB format, as well as other popular books in Politics & International Relations & Probability & Statistics. We have over one million books available in our catalogue for you to explore.
Part II
Models
5
Linear Models
Inés Fynn1 and Lihuen Nocetto2
Suggested readings
Angrist, J. D. and Pischke, J. S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press, Princeton, NJ.
Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press, Cambridge.
Lewis-Beck, C. and Lewis-Beck, M. (2016). Applied Regression: An Introduction. SAGE, Thousand Oaks, CA.
Wooldridge, J. M. (2016). Introductory Econometrics: A Modern Approach. Cengage Learning, Boston, MA, 6th edition.
Packages you need to install
tidyverse (Wickham, 2019), politicalds (Urdinez and Cruz, 2020), skimr (Waring et al., 2020), car (Fox et al., 2020), ggcorrplot (Kassambara, 2019), texreg (Leifeld, 2020), prediction (Leeper, 2019), lmtest (Hothorn et al., 2019), sandwich (Zeileis and Lumley, 2019), miceadds (Robitzsch et al., 2020).
5.0.1Introduction
In this chapter, we will learn how to do linear regressions. Here the function is linear, that is, it is estimated by two parameters: the slope and the intercept. When we face a multivariate analysis, the estimation gets more complex. We will cover how to interpret the different coefficients, how to create regression tables, how to visualize predicted values, and we will go further into evaluating the Ordinary Least Squares (OLS) assumptions, so that you can evaluate how well your models fit.
5.1OLS in R
In this chapter, the dataset we will work is a merge of two datasets constructed by Evelyne Huber and John D. Stephens3. These datasets are:
Latin America Welfare Dataset, 1960-2014 (Evelyne Huber and John D. Stephens, Latin American Welfare Dataset, 1960-2014, University of North Carolina at Chapel Hill, 2014.): it contains variables on Welfare States in all Latin American and Caribbean countries between 1960 and 2014.
Latin America and Caribbean Political Data Set, 1945-2012 (Evelyne Huber and John D. Stephens, Latin America and Caribbean Political Dataset, 1945-2012, University of North Carolina at Chapel Hill, 2012): it contains political variables for all Latin American and Caribbean countries between 1945 and 2012.
The resulting dataset contains 1074 observations for 25 countries between 1970 and 2012 (data from the 1960s was excluded since it contained many missing values).
First, we load the tidyverse package.
 library(tidyverse) 
We will import the dataset from the book’s package:
 library(politicalds) data("welfare") 
Now, the dataset has been loaded into our R session
 ls() ## [1] "welfare" 
In the chapter, we will use the paper of Huber et al. (2006) as the example for analysis. In this article, they estimate the determinants of inequality in Latin America and Caribbean. Working from this article allows us to estimate a model with multiple control variables that have already been identified as relevant for explaining the variation of inequality in the region. Thus, the dependent variable we are interested in explaining is income inequality in Latin American and Caribbean countries, operationalized according to the Gini Index (gini). The control variables that we will incorporate into the model are the following:
Sectorial dualism (it refers to the coexistence of a traditional low-productivity sector and a modern high-productivity sector) - sector_dualism
GDP - gdp
Foreign Direct Investment (net income as % of the GDP) - foreign_inv
Ethnic diversity (dummy variable coded as 1 when at least the 20% but no further than the 80% of the population is ethnically diverse) - ethnic_diversity
Democracy (type of regime) - regime_type
Education expenditure (as percentage of the GDP) - education_budget
Health expenditure (as percentage of the GDP) - health_budget
Social security expenditure (as percentage of the GDP) - socialsec_budget
Legislative balance - legislative_bal
During this chapter, we will try to estimate what is the effect of education expenditure in the levels of inequality in Latin American and Caribbean countries. Thus, our independent variable of interest will be education_budget.
5.1.1Descriptive Statistics
Before estimating a linear model with Ordinary Least Squares (OLS) it is recommended you first identify the distribution of the variables you are interested in: the dependent variable y (also called response variable) and the independent variable of interest x (also called explanatory variable or regressor). In general, our models will have, besides the independent variable of interest, other independent (or explanatory) variables that we will call “controls”, since t...

Table of contents

  1. Cover
  2. Half Title
  3. Series Page
  4. Title Page
  5. Copyright Page
  6. Contents
  7. Preface
  8. Contributors
  9. I. Introduction to R
  10. II. Models
  11. III. Applications
  12. IV. Bibliography and Index
  13. Bibliography
  14. Index