Data Analysis in R

This course is intended for applied data analysts, including academics and postgraduate students, policy specialists and others. It will examine questions dealt with in public policy, the social sciences and industry, using real data. This includes surveys, and economics and public health data. The unit will help build participants’ ability to undertake rigorous statistical analysis, including means, confidence intervals and linear regression in R, and create publication-standard graphs of the results. The end result will be more professional and easy to understand research. It provides the foundational skills needed for the ACSPRI course Using R for Advanced Statistical Analysis.

 
Level 2 - runs over 5 days
Instructor: 

Dr Shaun Ratcliff is a quantitative political scientist working at the United States Studies Centre at the University of Sydney. His research focuses on using traditional and novel data sources and methods to study public opinion and party behaviour in the US, Australia and comparative democracies. He is particularly interested in examining the policy preferences and behaviour of political actors, and the role of parties as interest aggregators, and how these influence public policy outcomes.

He teaches voter behaviour and public opinion, and the use of quantitative research methods to solve problems in the social sciences.

Prior to working at the University of Sydney, Shaun taught politics, political psychology and methodology in the social sciences at Monash University and the University of Melbourne.

He is an advocate for the use of quantitative research methods to better understand politics and society, and is a member of the executive committee of the Australian Society for Quantitative Political Science.

Shaun received a PhD in political science from Monash University, and he has a background working in politics and government relations, and has consulted for political campaigns.

About this course: 

We are in the middle of a data revolution. A new laptop computer can run processes impossible for a supercomputer a few generations ago. The internet makes data collection and distribution easier and cheaper than ever, with terabytes of information on consumer behaviour, public transport use, crime statistics and election results sourced from across the world now available almost anywhere in minutes or seconds.

 

These advances in modern computing allow us begin to answer important questions about the world, including what drives regional health issues, why certain choices were made by voters during elections, and whether individuals convicted of serious crimes are likely to reoffend.

 

Taught by a quantitative political scientists from the University of Sydney, this is a problem-based course for subject matter experts who want to use R to take their quantitative analysis to the next level. By the end of the week you will be able to better conduct descriptive analysis and regression in R, and will be able to create impressive looking data visualisations.

 

R is open source and free. It is flexible, powerful and intuitive and it is excellent for data visualisation. As it is open source, R has thousands of developers in leading universities, corporate research labs and other institutions across the world. This means its capabilities tend to exceed competing software, with new packages added or updated daily. This is particularly the case for data visualisation, in which R tends to lead the pack. As there is no licence, you can take it with you wherever you go. No matter where you work, you don't have to change software packages when you change employers. Consequently, R has becoming increasingly popular for academic research, economics analysis and public policy development. This trend is only likely to continue.

 

Being skilled in R will help build your personal capabilities and employment opportunities by making you a more flexible worker capable of undertaking analysis many other researchers and analysts cannot.

No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should be computer literate and use data in their occupations (or study, if they are a student) and understand some of the basics of statistics. Some basic knowledge with regression is helpful, as is the ability to do simple coding and programming.

 

If you are unsure whether this is for you, please contact Shaun for more information. He can talk you through the course and the kinds of things you will cover.

Course syllabus: 

Day 1

Operating in the R environment

The first day of the course will explore how to operate in the R environment. We will load and re-code data in R, and calculate descriptive statistics. We will visualise data so you can better understand its structure. The first day of the unit will also include how to plot the trends of multiple indicators (for instance, the stock market), and public opinion data in a way that looks professional and sophisticated, with just a few a few lines of code.

 

Day 2

Understanding your data

On the second day of the course, we will look at engaging in more complex descriptive analyses using survey data to understand smoking behaviour amongst Australian adolescents, and earnings in the United States. We also look at creating more complex visualisations of our results, plotting confidence intervals, and manipulating our data.

 

We will finish the day by working in small groups to examine what individual characteristics explain quality life.

 

Day 3

Spatial Data

Much of human behaviour can be understood (at least in part) as a function of geography. This includes election outcomes (often decided based on discrete geographic contests), crime and public health. In this lecture, we will discuss the importance of geographic data for understanding social phenomena. For instance, showing different ways of visualising data (graphs vs maps) that indicates that sometimes presenting and studying the geographic patterns can increase our understanding of different phenomena and behaviours.

 

On the third day of this course we will walk you through using spatial data to understand important social phenomena, including why some parts of the United States suffer from higher mortality rates from drugs and alcohol than others. We will then use these data to make interactive maps of displaying variations in the poverty rate, prescriptions of opiates and drug and alcohol mortality rate across US counties.

 

Then, you will undertake your own study in small groups to understand predictors of geographical variation in the 2016 US presidential election results.

 

Day 4

More on regression

On day four of this course we will begin by covering the effects and significance of confounding factors when studying human behaviour, the importance of randomisation and how we can use linear regression to answer some of the questions in which we are interested. We will look at why controlling for potentially confounding variables such as education, income, gender and birthplace are important when answering social science questions. This will be followed by how can confounding factors also provide us with a greater substantive understanding of the research questions we are trying to answer.

 

We will then revise the concepts of linear regression. We will then walk through fitting linear regressions in R. At the end of the day you will work in small groups to select a problem from a few options and then work on this problem together by quickly undertaking a descriptive analysis of the data, and then using regression to develop an answer. You will finish by visualising and presenting the results.

 

Day 5

Bringing it all together

To help you master the material covered this week, we will finish the course by revising the methods learned so far, and looking at fitting more complex linear regression models. These allow us to create a better approximation of reality in our models. We will also look at graphing the regression line of our interactions, and the model coefficients, to make our results clearly understandable.

 

Course format: 

This course may run in a computer lab, or you may be advised to bring your own laptop with R installed. ACSPRI staff and the course instructor will be able to help with this in the weeks leading up to the course.

We will let you know in advance.

Data and course notes will be provided.

Recommended Background: 

This is a course for subject matter experts who want to use more quantitative analysis in their work. By the end of the week you will be able to better conduct basic descriptive analysis and regression in R, and will be able to create impressive looking visualisations.

Participant feedback: 

[Shaun] made what could have been very dull, dry subject matter very accessible (Summer 2018)

 

It was a great introduction to R. I now feel confident starting to use R at work. The course was excellent at differentiating for the different levels. I felt challenged but not out of my depth. (Winter 2017)

 

Comprehensive coverage, good application potential (graphing, transforming data, programming etc) (Winter 2017)

Program where course next likely to be offered: 
Summer Program 2019
Notes: 

Instructor's bound course notes will be provided.