Data Analysis in R

This course is intended for applied data analysts, including academics (and postgraduate research students), policy specialists and others. It will examine questions dealt with in public policy, the social sciences (especially politics) and industry, using real data. This includes voter surveys; economics data; and imprisonment rates in different Australian states. The unit will help build participants’ ability to undertake rigorous statistical analysis, including means, confidence intervals and linear regression in R, and create publication-standard graphs of the results. The end result will be more professional and easy to understand research.

 
Level 2 - runs over 5 days
Instructor: 

Shaun Ratcliff is a lecturer in political science at the University of Sydney, working from the United States Studies Centre. His research interests are public opinion, political behaviour and parties in both the United States and Australia.

He has completed a PhD at Monash University, researching voter and candidate issue preferences and public policy outcomes. He teaches politics and quantitative research methods, and is a member of the executive committee of the Australian Society of Quantitative Political Science.

Shaun has a background working in politics and government relations, and has consulted for federal election campaigns.

Course dates: Monday 3 July 2017 - Friday 7 July 2017
Week: 
Week 2
About this course: 

R is open source and free. It is flexible, powerful and intuitive and it is excellent for data visualisation. As it is open source, R has thousands of developers in leading universities, corporate research labs and other institutions across the world. This means its capabilities tend to exceed competing software, with new packages added or updated daily. This is particularly the case for data visualisation, in which R tends to lead the pack. As there is no licence, you can take it with you wherever you go. No matter where you work, you don't have to change software packages when you change employers. Consequently, R has becoming increasingly popular for academic research, economics analysis and public policy development. This trend is only likely to continue. Becoming skilled in R will help build your personal capabilities and employment opportunities by making you a more flexible worker capable of undertaking analysis many other researchers and analysts cannot.

 

No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should be computer literate and use data in their occupations (or study, if they are a student) and understand some of the basics pf statistics, including what is the mean, the median and the standard deviation. Some basic knowledge with regression is helpful, as is the ability to do simple coding.

 

This is a course for subject matter experts who want to use more quantitative analysis in their work. By the end of the week you will be able to better conduct basic descriptive analysis and regression in R, and will be able to create impressive looking graphs.

 

If you are unsure whether this is for you, please contact Shaun for more information. He can talk you through the course and the kinds of things you will cover.

Course syllabus: 

Day 1

Getting started – loading and cleaning your data and making professional graphs.

R is excellent for conducting simple yet effective analyses of data. In particular, it is useful in graphing descriptive data such as trends in unemployment and public opinion. We will look at plotting data to provide you with methods you can use in your work or your research.
 

The course starts with instructions on how to access and re-code data in R, and then calculate descriptive statistics. We will then cover graphing means and variance so you can better understand the structure of your data. The first day of the unit will also include how to plot the trends of multiple indicators (for instance, unemployment), and public opinion data in a way that looks professional and sophisticated, with just a few a few lines of code.

 

 

Day 2

Understanding your data

We then build on the work of the first day by running more complex descriptive analyses on public opinion on immigration and other policy issues. This includes examining the probabilities of voters holding certain preferences on policy issues, but also trends in these attitudes. We also learn how to break public opinion down by different population groups, such as younger and older voters.

 

For each of these steps we look at graphing these results, including overlaying trend lines and confidence intervals over the original data.

 

Day 3

Getting started with linear regression

Sometimes you need to do more than look at the descriptive data. For instance, there may be confounding factors, such as the effects of economic, political and demographic influences that impact on policy outcomes. Or there may be certain demographic characteristics of voters related to their preferences for certain policies. We can control for these and learn far more from our data using simple linear regression.

 

We follow up our descriptive examinations from the first two days by learning how to fit linear regression models to these data, and plotting the regression line over the original data to examine model fit. This allows us to examine how different variables influence outcomes we might be interested in.

 

Day 4

More on regression

On day four we will look at fitting more complex linear regression models, including interactions, and using a variety of datasets. We will also look at graphing the regression line of our interactions, and the model coefficients, to make our results clearly understandable. We will also look at plotting the residuals from the regressions to check our model fits.
 

Many social science issues are not linear, however, but instead involve probabilities or non-linear outcomes. On the afternoon of the fourth day of this course, we will look at some alternative ways to examine your data.
 

First, we will predict the probabilities of vote choices in Australian federal elections using logistic regressions fit to survey data; and plotting the predicted probabilities from these models onto the original data. This will help us establish what kinds of voters support different parties, and why

 

Day 5

Bringing it all together
On the final day we will explore some slightly more complex regression models and look at graphing estimates and predictions from the model outputs.
 

There will also be the option for you to provide your own data and we can look at the best ways to analyse it using R, and plotting results.
One-on-one consultations will be undertaken in the afternoon to go over specific parts of the course participants want further advice on.

 

Course format: 

This will be held in a classroom. Course participants will require a laptop for this course with R installed. ACSPRI staff and the course instructor will be able to help with this in the weeks leading up to the course.

 

Data and course notes will be provided (although there are options to use your own data on day 5).

Recommended Background: 

No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should be computer literate and use data in their occupations (or study, if they are a student) and understand some of the basics pf statistics, including what is the mean, the median and the standard deviation. Some basic knowledge with regression is helpful, as is the ability to do simple coding.

Course Fees:
All courses at a given program have the same fee structure, but fees vary depending on whether your organisation is an ACSPRI Member and whether there are Early Bird Discounts available at the time. The prices for this program are available on the program page.
Participant feedback: 

I already had some knowledge of the topic, but there was plenty of content that was new and useful. (Winter 2016)

 

I can see its capability for more sophisticated data analyses. (Winter 2016)

 

Program: 
Winter Program 2017
Notes: 

Instructor's bound course notes will be provided.