Big Data Analysis for Social Scientists

Instructors:

A/Prof Robert Ackland (Australian National University)
Mr Timothy Graham (University of Queensland)

 

This course introduces you to the collection and analysis of socially-generated 'big data' using the R statistical software and Gephi network visualisation software. The focus is on programmatic approaches for collecting and analysing big data from social media and the WWW. The course will also provide an opportunity for you to learn how these data and techniques are already being used in social science research.

 
Level 3 - runs over 5 days
Instructor: 

Prof. Robert Ackland is based in the School of Sociology at the Australian National University (ANU). He was awarded his PhD in economics from the ANU in 2001, and he has been researching online social and organisational networks since 2002. He leads the Virtual Observatory for the Study of Online Networks Lab (http://vosonlab.net) which was established in 2005 and is advancing the social science of the Internet by conducting research, developing research tools, and providing research training. Robert has been teaching masters courses in online research methods and the social science of the internet since 2008 (undergraduate versions of the courses started in 2017). His book Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age (SAGE) was published in July 2013. He created the VOSON software for hyperlink network construction and analysis, which was publicly released in 2006. The VOSON R packages for collecting and analysing social media network and text data were released in 2015 (Bryan Gertzel is the lead developer), and to date the packages have been downloaded over 80K times with current downloads of 1K per month.

Course dates: Monday 4 July 2016 - Friday 8 July 2016
Course status: Course completed (no new applicants)
Week: 
Week 2
About this course: 

Big data involves data on:

(1) people (social web) e.g. online social networks (e.g. Facebook), microblogs (e.g. Twitter);

(2) information (WWW) e.g. web pages, clickstreams;

(3) things (sensor web) e.g. phones, temperature sensors, and

(4) places (geospatial web) e.g. geology, land use maps.

This course is focused on collecting and analysing data from the social web and the WWW.

 

In this course, you will learn how to:

  • Collect data from Twitter, Facebook, YouTube and Web 1.0 websites. Who are the actors, and what actor attributes are available for them?
  • Construct, analyse and visualise networks of people and organisations (social networks) and terms (semantic networks). How can we find connections between actors, and how can we use social network analysis to understand the social scientific meaning of such connections?
  • Extract and analyse text data. What text can be attributed to these actors, and what does analysis of this text tell us about the actors and society as whole?
  • Conduct temporal analysis. How can we study behaviour over time, identifying significant events or trends?
  • Identify and engage with advanced techniques for dealing with very large datasets, including software optimisation and sampling techniques.
  • Utilise social theory to engage with and reason about the challenges and opportunities of big data, including the interpretation of findings and methodological considerations.

The main software used in the course is R, but we also cover the use of Gephi for advanced visualisation. Data collection will mainly be via an R package for collecting and processing social media data (created at the VOSON Lab specifically for use in this course) and the VOSON software (for collecting WWW hyperlinks and text content). We also provide R scripts covering other important packages for data analysis such as: igraph (network analysis and visualisation), tm (text mining), RTextTools (supervised machine learning for text classification), wordcloud (text word clouds and term frequency visualisation), plyr and stringr (text sentiment analysis), and topicmodels (topic modelling of textual data).

 

The target audience for this course is people with a fairly strong technical background. The course will be particularly appealing to social scientists who want to become more computationally literate, and those from technical disciplines (e.g. computer science, engineering) who want to become more socially literate.

Course syllabus: 

The following is an indicative list of topics covered during the course. Prior to the course running we will ascertain student interest in particular topics and focus the course accordingly.

 

Day 1

  • R and RStudio refresher
  • Introduction to SocialMediaLab; Collecting YouTube video comment data with SocialMediaLab
  • Social network analysis in R – 1 (graph visualisation, core node- and network-level metrics)

 

Day 2

  • Collecting Facebook data with SocialMediaLab
  • Text analysis in R – 1 (building a corpus, descriptive analysis, wordclouds)
  • Social network analysis in R – 2 (clustering, bimodal networks)
  • Collecting WWW hyperlink and website text content with vosonR

 

Day 3

  • Collecting Twitter data with SocialMediaLab
  • Introduction to Gephi
  • Text analysis in R – 2 (supervised machine learning [e.g. support vector machines], unsupervised machine learning [topic modelling], sentiment analysis, gender analysis)

 

Day 4

  • Dynamic network analysis in R (analysing networks over time, identifying significant changes in behaviour of individual nodes, clusters or entire networks)
  • Dynamic network analysis in Gephi
  • Rmarkdown and other useful tools for ‘working smarter’ in R

 

Day 5

  • Optimising R to handle big datasets
  • Publishing interactive networks on the web (Shiny)
  • Advanced topics
Course format: 

You will be advised in advance whether this course will be run in a computer lab or whether you will have to bring your own laptop.

 

If the course is run with participants using their own laptops, you will need to have installed:

     (1) R statistical software plus a set of packages which will be specified before the course starts,

     (2) Gephi. Both R and Gephi run with Mac/Windows/Linux.

Recommended Background: 

You are advised to have taken at least one of the following ACSPRI courses or have had some equivalent exposure to social network analysis and quantitative text analysis:

  • Social Media Analysis
  • Introduction to Social Network Research and Network Analysis
  • Advanced Network Analysis for Social Research

 

You must also have some experience with the R programming language. You don't need to be an R expert but must have some familiarity with how to program in R (or other similar languages) for example, via the following ACSPRI courses:

  • Learning R: Open Source (Free) Stats Package
  • Using R for Practical Research and Data Visualisation
  • Data Analysis, Graphics and Visualisation Using R
Recommended Texts: 
Course fees
Member: 
$1,950
Non Member: 
$3,700
Full time student Member: 
$1,930
FAQ: 

Q: Do I have to have to done an ACSPRI R Course before attempting this course?

A: Not necessarily, as long as you have some experience with the R programming language.

Participant feedback: 

I was interested to see where this area of work was headed, in this the course was very useful. (Winter 2015)

 

First time it has been run - so things will improve as far as organisation. So naturally more structure will come that said Rob & Tim were extremely adaptable & open to co-creating content with their students – EXCELLENT!! (Winter 2015)

Program: 
Winter Program 2016
Notes: 

The instructor's bound, book length course notes will serve as the course texts.