Big Data Analysis for Social Scientists

A/Prof Robert Ackland (Australian National University)
Mr Timothy Graham (University of Queensland)

This course introduces students to the collection and analysis of socially-generated 'big data' using the R statistical software and Gephi network visualisation software.

Big data involves data on: (1) people (social web) e.g. online social networks (e.g. Facebook), microblogs (e.g. Twitter); (2) information (WWW) e.g. web pages, clickstreams; (3) things (sensor web) e.g. phones, temperature sensors, and (4) places (geospatial web) e.g. geology, land use maps.

The focus of this course is on data from the social web and the WWW. Students will learn how to: (1) collect data from web pages, Twitter and Facebook; (2) construct, analyse and visualise networks of people and organisations (social networks) and terms (semantic networks); (3) extract and analyse text data; (4) conduct temporal analysis, (5) filter and sample from large datasets; (6) identify and engage with advanced techniques for dealing with very large datasets.

We will focus on three sources of network and text data: Twitter, Facebook and the WWW. We will look at:

Who are the actors, and what actor attributes are available for them?
How can we find connections between actors, and how can we use social network analysis to understand the social scientific meaning of such connections?
What text can be attributed to these actors, and what does analysis of this text tell us about the actors and society as whole?
How can we study behaviour over time, identifying significant events or trends?
What are some of the key methodological issues for using social media and WWW data to study individual and collective behaviour, and how can we address such limitations?

The course will also provide an opportunity for students to learn about examples of 'best practice' social science big data research, and thus see how these data and techniques are already being used in social science.

Topics covered in the course

The following is an indicative list of topics covered in the course. The topics are listed as being either 'core' or 'advanced'. Core topics will be covered in detail, while the coverage of the advanced topics will depend on student interest. Prior to the course running we will ascertain student interest in particular topics and focus the course accordingly.

1. Data collection (core)

Collection of data from Twitter, Facebook and WWW. We will also provide datasets (e.g. Twitter #auspol dataset) that will be used in the course.

2. Creation of networks (core)

Creation of different types of networks, including:

unimodal networks (one type of actor): (1) social networks (where network nodes are Twitter users, FB users, organisational websites), (2) semantic networks (where network nodes are terms extracted from tweets, FB posts, web pages)
bimodal networks (two types of actors) – for example, Twitter users and hashtags extracted from tweets, FB users and posts they have commented on or liked.

3. Network analysis (core)

Introduction to social network analysis, covering main network-level and node-level metrics, and clustering (‘community detection’).

4. Visualisation techniques (core)

How to create high-quality network visualisations, publish interactive networks on the web, and generate ‘word clouds’ and dendrograms from network data.

5. Text analysis (core)

Supervised machine learning (e.g. support vector machines), unsupervised machine learning (topic modelling), sentiment analysis, hierarchical clustering, and descriptive analytics.

6. Temporal analysis (advanced)

Analysing networks and text over time, identifying significant changes in behaviour of individual nodes, clusters or entire networks.

7. Filtering and sampling (advanced)

Techniques for targeting data collection and analysis on a particular set of actors. Reducing the scale of datasets via sampling.

8. Scaling up to very large datasets (advanced)

The data used in the course will be able to be analysed on a desktop/laptop computer. What if your dataset is too large for your computer, or do you have a project in mind that involves massive amounts of data?

Tools used in the course

R and Gephi will be used. We will be using a selection of existing R libraries, but will aim to make use of 'wrapper functions' or possibly a custom R library so as to reduce the amount of coding that students need to undertake during the week (hence maximising the amount of content we can cover). All R source code will be available to students. The following is an indicative list of the R libraries used in the course:

igraph (network analysis and visualisation)
twitteR (for collecting Twitter data)
tm (text mining)
RTextTools (machine learning package for automatic text classification)
RCurl (collecting WWW data)
XML (reading and creating XML documents)
R.utils (programming utilities)
wordcloud (text word clouds)
ape and dendextend (dendograms, hierarchical clustering)?
FactoMineR and homals (multiple correspondence analysis)
plyr and stringr (text sentiment analysis)

This course is expected to take place in a computer lab. Course participants are welcome to bring a laptop with R installed if they prefer.

Level 3 - runs over 5 days

Instructor:

Prof. Robert Ackland is based in the School of Sociology at the Australian National University (ANU). He was awarded his PhD in economics from the ANU in 2001, and he has been researching online social and organisational networks since 2002. He leads the Virtual Observatory for the Study of Online Networks Lab (http://vosonlab.net) which was established in 2005 and is advancing the social science of the Internet by conducting research, developing research tools, and providing research training. Robert has been teaching masters courses in online research methods and the social science of the internet since 2008 (undergraduate versions of the courses started in 2017). His book Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age (SAGE) was published in July 2013. He created the VOSON software for hyperlink network construction and analysis, which was publicly released in 2006. The VOSON R packages for collecting and analysing social media network and text data were released in 2015 (Bryan Gertzel is the lead developer), and to date the packages have been downloaded over 80K times with current downloads of 1K per month.

Course dates: Monday 29 June 2015 - Friday 3 July 2015

Course status: Course completed (no new applicants)

Venue:

University of Queensland, St Lucia Campus

Week:

Week 1

Recommended Background:

Participants are advised to have taken at least one of the following ACSPRI courses or have had some equivalent exposure to social network analysis and quantitative text analysis:

Social Media Analysis
Introduction to Social Network Research and Network Analysis
Advanced Network Analysis for Social Research

Participants must also have some experience with the R programming language. You do not need to be an R expert but must have some familiarity with how to program in R (or other similar languages), for example via the following ACSPRI courses:

Learning R: Open Source (Free) Stats Package
Using R for Practical Research and Data Visualisation
Data Analysis, Graphics and Visualisation Using R

If the course is run with participants using their own laptops, participants will need to have installed: (1) R statistical software plus a set of libraries which will be specified before the course starts, (2) Gephi. Both R and Gephi run with Mac/Windows/Linux.

Recommended Texts:

The R Project for Statistical Computing - http://www.r-project.org
Gephi - http://gephi.github.io
VOSON Lab - http://vosonlab.net/

Course texts

R code and course notes will be provided.

Course fees

Member:

$1,870

Non Member:

$3,485

Full time student Member:

$1,870

Program:

Winter Program 2015

Event	Dates
Summer Program 2026	19/01/2026 - 13/02/2026
ANU Online Summer School in Political Analysis	02/02/2026 - 20/02/2026
Master-class March 2026: Qualitative Interviewing: Online	26/03/2026 - 27/03/2026
Master-class April 2026: Discourse Analysis: Online	09/04/2026 - 10/04/2026
Master-class April 2026: Questionnaire Design: Online	16/04/2026 - 17/04/2026
Workshop May 2026: LimeSurvey Web Surveys: Online	21/05/2026
Master-class June 2026: Writing Qualitatively: Online (2-days)	03/06/2026 - 04/06/2026
Master-class September 2026: Data Visualisation: Online	09/09/2026 - 11/09/2026
10th Biennial ACSPRI Social Science Methodology Conference	24/11/2026 - 26/11/2026

Big Data Analysis for Social Scientists

Upcoming Events

Shopping cart

Big Data Analysis for Social Scientists

Subscribe

Upcoming Events

Shopping cart