Introduction to Large Language Models (LLMs) in Social Sciences: Online (3 days)

An introduction to Large Language Models (LLMs) in the context of social sciences, offering hands-on, practical experience - no coding expertise or advanced maths skills required.

Instructor

Maria Prokofieva

Dr. Maria Prokofieva is an Associate Professor in AI/ML at Victoria University and a data lead at the Mitchell Institute, Vic. She works on projects that use AI/ML approaches and Large Language Models to research applications in business and healthcare. As a machine learning engineer with a deep passion for the responsible application of AI, Maria's work deciphers complex online behaviors to inform consumer and business strategies. She also chairs the CPA Australia Business Analytics Group and part of CPA Australia Technology Innovation Committee, driving innovation in data analysis tools. Maria’s contributions to both research and practical applications are shaping the integration of AI in business and policy on a global scale.

Course Level

Masterclass

two people talking through string & cans with a heading of LLMs

Are you working with text and looking for new ways to tackle language-related tasks? (LLMs) might be just the tool you need!

You'll explore the basics of Natural Language Processing (NLP) and LLMs, diving into real-world applications like text classification, topic modeling, and text generation. We’ll be using Python and Google Colab, but don’t worry if you have no prior programming experience—we’ve got you covered every step of the way.

Our focus is on equipping you with practical skills that you can directly apply to your research. Need to analyze a large body of diverse literature, bulk social media discussions, interview responses, or news articles? Want to detect emerging trends, automate qualitative coding in interviews, or generate synthetic survey responses to test hypotheses? LLMs are revolutionizing how we understand and process text-based data in the social sciences—and this is just the beginning. Don’t miss out!

In today's data-driven world, making sense of text data is more important than ever. This course is your gateway to tapping into the incredible potential of Large Language Models (LLMs) – from free models from Hugging Face and Meta (Llama models) to proprietary models from OpenAI, DeepSeek, Google (aka Gemini models). What would you prefer – hours of tedious manual work or use LLMs to analyse huge amounts of text quickly, uncover hidden patterns, and turn data into actionable insights.

We’ll kick things off with the basics of Python and LLMs, but the real magic happens when we dive into how these tools can supercharge your research and business. From sentiment analysis to summarisation and classification, you'll learn practical skills that will help you tackle complex questions and industry challenges.

As the course unfolds, you will learn about ways to benchmark your results and get better performance with fine-tuning LLMs to your domain, crafting prompts that guide the models to deliver better results, and integrating your data for even more accurate outputs. Plus, we'll cover the tips to save time and resources with LLM, dig into the ethical side of using AI, and making sure your work is responsible and transparent for publication and public use.

By the end of the course, you won’t just know about LLMs—you’ll know how to use them to solve real problems (your research problems in the first place), manage data efficiently, and generate results that MATTER. Whether you're a researcher looking to enhance your studies or a professional aiming to leverage AI for industry insights, this course gives you the hands-on experience and tools to stay ahead in the fast-changing world of AI-powered text analysis.

This masterclass is part of the ACSPRI suite of courses in social data science.

This course will be run over 3 days using the following timetable:

Day 1

9.30 am - 10.00 am – Introductions and setup check
10.00 am - 11.30 am - Session 1
12.30 pm - 2.00 pm - Session 2
3.00 pm - 5.00pm - Session 3 + exercises

Days 2 and 3

9.00 am - 10.30 am - Session 1
11.30 pm - 1.00 pm - Session 2
2.00 pm - 4.00pm - Session 3 exercises and consultation

Day 1: Foundations of Python and Introduction to LLMs

Introduction to Python coding and Goodle Colab
Introduction to Foundational models at Hugging Face and Proprietary models (e.g. OpenAI, DeepSeek and Gemini)
LLM workflow: from data collection - data prep - modelling - evaluation and improvement
Case Demonstration:
- Analysing a simple dataset with text data: from loading to data preprocessing to main language tasks, such as sentiment analysis, classification/ zero-shot classification, summarisation and question-answering

Day 2: Large Language Models (LLMs) in Social Sciences

LLMs: how they work, how to use them in social sciences research, how to evaluate results
Word embeddings and Sentence transformers: how to use for social science research
Limitations of pre-trained models and how to get better results for your data
Ethics, data and model training / use: what to be aware of
Case Demonstration:
- End-to end research project
  - Data Loading and prep
  - Tokenisation, working with word/ sentence embeddings
  - Storing and managing data: intro to vector databases
  - Inference and ways to cut costs
  - Evaluation of results and benchmarking
  - Putting all this together: from results to actionable insights

Day 3: Getting better results for your data with LLM

Prompt Engineering: Crafting effective prompts to guide LLM's responses
Fine tuning and parameter-efficient fine-tuning: Adapting a pretrained LLM to specific tasks or domains by training it further on task-specific data.
Retrieval-Augmented Generation (RAG): Enhancing LLM's responses by integrating external information retrieval mechanisms.
Reinforcement Learning from Human Feedback (RLHF): Incorporating human feedback int the training process.
Case Demonstration:
- Tailoring LLMs to understand and categorise domain-specific content.
- Comparing different approaches to get better results for data.
- Ethical implications and potential biases in LLM use, sharing results responsibly and transparently.
Final Activity: Workshop Wrap-up and Next Steps
- Participants share their project ideas and receive feedback
- Resources for further learning and exploration in Python, LLMs, and social science research.
- Discuss potential collaborations and future research projects.

The course requires understanding of a basic of statistical concepts and text analysis tasks, exposure to machine learning foundations is beneficial as well, such as Machine Learning for Data Science: Surpervised Learning Techniques

The course assumes no prior knowledge of Python, though some programming experience (e.g. using R) is beneficial.

HuggingFace official Getting Started Guide

https://huggingface.co/learn/

Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. " O'Reilly Media, Inc.".

https://learning.oreilly.com/library/view/natural-language-processing/9781098136789/

Maria was very good at adapting the training based on the inputs/requests from course participants. This greatly enhance the relevance of the course materials.

As it was a small group it was a good interactive class and being online did not disturb the experience of flow of workshop.

Introduction to Large Language Models (LLMs) in Social Sciences: Online (3 days)

Instructor

Notify me about the next course offering