In today's data-driven world, making sense of text data is more important than ever. This course is your gateway to tapping into the incredible potential of Large Language Models (LLMs) – from free models from Hugging Face and Meta (Llama models) to proprietary models from OpenAI, DeepSeek, Google (aka Gemini models). What would you prefer – hours of tedious manual work or use LLMs to analyse huge amounts of text quickly, uncover hidden patterns, and turn data into actionable insights.
This master-class introduces LLMs in the context of social sciences, offering hands-on, practical experience - no coding expertise or advanced maths skills required.
This course is being held online over 3 days via Zoom and run on Australian Eastern Standard Time (UTC +10)
(Canberra, Sydney, Melbourne, Brisbane time)
Are you working with text and looking for new ways to tackle language-related tasks?
Large Language Models (LLMs) might be just the tool you need! This master-class introduces LLMs in the context of social sciences, offering hands-on, practical experience - no coding expertise or advanced maths skills required.
You'll explore the basics of Natural Language Processing (NLP) and LLMs, diving into real-world applications like text classification, topic modeling, and text generation. We’ll be using Python and Google Colab, but don’t worry if you have no prior programming experience—we’ve got you covered every step of the way.
Our focus is on equipping you with practical skills that you can directly apply to your research. Need to analyze a large body of diverse literature, bulk social media discussions, interview responses, or news articles? Want to detect emerging trends, automate qualitative coding in interviews, or generate synthetic survey responses to test hypotheses? LLMs are revolutionizing how we understand and process text-based data in the social sciences—and this is just the beginning. Don’t miss out!
This masterclass is part of the ACSPRI suite of courses in social data science.
This course will be run over 3 days using the following timetable:
Day 1
- 9.30 am - 10.00 am – Introductions and setup check
- 10.00 am - 11.30 am - Session 1
- 12.30 pm - 2.00 pm - Session 2
- 3.00 pm - 5.00pm - Session 3 + exercises
Days 2 and 3
- 9.00 am - 10.30 am - Session 1
- 11.30 pm - 1.00 pm - Session 2
- 2.00 pm - 4.00pm - Session 3 exercises and consultation
Dr. Maria Prokofieva is an Associate Professor in AI/ML at Victoria University and a data lead at the Mitchell Institute, Vic. She works on projects that use AI/ML approaches and Large Language Models to research applications in business and healthcare. As a machine learning engineer with a deep passion for the responsible application of AI, Maria's work deciphers complex online behaviors to inform consumer and business strategies. She also chairs the CPA Australia Business Analytics Group and part of CPA Australia Technology Innovation Committee, driving innovation in data analysis tools. Maria’s contributions to both research and practical applications are shaping the integration of AI in business and policy on a global scale.
In today's data-driven world, making sense of text data is more important than ever. This course is your gateway to tapping into the incredible potential of Large Language Models (LLMs) – from free models from Hugging Face and Meta (Llama models) to proprietary models from OpenAI, DeepSeek, Google (aka Gemini models). What would you prefer – hours of tedious manual work or use LLMs to analyse huge amounts of text quickly, uncover hidden patterns, and turn data into actionable insights.
We’ll kick things off with the basics of Python and LLMs, but the real magic happens when we dive into how these tools can supercharge your research and business. From sentiment analysis to summarization and classification, you'll learn practical skills that will help you tackle complex questions and industry challenges.
As the course unfolds, you will learn about ways to benchmark your results and get better performance with fine-tuning LLMs to your domain, crafting prompts that guide the models to deliver better results, and integrating your data for even more accurate outputs. Plus, we'll cover the tips to save time and resources with LLM, dig into the ethical side of using AI, and making sure your work is responsible and transparent for publication and public use.
By the end of the course, you won’t just know about LLMs—you’ll know how to use them to solve real problems (your research problems in the first place), manage data efficiently, and generate results that MATTER. Whether you're a researcher looking to enhance your studies or a professional aiming to leverage AI for industry insights, this course gives you the hands-on experience and tools to stay ahead in the fast-changing world of AI-powered text analysis.
Day 1: Foundations of Python and Introduction to LLMs
- Introduction to Python coding and Goodle Colab
- Introduction to Foundational models at Hugging Face and Proprietary models (e.g. OpenAI, DeepSeek and Gemini)
- LLM workflow: from data collection - data prep - modelling - evaluation and improvement
- Case Demonstration:
- Analysing a simple dataset with text data: from loading to data preprocessing to main language tasks, such as sentiment analysis, classification/ zero-shot classification, summarisation and question-answering
Day 2: Large Language Models (LLMs) in Social Sciences
- LLMs: how they work, how to use them in social sciences research, how to evaluate results
- Word embeddings and Sentence transformers: how to use for social science research
- Limitations of pre-trained models and how to get better results for your data
- Ethics, data and model training / use: what to be aware of
- Case Demonstration:
- End-to end research project
- Data Loading and prep
- Tokenisation, working with word/ sentence embeddings
- Storing and managing data: intro to vector databases
- Inference and ways to cut costs
- Evaluation of results and benchmarking
- Putting all this together: from results to actionable insights
- End-to end research project
Day 3: Getting better results for your data with LLM
- Prompt Engineering: Crafting effective prompts to guide LLM's responses
- Fine tuning and parameter-efficient fine-tuning: Adapting a pretrained LLM to specific tasks or domains by training it further on task-specific data.
- Retrieval-Augmented Generation (RAG): Enhancing LLM's responses by integrating external information retrieval mechanisms.
- Reinforcement Learning from Human Feedback (RLHF): Incorporating human feedback int the training process.
- Case Demonstration:
- Tailoring LLMs to understand and categorise domain-specific content.
- Comparing different approaches to get better results for data.
- Ethical implications and potential biases in LLM use, sharing results responsibly and transparently.
- Final Activity: Workshop Wrap-up and Next Steps
- Participants share their project ideas and receive feedback
- Resources for further learning and exploration in Python, LLMs, and social science research.
- Discuss potential collaborations and future research projects.
This workshop will take place online.
BYO Laptop + Zoom. Both PC and MAC are great
The course uses Google Colab and requires a Google account (please make sure you have one or please register one before the session)
All course materials will be provided
The course requires understanding of a basic of statistical concepts and text analysis tasks, exposure to machine learning foundations is beneficial as well, such as Machine Learning for Data Science: Surpervised Learning Techniques
The course assumes no prior knowledge of Python, though some programming experience (e.g. using R) is beneficial.
HuggingFace official Getting Started Guide
Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. " O'Reilly Media, Inc.".
https://learning.oreilly.com/library/view/natural-language-processing/9781098136789/
Q. How much mathematics do I need to start working with deep learning in TensorFlow?
A. You do not need an in-depth understanding of advanced mathematics. The course is designed to introduce you to deep learning applications in an accessible manner, focusing more on implementation and practical use rather than the statistical underpinnings. A basic understanding of algebra and some familiarity with concepts of arrays and matrices will be enough to get you started.
Q. Do I need to install anything before the session? What is Google Colab?
A. No, you do not need to install anything. We will work with Google Colab which is a free cloud service hosted by Google. It allows you to write and execute Python code through your browser. Just make sure you have a Google account! You can sign up here:
https://accounts.google.com/signin
Q. I have used R before, but not Python. Will I struggle?
A. Coming from an R background, you'll find that Python has some differences in syntax and data structures, but many of the underlying concepts are similar – you will be fine!
Q. Where can I see resources for the course?
A. All resources will be available after the course in open access, including Jupyter notebooks with practical examples covered throughout the course and additional cases.
1. BOOKING - ACSPRI does not accept ‘expressions of interest’ for course places, i.e. all bookings, are considered firm, and a cancellation fee is charged if you cancel your booking after the early-bird date.
2. DISCOUNT RATE – The discounted rate for ACSPRI members is available to all staff and students of member organisations. To be eligible for this rate:
The course fee must be paid by either the member organisation or by you. Where fees are paid by a non-member organisation the non-member rate applies:and
You must either have a valid email address issued by the member organisation; or you must hold, or have a right to hold, a current staff or student identity card from the member organisation.
In addition, to be eligible for a full time student discount the participant must:
Hold, or have a right to hold, a current student identity card from the member organisation;
Be enrolled as a full-time student;
Make payment in full with your application, arrange electronic funds transfer (EFT), or contact ACSPRI to advise credit card details for payment, by the early-bird closing date;
Provide ACSPRI with contact details of your supervisor, so we can request them to confirm your eligibility for the full time student rate.
The early bird rate applies to all bookings paid in full by the early bird close date, otherwise you will be charged at the standard rate.
3. REFUNDS & CANCELLATIONS - Course fees are not refundable unless:
we cancel the course in which you have enrolled; or
you cancel your enrolment before the early-bird closing date.
A cancellation fee of $250 will be charged if you cancel within the period from the early-bird closing date of and one week prior to the commencement of the program. The full course fee will be charged if you cancel within 1 week of the beginning of your course.
4. PRE-REQUISITES - Course descriptions specify course pre-requisites. You must undertake to meet the pre-requisites of the course(s) in which you enrol. If in any doubt, you should contact ACSPRI prior to enrolling.
Delivery of this course is online - via Zoom.
Please ensure you have the following:
- Reliable Internet connection with at least 5Gb per day of data available (i.e. a 5 day course will use about 25Gb of data just on the Zoom application)
- A computer/laptop with the Zoom application installed (free)
- A webcam (built in to most laptops)
- A headset with a microphone (not required but ideal)
- A second monitor/screen if possible
Please also check the course page for specific software requirements (if any).