Machine Learning Techniques using Stata - (2 days)

This course is designed as an applied introduction to the use of the Stata software for Machine Learning (ML) techniques.

Master Class - runs over 2 days

Dr Joanna Dipnall is a biostatistician with the School of Public Health and Preventative Medicine (SPHPM) at Monash University and Honorary Research Fellow with School of Medicine at Deakin University. She holds a B.Ec(Honours) from Monash University, and a PhD from the School of Medicine at Deakin University. She also lectures and tutors with the Department of Statistics, Data Science and Epidemiology at Swinburne University. Joanna has developed a novel Risk Index for Depression (RID) utilising SEM and machine learning techniques that brought together five key determinants of depression. She has been a teacher of Stata software for over 15 years, training across Australia and overseas and was a member of the Scientific Committee for the Oceania Stata Users Group Meeting in 2017.

About this course: 

Machie Learning techniques are becoming increasingly popular across areas of research from computer science to various disciplines of medicine. This branch of artificial intelligence relates to algorithms that learn from data based on specific tasks and performance measures. This course is an introductory applied course, using Stata software to run various ML algorithms. This course will use some Stata commands that are built into the base system and others that have been specially designed user-written commands that have evolved from the increasing use of ML. Classification, prediction and model selection issues will be discussed. Detailed notes with worked examples and references will be provided as a basis for both the lecture and hands-on computing aspect of the course.

Please note that this course will use Stata V16.

Course syllabus: 

This course primarily focusses on the application of specific ML techniques rather than the complex mathematics behind the ML algorithms and is broken up into Five Parts:

Part I: Fundamentals of Machine Learning

Part II: Machine Learning Techniques and Work Flow

Part III: Decision Trees & Random Forests

Part IV: Boosted regression

Part V: Support Vector Machines

Part VI: Lasso regression


At the end of each day, participants will be given time to do some ML exercises on their own to practise what they have learned.

Course format: 

This workshop will take place in a classroom. You will need to bring your own laptop with Stata. If you don't have a copy of Stata, please let us know in advance and we will organise a trial version for the course.

Recommended Background: 

This course assumes that participants have (1) Familiarity with the Stata command language (2) sufficient understanding of statistics to be able to comprehend the material covered in the course outline, such as a basic grounding in multiple regression (e.g. linear, logistic, Poisson) and clustering techniques (e.g. Principal components analysis, k-means clustering) (3) access to Stata V16 (4) some experience in using Microsoft Word and Excel or their equivalent (5) experience using a text editor such as Notepad.

Recommended Texts: 

Course notes will be supplied.


No specific references are suggested but a number will be supplied with the notes handed out for the course.

Supported by: 

Stata is distributed in Australia and New Zealand by Survey Design and Analysis Services.