This masterclass offers a step-by-step, interactive introduction to Python for participants with no experience with this software package. This masterclass, part of the ACSPRI suite of courses in social data science, is specially designed for those who want to learn how to use Python for data manipulation and statistical analysis.
This course will be run over 2 days using the following timetable:
Day 1
- 9.30 am - 10.00 am – Introductions and setup check
- 10.00 am - 11.30 am - Instructional Zoom Session
- 12.30 pm - 2.00 pm - Instructional Zoom Session
- 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises
Day 2
- 10.00 am - 11.30 am - Instructional Zoom Session
- 12.30 pm - 2.00 pm - Instructional Zoom Session
- 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises
Dr Joanna Dipnall is an applied statistician with interests in the advanced statistical methods, including machine learning and deep learning techniques. She completed her Honours in Econometrics with Monash University and her PhD with IMPACT SRC, School of Medicine, Deakin University. Joanna works extensively with registry and linked medical data and collaborates extensively with the Faculty of IT at Monash to supervise Masters and PhD students to integrate artificial intelligence within health research. Joanna teaches within the Monash Biostatistics Unit and is the Unit Co-coordinator for the Monash Masters of Health Data Analytics course. Joanna has taught advanced statistical methods for many years at universities and for ACSPRI.
One of the key skills in data science is making effective use of the Python software for manipulating data and generating results. Python is an established software environment used in the world of data science. In this course, you will be introduced to basic data wrangling, descriptive statistics, visualisation and reporting of results. You will be introduced to Anaconda, setting up a Python environment and using Jupyter Notebook to run the workshop examples.
Upon completion of this master class, you will have the skills required to load different types of data files into Python, manage and manipulate your data, and build some basic visualisations. The workshop is relevant to researchers and data analysts in any area of research that want to use Python for their research work. This workshop aims to introduce the foundations of Python and build confidence in the use of Python.
Day 1:
- Introduction to Python
- Installing and loading libraries
- Python environment
- Coding Statements, Comments and Functions
- Native Python functions
- Data structures in Python
Day 2:
- Data manipulation
- Basic descriptive statistics
- Tabulations
- Filtering data frames (rows and/or columns)
- Basic graphs
Exercises will be included interactively throughout the workshop.
This course will be run online over 2 days.
Participants will require their own computers and to have loaded Python and Anaconda onto their machines. They will also need to be able to access the internet to download Python libraries.
This course will be taught in the PC environment but MAC users are welcome.
Please note that due to the short 2-day structure, there will not be any time set aside for analysing participant’s own data.
This course assumes that participants have:
- A basic understanding of statistical concepts including descriptive statistics (mean, median and interquartile range),
- Some familiarity with a PC/Mac environment including keyboard skills,
- An understanding of folder and file structures in the PC/Mac environment, and
- Some experience in using Microsoft Word and Excel or their equivalent.
Murach's Python Programming, 2nd Edition by Joel Murach and Michael Urban
Python for Data Science: A Hands-On Introduction Paperback by Yuli Vasiliev
Learning Python: Powerful Object-Oriented Programming, 5th Edition by Mark Lutz