BYO laptops are required for this course.
This course is intended for current users of data analysis software (eg SPSS, SAS or Stata) who wish to learn the R system. R is a free software environment for scientific and statistical computing and graphics that runs on all common computing platforms. An active and highly skilled developer community works on development and improvement. It has become an environment of choice for the implementation of new methodology. It is at the same time attracting wide attention from statistical application area specialists. The powerful and innovative graphics abilities available in R include the provision of well-designed publication-quality plots that can include mathematical symbols and formulae.
The course shows how to use R for the methods covered in the ACSPRI course 'Fundamentals of Multiple Regression', and assumes that participants have an understanding of regression techniques at least to the level provided by that course.
It will emphasise:
1) careful thinking about the aims of the analysis (included intended outcome), and whether the data has the information needed to achieve those aims;
2) the data exploration that should precede and accompany the use of regression methods (this is an area where R has particular strengths);
3) the different challenges involved in using regression for prediction, as opposed to assessing the contribution of specific explanatory variables;
4) the diagnostic checks that should follow regression calculations;
5) the role and limitations of the particular methods and possible alternatives;
6) establishing the existence of non-linear relationships and identifying the necessary transformation procedures;
7) how to use R's model formulae to incorporate and interpret interaction effects;
8) the uses of simulation and other computer intensive techniques to supplement regression methods;
9) other issues for users of regression methods - missing values, multicollinearity, errors in explanatory variables, etc.
Much of the time on the first two days of the course will be taken with an introduction to R, emphasising its use for exploratory data analysis and giving a broad introduction to the course content for the remaining days. It will demonstrate the use of model formulae to account for categorical (“nominal level”) variables, often handled using dummy variables. The final day will show how the methodologies for continuous outcome variables can be extended to handle many types of binary (0/1) and count data.
Intending participants are encouraged to work through the introductory notes on the R system that are noted below.
There will be some limited use of the graphical user interface provided by the R Commander package for R. Most use of R will however be from the command line. Further details can be found on the web page
Data will be provided. Participants who provide the data in advance of the course will, if the data are suitable for the methods covered in the course, have the opportunity to analyse their own data and discuss the output.
Following a first in Mathematics at Auckland University and a variety of teaching and lecturing positions, John Maindonald settled down to working with other researchers as a quantitative problem solver. Until his move from New Zealand to Australia in 1996, much of his work was in plant, fruit and insect and other pest research, with industrial consulting as a sideline. He took up a position at The Australian National University (ANU) in 1998. At ANU he has relished the stimulus of working with biologists (including molecular biologists), ecologists, epidemiologists, public health researchers, demographers, computer scientists, numerical analysts, machine learners, an economic historian, forensic linguists, and a lively group of statisticians. He is the author of a book on Statistical Computation. He the senior author of "Data Analysis and Graphics Using R". This example-based exposition of practical approaches to data analysis, now into its third edition, has sold more than 10,000 copies. Now in semi-retirement, he does occasional consulting, and fronts workshops on the use of the open source R system for scientific and statistical applications and for graphics.
Knowledge of the principles of multiple regression at a level comparable to that provided by the ‘Fundamentals of Multiple Regression’ course. Previous experience of data analysis using SPSS or SAS or Stata or R, or another system with comparable abilities. Participants must be comfortable with typing commands at the command line.
The following, with relevant sections substantially supplemented by course notes, will be used as a text. Copies will be available at a discounted price from the Cambridge University Press Melbourne office. Contact the tutor for details.
Maindonald, J.H. and Braun, W.J. (2010). Data Analysis and Graphics Using R. An Example-Based Approach. 3rd edn, Cambridge University Press.