Skip subnavigation and go to article content

Moving Towards Statistical Machines in Health Sciences

Berkeley Public Health Brown Bag Talk, November 10 2020

Alan Hubbard: Moving Towards Statistical Machines in Health Sciences

Recent years have seen the rapid rise of the use of data adaptive, machine learning (ML) algorithms in health sciences. Their use has seen some success, but also some reason for caution in how they can be misused. Some of this has to do with the idiosyncratic manner in which ML is deployed, guided less by theory and more by practical metrics (e.g., prediction performance of the algorithm in a test set). Both recently develop methodology/theory has suggested the potential of true statistical machines: algorithms where the required information is inputted including the desired statistical summaries, a button is pressed, and estimation/inference are optimized automatically with little input and “experimentation” by the analyst. The methods to do so represent a break with traditional statistical analysis and its teaching, which require a re-think of how we apply statistical methodology in public health, and how to change the curriculum to better train students to understand these developments. The talk will both present a background in the development of statistical machines, as well as possible future developments, how machines can help to improve reliability of public health science, always with an eye on the public health goals.

Alan Hubbard’s (UC Berkeley Professor of Biostatistics) research focuses on the application of statistics to population studies with emphasis on semi-parametric models in causal inference, as well as applications in high dimensional biology. Applied work ranges from molecular biology of aging, wildlife biology, epidemiology, and infectious disease modeling, but most of his work has focused on semi-parametric estimation and inference with high-dimensional data. He is particularly interested in harnessing machine-learning algorithms and advances in semiparametric causal inference towards machines for optimizing estimation of parameters related to causal inference/variable importance.