12 Apr module 10: introduction to machine learning
- module 10: introduction to machine learning
16/09/2020 - 18/09/2020
1:45 pm - 3:15 pm
Wed 16/09: 1:45 pm - 5:00 pm | online
Thu 17/09: 9:30 am - 5:00 pm | online
Fri 18/09: 9:30 am - 3:15 pm | online
week 2 | register now
Many modern digital applications increasingly rely on machine learning as a means to derive predictive strength from high-dimensional data sets. Compared to traditional statistics, the absence of a focus on scientific hypotheses, and the need for easily leveraging detailed signals in the data require a different set of models, tools, and analytical reflexes. This course aims to bring participants to the level where they can independently start the analytical part of data mining projects. This means that the most common types of projects will be addressed - regression-type with continuous outcomes, classification with categorical outcomes, and clustering. For each of these, the practical use of a set of standard methods will be shown, like Random Forests, Gradient Boosting Machines, Support Vector Machines, k-Nearest-Neighbors, K-means,... Furthermore, throughout the course, concepts will be highlighted that are of concern in every statistical learning applications, like the curse of dimensionality, model capacity, overfitting and regularization, and practical strategies will offered to deal with them, introducing techniques such as the Lasso and ridge regression, cross-validation, bagging and boosting. Instructions will also be given on a selection of specific techniques that are often of interest, such as e.g. model calibration and explanation of black-box models. Finally, we will introduce the idea of deep learning as a powerful tool for data analysis, discussing when and how to practically use it, and when to shy away from it. The course will offer practical materials to do the analyses in Python, selections of these practicals will also be covered in class.
Copies of the slides and Python code notebooks.
This course targets professionals and investigators from all areas that are involved in predictive modelling based on large and/or high-dimensional databases. Participants are expected to be familiar with the basics of regression (as for instance taught in this Summer School), and to have had a first experience programming in Python.
As a Senior Data Scientist at the KBC Group Big Data center, Dr. Bart Van Rompaye heads a group of data scientists applying modern data analytical approaches to investment-related problems. He obtained his PhD at Ghent University on issues in survival analysis, and held postdoctoral positions at Ghent University and Umea University, Sweden. In the past, he has taught numerous courses for the Master in Statistical Data Analysis, the Institute for Continuing Education in Science, and FLAMES, the Flanders Training Network for Methodology and Statistics.