21 Sep (Statistical) Data Science for the Business User – ONLINE
- 16/11/2020 - 17/11/2020
10:00 am - 4:00 pm
statistics course | level: beginner |
for questions related to this event, contact email@example.com
affiliation: University of Antwerp
This course teaches the basic concepts of statistical data science with focus on business applications. It kicks off by zooming out and reviewing various applications and the data science process model. We then discuss each of the steps in more detail. Data preprocessing is extensively covered given its impact on subsequent analytical model development. We elaborate on descriptive and predictive analytics and discuss various techniques together with their performance measurement from a business perspective. Throughout the course, some examples and small case studies are used for further clarification of the concepts introduced. In this course, illustrations will be given using the programming language R.
Upon finishing this course, the participant will have knowledge about the key concepts, business applications, potential and challenges of data science!
There are no prerequisites, but previous knowledge about basic statistical principles (like maximum likelihood, hypotheses testing and linear regression) is recommended.
We will use free software R and RStudio, which can be downloaded here: https://cran.r-project.org and https://rstudio.com/products/rstudio/download/ for different operating systems.
Extra references will be given in the slides.
Normal rates apply.
This course is online. Details will be announced.
Introduction to Data Science
- What is data science/machine learning/artificial intelligence?
- Machine learning examples and opportunities
- Data science process model
- Types of data and variables
- Summary statistics and visual data exploration
- Dealing with missing values
- Outlier or anomaly detection
- Principal component analysis
- Dealing with imbalanced data
- Feature engineering
- Hierarchical clustering
- K-means clustering
- Target definition
- Linear regression
- Logistic regression
- Decision trees
- Ensemble methods
- Neural networks and link to deep learning
Measuring the performance of predictive analytics models
- Split sample method and cross-validation
- Confusion matrix
- Performance measures for classification
- ROC curve and AUC
- Performance measures for regression