An Introduction to Big Data Processing – ONLINE

An Introduction to Big Data Processing – ONLINE

  • First Session
    08/12/2021
    1:00 pm - 5:00 pm
  • Second Session
    09/12/2021
    1:00 pm - 5:00 pm
  • Third Session
    15/12/2021
    1:00 pm - 5:00 pm

Course details

data science course | level: intermediate | register now
for questions related to this event, contact uhasselt@flames-statistics.com
affiliation: Hasselt University


Abstract

This course will be offered ONLINE.

The term "Big Data" refers to data that has characteristics (in terms of Volume, Velocity, Veracity, Variety, ...) that make it difficult to analyze using conventional processing methods (e.g., R or Python or SQL on a single machine).

In response, novel programming models and frameworks for dealing with Big Data characteristics have been developed. The objective in this course is to introduce researchers to these programming models, where we focus exclusively on the Volume and Velocity characteristics of Big Data.

Concretely, one way of dealing with the Volume characteristic is to analyze such data in parallel, using many machines organized in a distributed compute cluster. We introduce the architecture of distributed compute clusters (a.k.a. "Warehouse Scale Machines"); how such clusters enable data-parallelism; and the importance of fault-tolerance for such clusters.

We then study how to program on such clusters using the Map/Reduce and Apache Spark programming frameworks.

Finally, we turn to the velocity characteristic of big data. We discuss the challenges of this characteristic, and ways of organizing an analysis pipeline to address these challenges. We also show how this organization can be implemented using Spark Streaming.

The course does not cover statistics, data mining, machine learning, or predictive modelling. Neither does it constitute an introduction to programming. It aims to provide researchers the means to effectively analyze data at scale, using modern distributed analytics engines to exploit parallellism.

The course is organized as a three-day course. Each day starts with a theoretical part that discusses the necessary background, and is followed by a hands-on exercise session where participants use the corresponding programming models.


Prerequisites

The exercises will interact with Spark and Spark streaming through python. A hands-on knowledge of python is therefore required. Since the exercises will be done in the form of jupyter notebooks, familiarity with jupyter notebooks is also required.


Background readings


Fee

PhDs and postdocs of a Flemish university: 0 €
Other academics: 120 €
Non-profit/Social sector: 200 €
Private sector: 400 €


Venue

The course is offered ONLINE.


Instructor

prof. dr. Stijn Vansummeren


Details Price Qty
PhD & postdocs of Flemish universitiesshow details + €0.00 (EUR)  
Othershow details + €0.00 (EUR)  
Waiting list (Register here if tickets are sold out)show details + €0.00 (EUR)