Scraping tables from PDFs with pdftools package (in R) – ONLINE

Scraping tables from PDFs with pdftools package (in R) – ONLINE

  • Access to recorded web lecture
    10/02/2021 - 17/02/2021
    10:00 am
  • Online Q&A session
    17/02/2021
    10:00 am - 11:00 am

Course details

statistics seminar | level: advanced | register now
for questions related to this event, contact kuleuven@flames-statistics.com
affiliation: KU Leuven


Abstract

The aim of this two hours seminar is to provide a guide to extract "irregularly" formatted tables from PDFs.
We’ll use ROpenSci’s pdftools package along with several tidyverse packages: stringr (text manipulation), dplyr (data wrangling) and tidyr (data cleaning). Two examples of scraping different tables out of PDF documents will be discussed.

Dates & Times for this seminar:

You will have access to previously recorded web lecture on 10 February 2021. Look at them at your own pace.

An online and interactive Q&A session of hour is scheduled on 17 February 2021 at 10 am. Ask your questions there!


Prerequisites

IMPORTANT!!

Good knowledge of R and Tidyverse. is required. Previous experience of coding in R is foundamental. If you just started coding in R, this seminar is not for you.
If you want more information, don't hesitate to contact the organizer at: kuleuven@flames-statistics.com


Background readings

https://www.brodrigues.co/blog/2018-06-10-scraping_pdfs/


Fee

No fee


Venue

ONLINE


Instructor

Cristina Cametti


Sign up (price ticket depends on affiliation)

You will receive payment details later from the organiser by email.