Searching text patterns using regular expressions [ON CAMPUS]

Searching text patterns using regular expressions [ON CAMPUS]

  • 08/12/2021
    10:00 am - 4:00 pm

Course details

methodology seminar | level: beginner | register now
for questions related to this event, contact jarl.kampen@uantwerp.be
affiliation: University of Antwerp


Abstract

The analysis of large and complex datasets often starts with retrieving the records of interest, using a query that is based on a text pattern. Starting from a large database with student records, you may have to retrieve the students whose student-ID starts with s2016. Or you may need to retrieve files from which the filename starts with “results”, followed by a date, and ending with the extension “.txt”. And in these retrieved files, you may have to change the extension “.txt” into “.csv”.

This workshop illustrates functions in R that allow you to work with text data, including grep(), sub(), gsub() and strsplit(). Most of these functionalities are not unique to R – most of them are also found in other programming languages including perl, awk and Python.

An important notion in working with text is the concept of regular expressions. Regular expressions are a textual syntax for representing patterns for matching text – allowing to express patterns in character values. These patterns can then be used to extract parts of the dataset or modify these character values.


Prerequisites

No knowledge in statistics is needed. Participants need to have the latest version of R and Rstudion installed on their laptop, and have some familiarity with R - if you have no idea what the following commands mean, the course is too advanced for you:

mydata <- read.table (“c:/temp/rawdata.txt”,header=T, dec=”,”)
sub.males<-mydata[mydata$sex==”male,]
sub.females<-mydata[mydata$sex==”female”,]
mydata$pass<-as.factor(ifelse(mydata$examresult<10,0,1))
levels(mydata$pass)<-c(“fail”,”pass”)
table(mydata$pass)


Background readings


Fee

Normal fees apply.


Venue

The course is taught on-campus, at University of Antwerp, Groenenborger Campus, Room V.008.


Instructor

Prof. dr. Erik Fransen


The analysis of large and complex datasets often starts with retrieving the records of interest, using a query that is based on a text pattern. Starting from a large database with student records, you may have to retrieve the students whose student-ID starts with s2016. Or you may need to retrieve files from which the filename starts with “results”, followed by a date, and ending with the extension “.txt”. And in these retrieved files, you may have to change the extension “.txt” into “.csv”.

This workshop illustrates functions in R that allow you to work with text data, including grep(), sub(), gsub() and strsplit(). Most of these functionalities are not unique to R – most of them are also found in other programming languages including perl, awk and Python.

An important notion in working with text is the concept of regular expressions. Regular expressions are a textual syntax for representing patterns for matching text – allowing to express patterns in character values. These patterns can then be used to extract parts of the dataset or modify these character values.

Details Price Qty
Free Ticket (researchers at FLAMES participating universities)show details + €0.00 (EUR)  
Academic/Non-profit/Private Sectorshow details + €0.00 (EUR)