Quick Overview

Column 1

Outline

Given the rapid rate at which text data are being digitally gathered in many domains of science, there is growing need for automated tools that can analyse, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, and responsible text mining.

The course deals with the following topics:

  • Review the fundamental approaches to text mining;
  • Understand and apply current methods for analysing texts;
  • Define a text mining pipeline given a practical data science problem;
  • Implement all steps in a text mining pipeline: feature extraction, feature selection, model learning, model evaluation;
  • Understand and apply state-of-the-art methods in text mining;
  • Implement word embedding and advanced deep learning techniques;
  • Review the advanced approaches to text mining: transformers, LLMs, prompting, responsible text mining.

The course begins with a review of the basic concepts of text mining, before moving on to implement advanced concepts in natural language processing. By the end of the week, participants will have mastered the advanced skills required for text mining and NLP using Python.

Requirements

Participants should have a basic knowledge of data science and programming, as well as an interest in scripting and programming in Python.

Prerequisites

Participants are requested to bring their own laptop for the lab meetings.

Certificate

Participants will receive a certificate at the end of the course.

Additional references

1- Jurafsky, D., Martin, J.H. (2025). Speech and language processing, third edition. Find online chapters here

2- Eisenstein, J. (2018). Natural Language Processing. Find online chapters here

Column 2

Daily schedule

Start time End time Type
09:00 10:30 Lecture
Break
10:50 12:00 Practical
12:00 12:30 Discussion
Lunch at Vening Meinesz building (A)
14:00 15:20 Lecture
Break
15:30 16:30 Practical
16:30 17:00 Discussion

How to prepare

Column 1

Preparing your machine for the course

Dear all,

This summer you will participate in the S42: Applied Text Mining, from Foundations to Advanced course in Utrecht, the Netherlands. To realize a steeper learning curve, we will use some functionality from the Python programming language using Google Colab. The below steps guide you through how to use Python and work on the practicals in this course.

We look forward to seeing you in Utrecht!

The teaching team

System requirements

Bring a laptop computer to the course and make sure that you have an Internet connection to be able to use Python in Google Colab. If you are using PyCharm, Jupyter Notebook, VS code or any other Python environment also check that you have full write access and administrator rights to the machine. We will explore programming and compiling Python codes in this course. Some corporate laptops have limited user access. In this case, we advise you to bring your own laptop, if you have one.

Python in Google Colab

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It is a powerful environment for scientific computing.

We expect that many of you will have some experience with Python; for the rest of you, this section will serve as a quick crash course both on the Python programming language and on the use of Python in Google Colab:

Follow the tutorial on Python in Google Colab for the Applied Text Mining course from here.

This tutorial is mainly from the CS231n Python Tutorial With Google Colab.

Column 2

Alternative approach

I prefer to use Jupyter Notebook or PyCharm when I am not using Google Colab.

  • You can find an extensive tutorial by JetBrains on how to install and use PyCharm from here.

  • A beginner’s tutorial on how to use Jupyter Notebook: link

Monday

Column 1

Materials

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.

Here you will find the materials for Monday:

Column 2

Additional references

  • Ref 1: Chapters 2, 3, 4, 5, 6
  • Ref 2: Chapters 1, 2, 3, 4

Tuesday

Column 1

Materials

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.

Here you will find the materials for Tuesday:

Column 2

Additional references

  • Ref 2: Chapter 5

Wednesday

Column 1

Materials

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.

Here you will find the materials for Wednesday:

Column 2

Additional references

  • Ref 1: Chapters 6, 7, 9
  • Ref 2: Chapters 6, 14

Thursday

Column 1

Materials

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.

Here you will find the materials for Thursday:

Column 2

Additional references

  • Ref 1: Chapters 6, 7, 9
  • Ref 2: Chapters 6, 14

Friday

Column 1

Materials

We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.

Here you will find the materials for Friday:

Column 2

Additional references

  • Ref 1: Chapters 6, 7, 9
  • Ref 2: Chapters 6, 14

Archive

All materials in a ZIP file

On the last day of the course, all the materials will be available in a compact file for download:

Download the Materials

We wish all the participants success with their Text Mining / NLP projects!

The teaching team