Given the rapid rate at which text data are being digitally gathered in many domains of science, there is growing need for automated tools that can analyse, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, and responsible text mining.
The course deals with the following topics:
The course begins with a review of the basic concepts of text mining, before moving on to implement advanced concepts in natural language processing. By the end of the week, participants will have mastered the advanced skills required for text mining and NLP using Python.
Participants should have a basic knowledge of data science and programming, as well as an interest in scripting and programming in Python.
Participants are requested to bring their own laptop for the lab meetings.
Participants will receive a certificate at the end of the course.
Start time | End time | Type |
---|---|---|
09:00 | 10:30 | Lecture |
Break | ||
10:50 | 12:00 | Practical |
12:00 | 12:30 | Discussion |
Lunch at Vening Meinesz building (A) | ||
14:00 | 15:20 | Lecture |
Break | ||
15:30 | 16:30 | Practical |
16:30 | 17:00 | Discussion |
Dear all,
This summer you will participate in the S42: Applied Text Mining, from Foundations to Advanced course in Utrecht, the Netherlands. To realize a steeper learning curve, we will use some functionality from the Python programming language using Google Colab. The below steps guide you through how to use Python and work on the practicals in this course.
We look forward to seeing you in Utrecht!
The teaching team
Bring a laptop computer to the course and make sure that you have an Internet connection to be able to use Python in Google Colab. If you are using PyCharm, Jupyter Notebook, VS code or any other Python environment also check that you have full write access and administrator rights to the machine. We will explore programming and compiling Python codes in this course. Some corporate laptops have limited user access. In this case, we advise you to bring your own laptop, if you have one.
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It is a powerful environment for scientific computing.
We expect that many of you will have some experience with Python; for the rest of you, this section will serve as a quick crash course both on the Python programming language and on the use of Python in Google Colab:
Follow the tutorial on Python in Google Colab for the Applied Text Mining course from here.
This tutorial is mainly from the CS231n Python Tutorial With Google Colab.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Monday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Tuesday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Wednesday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Thursday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Friday:
On the last day of the course, all the materials will be available in a compact file for download:
We wish all the participants success with their Text Mining / NLP projects!
The teaching team