Given the rapid rate at which text data are being digitally gathered in many domains of science, there is growing need for automated tools that can analyse, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, and responsible text mining.
The course deals with the following topics:
The course starts with reviewing basic concepts of text mining and implementing advanced concepts in natural language processing. At the end of the week, participants will master advanced skills of text mining with Python.
Participants should have a basic knowledge of data science and programming and a motivation of scripting and programming in Python.
Participants are requested to bring their own laptop for the lab meetings.
Participants will receive a certificate at the end of the course.
Start time | End time | Type |
---|---|---|
09:00 | 10:30 | Lecture |
Break | ||
10:50 | 12:00 | Practical |
12:00 | 12:30 | Discussion |
Lunch at Vening Meinesz building (A) | ||
14:00 | 15:20 | Lecture |
Break | ||
15:30 | 16:30 | Practical |
16:30 | 17:00 | Discussion |
Dear all,
This summer you will participate in the S42: Data Science: Applied Text Mining course in Utrecht, the Netherlands. To realize a steeper learning curve, we will use some functionality from the Python programming language using Google Colab. The below steps guide you through how to use Python and work on the practicals in this course.
If you follow this course online please have a look at this instructional page on MS Teams.
We look forward to see you all in Utrecht and online.
The Applied Text Mining team
Bring a laptop computer to the course and make sure that you have an Internet connection to be able to use Python in Google Colab. If you are using PyCharm or Jupyte Notebook, also check that you have full write access and administrator rights to the machine. We will explore programming and compiling in this course. Some corporate laptops come with limited access for their users, we therefore advise you to bring a personal laptop computer, if you have one.
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It is a powerful environment for scientific computing.
We expect that many of you will have some experience with Python; for the rest of you, this section will serve as a quick crash course both on the Python programming language and on the use of Python in Google Colab:
Follow the tutorial on Python in Google Colab for the Applied Text Mining course from here.
This tutorial is mainly from the CS231n Python Tutorial With Google Colab.
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Monday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Tuesday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Wednesday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Thursday:
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advise all course participants to access the materials online. Lectures are provided in html and pdf formats. Practical files contain the exercises, in two versions, with and without solutions.
Here you will find the materials for Friday:
On the last day of the course, all the materials will be available in a compact file for download:
We wish all the participants success with their Text Mining projects!
The Applied Text Mining team