Course Presentation & Logistics
Natural Language Processing
Master in Information Health Engineering
Jerónimo Arenas-García
January 31, 2024
Department of Signal Theory and Communications
Universidad Carlos III de Madrid
Part I
NLP Introduction
1/20
Motivation I
[Link] never- sleeps
2/20
Motivation II
Large amounts of unstructured text data are generated every second
We can no longer use the common approach to understand the text and
this is where NLP comes in
3/20
What is NLP?
AI’s discipline concerned with giving computers the ability to understand
written and spoken human languages in the same manner humans do
Modified from
4/20
A walk-through of recent developments in NLP
Natural language processing: state of the art, current trends and challenges 5/20
Some NLP applications
Source: IconicTranslation, modified from
6/20
Some NLP applications
• Spam Filtering
• Sentiment Analysis
• Alert detection
• Automating customer
walkthroughs in support
systems
• ...
Source: Singapore Institute of Manufacturing Technology (SIMTech). Free demo 7/20
Some NLP applications
Regular
Expressions
POS tagging
INFORMATION
Rule-Based
EXTRACTION
Matching
TECHNIQUES
NER
Topic
Free demo
Modeling
8/20
Some NLP applications
Free demo 9/20
Some NLP applications
Natural Language Chatbot “Eno” & “BlenderBot”
10/20
NLP is booming in the healthcare industry
(a) Multiomics Topic Modeling for Breast Cancer Classification
(b) Winterlight Labs
(c) Using Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy
Management Research
(d) Amazon comprehend medical
(e) Woebot Health
11/20
Part II
Course logistics
12/20
What is this course about?
Basic
Document
Vectorization
Parsing
techniques
Word Corpus NLP pre-
embeddings Acquisition processing
Contex- I.
II.
tualized Text LDA
embeddings
Topic modeling
Vectorization
Evaluation
metrics &
Visualization
NLP
III. Semantic
Transformers III.
Semantic Information
Architecture Transformers
Analysis Retrieval
Transfer Semantic
Learning Graphs
Graph
Hugging Face visualization
& Analysis
13/20
Technologies
14/20
Instructor
• Jerónimo Arenas García
Email: jarenas@[Link]
Office: 4.2.C06 (Leganés)
Tutor Hours: Wednesday 19:30 – 20:30 (request by email)
Special thanks to Lorena Calvo Bartolomé and Jesús Cid Sueiro that
have both contributed to the teaching materials of this course.
15/20
Methodology
• Mixed theory + practical sessions
• Personal work
• Intensive use of Python Notebooks
• Lab exercises will be proposed during each block
• Writing quiz at the end blocks 2 and 4
16/20
Assessment (Continuous evaluation)
1. Lab exercises (20% total)
• Notebooks should be handed before the next Tuesday, at 23.59.
• Proposed exercises will be reviewed based on methodological correctness
and code quality.
• 5 Assignments will be proposed, keeping best 4 grades.
2. Written tests (15% each, 30% total)
• March 13: Blocks I – III
• May 8: Block IV
3. Kaggle (10%)
• Apr 3: Proposal
• Apr 17: Submission deadline
4. Final project (40%)
• Feb 21: Proposal
• May 13: Deadline (report + code)
17/20
Assessment. June Call
Blocks 1, 2 and 3 grades will be discarded, and a new grade (60%) will be
based on the performance in a practical exam to be carried out in the lab,
and solved in Google Colab. The exam may include some questions about
theoretical concepts.
In addition to the previous practical exam, students will be able to hand in a
new final project (40%).
They can opt for just taking the exam, carrying out a new project, or both.
Any existing grade will be replaced by that achieved in the new
assessmnent items taken during the May call.
18/20
Final Project
Goals:
• To work on a real application scenario involving NLP
• To design experiments/assess results to validate or reject hypotheses
• To use visualization tools
• Prepare a report, introducing the used models/algorithms, presenting
the achieved results, analyzing these results and drawing conclusions
Rules:
• Students will work in teams and hand in their code and a short report
Assessment (tentative):
• Methodology: 2 points
• Report Quality + Visualization: 1,5 points
• Code: 0,5 points
In any case, specific conditions will be published together with the final
project statement.
19/20
Information Resources
• Notebooks provided for the lessons
• Other resources as published in the course description
• Plenty of high-quality material available on the web
20/20