0% found this document useful (0 votes)
371 views2 pages

Text and Speech Analysis Syllabus

This document outlines the objectives and units of study for a course on text and speech analysis. The course covers natural language processing basics, text classification algorithms, question answering systems, speech recognition and speech synthesis. It includes suggested activities, evaluation methods and expected learning outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
371 views2 pages

Text and Speech Analysis Syllabus

This document outlines the objectives and units of study for a course on text and speech analysis. The course covers natural language processing basics, text classification algorithms, question answering systems, speech recognition and speech synthesis. It includes suggested activities, evaluation methods and expected learning outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CSE/IT/CCE/AIDS/CSBS- DATA SCIENCE

CCS369 TEXT AND SPEECH ANALYSIS L T PC


2 0 2 3
COURSE OBJECTIVES:
• Understand natural language processing basics
• Apply classification algorithms to text documents
• Build question-answering and dialogue systems
• Develop a speech recognition system
• Develop a speech synthesizer
UNIT I NATURAL LANGUAGE BASICS 6
Foundations of natural language processing – Language Syntax and Structure- Text Preprocessing and
Wrangling – Text tokenization – Stemming – Lemmatization – Removing stop-words – Feature Engineering
for Text representation – Bag of Words model- Bag of N-Grams model – TF-IDF model
Suggested Activities
● Flipped classroom on NLP
● Implementation of Text Preprocessing using NLTK
● Implementation of TF-IDF models
Suggested Evaluation Methods
• Quiz on NLP Basics
• Demonstration of Programs
UNIT II TEXT CLASSIFICATION 6
Vector Semantics and Embeddings -Word Embeddings - Word2Vec model – Glove model – FastText model
– Overview of Deep Learning models – RNN – Transformers – Overview of Text summarization and Topic
Models
Suggested Activities
• Flipped classroom on Feature extraction of documents
• Implementation of SVM models for text classification
• External learning: Text summarization and Topic models
Suggested Evaluation Methods
• Assignment on above topics
• Quiz on RNN, Transformers
• Implementing NLP with RNN and Transformers
UNIT III QUESTION ANSWERING AND DIALOGUE SYSTEMS 9
Information retrieval – IR-based question answering – knowledge-based question answering – language
models for QA – classic QA models – chatbots – Design of dialogue systems -– evaluating dialogue
systems
Suggested Activities:
• Flipped classroom on language models for QA
• Developing a knowledge-based question-answering system
• Classic QA model development
Suggested Evaluation Methods
• Assignment on the above topics
• Quiz on knowledge-based question answering system
• Development of simple chatbots
UNIT IV TEXT-TO-SPEECH SYNTHESIS 6
Overview. Text normalization. Letter-to-sound. Prosody, Evaluation. Signal processing - Concatenative
and parametric approaches, WaveNet and other deep learning-based TTS systems
Suggested Activities:
• Flipped classroom on Speech signal processing
• Exploring Text normalization
• Data collection
• Implementation of TTS systems
Suggested Evaluation Methods
• Assignment on the above topics
• Quiz on wavenet, deep learning-based TTS systems
• Finding accuracy with different TTS systems
UNIT V AUTOMATIC SPEECH RECOGNITION 6
Speech recognition: Acoustic modelling – Feature Extraction - HMM, HMM-DNN systems
Suggested Activities:
• Flipped classroom on Speech recognition.
• Exploring Feature extraction
Suggested Evaluation Methods
• Assignment on the above topics
• Quiz on acoustic modelling
30 PERIODS
PRACTICAL EXERCISES 30 PERIODS
1. Create Regular expressions in Python for detecting word patterns and tokenizing text
2. Getting started with Python and NLTK - Searching Text, Counting Vocabulary, Frequency
Distribution, Collocations, Bigrams
3. Accessing Text Corpora using NLTK in Python
4. Write a function that finds the 50 most frequently occurring words of a text that are not stop words.
5. Implement the Word2Vec model
6. Use a transformer for implementing classification
7. Design a chatbot with a simple dialog system
8. Convert text to speech and find accuracy
9. Design a speech recognition system and find the error rate
TOTAL: 60 PERIODS
COURSE OUTCOMES:
On completion of the course, the students will be able to
CO1:Explain existing and emerging deep learning architectures for text and speech processing
CO2:Apply deep learning techniques for NLP tasks, language modelling and machine translation
CO3:Explain coreference and coherence for text processing
CO4:Build question-answering systems, chatbots and dialogue systems
CO5:Apply deep learning models for building speech recognition and text-to-speech systems

TEXTBOOK
1. Daniel Jurafsky and James H. Martin, “Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition”, Third Edition, 2022.
REFERENCES:
1. Dipanjan Sarkar, “Text Analytics with Python: A Practical Real-World approach to Gaining
Actionable insights from your data”, APress,2018.
2. Tanveer Siddiqui, Tiwary U S, “Natural Language Processing and Information Retrieval”, Oxford
University Press, 2008.
3. Lawrence Rabiner, Biing-Hwang Juang, B. Yegnanarayana, “Fundamentals of Speech Recognition”
1st Edition, Pearson, 2009.
4. Steven Bird, Ewan Klein, and Edward Loper, “Natural language processing with Python”, O’REILLY.
CO’s-PO’s & PSO’s MAPPING
CO’s PO’s PSO’s
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
1 3 2 3 1 3 - - - 1 2 1 2 1 1 1
2 3 1 2 1 3 - - - 2 2 1 3 3 2 1
3 2 2 1 3 1 - - - 3 3 1 2 3 3 1
4 2 1 1 1 2 - - - 2 1 2 2 3 1 1
5 1 3 2 2 1 - - - 3 2 1 1 2 3 1
AVg. 2.2 1.8 1.8 1.6 2 - - - 2.2 2 1.2 2 2.4 2 1
1 - low, 2 - medium, 3 - high, ‘-' - no correlation

Common questions

Powered by AI

Text normalization involves converting text data to a standard form, dealing with challenges like homograph disambiguation and acronym expansion. Deep learning models tackle these by learning patterns directly from large datasets, automating the disambiguation process without hand-crafted rules. This model-based generalization adapts more effectively to diverse linguistic variations, leading to more accurate normalization across different contexts and languages .

Prosody, encompassing pitch, rhythm, and stress patterns, plays a crucial role in making synthesized speech sound natural and intelligible. Effective management of prosody allows a TTS system to convey emotions and emphases similar to human speech, which enhances comprehension and listener engagement. Improved prosody enables listeners to distinguish between statements and queries, understand tonal nuances, and reduces cognitive load during auditory processing .

The Bag of Words model represents text data by counting the occurrence of words without considering their semantics or position, thus leading to a sparse representation. It treats each word as an independent feature. In contrast, the TF-IDF (Term Frequency-Inverse Document Frequency) model refines this by weighing terms based on their frequency across documents, giving more importance to rare but significant words over common ones .

The Word2Vec model represents words in a distributed vector space using continuous bag-of-words or skip-gram methods, which do not consider phrases or subword information. The Glove model, however, constructs vectors based on the global word co-occurrence matrix, capturing more semantic meaning. FastText extends Word2Vec by considering subwords and thus is better at handling morphologically rich languages and rare words by breaking down words into n-grams .

WaveNet, a deep learning model, generates audio waveforms directly from text, capturing subtle nuances and producing highly natural-sounding speech through its training on large datasets. Unlike concatenative TTS, which stitches pre-recorded sounds together and can sound robotic, or parametric models, which involve complex signal processing with limited expressiveness, WaveNet's neural architecture allows for dynamic variation and higher fidelity, leading to more realistic and expressive speech synthesis .

Transformers, with their ability to handle large contexts and capture long-range dependencies via attention mechanisms, redefine chatbot capabilities beyond structured interactions. They provide a more natural conversational flow and adaptability, learning diverse language patterns from data. In contrast, rule-based systems require extensive manual modification for every potential scenario, leading to rigidity. Using Transformers allows chatbots to understand and generate rich, context-aware responses, improving customer satisfaction and operational efficiency .

RNNs (Recurrent Neural Networks) are effective for text summarization due to their ability to process sequences of text over time, maintaining context across long texts. However, they struggle with long dependencies. Transformers overcome this by using attention mechanisms, which can look at the entire text globally to more effectively capture long-range dependencies, making them adept at summarization and extracting topics without the limitation of sequence processing .

Regular expressions are a powerful tool for text tokenization due to their flexibility and fine control over pattern matching, enabling highly customized tokenization. However, defaulting to regular expressions can lead to complexity in maintenance and performance inefficiencies. Libraries such as NLTK provide optimized tokenization utilities that handle edge cases like punctuation and special characters, offering faster and more reliable preprocessing, making them preferable in general-purpose applications .

IR-based question answering systems rely on information retrieval techniques, extracting answers from large datasets or documents based on keyword matching and ranking, and are limited by the availability of indexed data. Knowledge-based systems, on the other hand, utilize structured databases or ontologies to find exact answers, enabling reasoning and inference to offer more precise and contextually relevant responses even when data is sparse or requires deeper understanding .

In ASR, feature extraction transforms raw audio signals into a more compact representation, identifying key characteristics such as frequency and time-domain features. HMM-DNN systems leverage these features, where Hidden Markov Models (HMM) model temporal dynamics and Deep Neural Networks (DNN) capture acoustic similarities, combining sequential and feature learning. This synergy improves recognition accuracy by enhancing the model's ability to distinguish speech sounds under varying conditions .

You might also like