Applied Text Analysis Overview

This document provides an overview of text analytics, including: - It combines techniques from NLP, machine learning, and information retrieval to perform tasks like document categorization, text clustering, information extraction, and association discovery. - These tasks are used in applications across many domains to discover patterns and insights in large collections of text-based documents. - Key challenges include linguistic ambiguity, high dimensionality of text data, and lack of explicit structure in unstructured text data. Text analytics aims to address these challenges to enable automated analysis of natural language.

Uploaded by

Таня Брода

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views13 pages

Applied Text Analysis Overview

Uploaded by

Таня Брода

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Text analysis. Text mining. Text analytics.

Applied text analysis 1

Structured and non-structured data
• Structured data: Corresponds to data that have been organized in
repositories such as a database, so that its elements can be accessed
by effective analysis and processing methods (i.e., an SQL table).
• Non-structured data: Corresponds to data that don’t have a
predefined structure or model or that’s not organized in a predefined
way, making them hard to understand using traditional computational
methods (i.e., news and customer complaints).
Search and discovery

A search task is goal-oriented, which means that you must provide a clear criterion to receive the results that you need (i.e., a
condition that must be met by the data attributes).
A discovery task is by nature opportunistic, that is, you don’t know what you want to search for, so data hypotheses are
automatically explored to discover new opportunities in the form of data hidden patterns (or latent), which can be interesting
and novel.
Text mining and text analytics
• Text mining and text analytics are highly interchangeable terms.
• Text mining is the automated process of examining large collections of documents or corpora
to discover patterns or insights that may be interesting and useful (Ignatow & Mihalcea, 2017;
Struhl, 2015; Zhai & Massung, 2016). For this, text mining identifies facts, relationships, and
patterns that would otherwise be buried in textual data (Atkinson & Pérez, 2013). This
information can be converted to a structured form that can be later analyzed and integrated
with other types of systems (i.e., business intelligence, databases, and data warehouses).
• On the other hand, text analytics synthesizes the results of text mining so that they can be
quantified and visualized in a way that supports decision-making, producing actionable
insights, so text mining encompasses broader aspects than text analytics.
• The applications of text analytics in industrial and business areas are many, including
• document clustering,
• text categorization,
• information extraction to populate databases,
• text generation,
• association discovery, etc.
Linguistic problems
• Since the goal is to automatically analyze textual information sources
that are written in natural language by humans, computational
methods (Jurafsky et al., 2014) must be able to address three key
linguistic problems:
• Ambiguity: lexical (i.e., a single word with more than one meaning), syntactic
(i.e., a single sentence that has several possible grammatical structures),
semantic (i.e., a sentence with several possible interpretations), and
pragmatic (i.e., a sentence with several possible contexts to determine its
intention).
• Dimensionality
• Linguistic Knowledge
Text mining areas
• Natural-Language Processing (NLP) provides theories, models, and
methods so that a computer can understand natural language (written
or spoken) at different linguistic levels (i.e., phonetic, morphological,
lexicon, syntactic, semantic, discursive, and pragmatic). In practice,
NLP techniques focus on creating systems that process textual
information in order to make it accessible to other computer
applications.
• Retrieving information from specific unstructured data sources (i.e.,
texts, images, and videos), in which an analysis based on some
measure of relevance (i.e., importance) is key with respect to a certain
input query in order to make them available to other tasks and
applications. In information retrieval (IR) methods and models, NLP
plays a fundamental role in characterizing and “understanding” some
elements of the information (i.e., documents) that’s retrieved.
• Machine learning (ML) is applied, an AI area that provides
computational techniques allowing a computer to learn how to
perform a task based on experience (Wilmott, 2020; Mohri et al.,
2018). Thus, an ML system improves its performance with experience,
without the need to write explicit rules or models. Models
automatically created by ML are thus capable of generalizing
behaviors for unknown cases, improving the performance of certain
tasks.
Tasks and applications
• Text Clustering
• Information extraction
• Text Categorization
• Relationship Inference
The text analytics process
SUMMARY
• Text analytics is the science that is based on examining and discovering
interesting and ideally actionable patterns from large collections of
documents (corpora) written in natural language.
• To make this possible, text analytics combines techniques and models
from NLP, ML, and IR.
• This combination allows performing tasks such as document
categorization, text clustering, specific information extraction from
documents, association discovery, topic detection, etc.
• These tasks are the basis for cross-cutting applications in all domains,
both in the private and public spheres, from scientific applications to
industrial and business applications.
QUESTIONS
• 1. What are the main differences between a text analytics application and a NLP application?
• 2. What difficulties does linguistic ambiguity cause in a text analytics or text mining task?
• 3. List two differences between analyzing informal texts on social media and analyzing formal news
texts.
• 4. How does the dimensionality of documents affect the performance of a text analytics task?
• 5. Describe two problems that can arise when analyzing documents using only lexical analysis, that is,
at the word level.
• 6. Suppose two applications that involve the handling of textual information: One that allows hotel
reservations to be made through natural language and the other that allows the detection of names of
personalities in news texts. In which of them is it necessary to use NLP tools and in which is it required
to use a textual analysis method?
• 7. How can a ML method help a text analytics task?
• 8. You should automatically group all news reaching your email and then store it in specific folders.
What analysis methodology would you use, a clustering method or a categorization method?
• 9. What’s the fundamental difference between an IR task and an information extraction task?
• 10. What type of features could be selected as input to a text mining or text analytics task?
QUESTIONS
• 11. In the text mining process, the evaluation of patterns discovered by some analysis task is essential. In what ways could
these patterns be evaluated in order to generate insights?
• 12. Describe two types of patterns that can be discovered in text analytic tasks.
• 13. In a textual analysis task, such as the sentiment classification on social networks, you could simply use keywords such
as features, so that an automatic classifier can determine whether an opinion expresses positive or negative sentiments.
What’s the problem that we’ll encounter if such an application uses only such a type of features?
• 14. State which of the following applications use NLP models and which use text analytics approaches:
• Simple document search engine
• Keyword-based sentiment classifier
• Rules-based spam filter
• Virtual tutor that helps a child understand math
• Assess quality of texts written by job applicants
• Classify medical diagnoses
• 15. You know there is an important link between an organization X and a person Y in a set of news articles. What text
analytics approach would you take to determine the specific link that exists?
• 16. A service company receives many written complaints (no more than 2–3 paragraphs) through its web portal. What kind
of analysis method would you use to generate statistics about the client’s complaints for a given date?
• 17. You have a large database of invention patent descriptions and want to determine whether a new patent “application”
that comes to you is similar to one that you already have in your database. What text analytics and/or NLP approach would
you use to address this problem?
References
• Atkinson-Abutridy J. Text Analytics. An Introduction to the Science and
Applications of Unstructured Information Analysis. Chapman & Hall.
2022
• Text Mining and Analytics https://www.youtube.com/watch?
v=Uqs0GewlMkQ&list=PLLssT5z_DsK8Xwnh_0bjN4KNT81bekvtt

NLP & Text Analysis in AI
No ratings yet
NLP & Text Analysis in AI
6 pages
Intro To Data Science With DB
No ratings yet
Intro To Data Science With DB
33 pages
Maths For Data Science
No ratings yet
Maths For Data Science
1 page
Python Cheet Sheet
No ratings yet
Python Cheet Sheet
2 pages
Indovasive Catheter Product Overview
No ratings yet
Indovasive Catheter Product Overview
6 pages
Cloud Computing: Key Benefits for Businesses
No ratings yet
Cloud Computing: Key Benefits for Businesses
23 pages
Data Sources of Healthcare
No ratings yet
Data Sources of Healthcare
25 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Intro To Data Science
No ratings yet
Intro To Data Science
8 pages
Role of Cloud Computing For Big Data
No ratings yet
Role of Cloud Computing For Big Data
5 pages
Introduction to Natural Language Processing
100% (1)
Introduction to Natural Language Processing
12 pages
Data Science Essentials Guide
No ratings yet
Data Science Essentials Guide
26 pages
IBM Cloud Data Science Method
No ratings yet
IBM Cloud Data Science Method
34 pages
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
60 pages
Text Analysis in R
No ratings yet
Text Analysis in R
21 pages
Python Data Visualization Techniques
No ratings yet
Python Data Visualization Techniques
13 pages
JASP: Open-Source Stats for Psychologists
No ratings yet
JASP: Open-Source Stats for Psychologists
7 pages
Guide to Professional Data Engineering
No ratings yet
Guide to Professional Data Engineering
6 pages
Neural Networks: Python & R Guide
100% (1)
Neural Networks: Python & R Guide
15 pages
Understanding Statistical Deception
No ratings yet
Understanding Statistical Deception
13 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Machine Learning in Genomic Medicine
No ratings yet
Machine Learning in Genomic Medicine
22 pages
Database System Chapter 2 Database Design 2 - Design Method
No ratings yet
Database System Chapter 2 Database Design 2 - Design Method
41 pages
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
No ratings yet
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
116 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
13 pages
01 - ML Introduction - Course Outline
No ratings yet
01 - ML Introduction - Course Outline
21 pages
Machine Learning Workshop with KNIME
No ratings yet
Machine Learning Workshop with KNIME
12 pages
Brief - Data Governance
No ratings yet
Brief - Data Governance
20 pages
700 ML Projects
No ratings yet
700 ML Projects
14 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
Post Graduate Program
No ratings yet
Post Graduate Program
15 pages
Python Programming for Data Science I
No ratings yet
Python Programming for Data Science I
6 pages
5 - Data Analytics, Data Science and Machine Learning
No ratings yet
5 - Data Analytics, Data Science and Machine Learning
56 pages
Customizing Seaborn Figure Styles
No ratings yet
Customizing Seaborn Figure Styles
15 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Cuny Data Science Challenge
No ratings yet
Cuny Data Science Challenge
8 pages
Python For EveryBody
0% (1)
Python For EveryBody
8 pages
Cloud Computing for Businesses
0% (1)
Cloud Computing for Businesses
6 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Smart Home NLP for Non-Tech Users
No ratings yet
Smart Home NLP for Non-Tech Users
9 pages
Cloud Computing
100% (1)
Cloud Computing
167 pages
Bayes Classification for Fish Sorting
No ratings yet
Bayes Classification for Fish Sorting
86 pages
Rhino Python Scripting Guide
100% (1)
Rhino Python Scripting Guide
1 page
Cuestionario Resuelto Big Data
67% (6)
Cuestionario Resuelto Big Data
2 pages
Data Science Skills They Dont Teach You
No ratings yet
Data Science Skills They Dont Teach You
72 pages
Building A Career in Data Science - The Overview
No ratings yet
Building A Career in Data Science - The Overview
2 pages
SNOMED CT Release File Guide
No ratings yet
SNOMED CT Release File Guide
282 pages
Big Data Concepts
No ratings yet
Big Data Concepts
5 pages
Python NumPy Course: Multi-dimensional Data
No ratings yet
Python NumPy Course: Multi-dimensional Data
36 pages
SQL Training for Data Professionals
No ratings yet
SQL Training for Data Professionals
94 pages
Extend RapidMiner 5: A Comprehensive Guide
No ratings yet
Extend RapidMiner 5: A Comprehensive Guide
92 pages
CNN Cheatsheet for Deep Learning
No ratings yet
CNN Cheatsheet for Deep Learning
5 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Bayesian Inference & Naive Bayes Guide
No ratings yet
Bayesian Inference & Naive Bayes Guide
14 pages
Neural Networks in Pattern Classification
No ratings yet
Neural Networks in Pattern Classification
35 pages
Data Science vs. Machine Learning
No ratings yet
Data Science vs. Machine Learning
5 pages
Text Analytics and Mining Insights
No ratings yet
Text Analytics and Mining Insights
5 pages
Module 4
No ratings yet
Module 4
63 pages
A Holistic View of Bilingualism Grosjean
No ratings yet
A Holistic View of Bilingualism Grosjean
12 pages
Sequence 01 - Lesson 6 Practice by - Negrouche
No ratings yet
Sequence 01 - Lesson 6 Practice by - Negrouche
2 pages
Structuralism
No ratings yet
Structuralism
5 pages
Soal PAT Bahasa Inggris Ganjil 2024.25
No ratings yet
Soal PAT Bahasa Inggris Ganjil 2024.25
8 pages
Dzexams 1as Anglais 619765
No ratings yet
Dzexams 1as Anglais 619765
4 pages
A An The Quiz 1
No ratings yet
A An The Quiz 1
2 pages
A1 - Review Grammar April 2025
No ratings yet
A1 - Review Grammar April 2025
6 pages
b7 English Language
No ratings yet
b7 English Language
11 pages
Class 6th Final-Year Exams Syllabus 2023-24
No ratings yet
Class 6th Final-Year Exams Syllabus 2023-24
2 pages
Thk2e AmE L2 Vocabulary Unit 2
100% (1)
Thk2e AmE L2 Vocabulary Unit 2
1 page
Vocabulary and Grammar Exercises for English III
No ratings yet
Vocabulary and Grammar Exercises for English III
6 pages
English Verb 'To Do' Guide
No ratings yet
English Verb 'To Do' Guide
5 pages
English For Starters 12 Students Book Coll PDF Download
No ratings yet
English For Starters 12 Students Book Coll PDF Download
39 pages
SSS Individual Assignment - 30%
No ratings yet
SSS Individual Assignment - 30%
2 pages
Online Orientation On The Admin of Phil IRI 2025
No ratings yet
Online Orientation On The Admin of Phil IRI 2025
64 pages
RM Research
No ratings yet
RM Research
20 pages
ENGLISH HL Grade 5 TERM 1 2024 EXEMPLAR TASK
No ratings yet
ENGLISH HL Grade 5 TERM 1 2024 EXEMPLAR TASK
9 pages
DLL Feb 10 14
No ratings yet
DLL Feb 10 14
3 pages
@ Class 5 Blueprint UT 1 2025-2026
No ratings yet
@ Class 5 Blueprint UT 1 2025-2026
3 pages
CAE Open Cloze Practice Test - Flojoe
No ratings yet
CAE Open Cloze Practice Test - Flojoe
3 pages
5TH CLASS ENGLISH - Watermark
No ratings yet
5TH CLASS ENGLISH - Watermark
140 pages
English-Russian Dictionary
100% (1)
English-Russian Dictionary
10 pages
Strategies of Turn Taking
No ratings yet
Strategies of Turn Taking
11 pages
Topic 3. My Friends
No ratings yet
Topic 3. My Friends
5 pages
ESL Code Mixing Questionnaire
No ratings yet
ESL Code Mixing Questionnaire
2 pages
(Teacherscom - Library) EFL Grade 9 SB
No ratings yet
(Teacherscom - Library) EFL Grade 9 SB
156 pages
EGR Benchmarking Workshop MALAWI Final
No ratings yet
EGR Benchmarking Workshop MALAWI Final
8 pages
Capitalization and Abbreviation
No ratings yet
Capitalization and Abbreviation
36 pages
ESL Pronunciation: Final -ed Sounds
No ratings yet
ESL Pronunciation: Final -ed Sounds
3 pages
Unit 9.
No ratings yet
Unit 9.
3 pages

Applied Text Analysis Overview

Uploaded by

Applied Text Analysis Overview

Uploaded by

Text analysis. Text mining. Text analytics.

Applied text analysis 1

You might also like