Computer Assisted Text Analysis
Computer Assisted Text Analysis
Analysis:
An Overview and Guide
Laura K. Nelson
Kellogg School of Management
Content Analysis PWD
Academy of Management Annual
Conference
August 7, 2015
Vancouver, BC
Goals
Describe the wide range of
Overview:
Types of Automated Text Analysis
Unsupervised exploration (hypothesis
forming/inductive)
Topic modeling
Lexical selection
inductive/hypothesis testing)
Part-of-Speech Tagging
Named Entity Recognition
Concordances
Sentiment analysis
Question 1:
Do you want to inductively explore the
text?
Unsupervised Exploration:
The Goal
Informative
Groups of
Words
Set-Up: Document-Term
Matrix*
ambit povert people full
i
Docume 4
2
0
0
nt1
Docume 1
3
7
0
nt2
Docume 2
0
0
1
nt3
*Cells
can 9be word1 frequencies
or 2
Docume
4
weighted
word scores
nt4
important?
Yes? Structural Topic Modeling (STM)
Are the topics correlated?
Yes? Correlated Topic Modeling (CTM)
Order is relatively arbitrary, topics
may not be related?
Latent Dirichlet Allocation (LDA)
If style
Difference of Proportions
Chicago
chicago
children
center
union
school
abort
nixon
day
vietnam
people
city
hospital
cwlu
Abstract
DoP
5.31
4.59
4.34
3.61
3.48
3.19
2.93
2.86
2.57
2.50
2.44
2.38
2.37
Concret
e
Question 1:
Do you want to test a
hypothesis?
If yes:
Question 2: Themes or
Styles?
If themes
Which Algorithm?
You want individual
documents coded:
Document Classification
(e.g. SVM, Naive Bayes)
You want proportion of
documents in each category:
ReadMe (R package)
Dictionary Methods
Standardized
Dictionaries
LIWC (can be used for
sentiment analysis)
Custom Dictionary
If styles
of words, relationships
between words,
grammatical structures, etc.
NLP: Concordances
ong the former , one was of a most
ON OF THE PSALMS . " Touching that
ll over with a heathenish array of
d as you gazed , and wondered what
that has survived the flood ; most
they might scout at Moby Dick as a
th of Radney .'" CHAPTER 55 Of the
ing Scenes . In connexion with the
ere to enter upon those still more
ght have been rummaged out of this
monstrous
monstrous
monstrous
monstrous
monstrous
monstrous
monstrous
monstrous
monstrous
monstrous
year)
Used R package stm (Roberts, Stewart, and Tingly)
Further grouped the 40 topics into 7
topic categories
Python NLTK, extracted verbs/verb
phrases
Hand identified tactics, created
dictionaries of tactical categories
Tactics by Year
Tactics by Year
Conclusion:
Research design is key! Good data
is critical!
Match your method to your
question and data. Be purposeful,
not trendy
Use multiple methods, including
qualitative, to verify the analysis
Learn a programming language
Off-the-shelf tools box you in (see
point 2).
Happy text
analyzing!
Laura K. Nelson
laura.nelson@kellogg.northw
estern.edu
Tactical Categories*
Direct Environmental Protection: build, improve, protect, recycle
Non-Disruptive Protest: chant, demonstrate, organize, petition, protest
Disruptive Protest: blockade, chain, prevent, damage, sabotage
Political: campaign, donate, elect, endorse, regulate
Juridical: audit, enforce, inspect, represent, testify
Verbal Statements: advocate, comment, criticize, explain, refute
Business: boycott, buy, invest, purchase, sponsor
Education/Raising Awareness: editorial, outreach, publish, report, tweet
Organization/Movement Building: fund-raise, initiate, launch, participate
Negotiations: deal, discuss, engage, listen, persuade
*Categories are not mutually exclusive