Textrank: Bringing Order Into Texts: Rada Mihalcea and Paul Tarau
Textrank: Bringing Order Into Texts: Rada Mihalcea and Paul Tarau
University of North Texas
rada,tarau @cs.unt.edu
lective knowledge of Web architects rather than in- ing these votes.
dividual content analysis of Web pages. In short, a Formally, let be a directed graph with
ing into account global information recursively com- the set of vertices that point to it (predecessors), and
puted from the entire graph, rather than relying only let be the set of vertices that vertex points
on local vertex-specific information. to (successors). The score of a vertex is defined as
Applying a similar line of thinking to lexical follows (Brin and Page, 1998):
or semantic graphs extracted from natural language
documents, results in a graph-based ranking model !#"%$#&'(*),+-.&/0-21 465873 9.:<;=?> A BDCFE @ :G;IHJ> A K#" 4 &
that can be applied to a variety of natural language
L
processing applications, where knowledge drawn
from an entire text is used in making local rank- where is a damping factor that can be set between
ing/selection decisions. Such text-oriented ranking 0 and 1, which has the role of integrating into the
methods can be applied to tasks ranging from auto- model the probability of jumping from a given vertex
mated extraction of keyphrases, to extractive summa- to another random vertex in the graph. In the context
rization and word sense disambiguation (Mihalcea et of Web surfing, this graph-based ranking algorithm
L
al., 2004). implements the “random surfer model”, where a user
MON
In this paper, we introduce the TextRank graph- clicks on links at random with a probability , and
L L
based ranking model for graphs extracted from nat- jumps to a completely new page with probability
ural language texts. We investigate and evaluate the . The factor is usually set to 0.85 (Brin and Page,
application of TextRank to two language processing 1998), and this is the value we are also using in our
tasks consisting of unsupervised keyword and sen- implementation.
Starting from arbitrary values assigned to each Convergence curves (250 vertices, 250 edges)
node in the graph, the computation iterates until con- 0.18
undirected/unweighted
vergence below a given threshold is achieved 1 . After 0.16 undirected/weighted
directed/unweighted
running the algorithm, a score is associated with each 0.14 directed/weighted
vertex, which represents the “importance” of the ver-
0.12
tex within the graph. Notice that the final values
Error rate
obtained after TextRank runs to completion are not 0.1
Table 1: Results for automatic keyword extraction using TextRank or supervised learning (Hulth, 2003)
ments for evaluation purposes. sibly due the limitation imposed by our approach on
The results are evaluated using precision, recall, the number of keywords selected, which is not made
and F-measure. Notice that the maximum recall that in the supervised system3 . A larger window does not
can be achieved on this collection is less than 100%, seem to help – on the contrary, the larger the win-
since indexers were not limited to keyword extrac- dow, the lower the precision, probably explained by
tion – as our system is – but they were also allowed the fact that a relation between words that are further
to perform keyword generation, which eventually re- apart is not strong enough to define a connection in
sults in keywords that do not explicitly appear in the the text graph.
text. Experiments were performed with various syntac-
For comparison purposes, we are using the results tic filters, including: all open class words, nouns and
of the state-of-the-art keyword extraction system re- adjectives, and nouns only, and the best performance
ported in (Hulth, 2003). Shortly, her system consists was achieved with the filter that selects nouns and ad-
of a supervised learning scheme that attempts to learn jectives only. We have also experimented with a set-
how to best extract keywords from a document, by ting where no part of speech information was added
looking at a set of four features that are determined to the text, and all words - except a predefined list
for each “candidate” keyword: (1) within-document of stopwords - were added to the graph. The re-
frequency, (2) collection frequency, (3) relative po- sults with this setting were significantly lower than
sition of the first occurrence, (4) sequence of part of the systems that consider part of speech information,
speech tags. These features are extracted from both which corroborates with previous observations that
training and test data for all “candidate” keywords, linguistic information helps the process of keyword
where a candidate keyword can be: Ngrams (uni- extraction (Hulth, 2003).
grams, bigrams, or trigrams extracted from the ab- Experiments were also performed with directed
stracts), NP-chunks (noun phrases), patterns (a set of graphs, where a direction was set following the natu-
part of speech patterns detected from the keywords ral flow of the text, e.g. one candidate keyword “rec-
attached to the training abstracts). The learning sys- ommends” (and therefore has a directed arc to) the
tem is a rule induction system with bagging. candidate keyword that follows in the text, keeping
Our system consists of the TextRank approach de- the restraint imposed by the co-occurrence relation.
scribed in Section 3.1, with a co-occurrence window- We have also tried the reversed direction, where a
size set to two, three, five, or ten words. Table 1 lists lexical unit points to a previous token in the text.
the results obtained with TextRank, and the best re- Table 1 includes the results obtained with directed
sults reported in (Hulth, 2003). For each method, graphs for a co-occurrence window of 2. Regard-
the table lists the total number of keywords assigned, less of the direction chosen for the arcs, results ob-
the mean number of keywords per abstract, the total tained with directed graphs are worse than results ob-
number of correct keywords, as evaluated against the tained with undirected graphs, which suggests that
set of keywords assigned by professional indexers, despite a natural flow in running text, there is no nat-
and the mean number of correct keywords. The table ural “direction” that can be established between co-
also lists precision, recall, and F-measure. 3
The fact that the supervised system does not have the ca-
pability to set a cutoff threshold on the number of keywords,
Discussion. TextRank achieves the highest preci- but it only makes a binary decision on each candidate word, has
sion and F-measure across all systems, although the the downside of not allowing for a precision-recall curve, which
recall is not as high as in supervised methods – pos- prohibits a comparison of such curves for the two methods.
occurring words. 3: BC−HurricaineGilbert, 09−11 339
4: BC−Hurricaine Gilbert, 0348
5: Hurricaine Gilbert heads toward Dominican Coast
Overall, our TextRank system leads to an F- 6: By Ruddy Gonzalez
7: Associated Press Writer
measure higher than any of the previously proposed 8: Santo Domingo, Dominican Republic (AP)
systems. Notice that TextRank is completely unsu- 9: Hurricaine Gilbert Swept towrd the Dominican Republic Sunday, and the Civil Defense
alerted its heavily populated south coast to prepare for high winds, heavy rains, and high seas.
pervised, and unlike other supervised systems, it re- 10: The storm was approaching from the southeast with sustained winds of 75 mph gusting
to 92 mph.
lies exclusively on information drawn from the text 11: "There is no need for alarm," Civil Defense Director Eugenio Cabral said in a television
alert shortly after midnight Saturday.
itself, which makes it easily portable to other text col- 12: Cabral said residents of the province of Barahona should closely follow Gilbert’s movement.
13: An estimated 100,000 people live in the province, including 70,000 in the city of Barahona,
lections, domains, and languages. about 125 miles west of Santo Domingo.
14. Tropical storm Gilbert formed in the eastern Carribean and strenghtened into a hurricaine
Saturday night.
15: The National Hurricaine Center in Miami reported its position at 2 a.m. Sunday at latitude
4 Sentence Extraction 16.1 north, longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles
southeast of Santo Domingo.
16: The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving westard
The other TextRank application that we investigate at 15 mph with a "broad area of cloudiness and heavy weather" rotating around the center
of the storm.
consists of sentence extraction for automatic sum- 17. The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until
at least 6 p.m. Sunday.
marization. In a way, the problem of sentence extrac- 18: Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds,
and up to 12 feet to Puerto Rico’s south coast.
tion can be regarded as similar to keyword extraction, 19: There were no reports on casualties.
20: San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided during
since both applications aim at identifying sequences the night.
21: On Saturday, Hurricane Florence was downgraded to a tropical storm, and its remnants
that are more “representative” for the given text. In pushed inland from the U.S. Gulf Coast.
22: Residents returned home, happy to find little damage from 90 mph winds and sheets of rain.
keyword extraction, the candidate text units consist 23: Florence, the sixth named storm of the 1988 Atlantic storm season, was the second hurricane.
24: The first, Debby, reached minimal hurricane strength briefly before hitting the Mexican coast
of words or phrases, whereas in sentence extraction, last month.
we deal with entire sentences. TextRank turns out to [0.50] 24 4 [0.71] 5 [1.20]
be well suited for this type of applications, since it [0.80] 23 0.15
allows for a ranking over text units that is recursively [0.70] 22
6 [0.15]
computed based on information drawn from the en- [1.02] 21 0.19 7 [0.15]
tire text. 0.15
0.55
20 8 [0.70]
4.1 TextRank for Sentence Extraction [0.84] 0.35
0.30
To apply TextRank, we first need to build a graph as- 19 9 [1.83]
[0.15] 0.59
sociated with the text, where the graph vertices are 0.15
representative for the units to be ranked. For the task [1.58] 18 0.15
0.27 0.29 10 [0.99]
of sentence extraction, the goal is to rank entire sen-
0.16
tences, and therefore a vertex is added to the graph [0.70] 17
0.14
0.15
11 [0.56]
for each sentence in the text.
16
The co-occurrence relation used for keyword ex- [1.65] 12 [0.93]
traction cannot be applied here, since the text units in 15 14 [1.09] 13 [0.76]
[1.36]
consideration are significantly larger than one or few TextRank extractive summary
words, and “co-occurrence” is not a meaningful rela- Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil De−
fense alerted its heavily populated south coast to prepare for high winds, heavy rains
tion for such large contexts. Instead, we are defining and high seas. The National Hurricane Center in Miami reported its position at 2 a.m.
Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce,
a different relation, which determines a connection Puerto Rico, and 200 miles southeast of Santo Domingo. The National Weather Service
in San Juan, Puerto Rico, said Gilbert was moving westward at 15 mph with a "broad
between two sentences if there is a “similarity” re- area of cloudiness and heavy weather" rotating around the center of the storm. Strong
winds associated with Gilbert brought coastal flooding, strong southeast winds and up
lation between them, where “similarity” is measured to 12 feet to Puerto Rico’s south coast.
Manual abstract I
as a function of their content overlap. Such a rela- Hurricane Gilbert is moving toward the Dominican Republic, where the residents of
tion between two sentences can be seen as a process the south coast, especially the Barahona Province, have been alerted to prepare for
heavy rains, and high wind and seas. Tropical storm Gilbert formed in the eastern
of “recommendation”: a sentence that addresses cer- Carribean and became a hurricane on Saturday night. By 2 a.m. Sunday it was about
200 miles southeast of Santo Domingo and moving westward at 15 mph with winds
tain concepts in a text, gives the reader a “recom- of 75 mph. Flooding is expected in Puerto Rico and in the Virgin Islands. The second
hurricane of the season, Florence, is now over the southern United States and down−
mendation” to refer to other sentences in the text that graded to a tropical storm.
Manual abstract II
address the same concepts, and therefore a link can Tropical storm Gilbert in the eastern Carribean strenghtened into a hurricane Saturday
night. The National Hurricane Center in Miami reported its position at 2 a.m. Sunday to
be drawn between any two such sentences that share be about 140 miles south of Puerto Rico and 200 miles southeast of Santo Domingo. It
common content. is moving westward at 15 mph with a broad area of cloudiness and heavy weather with
sustained winds of 75 mph gusting to 92 mph. The Dominican Republic’s Civil Defense
The overlap of two sentences can be determined alerted that country’s heavily populated south coast and the National Weather Service
in San Juan, Puerto Rico issued a flood watch for Puerto Rico and the Virgin Islands until
simply as the number of common tokens between at least 6 p.m. Sunday.
Formally, given two sentences and , with a and we compare the performance of TextRank with
$
sentence being represented by the set of words the top five performing systems, as well as with the
,
that appear in the sentence: baseline proposed by the DUC evaluators – consist-
the similarity of and is defined as:
$
ing of a 100-word summary constructed by taking
!#" $% $ & ' !#"$% 4 4&
K 2
the first sentences in each article. Table 2 shows the
results obtained on this data set of 567 news articles,
including the results for TextRank (shown in bold),
Other sentence similarity measures, such as string the baseline, and the results of the top five perform-
kernels, cosine similarity, longest common subse- ing systems in the DUC 2002 single document sum-
quence, etc. are also possible, and we are currently marization task (DUC, 2002).
evaluating their impact on the summarization perfor-
ROUGE score – Ngram(1,1)
mance. stemmed
The resulting graph is highly connected, with a System basic stemmed no-stopwords
weight associated with each edge, indicating the (a) (b) (c)
strength of the connections established between var- S27 0.4814 0.5011 0.4405
ious sentence pairs in the text. The text is therefore S31 0.4715 0.4914 0.4160
represented as a weighted graph, and consequently TextRank 0.4708 0.4904 0.4229
S28 0.4703 0.4890 0.4346
we are using the weighted graph-based ranking for- S21 0.4683 0.4869 0.4222
mula introduced in Section 2.2. Baseline 0.4599 0.4779 0.4162
After the ranking algorithm is run on the graph, S29 0.4502 0.4681 0.4019
sentences are sorted in reversed order of their score,
and the top ranked sentences are selected for inclu- Table 2: Results for single document summarization:
sion in the summary. TextRank, top 5 (out of 15) DUC 2002 systems, and
Figure 3 shows a text sample, and the associated baseline. Evaluation takes into account (a) all words;
weighted graph constructed for this text. The fig- (b) stemmed words; (c) stemmed words, and no stop-
ure also shows sample weights attached to the edges words.
connected to vertex 94 , and the final TextRank score
computed for each sentence. The sentences with the Discussion. TextRank succeeds in identifying the
highest rank are selected for inclusion in the abstract. most important sentences in a text based on infor-
For this sample article, the sentences with id-s 9, 15, mation exclusively drawn from the text itself. Un-
16, 18 are extracted, resulting in a summary of about like other supervised systems, which attempt to learn
100 words, which according to automatic evaluation what makes a good summary by training on collec-
measures, is ranked the second among summaries tions of summaries built for other articles, TextRank
produced by 15 other systems (see Section 4.2 for is fully unsupervised, and relies only on the given
evaluation methodology). text to derive an extractive summary, which repre-
sents a summarization model closer to what humans
4.2 Evaluation are doing when producing an abstract for a given
We evaluate the TextRank sentence extraction algo- document.
rithm on a single-document summarization task, us- Notice that TextRank goes beyond the sentence
ing 567 news articles provided during the Document “connectivity” in a text. For instance, sentence 15 in
Understanding Evaluations 2002 (DUC, 2002). For the example provided in Figure 3 would not be iden-
each article, TextRank generates an 100-words sum- tified as “important” based on the number of connec-
mary — the task undertaken by other systems partic- tions it has with other vertices in the graph, but it is
ipating in this single document summarization task. identified as “important” by TextRank (and by hu-
For evaluation, we are using the ROUGE evalu- mans – see the reference summaries displayed in the
ation toolkit, which is a method based on Ngram same figure).
statistics, found to be highly correlated with hu- Another important aspect of TextRank is that it
man evaluations (Lin and Hovy, 2003). Two manu- gives a ranking over all sentences in a text – which
ally produced reference summaries are provided, and means that it can be easily adapted to extracting
used in the evaluation process5 . very short summaries (headlines consisting of one
4
Weights are listed to the right or above the edge they cor- The evaluation is done using the Ngram(1,1) setting of ROUGE,
respond to. Similar weights are computed for each edge in the which was found to have the highest correlation with human
graph, but are not displayed due to space restrictions. judgments, at a confidence level of 95%. Only the first 100
5
ROUGE is available at http://www.isi.edu/˜cyl/ROUGE/. words in each summary are considered.
sentence), or longer more explicative summaries, most influential ones, i.e. the ones that are in turn
consisting of more than 100 words. We are also highly recommended by other related units. The un-
investigating combinations of keyphrase and sen- derlying hypothesis is that in a cohesive text frag-
tence extraction techniques as a method for building ment, related text units tend to form a “Web” of con-
short/long summaries. nections that approximates the model humans build
Finally, another advantage of TextRank over previ- about a given context in the process of discourse un-
ously proposed methods for building extractive sum- derstanding.
maries is the fact that it does not require training cor-
pora, which makes it easily adaptable to other lan- 6 Conclusions
guages or domains. In this paper, we introduced TextRank – a graph-
based ranking model for text processing, and show
5 Why TextRank Works how it can be successfully used for natural language
Intuitively, TextRank works well because it does not applications. In particular, we proposed and eval-
only rely on the local context of a text unit (vertex), uated two innovative unsupervised approaches for
but rather it takes into account information recur- keyword and sentence extraction, and showed that
sively drawn from the entire text (graph). the accuracy achieved by TextRank in these applica-
Through the graphs it builds on texts, TextRank tions is competitive with that of previously proposed
identifies connections between various entities in a state-of-the-art algorithms. An important aspect of
text, and implements the concept of recommenda- TextRank is that it does not require deep linguistic
tion. A text unit recommends other related text knowledge, nor domain or language specific anno-
units, and the strength of the recommendation is re- tated corpora, which makes it highly portable to other
cursively computed based on the importance of the domains, genres, or languages.
units making the recommendation. For instance, in References
the keyphrase extraction application, co-occurring S. Brin and L. Page. 1998. The anatomy of a large-scale hyper-
words recommend each other as important, and it is textual Web search engine. Computer Networks and ISDN
the common context that enables the identification of Systems, 30(1–7).
connections between words in text. In the process of DUC. 2002. Document understanding conference 2002.
identifying important sentences in a text, a sentence http://www-nlpir.nist.gov/projects/duc/.
E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G.
recommends another sentence that addresses similar Nevill-Manning. 1999. Domain-specific keyphrase extrac-
concepts as being useful for the overall understand- tion. In Proceedings of the 16th International Joint Confer-
ing of the text. The sentences that are highly recom- ence on Artificial Intelligence.
mended by other sentences in the text are likely to M. Halliday and R. Hasan. 1976. Cohesion in English. Long-
be more informative for the given text, and will be man.
P.J. Herings, G. van der Laan, and D. Talman. 2001. Measuring
therefore given a higher score. the power of nodes in digraphs. Technical report, Tinbergen
An analogy can be also drawn with PageRank’s Institute.
“random surfer model”, where a user surfs the Web J. Hobbs. 1974. A model for natural language semantics. Part I:
by following links from any given Web page. In the The model. Technical report, Yale University.
context of text modeling, TextRank implements what A. Hulth. 2003. Improved automatic keyword extraction given
more linguistic knowledge. In Proceedings of the 2003 Con-
we refer to as “text surfing”, which relates to the con- ference on Empirical Methods in Natural Language Process-
cept of text cohesion (Halliday and Hasan, 1976): ing, Japan, August.
from a certain concept in a text, we are likely to J.M. Kleinberg. 1999. Authoritative sources in a hyperlinked
“follow” links to connected concepts – that is, con- environment. Journal of the ACM, 46(5):604–632.
C.Y. Lin and E.H. Hovy. 2003. Automatic evaluation of sum-
cepts that have a relation with the current concept
maries using n-gram co-occurrence statistics. In Proceedings
(be that a lexical or semantic relation). This also re- of Human Language Technology Conference (HLT-NAACL
lates to the “knitting” phenomenon (Hobbs, 1974): 2003), Edmonton, Canada, May.
facts associated with words are shared in different R. Mihalcea, P. Tarau, and E. Figa. 2004. PageRank on semantic
parts of the discourse, and such relationships serve networks, with application to word sense disambiguation. In
Proceedings of the 20st International Conference on Compu-
to “knit the discourse together”. tational Linguistics (COLING 2004), Geneva, Switzerland.
Through its iterative mechanism, TextRank goes R. Mihalcea. 2004. Graph-based ranking algorithms for sen-
beyond simple graph connectivity, and it is able to tence extraction, applied to text summarization. In Pro-
score text units based also on the “importance” of ceedings of the 42nd Annual Meeting of the Association for
other text units they link to. The text units selected by Computational Lingusitics (ACL 2004) (companion volume),
Barcelona, Spain.
TextRank for a given application are the ones most P. Turney. 1999. Learning to extract keyphrases from text.
recommended by related text units in the text, with Technical report, National Research Council, Institute for In-
preference given to the recommendations made by formation Technology.