Find it on Github: https://github.com/davidadamojr/TextRank
TextRank is an algorithm for automatic keyword and sentence extraction (summarization) proposed by Rada Mihalcea and Paul Tarau in this paper. This post presents an implementation of “TextRank” in Python. Unlike the approach taken in the paper, this implementation uses Levenshtein Distance as the relation between text units.
This implementation performs automatic keyword and sentence extraction on 10 articles retrieved from http://theonion.com.
It achieves the following:
- Generates a 100 word summary for each article
- Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
- Adjacent keywords in the text are concatenated into keyphrases
Obviously, this algorithm is useful for automatically extracting relevant keywords and automatic summarization of a given body of text.
The implementation has the following dependencies:
- Networkx – http://networkx.github.io/download.html
- NLTK 3.0 – http://nltk.org/install.html
- Numpy –http://sourceforge.net/projects/numpy/files/
Find it on Github: https://github.com/davidadamojr/TextRank