Find it on Github: https://github.com/davidadamojr/TextRank
TextRank is an algorithm for automatic keyword and sentence extraction (summarization) introduced by Rada Mihalcea and Paul Tarau in this paper. This post presents an implementation of “TextRank” in Python. Unlike the approach taken in the paper, this implementation uses Levenshtein Distance as the relation between text units.
This implementation does the following:
- Generates a 100 word summary from a given text input.
- Extracts a number of keywords relative to the size of the text (a third of the number of nodes in the graph).
- Concatenates adjacent keywords in the input text to form keyphrases.