by

Python Implementation of TextRank (Github Repo)

Find it on Github: https://github.com/davidadamojr/TextRank

TextRank is an algorithm for automatic keyword and sentence extraction (summarization) proposed by Rada Mihalcea and Paul Tarau in this paper. This post presents an implementation of “TextRank” in Python. Unlike the approach taken in the paper, this implementation uses Levenshtein Distance as the relation between text units.

This implementation performs automatic keyword and sentence extraction on 10 articles retrieved from http://theonion.com.

It achieves the following:

  • Generates a 100 word summary for each article
  • Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
  • Adjacent keywords in the text are concatenated into keyphrases

Obviously, this algorithm is useful for automatically extracting relevant keywords and automatic summarization of a given body of text.

The implementation has the following dependencies:

Find it on Github: https://github.com/davidadamojr/TextRank

Write a Comment

Comment