in Programming

Python Implementation of TextRank (Github Repo)

Find it on Github: https://github.com/davidadamojr/TextRank

A few months ago, I wrote an implementation of “TextRank” in Python. TextRank is an algorithm for automatic keyword and sentence extraction (summarization) proposed by Rada Mihalcea and Paul Tarau in this paper. However, unlike the approach taken in the paper, this implementation uses Levenshtein Distance as the relation between text units.

This implementation carries out automatic keyword and sentence extraction on 10 articles gotten from http://theonion.com.

It achieves the following:

  • Generates a 100 word summary for each article
  • Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
  • Adjacent keywords in the text are concatenated into keyphrases

Obviously, this algorithm is useful for automatically extracting relevant keywords and automatic summarization of a given body of text.

The implementation has the following dependencies:

Find it on Github: https://github.com/davidadamojr/TextRank

Write a Comment

Comment