Search Articles

Home / Articles

A Comparative Study of Text Summarization using Gensim, NLTK, Spacy, and Sumy Libraries

. Abhilasha Sharma, Raghav Aggarwal & Raghav Alawadhi


Abstract

The exponential increase of textual information on the internet has led to a considerable expansion of digital content. However, this abundance of information makes it challenging to extract valuable insights due to the sheer volume of content. Text summarization has become an essential tool to address this issue by providing a condensed version of the selected content. This research paper introduces an Auto Text Summarizer Application is introduced which is developed in Python. The application can accept a web page URL or textual input as its source, which is then processed to generate a summary using the Extractive Text Summarization technique. The application utilizes four distinct Python libraries including Natural Language Toolkit (NLTK), Spacy, Gensim, and Sumy, and Flask framework is employed to present the summarized content on the front-end. The back-end of the application involves the use of the Beautiful Soup library to scrape web page content or read the provided text data. The results obtained by each of these libraries are compared based on the reading time required for the summarized content, while also computing Rouge Score, F1 Score and Precision. The development of the Text Summarizer Application is a valuable addition to the Natural Language Processing domain, as it provides a means for summarizing large volume of textual data in an efficient and effective manner. Furthermore, the use of Python libraries and frameworks makes this application scalable and easy to use, while also providing accurate and reliable results.

Keywords: Machine Learning, Text Summarizer, NLTK, Spacy, Gensim, Sumy.

 

Download :