Improving Arabic Text Summarization Using Hybrid AI Methods

. Bashayer Mohammed alrubaie, Hesham A. Hefny & Mostafa Ezzat

Abstract

Arabic text summarization presents a significant challenge in the field of natural language processing (NLP), particularly due to the language's morphological richness and syntactic diversity. While advancements in deep learning have propelled summarization capabilities in high-resource languages, Arabic continues to face obstacles, including a scarcity of high-quality annotated data and a lack of standardized summarization benchmarks. This paper introduces a hybrid artificial intelligence architecture that combines extractive summarization utilizing AraBERT embeddings with abstractive generation through AraT5. By integrating the syntactic insights gained from extractive summarization with the generative capabilities of transformer models, our approach aims to produce summaries that are both content-rich and linguistically coherent. Evaluations of the proposed model on the AraSum and Arabic Gigaword datasets demonstrate its superior performance compared to standalone methods, as measured by ROUGE and semantic similarity metrics. Specifically, the hybrid model achieved impressive scores: 86% for ROUGE-1, 84% for ROUGE-2, 86% for ROUGE-L, and 49% for BLEU. These results underscore the accuracy and effectiveness of our approach in summarizing Arabic texts. This study not only provides theoretical and technical insights into model fusion and language-specific adaptation but also outlines potential avenues for future enhancements in Arabic summarization.

Keywords—Arabic Text Summarization; Extractive Summarization; Abstractive Summarization; Transformers, BERT; AraBERT; AraT5

Download :