Hate Speech Identification in Texts Through Phraseological Analysis and TF-IDF Representation of N-Grams (#624)
Read ArticleDate of Conference
July 16-18, 2025
Published In
"Engineering, Artificial Intelligence, and Sustainable Technologies in service of society"
Location of Conference
Mexico
Authors
Espin-Riofrio, César
Yanza-Montalván, Ángela
Carchi-Encalada, Rocío
Arias Candelario, Mayra Magdalena
Cruz-Chóez, Angélica
Montesdeoca-Rodríguez, Juan
Bailón-Guaranda, Marcos
Abstract
The phenomenon of hate speech, widely present on digital platforms, poses unique challenges in the Spanish language due toits linguistic ric hness and cultural diversity—characteristics that complicate the automatic identification of such content. This issue is further exacerbated by the language's ability to disguise hate messages through sarcasm, irony, or specific cultural references. This research focuses on the extraction of phraseological features and TF-IDF n- grams, utilizing traditional statistical classification models, neural networks, and ensemble methods to enhance the performance of classification models collectively. The OffendEs dataset, specifically labeled for hate speech tasks in Spanish, was used. Results demonstrate that ensemble models achieve higher levels of accuracy, striking a good balance between classes and showcasing their ability to handle the linguistic complexity of Spanish. In particular, the Voting Classifier achieved a macro F1 score of 0.742261. Our results were compared with predictions made using specific pre-trained models for hate speech detection, such as Piuba and Pysentimiento, demonstrating that our approach outperforms these models. These findings highlight the effectiveness of our methodology and its contribution to the development of more accurate tools for the automatic detection of hate speech in Spanish.