Model Proposal for the Detection of False Information About COVID-19 Using Machine Learning and Natural Language Processing Techniques

Ticona, Wilfredo M.; Salinas-Bolaños, Yair A.

Model Proposal for the Detection of False Information About COVID-19 Using Machine Learning and Natural Language Processing Techniques (#1191)

Read Article

Date of Conference

July 19-21, 2023

Published In

"Leadership in Education and Innovation in Engineering in the Framework of Global Transformations: Integration and Alliances for Integral Development"

Location of Conference

Buenos Aires

Authors

Ticona, Wilfredo M.

Salinas-Bolaños, Yair A.

Abstract

One of the main problems that arose as a result of this health emergency was the circulation of false information on COVID-19. Therefore, the study carried out aimed to find the best classifier of false information on COVID-19 in the Peruvian context. For this, 2022 information records related to COVID-19 were collected through web scraping of websites, Facebook and Twitter, which were manually labeled as True or False and then validated. Natural Language Processing techniques such as Bag of Words, TF-IDF, Word2Vec and FastText were used for feature extraction. Finally, different Machine Learning model were developed using KNN, Decision Tree, Naive Bayes, SVM, Logistic Regression and MLP. The results were evaluated according to the Accuracy, Precision, Recall and F1-score metrics. The best model resulted from the combination of the SVM algorithm (C (0.5), gamma (1) and kernel (rbf)) with TF -IDF of dimension 300 and n-grams from 1 to 2, whose metrics were superior to the others with 87.41% Accuracy, 88.63% Precision, 87.39% Recall and 88% F1-score.

Read Article