Performance Evaluation of Recurrent Neural Network on Large-Scale Translated Dataset for Question Generation in NLP for Educational Purposes

Published in: Industry, Innovation, and Infrastructure for Sustainable Cities and Communities: Proceedings of the 17th LACCEI International Multi-Conference for Engineering, Education and Technology
Date of Conference: July 24-26, 2019
Location of Conference: Montego Bay, Jamaica
Authors: Fidel Mamani Maquera (CiTeSoft UNSA, PE)
Alfredo Paz Valderrama (CiTeSoft UNSA, PE)
Eveling Castro Gutierrez (CiTeSoft UNSA, PE)
(CiTeSoft UNSA)
Full Paper: #178

Abstract:

In recent years, neural networks have been used widely to solve many NLP tasks that involve large-scale datasets. Recently, Question Generation (QG) has called great attention since it is a subtask of Question Answering (QA) that has many applications in the world, mainly for educational purposes. The importance of it could be seen on many recent released large-scale datasets prepared exclusively for this task, most the data used in NLP are available in the English language, but it is not the case for the rest of the languages, like Spanish, which is the third most used language in the world. This research is focused on analyzing the performance of current state-of-the-art neural network model used in QG using translated Spanish large-scale dataset from English. To know the accuracy of the translated Spanish data from English, it has used state-of-the-art OpenNMT machine translator and Google Translation API, then the results have been analyzed with the corresponding automatic metrics -BLEU, METEOR, ROUGE- and human evaluation such as fluency and adequacy, later, it has been trained a state-of-the-art question generation (QG) neural network model using Spanish translated data to generate automatic questions. Surprisingly, the results outperform the original results in English dataset in average 37% in all automatic evaluation metrics. To the best of our knowledge, this work is the first using large-scale Spanish translated data for QG task using recurrent neural networks for educational purposes.