<< Back

Automation of the process of transforming publication formats in scientific journals through a Python script (#987)

Read Article

Date of Conference

July 19-21, 2023

Published In

"Leadership in Education and Innovation in Engineering in the Framework of Global Transformations: Integration and Alliances for Integral Development"

Location of Conference

Buenos Aires


Murillo-Gonzalez, Danny

López, Sucel


Scientific disclosure and diffusion is the way to make society and other scientists aware of the research results and the generation of new knowledge. Over the last few years, scientific journals in digital format have become the most widely used medium to demonstrate these results, but mentioning whether we are going to publish or consult a journal is necessary to analyze some aspects of them such as: their presentation, form of distribution, quality of its content and impact of the magazine. Although all these elements are of interest, the form of distribution is of great relevance since it is linked to the visibility of the journal, if it is not found, it is not read or cited, but without the publication formats they are not diverse, neither We will be able to improve the digital reach of those who use this content. According to data from the Scholastica report, a paid web platform that includes more than 900 publishers of academic journals, the most used formats are pdf and html. In some studies carried out in Central America, specifically Costa Rica and Panama, the formats of scientific journals used are pdf, html, ePub, xml-jats, audio and Flipbook. Of the 185 journals evaluated, only 50% use two formats and barely 15% use more than three formats, the most common being html and pdf. However, the limitation is not only the use of software such as MS Word to transform pdf to html due to the limitations, but according to the editors they do not use other formats because they are unaware of the software used for this process. In the case of Panamanian journals, of 30 journals evaluated, 100% used pdf, only six used html, and only four journals used more than three formats, so we can say that there is a deficiency in the number of formats and probably in the time that the transformation process may take publishers. The objective of this work is to generate a script using Python as programming language to automate the process of transforming scientific article formats in docx, to other formats such as pdf, html, ePub, txt and audio, minimizing the use of software and reducing the processing time of these documents. In the tests carried out with the script, it was necessary to generate document character style formats to achieve good results, where it was not only possible to transform 24 articles from two Panamanian magazines into the five formats, but the transformation time was 15 minutes compared to the 15 hours it took publishers for this transformation.

Read Article