Heuristic method for automatic image annotation in HTML documents

Published in: Proceedings of the 13th Latin American and Caribbean Conference for Engineering and Technology: Engineering Education Facing the Grand Challenges, What Are We Doing?
Date of Conference: July 29 - 31, 2015
Location of Conference: Santo Domingo, Dominican Republic
Authors: Jorge Luis Betancourt González
Adisleydis Rodríguez Alvarez
Refereed Paper: #57


An automatic heuristic method for embedded image annotation in HTML documents is exposed. This method exploits the tree structure present in HTML documents trying to identify nodes that contain relevant information about the embedded image, and then using the text in these nearest nodes to expand the information collected about the image, increasing the recall of a Web Search Engine. The proposed heuristic was evaluated using the Agreement Index: the text contained in the identified nodes and the corresponding image was assessed and assigned a category of how well the text was related (i.e. described) with the image. In our test cases the calculated Agreement Index was over 85%, validating the proposed method.

Keywords-- image annotation, HTML, information retrieval, search engine, web