Dry Bean Classification. Analysis from Data Mining

Castrillón, Omar Danilo; Giraldo, Jaime Alberto; Arango, Jaime Antero

Dry Bean Classification. Analysis from Data Mining (#108)

Read Article

Date of Conference

July 19-21, 2023

Published In

"Leadership in Education and Innovation in Engineering in the Framework of Global Transformations: Integration and Alliances for Integral Development"

Location of Conference

Buenos Aires

Authors

Castrillón, Omar Danilo

Giraldo, Jaime Alberto

Arango, Jaime Antero

Abstract

The classification of seeds is a fundamental factor in all agricultural processes, Maxime, if you want to obtain the highest possible profitability of the products once they can be harvested and avoid cumbersome manual processes. In this investigative work, we start with an analysis from data mining in order to establish the most influential independent variables in the classification of a seed, in this case, dry beans. Among the independent variables subject to analysis are: area, perimeter, longest axis length, shortest axis length, aspect, eccentricity, convex area, equivalent diameter, extension, solidity, roundness, chromaticity, form factor 1, form factor 2, form factor 3, form factor 4. A dependent variable called class is also established, which consists of seven states: Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira. In this research, the J48 algorithm of the machine learning and data mining platform called Weka is used in order to identify the class to which a seed can belong and the most influential independent variables in this process. As a result, it is found that with an effectiveness greater than 93%, the most influential variables are permitero, longer shaft length, and shorter shaft length.

Read Article