Abstract:Large amounts of data are being generated constantly, for this reason it is necessary to use parallel and distributed systems, however these systems require careful control in the management of their resources, which is why the MapReduce programming model emerges, which using its schedulers is responsible for automatically managing distributed system resources, for example if one cluster node stops working, MapReduce automatically executes the interrupted task on another computer, another similar mechanism is the speculative scheduling, which is responsible for detecting those tasks that have an execution time abnormally prologando and executes copies of these in a computer different from the one that is executing the original task, this is done with the hope that the backup task finish before the original task and in this way reduce the execution time of the entire application, however the current speculative schedulers have some limitations to correctly calculate the progress of the tasks.
The calculation of the progress of the tasks is fundamental to determine when a task is straggler and a wrong estimation of the progress could cause that resources of the system are wasted and to prolong the time of execution of all the application, for this reason in this work a comparative to measure the accuracy in estimating the progress of the tasks of various speculative schedulers and finally points out some guidelines that could serve as a guide for future proposals and that through these guidelines achieve a more efficient estimate of progress.
|