Computational protocol: Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment

Similar protocols

Protocol publication

[…] In Hadoop MapReduce, speculation is implemented in various classes. The relationship between these classes is shown in . TaskRuntimeEstimator is an interface while StartEndTimesBase stores the lifetime of a complete task and it can be used for estimating the running time of a new task. The function estimtedNewAttemptRunTime is for calculating the finishing time of a backup task. Estimating time of a running task is calculated in a class called LegacyTaskRuntimeEstimator. The function estimtedRunTime is used for calculated the finishing time of the current task. The updateAttmpt function in both classes is a function for updating the task status every time when the progress updates. When a heartbeat arrives, DefaultSpeculator will start estimation processes to decide if a backup task needs to be created. The function addSpeculativeAttempt in it can be used for adding a backup task into a task pool. The statusUpdate updates the task status and call the functions to calculate the finishing time.In Algorithm 1, a data structure called HisPro containing the chain (progress, timestamp) is used to store real-time information. When a task is to be completed, such a dataset will be generated and written to HDFS. α in Algorithm 1 is a threshold for evaluating if the dataset should be written to HDFS. It is an empirical value, and it is set 0.95 in this paper to ensure that the dataset will be stored before the task finishes. Time complexity equals the original time complexity due to the fact that data collecting method does not change the original logic procedure. Space complexity is O (m * n), where m is the task volume and n represents the data volume stored in each list. and show an example of collected data being generated when WordCount and Sort are executed, respectively. Through which, it can be seen that the same trend between progress and consumed time appears during the execution of WordCount and Sort. Algorithm 1: Data Collecting Method (GetDataSet)Input:TA: The task attemptTN: The name of the task attemptP: The progress of a running taskTT: The type of the running taskDM: The data map storing all the data lists of currently running tasks, whose key is TN and value is DLDL: The data list storing HisPro generated by a running taskSteps:For each TA in the task pool  If Current TA is Running   Get the current HisPro   Get the DL from DM according to TN   If DL does not contain HisPro    Add HisPro to the DL   Else    Update the DL using HisPro   EndIf   Update the DL in DM  EndIf  If P > α   savetoHDFS (TA, DL, TT)  EndIfEndFor […]

Pipeline specifications

Software tools MapReduce, EMBOSS
Application Miscellaneous