Rashmi Sharma,Nitin Nitin and Deepak Dahiya
1University of Petroleum&Energy Studies(UPES),School of Computer Science,Energy Acres Building,Bidholi,Dehradun,248007,Uttarakhand,India
2Department of Electrical Engineering and Computer Science,College of Engineering and Applied Science,University of Cincinnati,2600 Clifton Ave,Cincinnati,OH 45221,United States
3College of Computer and Information Sciences(CCIS),Majmaah University,Majmaah,11952,Kingdom of Saudi Arabia
Abstract: Failure is a systemic error that affects overall system performance and may eventually crash across the entire configuration.In Real-Time Systems (RTS), deadline is the key to successful completion of the program.If tasks effectively meet the deadline,it means the system is working in pristine order.However,missing the deadline means a systemic fault due to which the system can crash (hard RTS)or degrade inclusive performance (soft RTS).To fine-tune the RTS, tolerance is the critical issue and must be handled with extreme care.This article explains the context of fault tolerance with improvised Joint EDF-RMS algorithm in RTS.The backup method has been derived to prevent the system from being recursively migrating the same task.If any task migrates three times, this migrated task will get shifted to the backup queue.This backup queue assigns the task to a backup processor and is destined for final execution.For performance evaluation purposes,a relative graph between fault and failure rates, failure and total processor utilization along with other averages have been evaluated.Furthermore,these archived results are compared with fault-tolerant Earliest Deadline First(EDF)and Rate Monotonic Scheduling (RMS)algorithms independently in relatively similar conditions.These comparisons show better performance against overloading conditions.
Keywords:Fault tolerance;joint edf-rms algorithm;real-time systems(RTS);distributed systems;migration
RTS is a system where task execution is not dependent only on correct logic,but on-time execution(deadline)is equally important[1-3].Based on deadline RTS is of two types:Hard and Soft[1-3].In soft RTS,missing deadlines degrade the performance of the system,but hard RTS give catastrophic result along.Video game is a good example of soft RTS where the target timing of the game is scheduled.If a player is unable to achieve the mentioned time then only the player performance affect.Whereas,in hard RTS,if an aircraft is unable to execute some operations on time,then catastrophic results like loss of life can happen.The above-mentioned catastrophe or performance deprivation may occur due to the occurrence of a fault in the system and the mechanism used to handle such problems called fault tolerance[4].
In RTS, priority-based scheduling algorithms employee to schedule the tasks, with criteria to decide the priority of tasks.If a task is unable to match the criteria,then that task is non-schedulable.In RTS,meeting the deadline means successful task execution and missing a deadline means failure of a task.The occurrence of a fault in scheduling algorithms is one of the reasons for missing a deadline.Many researchers have discussed the fault tolerance techniques in EDF and RMS [5-7].Here, fault tolerance technique is embed in joint EDF-RMS real-time scheduling algorithm [8] to reduce the permanent fault along with intermittent.The organization of remaining paper is as follows: Next section will explain the fault tolerance mechanisms in Real-Time and Distributed Systems (both),along with the scheduling algorithms of RTS.Further Section 3 explains the fault tolerance in the Joint EDF-RMS algorithm.Simulation studies and results explain in Section 4 and 5.Conclusion and future work will depict at the end.
In today’s era,features of RTS have been merged with many technologies i.e.,cloud computing,distributed systems, mixed-criticality systems, big data and many more.In all stated systems, fault tolerance is the main concern of all algorithms.In [9], hybrid of traditional fault tolerance methods discuss for mobile distributed systems.Dependently fast library uses in[10]to implement the interface to prevent unexpected fail-slow tolerant distributed systems.A scheduling algorithm for cloud computing is proposed in[11]to minimize the response time of tasks present in backup after multiple failures of tasks.The stopwatch automata network model use to make real-time computer system(RTCS)fault tolerant[12].Additionally,to provide reliability against transient faults high power-based backup processor use to execute selective backup tasks[12].In[13]author has proposed two scheduling algorithms FEED-O and FEED-OD to handle the overloading of backup tasks in overlapping time intervals.In[14],Fault-tolerant multi-core mixed criticality systems use to cope with high power and temperature for dependent dual-criticality tasks.In[15],with server CAN(one of the types of storage),a fault tolerance mechanism has been used.In[16]also the fault-tolerance concept implements in big data.Based on aforementioned work,it is clear that fault tolerance is not important in RTS only,but in every domain[9-17].
The focus of this paper is on fault tolerance scheduling algorithms in Real-Time Distributed systems(RTDS).RTDS is a distributed system with temporal and dynamic behaviour,as it changes with time.Temporal activity/behaviour includes arrival time,waiting time,execution and completion time of a task.The completion time of any task affects the performance of dependent and enqueuer tasks.Therefore,timely execution of tasks is the uppermost priority of any RTS scheduling algorithms.Presence of fault in a system may affect its performance by continuous missing of tasks deadline.The system failure can happen due to many improperly handled faults,like preemption of lower priority tasks or lack of resources.This missing deadline is bearable in soft RTS,but not in hard[18,19].The further section will explain the detailed information about failure, faults and their relationship with each other.
When we talk about fault tolerance, the scenario comes to our mind that there is a situation where an irregular, unusual, erroneous function is being played.It degrades the performance or sometimes fails the entire system.Now tolerance channels our system in such a way that these faults are bypassed,thus,maintaining the near normal functioning of the system.Sometimes multiple modules in a redundant fashion,mirror images of the existing system and some kinds of remedial measures are proposed inline.Tolerance methods are necessary for the successful running of any system because critical faults strike without warning.Following (Tab.1)are few examples where tolerance methods for faults present:
Table 1: Examples of fault tolerance
Real-Time tasks schedule on distributed systems where faults of distributed systems affect respective real-time tasks or vice versa.Following Tab.2 summaries, some common types of faults in both systems[20,21]and Fig.1 displaying the visualizations of respective fault types:
Table 2: Types of faults
Figure 1:Visualization of(a)Transient fault(b)Intermittent fault and(c)Permanent fault
The fault is an unintended behaviour of a system that either reduces the performance or fails the system.To protect the system, fault tolerance strategies are required.However, fault-masking and system reconfiguration are a few techniques for fault tolerance but commonly, redundancy in hardware, software, information or time is used [22-24].In following Tab.3 detailed discussions on redundancy techniques have explained:
Table 3: Types of redundancy techniques
Further section explains fault tolerance scheduling algorithms.
While designing any scheduling algorithms for RTS,following criteria should be considered:
? Each task should execute once in a system i.e.,parallelism between the same task should not be there
? All tasks should meet the deadline
? Number of processors failure should be tolerated i.e., failure of one processor should not affect the progress of the entire system(hence,always schedule a backup of tasks on separate processors)
? Preemption of tasks helps to improve the efficiency of the system.
Following are scheduling algorithms employed for the fault tolerance and avoidance as well[2,25-28]:
? First come first serve(FCFS)
? Shortest Job First(SJF)
? Preemptive
? Non-preemptive
? Round Robin Techniques
For all scheduling algorithms following terminologies are common:
Input:Arrival of Periodic taskstiwith arrival timeati,deadlinedti,periodpti,computation timecti,task utilizationuti,CPU utilizationUPi,a priority of respective taskprtiand time quantumTQ.
Output:Task meetingmeettiand missingmisstiof respective tasks
Table 4: Summary of real-time scheduling algorithms for fault tolerance techniques
Table 4:Continued
The above-mentioned Tab.4 is the summary of a few scheduling algorithms to tolerate the occurrence of faults.Most algorithms use for intermittent faults because missing deadlines are difficult to predict Hence,to endure such faults,many time-based scheduling algorithms exist that come under software and time redundancy technology.Out of the above-mentioned categories, this paper deals with preemptive scheduling algorithm class where EDF and RMS both algorithms are used for fault tolerance purposes.The next section will describe the working of the joint EDF-RMS algorithm in a faulty environment.
In RTS,failure of single task may affect the system performance and destroy it as well(catastrophe result).Hence, fault management is required in RTS.Many researchers actively participate and implement/simulate the fault-tolerant system with the help of some scheduling algorithms.Therefore,the performance of Joint EDF-RMS algorithm improvises by embedding the feature of fault tolerance in it.Following Fig.2 is the pictorial representation of mentioned algorithm with fault tolerance management.
There are four blocks A,B,C and D.Block‘A’represents a global queue where tasks will assign to randomly selected processors.According to RMS,the task having highest priority will assign first to a chosen processor of block‘B’and similar trend will follow for remaining tasks.To avoid overloading in a queue, RMS criteria has been used i.e.,wherenrepresents the total number of tasks.Further,a selected processor of the‘B’block use EDF algorithm to schedule the signed tasks (0.81 is the upper bound of CPU utilization [8,29]).To handle overloaded (victim)tasks,blocks‘C’and‘D’are available.Victim task is a task that migrates minimum of three times and remains unexecuted(may become responsible for missing the deadline of dependent tasks).For such unexecuted tasks,a backup queue is there that schedule victim tasks to the selected processor.Here,RMS algorithm use for the scheduling of victim tasks on assigned processors.
Figure 2:Working of joint EDF-RMS algorithm with fault tolerance technique
Following is the algorithm for the proposed work:
Fault-Tolerant Joint EDF-RMS Algorithm
Algorithm_3.3:Joint EDF-RMS Algorithm with Fault-Tolerant Technique Input:Arrival of tasks with τarrival,τwcet,τperiod,τdline Output:Number of tasks successfully meet and unable to achieve the deadline with other parameters BEGIN GlobalScheduler()//Global task Queue 1.Arrival of tasks ti // Periodic arrival of tasks with arrival time, wcet, assigned deadline and period 2. tasku=wcet/Period;3. UB=n*(Math.pow(2,1.0/n)-1);4.IF tasku <=UB 5.Generated task is schedulable 6. pselection(task)7.Else 8.The task is non-schedulable pselection(task)//Random Selection of Processors 1.Random Selection of Processor 2. PQueue(task);PQueue(task)//Processors local queue 1.Assign priorities to tasks on the basis of deadline 2.mig=0//migration counter initially 0 3. taskpriority ∝1 taskdeadline 4. TaskExecution(task,mig)TaskExecution(task,mig)//Task Execution by using EDF Scheduler 1.If tasku <=1 2. U =U+tasku//Cumulative accumulation of task utilization 3.If(U <=.810)//Processor utilization upper bound 4.The task is ready for execution 5.Else 6. Task Migration(task,tasku,mig)//now mig=1 Taskmigration(task,tasku,mig)//Task Migration on the basis of a processor utilization factor 1.Sorting of all processor utilization 2.Give task to less utilized processor 3. PQueue(task,mig);//mig take care number of times given task migrate 4.If mig==3 5.Backup-Global scheduler(task,tasku,mig)6.Else 7.Taskmigration(task,tasku,mig)//now mig=2(Continued)
Algorithm_3.3:Continued Backup-Global scheduler(task,tasku,mig)//Global scheduler for tasks 1.Random Selection of Back-upProcessor(3 processors)2. BkupPQueue(task);BkupPQueue(task)//Back-up Processors local queue 1.RMS algorithm is used here.2. taskpriority ∝1 taskperiod 3. BkupTaskExecution(task)BkupTaskExecution(task)//RMS is in action 1.If tasku <=n(21n +1)2. U =U+tasku//cumulative totaling of task utilization 3.If(U <=n(21n +1))&&(U <=1)//Acceptance test for victim task execution 4.The victim task is acceptable for execution 5.Else 6. Task Migration(task,tasku)END
? Experimental Set-up
For simulation purposes,the Eclipse Oxygen.3a version use for the java programming language.The concept of thread uses here to create,wait and execution of tasks.Maximum 2100 and minimum 300 tasks are used in distributed systems of six (3 for backup)processors.The implementation of all three algorithms EDF, RMS and joint EDF-RMS done on same tasks in synchrony to provide the same simulation parametrization.Following parameters use to evaluate the performance of the above-mentioned algorithms:
1.Fault rate:In RTS,if the task has less chance to meet the deadline as per utilization calculations andif(task.utilization <cpu.utilization)||(∑.utilization >CPU.utilization)then the scheduling algorithm allow it for execution,and that task can be a reason for the failure of many remaining tasks.Based on schedulability test,victim tasks identify here.Following equation is used to calculate the fault rate:
2.Failure rate:Tasks fail to meet the deadline due to fault done by scheduling algorithms by allowing tasks to execute.Following equation calculate failure rate:
3.Processor utilization threshold:The reason behind task migration is utilization of CPU.If any task crosses the threshold limit of CPU utilization then,that task will migrate.Hence,migration point in RMS isn(21/n-1),in EDF 1or100%and in joint EDF-RMS it is 0.81or81%.
Before the discussion on the results, the following lemma explains the dependency of task execution on the aforementioned parameters:
?Lemma 5.1Failure of task is dependent on victim tasks,migration and CPU utilization.
Proof
Considering the arrival ofNtasks in the systemSwhereτ1,τ2,τ3,..........,τk∈Nand their CPU utilizations are:respectively.Every time a new task arrives and the system computes its utilization whether it is less than or greater than CPU utilization i.e.,1(100%)or not.
Table 5: Demonstration without migration
Now,here the arrival of tasksτ4and τ5,exceeds the CPU utilization,as it becomes more than 1.Here, (Tab.5)τ4and τ5tasks are victim tasks that can create a fault in the system by overloading it.Overloading means resources of CPU engage (critical)in the execution of previous tasks(τ1,τ2,τ3)such that the system may start missing deadlines.Hence,victim tasks have either to wait for their turn or migrate on other processors.Hence,the following two conditions arise in the system:
I.The system is occupied in the execution of tasksτ1,τ2,τ3.Due to which remaining tasks have to wait for their execution,this increases by waiting time(Δt)and thus affect approaching of deadline.Asa1,a2,a3,a4,a5,a6,a7are the arrival time of tasksτ1,τ2,τ,τ4,τ5,τ6,τ7respectively with deadlines?1,?2,?3,?4,?5,?6,?7and execution timew1,w2,w3,w4,w5,w6,w7.Δtis the amount of waiting time that affects the assigned deadline.Following will be the scenario of scheduled victim tasks or tasks arrive after victim tasks:
Initially tasks fulfill the conditiona4< ?4, allowed to enter in semaphore, due to resource unavailability taskτ4has to wait for Δtamount of time i.e.,a4+Δt <?4.This increment of waiting time increases the tasks in a waiting list i.e.,τ5,τ6.........τk.a5+(α*Δt)<?5,a6+(2α*Δt) ≤?6,a7+(mα*Δt)=?7and it goes on till any of the tasks meet the deadline/migrate from one processor to another in a system.In this way,waiting time affects deadlines and tasks start missing deadlines.
II.In the above case, Tab.5 migration was not in the picture.Here, Tab.6 if more than one processor is there in the systemS, then the victim task can migrate towards another processor and can get the required resources of execution.
Suppose three processors are there in a systemSand each processor is engaged in the execution of tasks.Following are the tasks involved in each processor:
Table 6: Demonstration with migration
Hence,we can say that CPU utilization,victim tasks and migration of tasks affect the successful execution of upcoming tasks.
EDF, RMS and Joint EDF-RMS algorithms implement and test up to 2100 tasks.These tasks divide into the transaction of 300,600,900,1200,1500,1800 and 2100 tasks.As fault tolerance is the main point of concern here,the relation of fault with task failure,CPU utilization and migration of victim tasks are under consideration.
-Fault rate vs.Failure rate
Fig.3 demonstrates the number of victim tasks out of total tasks in a given transaction.In EDF,tasks exceed the CPU utilization of more than 1,in RMSn(21/n-1)and Joint EDF-RMS 0.81.Fig.4,explains the failure rate(tasks unable to execute on time)of tasks in given algorithms.Fig.5,represents the average fault rate and failure rate.In EDF on an average 22 tasks fails due to 3 victim tasks,similarly, 22 tasks fail due to 2 victim tasks in RMS but 18 tasks miss the deadline due to 3 victim tasks.Hence,the Joint EDF-RMS algorithm performance is better as compared to the remaining two algorithms.
Figure 3:Arrived tasks vs.fault rate calculations based on EDF,RMS and Joint EDF-RMS algorithm
Figure 4:Arrived tasks vs.failure rate of tasks calculations based on EDF,RMS and Joint EDF-RMS algorithm
-Processor utilization vs.failure rate
Up to three processors use in this simulation.The Fig.6, demonstrates the effect of total CPU utilization on the failure of tasks.As the participation of processors in the system increases, the total CPU utilization escalates accordingly.The behaviour of system while using the EDF algorithm illustrates in the Fig.6.When it increases the limit by more than 3(U≤1),tasks miss the deadline due to victim tasks.
Figure 5:Average Impact of fault rate on failure rate in EDF,RMS and Joint EDF-RMS algorithm
Figure 6:CPU utilization vs.failure rate based on EDF scheduling algorithm
Fig.7 explains the total CPU utilization effect on failure task when the RMS algorithm is in use.In Joint EDF-RMS,RMS algorithm uses to handle arrival of tasks in global queue and EDF algorithm use with threshold limit 0.81 in every processor.Hence,in Fig.8 the failure ratio is less as compared to others due to the threshold limit of EDF algorithms and usage of RMS for priority assignment.
Figure 7:CPU utilization vs.failure rate based on RMS algorithm
CPU utilization is the main parameter due to which victim tasks can recognize and migrate to other processors, that helps in failure tasks reduction.Fig.9 are showing the complete result of all three scheduling algorithms.On average for every 1200 tasks the failure percentage of Joint EDFRMS is less i.e.,17%as compared to the remaining two scheduling algorithms.Similarly,due to the application of migration technique on 0.81%threshold,the victim tasks percentage is also very less i.e.,only 2%in Joint EDF-RMS algorithm,whereas in EDF and RMS victim tasks percentage is greater or equal i.e.,4%and 2%,respectively.
Figure 8:CPU utilization vs.failure rate based on Joint EDF-RMS algorithm
Figure 9:Task’s status in all three scheduling algorithms based on average number of tasks
The fault tolerance mechanisms in RTS with the help of migration techniques in distributed systems successfully simulated.Basic scheduling algorithms (EDF and RMS)select to evaluate the performance of hybrid scheduling algorithm(Joint EDF-RMS)with fault tolerance mechanism.The vital role of fault tolerance algorithms in various applications discuss in previous sections.Here,fault tolerance implant in new hybrid algorithm(Joint EDF-RMS)and simulate in Real Time Distributed Systems.How overloaded(victim)task affect the execution of upcoming or enqueuer tasks have been explained with the help of lemma(with or without migration)and simulation.Overall,the behaviour of EDF, RMS and Joint EDF-RMS algorithm evaluates in faulty environment of RTDS.The Joint EDF-RMS algorithm handle faults efficiently due to the presence of both EDF and RMS algorithms that overcome limitations of each other.Additionally,usage of RMS algorithm in Backup processors improve the performance because this algorithm is good to handle overloading conditions due to its schedulability criteria.
This fault tolerance concept will implement in information-theoretic entropy based work[30-32].The simulation of information theoretic entropy based EDF scheduling algorithm is already done and in future,its behaviour in faulty environment will check by using fault tolerance mechanism use in this paper.
Funding Statement:Deepak Dahiya would like to thank Deanship of Scientific Research at Majmaah University for supporting this work under Project No.R-2022-56.
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
Computers Materials&Continua2022年9期