A Fault Tolerance Distributed Real-Time System. Design and Implementation

Doctoral Thesis / Dissertation, 2013

122 Pages, Grade: 90


List of Contents


List of Figures

List of Tables

List of Abbreviations

Chapter One
1.1 Overview
1.2 Related Works
1.3 Thesis Statement
1.4 Contributions
1.5 Outline of the Thesis

Chapter+ Two
2.1 Introduction
2.2 Distributed Systems Definition
2.3 Characterization of Distributed Systems
2.4 Software Concepts
2.4.1 Distributed Operating System Uniprocessor Operating Systems Multiprocessor Operating Systems Multicomputer Operating Systems
2.4.2 Network Operating Systems
2.5 Middleware
2.5 Positioning Middleware
2.5 Categories of Middleware
2.5.2 Distributed Tuples
2.5.2 Remote Procedure Call
2.5.2 Message-Oriented Middleware Distributed Object Middleware
2.6 Networking and Internetworking
2.6.1 Types of network
2.6.1 Personal area networks (PANs) Local area networks (LANs)
2.6.1 Wide area networks (WANs) Metropolitan area networks (MANs)
2.6.2 Communication Structure
2.6.3 Communication Protocols
2.6.4 Naming Names, Identifiers, and Addresses
2.6.4 Name Resolution
2.7 Migration
2.7 Data Migration
2.7.2 Computation Migration
2.7.3 Process Migration
2.8 Fault Tolerance
2.8.1 Failure, Error, and Faults
2.8 Techniques for Fault Tolerance

Chapter Three
3.1 Introduction
3.2 Basic Real-Time Concepts
3.3 Real-Time Systems
3.3.1 Real-Time System Design Issues
3.3.2 Real-time applications issues
3.4 Basic concepts for real-time task scheduling
3.4.1 Real-time task model
3.4.2 Scheduling Criteria
3.5 Scheduling definitions, algorithms and properties
3.5.1 Scheduling algorithms taxonomy
3.5.2 Scheduling properties
3.5.3 Multiprocessor Scheduling
3.6 Distributed Real-Time Systems
3.7 Scheduling Distributed Systems
3.8 The time/utility function

Chapter Four
4.1 Introduction
4.2 The proposed System Objective
4.3 Models
4.3.1 The System Model
4.3.2 Task Model
4.3.3 Timeliness Model
4.4 Scheduling Objectives of EOE-DRTSA
4.5 Scheduling Algorithm
4.5.1 EOE-CDRTSA collaborative scheduling algorithm Global scheduling with Static priorities Global scheduling with Dynamic priorities
4.5.2 EOE-PDRTSA independent scheduling algorithm Independent scheduling with Static priority algorithm Independent scheduling Dynamic priority algorithm
4.6 EOE-DRTSA Distributed RMI System design
4.6.1 EOE-DRTSA Distributed system RMI Java Design

Chapter five
5.1 Introduction
5.2 Objectives of EOE-DRTSA
5.3 Protocol Algorithm Description

Chapter Six
6.1 Introduction
6.2 Case Study (Flight reservation System)
6.3 The Objective of Flight reservation System
6.4 Features of Flight reservation System
6.5 System Architecture
6.6 The Hardware Description of the System
6.7 Software Requirements
6.8 Technologies Used
6.9 Flight reservation System Modules
6.10 User Interface
6.11 Scenarios
6.11.1 First Scenarios
6.11.2 Second Scenarios
6.11.3 Third Scenarios
6.12 Performance Parameters
6.13 Simulation

Chapter Seven
7.1 Introduction
7.2 Conclusions
7.3 Future Works



Now a day completed real-time systems are distributed. One of the working area of real-time scheduling is distributed scheduling. Task scheduling in distributed systems is dealt with two levels: on the level of each processor (local scheduling), and on the level of the allocation of tasks to processors (global scheduling).

In this thesis, a distributed real-time system with fault tolerance has been designed and called Fault Tolerance Distributed Real Time System FTDRTS. The system consists of heterogeneous processors act as servers and clients connected together via LAN communication network. This system has two types of scheduling schemes: (1) global model scheduling, (2) independent model scheduling for scheduling tasks in real time distributed manner. The time utility function TUF has been developed and called the DTUF (Developed TUF) function. This function gives another dimension and used to priorities’ tasks, based on whether they are Urgent or Important, or both, or neither.

A fault tolerance protocol called DRT-FTIP (Distributed Real Time - Fault Tolerance Integrity Protocol) has been developed. This protocol increases the integrity of the scheduling in distributed real time systems.

The proposed Distributed Real-Time system with its scheduling algorithms and integrity protocol have been designed using the Java Remote Method Invocation (RMI) and use the Flight Reservation System as a case study. The simulation results of this proposed distributed real-time system using global scheduling algorithm gives Deadline Satisfaction Ratio (DSR) equal 95%. While Accrued Utility Ratio (AUR) equal 0.7286.

List of Figures

1.1 Architecture of proposed system

2.1 A distributed system

2.2 Separating applications from operating system code

2.3 General structure of a multicomputer operating system

2.4 General structure of a network operating system.

2.5 General structure of a distributed system as middleware

2.6 ISO protocol stack

2.7 An ISO network message

3.1 Disciplines that impact on real-time systems engineering

3.2 Scheme of a real-time application

3.3 Task model

3.4 Task States

3.5 TUFs

4.1 Collaborative Scheduling System Architecture

4.2 Independent Scheduling System Architecture

4.3 Time Management Matrix

4.4 3D Time/Utility Functions (DTUF)

4.5 A basic RMI call with a stub and skeleton

4.6 RMI design steps

4.7 Stub and skeleton files

4.7 RMI registry architecture diagram

5.1 DRT-FTIP Integrity Protocol: Normal state behavior

5.2 DRT-FTIP Integrity Protocol: Anomaly state behavior

6.1 System Architecture

6.2 flight reservation system main form

6.3 Search Flight Window

6.4 Execution time for Global scheduling

6.5 Execution time for independent scheduling

6.6 Utilization of algorithm scheduling (independent ps,

List of Tables

2.1 Different forms of transparency in a distributed system

2.2 An overview between DOS (Distributed Operating Systems),NOS (Network Operating Systems), and middleware

6.1 Threads set example

List of Abbreviations

Abbildung in dieser Leseprobe nicht enthalten

Chapter One Introduction

1.1 Overview:

In a general-purpose computing system, the primary performance requirement is the correct functionality. For instance, if a user wishes to obtain a list of the first thousand prime numbers, a system meets a functionality requirement if it correctly outputs such a list to the user1.

In a real-time system, which can be thought of as a special case of generalpurpose computing systems, timeliness, along with correct functionality, is also a requisite. Consider, for example, a train that is attempting to make an emergency stop within 5 seconds. The fact that the train does stop demonstrates that it performs its task correctly. However, the completion of this task is not useful, to say the least, if the train does not stop on time (i.e., within 5 seconds). A train that does is said to meet the timeliness requirement1.

While A real-time application is typically composed of a number of cooperating activities, each contributing towards the overall goals of the application. The physical system being controlled dictates that these activities must perform computations within specific time intervals. For instance, safety considerations may dictate that an activity must respond to an alarm condition within several milliseconds of the receipt of the alarm signal2. In the physical world, the purpose of a real-time system is to have a physical effect within a chosen time-frame. Typically, a real-time system consists of a controlling system (computer) and a controlled system (environment). The controlling system interacts with its environment based on information available about the environment. On a real-time computer, which controls a device or process, sensors will provide readings at periodic intervals and the computer must respond by sending signals to actuators. There may be unexpected or irregular events and these must also receive a response. In all cases, there will be a time bound within which the response should be delivered. The ability of the computer to meet these demands depends on its capacity to perform the necessary computations in the given time. If a number of events occur close together, the computer will need to schedule the computations so that each response is provided within the required time bounds. It may be that, even so, the system is unable to meet all the possible unexpected demands. In the case that the system lacks sufficient resources; a system with unlimited resources and capable of processing at infinite speed could satisfy any such timing constraint. Failure to meet the timing constraint for a response can have different consequences; there may be no effect at all, or the effects may be minor or correctable, or the results may be catastrophic. Each task occurring in a real-time system has some timing properties. These timing properties should be considered when scheduling tasks on a real-time system3.

Real-time applications usually contain more activities that must be executed than there are processors on which to execute them. Consequently, several activities must share a single processor, and the question of how to schedule the activities for any specific processor, deciding which activity should run next on the processor must be answered. Necessarily, a prime concern in making scheduling decisions in real-time systems is satisfying the timing constraints placed on each individual activity, thereby satisfying the timing constraints placed on the entire application. Unfortunately, the activities to be scheduled are not independent.

Rather, they share data and devices, observe concurrency constraints on code execution, and send signals to one another. All of these interactions can be modeled as contention for shared resources that may only be used by one activity at a time. An activity that is waiting for access to a resource currently held by another activity is said to depend on that activity, and a dependency relationship is said to exist between them. Dependency relationships may encompass both precedence constraints, which express acceptable execution orderings of activities, and resource conflicts, which result from multiple concurrent requests for shared resources2.

Distributed real-time systems (DRTS)are widely employed in many cyberphysical control applications such as vehicle control application and multimedia communication application. Such systems typically require that a series of jobs be executed on a chain of processors and be completed within some end-to-end deadlines. This sequence of jobs is defined as a transaction and is periodically released. Resource competition among jobs from different transactions on a shared processor could severely increase job response times, potentially resulting in endto-end deadline misses. Therefore, it is important to assign priorities to jobs or transactions on each processor in order to guarantee the timing requirements of the transactions in a distributed real-time system4.

Scheduling of real-time multiprocessor systems has received increased attention recently. However, in most of these works target underloaded systems, the total application task utilization demand, U is always being less than the total available processing capacity of the system, which is m for an m-processor system. Majority of the researchs focus in multiprocessor real-time systems is on developing scheduling algorithms and understanding their schedulability utilization bounds i.e., task utilization bounds below which all task deadlines are met. The premise of this approach is that it is possible to determine the worst-case execution-time behaviors of applications (e.g., task arrival behaviors, task worstcase execution times) and thereby determine the total task utilization demands. Once the task utilization demand is known, task schedulability, the ability to meet all task deadlines can be ensured off-line by selecting the appropriate scheduling algorithm with a higher utilization bound5.

For some applications , it is difficult to determine worst-case execution-time behaviors a priori, as they are subject to run-time exigencies, such as execution time overruns and unpredictable thread arrival patterns, causing transient and permanent overloads. When overloads occur, often, such applications desire graceful timeliness degradation,e.g., meeting as many deadlines of high importance tasks as possible, irrespective of task urgency5.

When resource overloads occur, meeting deadlines of all application activities is impossible as the demand exceeds the supply. The urgency of an activity is typically orthogonal to the relative importance of the activity-e.g., the most urgent activity can be the least important, and vice versa; the most urgent can be the most important, and vice versa. Hence when overloads occur, completing the most important activities irrespective of activity urgency is often desirable. Thus, a clear distinction has to be made between the urgency and the importance of activities, during overloads6.

Past efforts on distributed scheduling can be broadly categorized into two classes: independent node scheduling and collaborative scheduling. In the independent scheduling approach, threads are scheduled at nodes using propagated thread scheduling parameters without any interaction with other nodes. Thread faults are managed by integrity protocols that run concurrent to thread execution.

Integrity protocols employ failure detectors (or FDs), and use them to detect thread failures. In the collaborative scheduling approach, nodes explicitly cooperate to construct system-wide thread schedules, detecting node failures using FDs while doing so7. There are a number of papers focused on distributed real-time systems , and others addressed distributed scheduling algorithms. Some of them are introduced in this chapter.

1.2 Related Works:

In year 1990 Raymond Keith Clark provided an algorithm, called DASA, that is effective for scheduling the class of real-time systems known as supervisory control systems. Simulation experiments that account for the time required to make scheduling decisions demonstrate that DASA provides equivalent or superior performance to other scheduling algorithms of interest under a wide range of conditions for parameterized, synthetic workloads. DASA performs particularly well during overloads, when it is impossible to complete all of the activities. This research makes a number of contributions to the field of computer science, including: a formal model for analyzing scheduling algorithms, the DASA scheduling algorithm, which integrates resource management with standard scheduling functions; results that demonstrate the efficacy of DASA in a variety of situations; and a simulator. In addition, this work may improve the current practices employed in designing and constructing supervisory control systems by encouraging the use of modern software engineering methodologies and reducing the amount of tuning that is required to produce systems that meet their real-time constraints , while providing improved scheduling, graceful degradation, and more freedom in modifying the system over time2.

In year 2004 Dhuha Basheer and Basil Shukr provided an Operating system, called RTDM (Real-time Deadline Multitask) which is a real-time operating system kernel for uniprocessor multitask system provides guaranteed response times to tasks. It is responsible for scheduling, managing, and executing periodic and sporadic real-time tasks. That real-time software is able to guarantee that all real-time tasks active on that kernel will meet their stated deadline under the operating system Windows8.

In year 2004 Haisang Wu, Binoy Ravindran, and E. Douglas Jenseny Peng Liz considered Real-Time CORBA 1.2's distributable threads (DTs), whose time constraints are specified using time/utility functions (TUFs), operating in legacy environments. In legacy environments, system node resources, both physical and logical are shared among time critical DTs and local applications that may also be time-critical. Hence, DTs that are scheduled using their propagated TUFs, as mandated by Real-Time CORBA 1.2's Case 2 approach, may suffer from performance degradation, if a node utility accrual (UA) scheduler achieves higher locally accrued utility by giving higher eligibility to local threads than to DTs. They presented decomposition techniques called UT, SCEQF, SCALL, OPTCON, and TUFS,which are specific to different classes of UA algorithms, such as those that use utility density, and those that use deadline as their key decision metric9.

In year 2004 Haisang Wu, Binoy Ravindran, and E. Douglas Jensen extended Jensen’s time/utility functions and utility accrual model with the concept of joint utility functions (or JUFs) that allow an activity’s utility to be described as a function of the completion times of other activities and their progress. They also specified the concept of progressive utility that generalizes the previously studied imprecise computational model, by describing an activity’s utility as a function of its progress. Given such an extended utility accrual model, they considered the scheduling criterion of maximizing the weighted sum of completion time, progressive, and joint utilities. They presented an algorithm called the Combined Utility Accrual algorithm (or CUA) for this criterion. Experimental measurements with an implementation of CUA on a POSIX RTOS illustrate the effectiveness of JUFs in a class of applications of interest to them10.

In year 2005 Dhuha Basheer and Basil Shukr provided evaluate time and space measurement tools for RTDM (Real-time Deadline Multitask) kernel. That measurement depends upon two kernel-level tools named the interrupt timing analysis tool and scheduler timing analysis tool11.

In year 2006 Binoy Ravindran, Edward Curley, Jonathan Anderson, and E. Douglas Jensenz considered the problem of recovering from failures of distributable threads in distributed real-time systems that operate under run-time uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. They presented a scheduling algorithm called HUA and a thread integrity protocol called TPR12.

In year 2006 Edward Curley ,Binoy Ravindran, Jonathan Anderson, and E. Douglas Jensen considered the problem of recovering from failures of distributable threads in dynamic systems with overloads and (permanent/transient) network failures13.

In year 2006 Wenming Li, Krishna Kavi , and Robert Akl proposed a new algorithm that uses multiple scheduling strategies for eƥcient non-preemptive scheduling of tasks. Their goal was to improve the success ratio of the well-known Earliest Deadline First (EDF) approach when the load on the system is very high and to improve the overall performance in both underloaded and overloaded conditions. Their approach, known as group-EDF (gEDF) is based on dynamic grouping of tasks with deadlines that are very close to each other, and on using Shortest Job First (SJF) technique to schedule tasks within the group14.

In 2006 Shouwen Lai, Binoy Ravindran, Hyeonjoong Choy considered minimizing the system-level energy consumption through dynamic voltage scaling for embedded devices, while a) allowing concurrent access to shared objects through lock-free synchronization b) meeting (m; k)-constraint, and c) completing as many high importance tasks as possible. They presented a scheduling algorithm called Lock-Free Utility accrual Algorithm (or MK-LfUA) to meet these goals. At offline stage, they statistically determined task execution time, and set the optimal CPU speed that will minimize system-level energy consumption. At run-time, the algorithm dynamically adjusts the CPU speed to compensate for slack time, while taking into account the speed transition overhead15.

In 2007 Sherif Fahmy, Binoy Ravindran, and E. DOUGLAS. Jensen considered scheduling distributable real-time threads with dependencies in partially synchronous systems in the presence of node failure. They present a collaborative distributed real-time scheduling algorithm called DQBUA. The algorithm uses quorum systems to coordinate nodes’ activities when constructing a global schedule. DBQUA detects and resolves distributed deadlock in a timely manner and allows threads to access resources in order of their potential utility to the system. Their main contribution was handling resource dependencies using a distributed scheduling algorithm16.

In 2007 Jonathan S. Anderson, Binoy Ravindran, and E. Douglas Jensen, demonstrated a consensus utility accrual scheduling algorithm for distributable threads with run-time uncertainties in execution time, arrival models, and node crash failures algorithm called DUA-CLA algorithm. The DUA-CLA algorithm represents a unique approach to distributable thread scheduling in two respects. First, it unifies scheduling with a fault-tolerance strategy. Second, DUA-CLA takes a collaborative approach to the scheduling problem, rather than requiring nodes independently to schedule tasks without knowledge of other nodes' states. Global scheduling approaches where in a single, centralized scheduler makes all scheduling decisions have been proposed and implemented. DUA-CLA takes a via media, improving independent node scheduler decision making with partial knowledge of global system state17.

In year 2007 Kai Han, Binoy Ravindran, and E. DOUGLAS. Jensenz considered scheduling real-time distributable threads in the presence of node/link failures and message losses in large-scale network systems. they presented distributed scheduling algorithm called Real-Time Gossip with Low Message Overhead (or RTG-L), which provides assurances on thread time constraint satisfactions in large-scale unreliable networks18.

In an attempt to develop scheduling algorithm for distributed real time system Sherif F. Fahmy, Binoy Ravindran (2008), and E. DOUGLAS. Jensen studeied the distributable threads abstraction for programming and scheduling such systems, and presented a collaborative thread scheduling algorithm called the Quorum-Based Utility Accrual scheduling (or QBUA). they showed that QBUA satisfies (end-to-end) thread time constraints in the presence of crash failures and message losses, has efficient message and time complexities, and lower overhead and superior timeliness properties than past thread scheduling algorithms19.

In year 2008 Sherif F. Fahmy, Binoy Ravindran, E. DOUGLAS. Jensen considered the problem of scheduling distributable realtime threads under runtime uncertainties including those on thread execution times, thread arrivals, node failures, and message losses. They presented a distributed scheduling algorithm called ACUA that is designed under a partially synchronous model, allowing for probabilistically-described message delays20.

Hyeonjoong Cho, Binoy Ravindran , and E. DOUGLASouglas Jensen (2009) presented the first Utility Accrual (or UA) real-time scheduling algorithm for multiprocessors, called global Multiprocessor Utility Accrual scheduling algorithm (or gMUA). The algorithm involved an application model where realtime activities are subject to time/utility function time constraints, variable execution time demands, and resource overloads where the total activity utilization demand exceeds the total capacity of all processors21.

In year 2010 Piyush Garyali, Matthew Dellinger, and Binoy Ravindran investigated the problem of scheduling dependent real-time tasks for overloads on a multiprocessor system, yielding best-effort timing assurance. They developed a class of polynomial-time heuristic algorithms, called the Global Utility Accrual (GUA) class of algorithms and they developed a Linux-based real-time kernel called ChronOS3.

Sherif F. Fahmy (2010) studied the problem of scheduling and synchronization of distributable real-time threads | Real-Time CORBA's first-class abstraction for programming real-time, multi-node sequential behaviors. Distributable real-time threads can be, broadly scheduled using two paradigms: node independent scheduling, in which nodes independently construct thread schedules, based on node-level decomposition of distributable thread (or DT) scheduling parameters, and collaborative scheduling, in which nodes collaborate to construct system-wide thread schedules, which may or may not involve scheduling parameter decomposition. While significant literature exists on node independent scheduling, little is known about collaborative scheduling and its concomitant tradeoffs. They designed three collaborative scheduling algorithms, called ACUA, QBUA, and DQBUA. ACUA uses theory of consensus and QBUA uses theory of quorums for distributable thread schedule construction. DQBUA extends QBUA with lock-based, local and distributed concurrency control22.

In the same year (2010) Piyush Garyali considered the problem of scheduling real-time tasks on a multiprocessor system. His primary focus is scheduling on multiprocessor systems where the total task utilization demand, U, is greater than m, the number of processors on a multiprocessor system, and the total available processing capacity of the system. When U > m, the system is said to be overloaded; otherwise, the system is said to be underloaded. While significant literature exists on multiprocessor real-time scheduling during underloads, little is known about scheduling during overloads, in particular, in the presence of task dependencies, due to synchronization constraints. Garyali considered real-time tasks that are subject to time/utility function (TUF) time constraints, which allow task urgency to be expressed independently of task importance, the most urgent task being the least important. The urgency/importance decoupling allowed by TUFs is especially important during overloads, when not all tasks can be optimally completed. Garyali developed a class of polynomial-time heuristic algorithms, called the Global Utility Accrual (or GUA) class of algorithms. The GUA class of algorithms includes two algorithms, namely, the Non-Greedy Global Utility Accrual (or NG-GUA) and the Greedy Global Utility Accrual (or G-GUA) algorithms. NG-GUA and G-GUA differ in the way schedules are constructed towards meeting all task deadlines, when possible to do so23.

In year 2011 Matthew A. Dellinger investigated the problem of experimentally evaluating the scaling behaviors of existing multicore real-time task scheduling algorithms on large-scale multicore platforms. As chip manufacturers rapidly increase the core count of processors, it becomes imperative that multicore real-time scheduling algorithms keep pace. Thus, it must be determined if existing algorithms can scale to these new high core-count platforms. He presented an experimental analysis of the scalability of 16 multicore real-time scheduling algorithms. They involve multicore platforms ranging from 8 to 48 cores. The algorithms are implemented in a real-time Linux kernel. Their study show that it is possible to implement global fixed and dynamic priority and simple global utility accrual real-time scheduling algorithms which will scale to large-scale multicore platforms. Interestingly, and in contrast to the conclusion of prior research, their results reveal that some global scheduling algorithms (e.g. G-NP-EDF) is actually scalable on large core counts24.

In year 2012 , Dhuha Basheer and Basil Shukr provided scheduling algorithm which was used with the kernel RTDM called earliest resource release and deadline first ERRDF algorithm. The ERRDF is preemptive, priority driven scheduling algorithm25.

In year 2012 H.S.Behera, Naziya Raffat ,and Minarva Mallik had proposed Enhanced Maximum Urgency First (EMUF) scheduling algorithm with intelligent laxity has been proposed. This algorithm is a further improvement in EMUF algorithm is mainly suited for real time systems where meeting of deadlines is an important criterion for scheduling26.

Kai Han (2012) investigated scheduling distributed soft real-time tasks in unreliable (e.g., those with arbitrary node and network failures) and untrustworthy systems (e.g., those with Byzantine node behaviors). He presented a distributed real-time scheduling algorithm called Real-Time Gossip (or RTG). RTG considers a distributed (i.e., multi-node) task model where tasks are subject to Time/Utility Function (or TUF) end-to-end time constraints, and the scheduling optimality criterion of maximizing the total accrued utility. The algorithm helps achieving two novel contributions. First, RTG uses gossip for reliably propagating task scheduling parameters and for discovering task execution nodes. Second, the algorithm guards against potential disruption of message propagation due to Byzantine attacks using a mechanism called Launcher-Attacker-Susceptible- Infective-Recovered-consumer (or LASIRC). By doing so, the algorithm assures task timeliness behaviors, despite system unreliability and untrustworthiness27.

In year 2012 Santhi Baskaran, and P. Thambidurai proposed a dynamic energy efficient scheduling algorithm with weighted First Come First Served (WFCFS) scheme. This also involves the run-time behavior of tasks, to further explore the idle periods of processors for energy saving. Their algorithm is compared with the existing Modified Feedback Control Scheduling (MFCS), First Come First Served (FCFS), and Weighted scheduling (WS) algorithms that uses Service-Rate-Proportionate (SRP) Slack Distribution Technique28.

1.3 Thesis Statement:

Distributed real-time systems such as those found in industrial automation, and military surveillance must support timely end-to-end activities. Timeliness includes application-specific acceptability of end-to-end time constraint satisfaction, and of the predictability of that satisfaction. These activities may include computational, sensor, and actuator steps which levy a causal ordering of operations, contingent on interactions with physical systems.

Reasoning about end-to-end timeliness is a difficult and unsolved problem in such systems. A distinguishing feature of such systems is their relatively long activity execution time scales (e.g., milliseconds to minutes), which permits more time-costlier real-time resource management. Maintaining end-to-end properties (e.g., timeliness, connectivity) of a control or information flow requires a model of the flow's locus in space and time that can be reasoned about19.

For those reasons , the problem of scheduling distributed real-time systems was investigated in this thesis. And a proposed distributed real-time system have been designed with two styles of dynamic scheduling schemes (independent node scheduling and collaborative scheduling) that examine the computation times, real times requirements of the tasks to produce a feasible schedule. The schedule was driven by DTUF function of the urgency and importance for tasks. Also a

Distributed Real Time-Fault Tolerance Integrity Protocol (DRT-FTIP) has been designed to control the faults that occurred in this proposed system.

1.4 Contributions:

In this work, some contributions were introduced:

1. Designing a distributed real time system with fault tolerance protocol using RMI Java middleware technique for building this proposed system.
2. Developing TUF(Time/Utility function) called DTUF (Developed Time/Utility function) to take the effect of the two values (Importance, urgency) to determine the priority of the threads.
3. Design two types of scheduling schemes: collaborative and independent called EOE-CDRTSA, EOE-PDRTSA with two types of assigning priority :static and dynamic.
4. Developing the fault tolerance integrity protocol control called DRT-FTIP with the proposed Distributed real-time system . The Integrity Protocol discovers the fault and recovers it.

1.5 Outline of the thesis:

Chapter Two: Distributed Systems

This chapter will provide a complete review about distributed system, Characterization of Distributed Systems, Distributed Operating System types, Middleware and its Categories, Networking and Internetworking and network types, Naming, Migration and Fault Tolerance.

Chapter Three: Real-Time systems

The Real-Time systems chapter describes Real-Time systems, the Design Issues, and the Basic concepts for real-time task scheduling, Scheduling definitions, algorithms and properties and Scheduling algorithms taxonomy, Multiprocessor Scheduling, Distributed Real-Time Systems, Scheduling Distributed.

Chapter Four: System And Algorithms Design

The aim in this chapter is to introduce the proposed system and its scheduling schemes, its objective and architecture. And describes how Distributed RMI System was designed.

Chapter Five: End-To-End Integrity protocol Design

This chapter describes the proposed protocol and its objectives and algorithms.

Chapter Six: System Implementation

The implementation chapter describes the case study that had been used to implement the proposed system, the objective of flight reservation system and its features, the hardware description of the system, software requirements, the technologies used, finally the scenarios and the results.

Chapter Seven: Conclusions and Future Works

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1: Architecture of proposed system

Chapter Two Distributed Systems

2.1 Introduction:

Networks of computers are everywhere. The Internet is one, as are the many networks of which it is composed. Mobile phone networks, corporate networks, factory networks, campus networks, home networks, in-car networks, both separately and in combination, all share the essential characteristics that make them relevant subjects for study under the heading distributed systems.

2.2 Distributed Systems Definition:

Various definitions of distributed systems have been given in the literature, none of them is satisfactory, and none of them is in agreement with any of the others. For our purposes it is sufficient to give a loose characterization:

“ Distributed systems are a collection of independent computers that appears to its users as a single coherent system. ” 29

This definition has several important aspects. The first one is that a distributed system consists of components (i.e., computers) that are autonomous. A second aspect is that users (be the people or programs) think that they are dealing with a single system. This means that in one way or the other the autonomous components need to collaborate. How to establish this collaboration lies at the heart of developing distributed systems29.

A distributed system is a collection of loosely coupled processors interconnected by a communication network. From the point of view of a specific processor in a distributed system, the rest of the processors and their respective resources are remote, whereas its own resources are local. The processors in a distributed system may vary in size and function. They may include small microprocessors, workstations, minicomputers, and large general-purpose computer systems. These processors are referred to by a number of names, such as sites, nodes, computers, machines, and hosts, depending on the context in which they are mentioned. Generally, one host at one site, the server, has a resource that another host at another site, the client (or user), would like to use. A general structure of a distributed system is shown in Figure 2.130.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.1: A distributed system30

2.3 Characterization of Distributed Systems

A distributed system should make resources easily accessible; it should reasonably hide the fact that resources are distributed across a network; it should be open; and it should be scalable29.

1- Connecting Users and Resources

The main goal of a distributed system is to make it easy for users to access remote resources, and to share them with other users in a controlled way. There are many reasons for wanting to share resources .One obvious reason is that of(economics).

2- Transparency

An important goal of a distributed system is to hide the fact that its processes and resources are physically distributed across multiple computers. A distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent. The concept of transparency can be applied to several aspects of a distributed system, as shown in table (2-1)29.

Table 2-1: Different forms of transparency in a distributed system29

Abbildung in dieser Leseprobe nicht enthalten

3- Openness

An open distributed system is a system that offers services according to standard rules that describe the syntax and semantics of those services [29,30].

4- Scalability

Scalability is one of the most important design goals for developers of distributed systems. Scalability of a system can be measured along at least three different dimensions. First , a system can be scalable with respect to its size, that is it can easily add more users and resources to the system .Second, a geographically scalable system is one in which the users and resources may lie far apart .Third , a system can be administratively scalable, that is it can still be easy to manage even if it spans many independent administrative organizations [31,32].

2.4 Software Concepts:

Hardware for distributed systems is important, but it is software that largely determines what a distributed system actually looks like. Distributed systems are very much like traditional operating systems. First, they act as resource managers for the underlying hardware, allowing multiple users and applications to share resources such as CPUs, memories, peripheral devices, the network, and data of all kinds. Second, and perhaps more important is that distributed systems attempt to hide the intricacies and heterogeneous nature of the underlying hardware by providing a virtual machine on which applications can be easily executed. Operating systems for distributed computers can be roughly divided into two categories: tightly-coupled systems and loosely-coupled systems. In tightly-coupled systems, the operating system essentially tries to maintain a single, global view of the resources it manages. Loosely-coupled systems can be thought of as a collection of computers each running their own operating system. However, these operating systems work together to make their own services and resources available to the others. A tightly-coupled operating system is generally referred to as a distributed operating system (DOS), and is used for managing multiprocessors and homogeneous multi-computers. Like traditional uniprocessor operating systems, the main goal of a distributed operating system is to hide the intricacies of managing the underlying hardware such that it can be shared by multiple processes. The loosely-coupled network operating system (NOS) is used for heterogeneous multicomputer systems. Although managing the underlying hardware is an important issue for a NOS, the distinction from traditional operating systems comes from the fact local services are made available to remote clients. To actually come to a distributed system, enhancements to the services of network operating systems are needed such that a better support for distribution transparency is provided. These enhancements lead to what is known as middleware, and lie at the heart of modern distributed systems29.

2.4.1 Distributed Operating System:

There are two types of distributed operating systems: multiprocessors operating System , multicomputer operating System. Multiprocessors operating System manages the resources of a multiprocessors. A multicomputer operating System is an operating system that is developed for homogeneous multiprocessors. The functionality of distributed operating system is essentially the same as that traditional operating systems for uniprocessor systems ,except that they handle multiple CPUs as shown in table (2-2) .

Table 2-2: An overview between DOS (Distributed Operating Systems),NOS (Network Operating Systems), and middleware29.

Abbildung in dieser Leseprobe nicht enthalten Uniprocessor Operating Systems:

Operating systems have traditionally been built to manage computers with only a single CPU. The main goal of these systems is to allow users and applications an easy way of sharing resources such as the CPU, main memory, disks, and peripheral devices. Sharing resources mean that different applications can make use of the same hardware in an isolated fashion. To an application, it appears as if it has its own resources, and that there may be several applications executing on the same system at the same time, each with their own set of resources. In this sense, the operating system is said to implement a virtual machine, offering multitasking facilities to applications29. Figure 2.2 shows separating applications through a microkernel.

Figure 2.2: Separating applications from operating system code through a microkernel29.

Abbildung in dieser Leseprobe nicht enthalten Multiprocessor Operating Systems:

It is an important, but often not entirely obvious extension to uniprocessor operating systems. It is support for multiple processors having access to a shared memory. Conceptually, the extension is simple in that all data structures needed by the operating system to manage the hardware, including the multiple CPUs, are placed into shared memory. The main difference is that these data are now accessible by multiple processors, so that they have to be protected against concurrent access to guarantee consistency29. Multicomputer Operating Systems:

Operating systems for multicomputer are of a totally different structure and complexity than multiprocessor operating systems. This difference is caused by the fact that data structures for system wide resource management can no longer be easily shared by merely placing them in physically shared memory. Instead, the only means of communication is through message passing. Multicomputer operating systems are therefore generally organized as shown in Figure 2.329.

Figure 2.3: General structure of a multicomputer operating system29.

Abbildung in dieser Leseprobe nicht enthalten

2.4.2 Network Operating Systems:

The machines and their operating systems may be different , but they are all connected to each other in a computer network .Also ,network operating systems provide facilities to allow users to make use of the services available on a specific machine. Service that is commonly provided by network operating systems is to allow a user to log into another machine remotely by using a command such as rlogin machine. In contrast to distributed operating systems, network operating systems do not assume that the underlying hardware is homogeneous and that it should be managed as if it were a single system. Instead, they are generally constructed from a collection of uniprocessor systems, each with its own operating system, as shown in Figure 2.4. The machines and their operating systems may be different, but they are all connected to each other in a computer network. It is perhaps easiest to describe network operating systems by taking a closer look at some services they typically offer29.


Excerpt out of 122 pages


A Fault Tolerance Distributed Real-Time System. Design and Implementation
University of Mosul  (College of Computer Sciences And Mathematics)
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
1714 KB
fault, tolerance, distributed, real-time, system, design, implementation
Quote paper
Amira Sallow (Author), 2013, A Fault Tolerance Distributed Real-Time System. Design and Implementation, Munich, GRIN Verlag, https://www.grin.com/document/270139


  • No comments yet.
Read the ebook
Title: A Fault Tolerance Distributed Real-Time System. Design and Implementation

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free