Cloud Basic Architecture and discussion.ppt
Lecture-2-Introduction-to-Cloud-Computing.pptx
NIST definition, "cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction."
The NIST definition lists five essential characteristics of cloud computing: on-demand self-service, broad network access, resource pooling, rapid elasticity or expansion, and measured service.
This chapter provides an overview of Cloud Computing. Cloud computing is a new paradigm which is widely used in distributed computing. To implement a service oriented architecture, it uses cyber infrastructure built on the concept of grid computing, virtualization, utility computing and software services [1]. In cloud computing, data are stored in a data center and manipulated through the internet. A cloud may consist of several thousand vendible servers. Virtualization concepts are used to deliver computing resources such as datacenter, servers, virtual machines and data storage [2]. A real time system can utilize the immense computing capabilities and virtualized environment of cloud for the execution of tasks. Cloud computing handles the resources on a larger scale which is cost effective and location independent.
There are different types of component such as SaaS, PaaS, IaaS, physical servers, virtual machines, load balancer network and cloud users. The cloud users submit the cloudlets and these are transferred through the network. The load balancer helps to balance the task according to scheduling. The cloud gives three types of service. The applications are provided by Software-as-a-service. Software subscription and required software are delivered by SaaS layer. The Software development tools are given by the PaaS layer. This layer helps to run the development software for testing purpose. The IaaS layer consists of visualization storage, compute and network etc. [4].
Cloud computing is a useful model of a collection of configurable computing resources such as data-center, servers, data storage and application services in real-time. The main features of cloud computing are on-demand accessibility scalability, management cost reduction, and location independence. Versatile critical applications are one of the major applications in cloud environment. Critical applications require a reliable cloud environment. Transient errors occur when energetic neutron particles from space and alpha particles from packaging materials hit integrated circuits (IC). These errors may change the state of memory bits or may alter the register file. Transient faults may also affect network connectivity, temporary unavailability of services, and timeout to get a response. These types of errors can be catastrophic in cloud applications such as, scientific research, financial and safety-critical systems. These errors may be a single bit, multiple bits and burst bits errors that may cause either service downtime or produces invalid results. To reduce the effect of such errors, a fault tolerant mechanism is required and the use of the method in cloud is more effective and reliable when it is more fault-tolerant and more adaptable to meet the demand. Fault tolerance is an effective step that permits a system to continue operation even in faulty environments. To ensure fault tolerance in the cloud, there are reactive and proactive fault tolerance. The reactive fault tolerance requires error recovery after faults are detected. On the other hand proactive fault tolerance technique prevents the faults by predicting it beforehand. Some of the popular fault tolerant techniques include parity, two-dimensional parity, hamming codes, checksum and cyclic redundancy checking. Most of these schemes are either hardware or software based. In this paper, we propose a software based fault tolerant technique. We have proposed a method one's complement based effective fault tolerance (EFT) technique to detect and correct errors in cloud computing systems. The proposed EFT scheme achieves 99.9% error coverage while it is 1.73 times faster than checksum technique and 3.06 times faster than CRC technique. Aggressive fault detection and recovery module detects faults and recovers from these faults using a smart decision agent. A smart decision agent takes decision on different types of hardware, software and communication faults. The experimental result shows that the proposed AFT approach is 1.5 times faster than checkpointing, 2.0 times faster than resubmission and 2.5 times faster than replication scheme. Our schemes reduce complexity and improves performance of fault tolerant schemes compared with some other existing techniques.
Versatile critical applications are being deployed in cloud computing. Critical applications require reliable cloud environment such as scientific research, financial and safety critical applications. In cloud computing, data are stored in a data center and manipulated through the internet. Suddenly useful data flow or control flows are changed the state due to transient, intermittent, permanent, omission and timeout faults. These single bit, multiple bits and burst bits errors may cause either service unavailability or produce invalid results. To reduce the effect of such errors, a fault tolerant mechanism is essential [1] [2]. The cloud computing is more effective and reliable when it is more fault-tolerant and more adaptable to meet the demand.
Cloud computing is a virtualization concept which is widely used in distributed computing. A group of small, self-directed services are consisted in a microservice architecture. Every service is self-directed and a single business capability should be implemented through well-designed application interfaces (APIs) [1]. To implement a microservice architecture, it usages cyber infrastructure built on the concept of virtualization, grid computing, software services and utility computing [3]. Data are stored in a data center and manipulated through the internet. Several thousand vendible servers may be consisted in a cloud. Virtualization concepts of datacenter, servers, virtual machines and data storage are used to deliver computing resources. Cloud computing handles the resources on a larger scale which is cost effective and location independent [4]. When a large number of users are increased, the demand of cloud is increased by horizontally or vertically scaling [5].
Cloud services are discussed as shown in Figure 1.1 [6]. The cloud users submit the tasks and are transferred through the network. The load balancer helps to distribute the tasks according to scheduling policy. The cloud gives three types of service to complete the tasks successfully. The applications are provided by Software-as-a-Service (SaaS). Software subscriptions and required software are delivered by SaaS layer. The Software development tools are given by the Platform-as-a-Service (PaaS) layer. This layer helps to run the development software for testing purpose. The IaaS layer consists of visualization storage, compute and virtual private network from the physical machines [7].
Figure 1.1: Basic cloud service layout [6].
In SaaS layer, software faults are state transition faults, operands errors, operands type errors, operation errors, early and late timing faults, timing overhead, protocol incompatibilities, data fault, logical fault, numerical exception, operating system faults, link timeout fault, user defined exception and unhandled exception faults. The transient and intermittent faults may occur in processing elements (PEs) of virtual machine (VMs) [8]. These types of faults may also occur in hosts and datacenters. Different types of faults usually are occurred in processing elements and memory modules as discussed in [9]. Unavailability of host, datacenter and VM may be occurred due to disk full or disk error. The communication faults are sending and receiving omission faults, early or late timing faults, packet corruption and packet loss faults. Data must be transmitted with acceptable accuracy and may be tainted during transmission. For this reason an error detection mechanism is necessary to receive the single-bit, multiple-bit, and burst-bit error free data. The transient faults may cause single bit, multiple bits and burst-bits errors. The permanent faults are physical damage of host, PEs and memory.
Figure 1.2: Different faults occurred in three service layers of cloud [9].
In Platform-as-a-Service (PaaS) section, it has multiple tools to implement application. We have installed application interface (API) and timeout faults are occurred [8]. Other faults are response fault, state transition fault and configuration change etc.
In Infrastructure-as-a-Service (IaaS) section, there are some faults such as transient fault, intermittent fault and permanent fault, cloudlets faults and virtual machine faults are PEs fault, memory fault, and bandwidth over utilized etc.
There are different types of errors occurred in different layers of cloud computing as depicted in Figure 1.3 [10]. Application misconfiguration and user requirement errors are occurred in user area. Input values change their state transition. Clients submit the cloudlet tasks to datacenter broker through the communication channel and can occur single, multiple and burst bits errors are occurred layer1 (L1). Errors might be occurred in the processor elements (PE), memory, and storage of virtual machine in layer2 (L2) [11]. Soft errors might be occurred in SaaS layer of public, private, multiple and community cloud during application processing or deployment application in layer3 and layer4 (L3 and L4) [9]. Intermittent faults are occurred in infrastructure for a short period during execution in layer4 (L4) [12]. Response faults consist of value, state transition and byzantine faults in layer5 (L5). The synchronization problems are occurred among the cloud information service (CIS), datacenter broker and datacenter servers in real time in layer6 (L6) [13].
s
Figure 1.3: Errors occur in different layers of cloud computing [14].
An error has been occurred due to network packet loss and packet corruption in network [15]. Time is set in sending, receiving and acknowledgement of a packet in layer7 (L7). Packets are not received and sent within time called timeout. Early and late faults include in timing faults which occur in sending and receiving side in layer7 (L7). Aging faults are occurred in hardware due to time dependent dielectric breakdown [16]. Omission fault is defined in denial of services and full of disk in layer5 (L5). Interaction fault includes the timing overheads, service inter dependencies and protocol incompatibilities in layer7 (L7) [17]. Fault tolerance manager increases the reliability and availability to detect the errors taking the right scheduling policy in user area layer1 (L1) [18].
Detection and correction of the following errors in cloud computing.
Ø To identify the different types of transient errors in cloud computing such as a single bit, multiple bits and burst bits errors.
Ø To detect the proactive errors such as sending omission errors and receiving omission errors in cloud computing environment
Ø Timeout errors
Ø results errors
Ø Operands errors
Ø Operand data types errors
Since the use of cloud computing is increasing in broad spectrum of applications, fault free services are required. The transient faults and noise change the state of control flow and data flow [12]. The main impacts for noise are disruption of a signal traveling, and unwanted systematic alteration [1]. It may cause timeout fault and temporary inaccessibility of services. The parity checking is commonly used for single bit error detection [19] [20]. The CRC technique mainly works using binary division. The CRC, enhanced CRC, and CRC16 can detect single bit, multiple bits and burst-bits errors [21] [19]. The CRC16 improves the reliability by detecting atmost 16 bits of burst errors [22]. But whenever the burst bit errors are greater than 16 bits [19], the probability of undetectable error is 1 – , where r is 17, 18, ... 64 etc.
All burst errors with L ≤ r will be detected [58].
All burst errors with L = r + 1 will be detected with probability 1 – [58].
All burst errors with L > r + 1 will be detected with probability 1 – [58].
The limitation of CRC and checksum techniques [19]:
1) CRC uses complex shift register and checksum uses binary adder circuit.
2) Detect the errors after multiple steps.
3) Reactive fault tolerance.
4) Checksum also detects single and multiple bits errors only [23].
CRC and Checksum are working in data link layer in TCP/IP model. So it detects and corrects errors in data communications. Checksum can only detects errors but CRC can detect and correct errors smoothly. When the burst length is greater than 16 bits, it unable detects errors. The complex circuit is used for error detection and correction whereas our proposed EFT scheme detects errors after two steps later using XNOR operation. Our AFT scheme detects and corrects errors without resending data again using smart decision agent. Since the use of cloud computing is increasing in different critical application, different fault tolerant techniques are needed. Replication and resubmission techniques work from initial state if a fault is occurred [24]. If the faults of VM, host, and PE are occurred, then replicas are used to execute the failed cloudlets from initial state. So the overhead of replication is more than hundred percent [25]. The checkpointing technique moves to the last checkpoint and compares with failed part of the task. The creation of checkpoint increases overhead. The checkpointing, replication and resubmission techniques are backward recovery and solve the problems using the roll back technique [16]. It also increases workload and execution time of cloud infrastructure. A maximum efficient smart decision is required to minimize the overhead of fault detection and recovery [26]. Aggressive fault tolerance technique consists of smart combination of checkpointing, resubmission, and replication methods to recover the faults using a smart decision agent [27].
Firstly, we have implemented the effective fault tolerant (EFT) technique using XNOR operation to extend the datacenter broker policy in cloud environment. This EFT technique detects and corrects the single bit, multiple bits and any length of burst bits errors after only two steps. Secondly, we have developed an aggressive fault tolerant (AFT) technique using smart decision agent to extend the datacenter broker in cloud. This AFT has combined checkpointing, resubmission and replication with smart decision agent. Transient errors occur when energetic neutron particles from space and alpha particles from packaging materials hit the integrated circuit (IC). It may cause single bit, multiple bits and burst-bits errors in hardware, software and communication. Noise is an important parameter for physical layer in data communication [4]. However, error detection is done in data link layer. The permanent faults are physical damage of host, PEs and memory. A fault tolerant mechanism ensures system reliability by improving the fault detection and recovery mechanism [28] [29] [30].
The main contributions of this research work are as follows:
a) To improve transient, permanent, omission and timeout error detection mechanism using error detection (ED) algorithm in cloud environment.
b) To develop an algorithm detects and corrects the single bit, multiple bits and any length of burst bits errors after only two steps.
1) Firstly, the server receives data and one’s complements of it.
2) Secondly executes the XNOR operation between them.
c) To implement the effective fault tolerant (EFT) protocol to extend the datacenter broker policy using XNOR operation in cloud environment.
d) To verify error coverage rate of the proposed technique by injecting different types of transient raw faults.
e) Develop algorithm of:
i. Data Packetizing (DP)
ii. Data Unpacketizing (DU)
iii. Error Detection (ED)
iv. Effective Error Correction ( EEC)
v. Aggressive Fault Tolerance Using Smart Decision Agent.
vi. Error Coverage Analysis.
The effective fault tolerant (EFT) scheme is able to detect and correct any length of burst errors. It can also detect and correct single and multiple bits errors. The coverage of proposed scheme is 99.98% in compare with CRC whose coverage is 97.78% and the coverage of checksum is 96.18%. Our experimental result shows that our approach is 1.73 times faster than checksum technique and 3.06 times faster than CRC technique. The EFT protocol overhead is payload size divided by total frame size. The packet size is 1500 byte and space overhead is calculated = 97.53%, where the frame size = 1538 byte. The overhead of single bit, multiple bits and burst-bits errors are different. The overhead of burst-bits error is more than that of multiple bits and single bits. The aggressive fault tolerance (AFT) technique consists of smart combination of checkpointing, resubmission, and replication methods. The aggressive fault detection (AFD) module monitors the message using heartbeat mechanism. The aggressive fault recovery (AFR) module recovers the faults using a smart decision agent. The performance of AFT is 1.5 times faster than checkpointing, 2.0 times faster than resubmission technique and 2.5 times faster than replication technique using a smart decision agent.
Chapter 2 discusses the related backgrounds for reliability, availability and usability improved in the cloud computing. The different types of faults are occurred in different layers of cloud computing environment. Some related existing techniques are discussed their mechanism. The researchers finds these faults and try to solve using the best mechanism. Among these problem we have found some critical problems and solved these successfully.
Chapter 3 proposes the effective fault tolerant (EFT) technique to detect and correct the faults effectively. It solves the problems proactively using error detection (ED) and error correction (EC) scheme. It helps to improve the system reliability and availability using the protocols. We have showed the space and time complexity to define the required time.
Chapter 4 develops an aggressive fault tolerance (AFT) using smart decision agent that describes the activities of smart decision agent for error detection and correction smartly using checkpointing, replication and re-submission mechanisms.
Chapter 5 presents the experimental setup, results and analysis that shows the required tools and improvement performance of our proposed schemes. These schemes are explained the best performance, higher reliability and availability with existing techniques.
Chapter 6 accomplishes these chapters of works in this thesis and it gives us a good guide of a future direction work.