The fault tolerant system is a mechanism that can run a system in cloud without hazard to clients that assists in cloud services between cloud provider and cloud users. Faults are raw materials problems as shown in Figure 2.1. It is the real reason of failure. Error is an indication or symbol of a fault. It refers to difference between actual output and expected output. Failure means external behavior is incorrect and a single error is occurred. It might damage a system. In a simple word, the outcome of the fault are errors of failures [16]. By the practice of keyword as fault tolerance in cloud and other related keyword, I have hunted different research work through popular research work.
The faults are caused for errors. A fault is occurred multiple errors. The unexpected results created the system failure. A single fault may cause of system failure [31].
Cloud computing is a distributed network for enabling convenient, ubiquitous, on-demand and shared pool of configurable computing resources such as servers, services, networks, storage and applications [32]. Various features are comprised in cloud computing such as, on-demand access, cost reduction, minimum management effort, scalability and device or location independence. It has four deployment models: (i) public cloud (ii) private cloud (iii) hybrid cloud (iv) community cloud. The fault tolerance techniques are applied to detect and recover faults in these deployment models. The cloud give three types of services (i) Software-as-a-Service (SaaS) (ii) Platform-as-a-Service (PaaS) (iii) Infrastructure-as-a-Service (IaaS). Different faults are occurred in clo.ud platform and obstructs to give the best services to cloud users [10].
There are several characteristics of cloud computing such as device and location independence, on-demand services, scalability, guaranteed quality of service, virtualization, security, multi-tenancy and fault tolerance [33].
Scalability and on-demand services: Cloud users are assumed on demand access of resources and services. The horizontal and vertical scaling up or down are based on demand from cloud users [33].
User-centric interface: The interfaces of cloud are not dependent on location of cloud users. They can be opened by well-established interfaces such as internet browsers and web services [33].
Guaranteed Quality of Service (GQS): Cloud computing promises quality of service for users by assured performance, processing capacity, bandwidth and memory size [33].
Autonomous system: Cloud users can reconfigure and associate software according to their requirements [33].
Cost: There is no essential investment of capital expenditure or any upfront savings in cloud. The cloud computing follows pay-as-you-go model. The services are finished on the basis of needed.
Virtualization: Virtual utilization of resources are improved by distribution of the servers and storage devices [33].
Multi-tenancy: A large number of users shares the resource and allows for centralization mechanism [34].
Loose coupling: The resources are inaccurately joined as one resource functionality barely affects the operational of different resources.
Reliable delivery: The reliable delivery of information is needed between resources in the TCP/IP architecture. The cloud organization is used the private network protocols [35].
High security: These are continued on the above characteristics. To execute the loose coupling the jobs are needed high security, whenever the parts of cloud are damaged.
Cloud can be consisted into three type categories that are presented by the cloud providers [36] . These are given below:
Software as a Service (SaaS): In this model, a complete application is delivered on demand to the cloud users. Clients need not invest upfront before using the applications. They use the software subscription based or followed the model Pay-as-you-go, such as Google, Salesforce and Microsoft [9] etc.
Platform as a Service (PaaS): In this model, the integrated development environment tools are provided to develop own business policy. Cloud developer develops the application and run it in cloud environment to supply the customers. A predefined configuration of operation system and application server are delivered to cloud users. For an example, Force.com and Google's App Engine are providing as platform [37].
Infrastructure as a Service (IaaS): The virtualization of resources are provided to run the application called the Infrastructure as a Service (IaaS). The resources are virtual server, host, machine, storage, and computing capacity etc. The cloud users deploy their own applications in cloud infrastructure such as Amazon and Go Grid [38] etc.
The public cloud model, private cloud model, hybrid cloud model and community cloud model are deployed in cloud computing. The deployment models of cloud are distinct structure.
Public Cloud: Cloud users are joined to internet and taking access of cloud space certainly. It offers of availability of cloud resources and follows the "Pay As You Go" model. The advantages of public cloud are cost effectives, reliability, high scalability, flexibility, and utility style costing and location independence. The disadvantages of public cloud are low security, less customizable and resources are shared publicly etc. The clients share the same virtual infrastructure with limited formation, security, and availability adjustments [4].
Private Cloud: The specific and limited access in an organization or a particular group is called the private cloud. The main advantages are higher security and privacy, more control, improved reliability and cost and energy efficiency. But the scalability is more costly and area of operation is restricted. It is used highly price than public cloud [4].
Hybrid Cloud: The hybrid cloud is consisted of public and private cloud. The scalability, security, cost efficiency, flexibility and security services are provided to customers. This cloud is used partially public and private [4].
Community Cloud: The organization with common requirements shares system and services from a specific community called community cloud. It reduces the cost among the associations. It shares files, resources and infrastructure among organizations from a specific community [4].
The cloud computing platforms are convinced the cloud computing more flexible, more reliable and many types of services [5]. Cloud computing based technologies are itemized below.
Ø Distributed system
Ø Grid computing
Ø Virtualization technology
Ø Client-Server model
Ø Utility computing
A collection of self-determining components are positioned on different machines to achieve common goals. It shares messages with each other in order. It is a distributed network to resolve large tasks by distributing these in the distributing systems. In shortly, the cloud computing is a distributed transparent system that computes the processing, storage and management of data. Some popular examples of distributed systems are facebook, World Wide Web (WWW) and ATM [39].
Grid computing is a collection of computers that are connected from different locations with each other to accomplish a common objective. These computer resources are different networks and different geographically regions. Grid computing breakdowns the complex tasks into smaller pieces, which are circulated for processing to complete a common objective [10].
Virtualization is a well-known technology which refers a single physical instance of applications or resources among several or group of organizations or tenants (customers). When a huge customer request to access the resources of cloud, it provides a pointer of specific virtual resources. Every virtual machine is a unique from a physical machine. Virtual machines deliver an environment that is logically divided from the underlying hardware. The virtual machines are created from a host. This virtual machine is accomplished by a software or firmware, which is identified as hypervisor [16].
The client-server application is a well-known application model that distributes tasks or loads among the service suppliers of a resource called the servers. The clients give the request and servers send the response to clients. When the clients send a request for service through the internet, the servers accept the request for processing and delivers the results to the client. The server shares the resources to clients but clients do not have it, such as E-mail and World Wide Web [40] etc.
Utility computing is constructed on the traditional on Pay-as-you-used-model. It deals a computing assets on-demand as a metered provision. The utility computing concept is based on cloud computing and grid computing [14].
There are several elements consisted in the cloud computing, such as clients, cloudlets, datacenter broker, cloud information services (CIS) and datacenters (virtual hosts, virtual machines and processor elements (PEs)) as shown in Figure 2.2 [41] [42].
Figure 2.2: Basic components of cloud architecture [The fault tolerant system is a mechanism that can run a system in cloud without hazard to clients that assists in cloud services between cloud provider and cloud users. Faults are raw materials problems as shown in Figure 2.1. It is the real reason of failure. Error is an indication or symbol of a fault. It refers to difference between actual output and expected output. Failure means external behavior is incorrect and a single error is occurred. It might damage a system. In a simple word, the outcome of the fault are errors of failures [16]. By the practice of keyword as fault tolerance in cloud and other related keyword, I have hunted different research work through popular research work.
The faults are caused for errors. A fault is occurred multiple errors. The unexpected results created the system failure. A single fault may cause of system failure [31].
Cloud computing is a distributed network for enabling convenient, ubiquitous, on-demand and shared pool of configurable computing resources such as servers, services, networks, storage and applications [32]. Various features are comprised in cloud computing such as, on-demand access, cost reduction, minimum management effort, scalability and device or location independence. It has four deployment models: (i) public cloud (ii) private cloud (iii) hybrid cloud (iv) community cloud. The fault tolerance techniques are applied to detect and recover faults in these deployment models. The cloud give three types of services (i) Software-as-a-Service (SaaS) (ii) Platform-as-a-Service (PaaS) (iii) Infrastructure-as-a-Service (IaaS). Different faults are occurred in clo.ud platform and obstructs to give the best services to cloud users [10].
There are several characteristics of cloud computing such as device and location independence, on-demand services, scalability, guaranteed quality of service, virtualization, security, multi-tenancy and fault tolerance [33].
Scalability and on-demand services: Cloud users are assumed on demand access of resources and services. The horizontal and vertical scaling up or down are based on demand from cloud users [33].
User-centric interface: The interfaces of cloud are not dependent on location of cloud users. They can be opened by well-established interfaces such as internet browsers and web services [33].
Guaranteed Quality of Service (GQS): Cloud computing promises quality of service for users by assured performance, processing capacity, bandwidth and memory size [33].
Autonomous system: Cloud users can reconfigure and associate software according to their requirements [33].
Cost: There is no essential investment of capital expenditure or any upfront savings in cloud. The cloud computing follows pay-as-you-go model. The services are finished on the basis of needed.
Virtualization: Virtual utilization of resources are improved by distribution of the servers and storage devices [33].
Multi-tenancy: A large number of users shares the resource and allows for centralization mechanism [34].
Loose coupling: The resources are inaccurately joined as one resource functionality barely affects the operational of different resources.
Reliable delivery: The reliable delivery of information is needed between resources in the TCP/IP architecture. The cloud organization is used the private network protocols [35].
High security: These are continued on the above characteristics. To execute the loose coupling the jobs are needed high security, whenever the parts of cloud are damaged.
Cloud can be consisted into three type categories that are presented by the cloud providers [36] . These are given below:
Software as a Service (SaaS): In this model, a complete application is delivered on demand to the cloud users. Clients need not invest upfront before using the applications. They use the software subscription based or followed the model Pay-as-you-go, such as Google, Salesforce and Microsoft [9] etc.
Platform as a Service (PaaS): In this model, the integrated development environment tools are provided to develop own business policy. Cloud developer develops the application and run it in cloud environment to supply the customers. A predefined configuration of operation system and application server are delivered to cloud users. For an example, Force.com and Google's App Engine are providing as platform [37].
Infrastructure as a Service (IaaS): The virtualization of resources are provided to run the application called the Infrastructure as a Service (IaaS). The resources are virtual server, host, machine, storage, and computing capacity etc. The cloud users deploy their own applications in cloud infrastructure such as Amazon and Go Grid [38] etc.
The public cloud model, private cloud model, hybrid cloud model and community cloud model are deployed in cloud computing. The deployment models of cloud are distinct structure.
Public Cloud: Cloud users are joined to internet and taking access of cloud space certainly. It offers of availability of cloud resources and follows the "Pay As You Go" model. The advantages of public cloud are cost effectives, reliability, high scalability, flexibility, and utility style costing and location independence. The disadvantages of public cloud are low security, less customizable and resources are shared publicly etc. The clients share the same virtual infrastructure with limited formation, security, and availability adjustments [4].
Private Cloud: The specific and limited access in an organization or a particular group is called the private cloud. The main advantages are higher security and privacy, more control, improved reliability and cost and energy efficiency. But the scalability is more costly and area of operation is restricted. It is used highly price than public cloud [4].
Hybrid Cloud: The hybrid cloud is consisted of public and private cloud. The scalability, security, cost efficiency, flexibility and security services are provided to customers. This cloud is used partially public and private [4].
Community Cloud: The organization with common requirements shares system and services from a specific community called community cloud. It reduces the cost among the associations. It shares files, resources and infrastructure among organizations from a specific community [4].
The cloud computing platforms are convinced the cloud computing more flexible, more reliable and many types of services [5]. Cloud computing based technologies are itemized below.
Ø Distributed system
Ø Grid computing
Ø Virtualization technology
Ø Client-Server model
Ø Utility computing
A collection of self-determining components are positioned on different machines to achieve common goals. It shares messages with each other in order. It is a distributed network to resolve large tasks by distributing these in the distributing systems. In shortly, the cloud computing is a distributed transparent system that computes the processing, storage and management of data. Some popular examples of distributed systems are facebook, World Wide Web (WWW) and ATM [39].
Grid computing is a collection of computers that are connected from different locations with each other to accomplish a common objective. These computer resources are different networks and different geographically regions. Grid computing breakdowns the complex tasks into smaller pieces, which are circulated for processing to complete a common objective [10].
Virtualization is a well-known technology which refers a single physical instance of applications or resources among several or group of organizations or tenants (customers). When a huge customer request to access the resources of cloud, it provides a pointer of specific virtual resources. Every virtual machine is a unique from a physical machine. Virtual machines deliver an environment that is logically divided from the underlying hardware. The virtual machines are created from a host. This virtual machine is accomplished by a software or firmware, which is identified as hypervisor [16].
The client-server application is a well-known application model that distributes tasks or loads among the service suppliers of a resource called the servers. The clients give the request and servers send the response to clients. When the clients send a request for service through the internet, the servers accept the request for processing and delivers the results to the client. The server shares the resources to clients but clients do not have it, such as E-mail and World Wide Web [40] etc.
Utility computing is constructed on the traditional on Pay-as-you-used-model. It deals a computing assets on-demand as a metered provision. The utility computing concept is based on cloud computing and grid computing [14].
There are several elements consisted in the cloud computing, such as clients, cloudlets, datacenter broker, cloud information services (CIS) and datacenters (virtual hosts, virtual machines and processor elements (PEs)) as shown in Figure 2.2 [41] [42].
Clients: These are typically the computers, mobile phones and thin browser which are used by the end users. These devices can be used by the end user to accomplish a task in cloud [6].
Cloud Information Service (CIS): Datacenter is registered in cloud information service (CIS) and the information of the cloud components are stored in a table of CIS.
Datacenter Broker: The datacenter broker acts as a coordinator between software-as-as-services (SaaS) and cloud providers. The main responsibility of broker collects the available resources and provides quality of service to clients of cloud system. CIS sends the acknowledgement to the broker about available resources of cloud [14].
Datacenter: Broker connects to datacenter. A datacenter is a collection of virtualized hosts, virtual machines, processing elements, virtual networks and virtual storage. A datacenter consists of architecture X86, operating system, and virtual machine monitor (VMM), host list, memory, bandwidth and storage [43] etc.
Host: A host consists of multiple virtual machines. The capacity of host is defined million instructions per second. There are multiple virtual machines in a host. The processing elements are processed the cloudlets and sends the result to broker [44].
Virtual Machine: The VMs are allocated in a host with the best-fit mechanism. The parameters of the host are processing capacity usually measured in million instructions per second (MIPS), memory size is in megabyte (MB), storage size is in terabyte (TB) and communication bandwidth is in megabyte per second (Mbps) [2].
Cloudlet: The cloudlet is an application which consists of million instructions (it is also known as a task such as social networking, content delivery and business application etc.). These cloudlets are executed by processing element (PE). The parameters of a cloudlet are cloudlet Id, user Id, length (in million instructions), number of PEs, input and output size (in MB). Sixthly, the cloudlets are submitted to virtual machine and are executed by PEs [45].
Different types of faults may be occurred in the cloud computing, such as transient and intermittent faults may occur in processing elements (PEs) of virtual machine (VMs) [8]. These types of faults may also occur in hosts and datacenters [7]. Different types of faults usually are occurred in processing elements and memory modules. Unavailability of host, datacenter and VM may be occurred due to disk full or disk error [46]. The permanent faults are physical damage of host, PEs and memory [47].
Figure 2.3: Single bit, multiple bits and burst bits errors [48].
Data must be transmitted with acceptable accuracy and may be tainted during transmission. For this reason an error detection mechanism is necessary to receive the single-bit, multiple-bit, and burst-bit error free data [15]. Single-bit error is occurred whenever a single bit of data are altered either from 1 to 0 or from 0 to 1 as shown in Figure 2.3 (a). Multiple-bit errors are occurred one bit of different position are changed as shown in Figure 2.3 (b). Whenever multiple bits are erroneous, they are called burst bits errors. Whenever consecutive multiple bits are erroneous, it is called the burst-bit errors. The consecutive bits are erroneous (e.g. B5, B4, B3 and B2) which is called burst-bit error as shown in Figure 2.3 (c) [20].
Figure 2.4: Taxonomy of different types of faults occurred in cloud computing [49] [50].
Faults are occurred in different location and layers of cloud computing. It has mainly divided three locations (i) hardware faults (ii) communications faults and (iii) software faults as shown in Figure 2.4. Faults cause to hamper to acquire the best services for cloud users.
The hardware faults are transient fault, intermittent fault, and disk full, processing elements (PEs) fault, memory fault and storage fault [51] [52]. Transient faults may effect of network connectivity, availability of services and response time errors. The virtual machine is either down or over utilized, hardware might be damaged, and the acknowledgement might be timeout etc.
Hardware faults are occurred in cloud environment such as:
1. Transient faults may cause of following errors:
Ø Single bit
Ø Multiple bits
Ø Burst bits
2. Intermittent faults are occurred in system suddenly for short period of time. It may cause of malfunctioning of device. So it should be solved by replacing the device.
3. Permanent faults are physical damage of:
Ø Machine
Ø Processor elements (PEs)
Ø Memory
4. Physical machine fault – It has multiple virtual machines, processor elements and memory etc. Whenever the permanent faults are occurred, faults are hampered to give proper services.
5. Disk full – It is one kind of omission fault. When the disk is full, we cannot write any data.
6. Machine fault- The permanent faults are occurred in memory, processor elements of a machine. It may cause service down.
7. Processor elements fault- The processing capacity of processor elements is executed a unit million instruction per second (MIPS). It may be busier, when the cloudlets are executed.
8. Memory faults- The single bit, multiple bit and burst bits are occurred to re-write for transient faults.
Siddiqui has proposed [46] a single bit error detection scheme based on hardware fault tolerance for huge data in cloud computing. This scheme uses concurrent error detection (CED) mechanism that is able to detect hardware faults. D. Mittal et al. [53] proposed heartbeat protocol which is used for hardware and software faults detection algorithm. The. M. K. Gokhroo et al. [54] proposed fault detection and mitigation using two fault detection time algorithms (fault detection time-algorithm1 and time-algorithm 2). These two algorithms identify the faults and correct them.
Software faults are software state transition faults, early and late timing faults, timing overhead, protocol incompatibilities, data fault, logical fault, numerical exception, operating system faults, link timeout fault, user defined exception and unhandled exception faults [16] [51] [55].
Different types of software faults are occurred in cloud computing environment such as:
1. Data faults – The raw faults are occurred on different types of data. These errors are single bit, multiple bits and burst bits errors.
2. Unexpected input - Whenever data are taken input from different sources, the different types of unexpected errors can be occurred in cloud computing.
3. Logical faults - Logical faults signify the explicit or implicit effect of physical errors on the performance of a system. The system cannot detect specific way.
4. Operating system fault - The operating system may be corrupted or suffered by different types of problems such as malware, viruses, spyware, and virus affected software etc. It changes the state of control flow and data flow direction.
5. Numerical exception - Numeric exceptions is occurred whenever some mathematical operations have an infinite value. Sometime the sign value is changed and created the unexpected events.
6. User defined exception – In software operation, different types of faults are recovered such as try-catch-throw method, blocking mechanism, N-version programming and self-checking method etc.
7. Application faults – It is defined the unexpected behavior of an application. We cannot find out the expected physical outlook.
Chinnaiah proposed [55] an algorithm that achieved reliability for depth critical configuration of a software system. It caused unpredictable behavior and performance anomalies in software systems. A. Ledmi et al. [56] discussed the optimizing fault tolerance in the distributed system. M. A. Rouf et al. [12] discussed the state-of-theart techniques to combat with soft errors using different techniques broadly categorized in three types: (i) software based schemes, (ii) hardware based schemes, and (iii) hardware and software based co-design schemes. S. Jaswal et al. [57] proposed model for fault tolerance in cloud environment. Trusted model in cloud is the mostly on-demand accessed that supports protecting communication in cloud computing.
The communication faults are sending and receiving omission faults, early or late timing faults, packet corruption and packet loss faults. The transient faults may cause single bit, multiple bits and burst-bits errors [58].
Some communication faults are given below [20]:
1. Omission faults - A directional fault is arisen in denial of service attacked and disk full called the omission fault.
2. Packet corruption – The single bit, burst bits and multiple bits are corrupted into the packets during data transmitted through the channel. The control flow and data flow are changed. Mainly packet corruption is occurred due to noise and temperature [15].
3. Packet loss - Whenever one or more packets of data are not arrived from source to destination, a packet loss is identified due to network errors [15].
4. Timeout fault – The early and late fault are occurred for timeout fault. For this, the acknowledgement is not reached in timely and session time out is appeared of a link [13].
5. Network congestion – Whenever network traffic exceeds the capacity and bandwidths are insufficient, network congestion error is occurred [28].
6. Protocol incompatibilities – The policy violation problems are occurred between two heterogeneous networks [28].
Cloud computing system is a new paradigm of transparent distributed system. It handles the resources on a larger scale with cost effective and location independent manner. Since the use of cloud computing is increasing in broad spectrum of applications, fault free services are required. The cloud computing is more effective and reliable when it is more fault-tolerant and more adaptable to meet the demand. Fault tolerance is an effective step that permits a system to continue operation even in faulty environments. It ensures system reliability by improving the fault detection and recovery mechanism. To ensure fault tolerance in the cloud, there are reactive and proactive fault tolerance. The reactive fault tolerance requires error recovery after faults are detected [59]. On the other hand proactive fault tolerant technique prevents the faults by predicting it beforehand [60]. Fault tolerance considers effective steps to prevent failure. It ensures system reliability [7] by improving the error detection and correction mechanism [54].
The fault tolerant technique considers various parameters in cloud environment such as throughput, performance, availability, usability, response-time, availability, scalability, reliability, security and service level agreement (SLA) [61].
Throughput– Throughput is an important metric for defining the performance of different fault tolerant techniques. It is a metric for measuring the time for sending and receiving data.
Response Time- The total time of input time, processing time and transmission time is called the response time.
Scalability– The services are up or down depending on requirements of clients. The horizontal or vertical scaling is increased horizontally or vertically. For this, the availability of resources is improved smoothly.
Availability- Availability is defined by sign A(t). It is proportional to reliability in a definition system. The definition of mean time to failure (MTTF) is given in Equation 2.1 [62] and mean time between failures (MTBF) is called the availability.
MTTF = dt (2.1)
Usability- The user level satisfaction from the available resources and proper utilization to achieve a goal with effectiveness and efficiency is called the usability.
Reliability- The system receives correct or acceptable results within deadline. The reliability of system depends of running application smoothly. It is defined as sign R(t) in Equation 2.2 [62].
R (t) = dt (2.2)
Overhead- The overhead can be associated to movement of cloudlets and inter system communication. It should be minimized for effective fault tolerance in cloud environment.
41] [42].
Clients: These are typically the computers, mobile phones and thin browser which are used by the end users. These devices can be used by the end user to accomplish a task in cloud [6].
Cloud Information Service (CIS): Datacenter is registered in cloud information service (CIS) and the information of the cloud components are stored in a table of CIS.
Datacenter Broker: The datacenter broker acts as a coordinator between software-as-as-services (SaaS) and cloud providers. The main responsibility of broker collects the available resources and provides quality of service to clients of cloud system. CIS sends the acknowledgement to the broker about available resources of cloud [14].
Datacenter: Broker connects to datacenter. A datacenter is a collection of virtualized hosts, virtual machines, processing elements, virtual networks and virtual storage. A datacenter consists of architecture X86, operating system, and virtual machine monitor (VMM), host list, memory, bandwidth and storage [43] etc.
Host: A host consists of multiple virtual machines. The capacity of host is defined million instructions per second. There are multiple virtual machines in a host. The processing elements are processed the cloudlets and sends the result to broker [44].
Virtual Machine: The VMs are allocated in a host with the best-fit mechanism. The parameters of the host are processing capacity usually measured in million instructions per second (MIPS), memory size is in megabyte (MB), storage size is in terabyte (TB) and communication bandwidth is in megabyte per second (Mbps) [2].
Cloudlet: The cloudlet is an application which consists of million instructions (it is also known as a task such as social networking, content delivery and business application etc.). These cloudlets are executed by processing element (PE). The parameters of a cloudlet are cloudlet Id, user Id, length (in million instructions), number of PEs, input and output size (in MB). Sixthly, the cloudlets are submitted to virtual machine and are executed by PEs [45].
Different types of faults may be occurred in the cloud computing, such as transient and intermittent faults may occur in processing elements (PEs) of virtual machine (VMs) [8]. These types of faults may also occur in hosts and datacenters [7]. Different types of faults usually are occurred in processing elements and memory modules. Unavailability of host, datacenter and VM may be occurred due to disk full or disk error [46]. The permanent faults are physical damage of host, PEs and memory [47].
Figure 2.3: Single bit, multiple bits and burst bits errors [48].
Data must be transmitted with acceptable accuracy and may be tainted during transmission. For this reason an error detection mechanism is necessary to receive the single-bit, multiple-bit, and burst-bit error free data [15]. Single-bit error is occurred whenever a single bit of data are altered either from 1 to 0 or from 0 to 1 as shown in Figure 2.3 (a). Multiple-bit errors are occurred one bit of different position are changed as shown in Figure 2.3 (b). Whenever multiple bits are erroneous, they are called burst bits errors. Whenever consecutive multiple bits are erroneous, it is called the burst-bit errors. The consecutive bits are erroneous (e.g. B5, B4, B3 and B2) which is called burst-bit error as shown in Figure 2.3 (c) [20].
Figure 2.4: Taxonomy of different types of faults occurred in cloud computing [49] [50].
Faults are occurred in different location and layers of cloud computing. It has mainly divided three locations (i) hardware faults (ii) communications faults and (iii) software faults as shown in Figure 2.4. Faults cause to hamper to acquire the best services for cloud users.
The hardware faults are transient fault, intermittent fault, and disk full, processing elements (PEs) fault, memory fault and storage fault [51] [52]. Transient faults may effect of network connectivity, availability of services and response time errors. The virtual machine is either down or over utilized, hardware might be damaged, and the acknowledgement might be timeout etc.
Hardware faults are occurred in cloud environment such as:
1. Transient faults may cause of following errors:
Ø Single bit
Ø Multiple bits
Ø Burst bits
2. Intermittent faults are occurred in system suddenly for short period of time. It may cause of malfunctioning of device. So it should be solved by replacing the device.
3. Permanent faults are physical damage of:
Ø Machine
Ø Processor elements (PEs)
Ø Memory
4. Physical machine fault – It has multiple virtual machines, processor elements and memory etc. Whenever the permanent faults are occurred, faults are hampered to give proper services.
5. Disk full – It is one kind of omission fault. When the disk is full, we cannot write any data.
6. Machine fault- The permanent faults are occurred in memory, processor elements of a machine. It may cause service down.
7. Processor elements fault- The processing capacity of processor elements is executed a unit million instruction per second (MIPS). It may be busier, when the cloudlets are executed.
8. Memory faults- The single bit, multiple bit and burst bits are occurred to re-write for transient faults.
Siddiqui has proposed [46] a single bit error detection scheme based on hardware fault tolerance for huge data in cloud computing. This scheme uses concurrent error detection (CED) mechanism that is able to detect hardware faults. D. Mittal et al. [53] proposed heartbeat protocol which is used for hardware and software faults detection algorithm. The. M. K. Gokhroo et al. [54] proposed fault detection and mitigation using two fault detection time algorithms (fault detection time-algorithm1 and time-algorithm 2). These two algorithms identify the faults and correct them.
Software faults are software state transition faults, early and late timing faults, timing overhead, protocol incompatibilities, data fault, logical fault, numerical exception, operating system faults, link timeout fault, user defined exception and unhandled exception faults [16] [51] [55].
Different types of software faults are occurred in cloud computing environment such as:
1. Data faults – The raw faults are occurred on different types of data. These errors are single bit, multiple bits and burst bits errors.
2. Unexpected input - Whenever data are taken input from different sources, the different types of unexpected errors can be occurred in cloud computing.
3. Logical faults - Logical faults signify the explicit or implicit effect of physical errors on the performance of a system. The system cannot detect specific way.
4. Operating system fault - The operating system may be corrupted or suffered by different types of problems such as malware, viruses, spyware, and virus affected software etc. It changes the state of control flow and data flow direction.
5. Numerical exception - Numeric exceptions is occurred whenever some mathematical operations have an infinite value. Sometime the sign value is changed and created the unexpected events.
6. User defined exception – In software operation, different types of faults are recovered such as try-catch-throw method, blocking mechanism, N-version programming and self-checking method etc.
7. Application faults – It is defined the unexpected behavior of an application. We cannot find out the expected physical outlook.
Chinnaiah proposed [55] an algorithm that achieved reliability for depth critical configuration of a software system. It caused unpredictable behavior and performance anomalies in software systems. A. Ledmi et al. [56] discussed the optimizing fault tolerance in the distributed system. M. A. Rouf et al. [12] discussed the state-of-theart techniques to combat with soft errors using different techniques broadly categorized in three types: (i) software based schemes, (ii) hardware based schemes, and (iii) hardware and software based co-design schemes. S. Jaswal et al. [57] proposed model for fault tolerance in cloud environment. Trusted model in cloud is the mostly on-demand accessed that supports protecting communication in cloud computing.
The communication faults are sending and receiving omission faults, early or late timing faults, packet corruption and packet loss faults. The transient faults may cause single bit, multiple bits and burst-bits errors [58].
Some communication faults are given below [20]:
1. Omission faults - A directional fault is arisen in denial of service attacked and disk full called the omission fault.
2. Packet corruption – The single bit, burst bits and multiple bits are corrupted into the packets during data transmitted through the channel. The control flow and data flow are changed. Mainly packet corruption is occurred due to noise and temperature [15].
3. Packet loss - Whenever one or more packets of data are not arrived from source to destination, a packet loss is identified due to network errors [15].
4. Timeout fault – The early and late fault are occurred for timeout fault. For this, the acknowledgement is not reached in timely and session time out is appeared of a link [13].
5. Network congestion – Whenever network traffic exceeds the capacity and bandwidths are insufficient, network congestion error is occurred [28].
6. Protocol incompatibilities – The policy violation problems are occurred between two heterogeneous networks [28].
Cloud computing system is a new paradigm of transparent distributed system. It handles the resources on a larger scale with cost effective and location independent manner. Since the use of cloud computing is increasing in broad spectrum of applications, fault free services are required. The cloud computing is more effective and reliable when it is more fault-tolerant and more adaptable to meet the demand. Fault tolerance is an effective step that permits a system to continue operation even in faulty environments. It ensures system reliability by improving the fault detection and recovery mechanism. To ensure fault tolerance in the cloud, there are reactive and proactive fault tolerance. The reactive fault tolerance requires error recovery after faults are detected [59]. On the other hand proactive fault tolerant technique prevents the faults by predicting it beforehand [60]. Fault tolerance considers effective steps to prevent failure. It ensures system reliability [7] by improving the error detection and correction mechanism [54].
The fault tolerant technique considers various parameters in cloud environment such as throughput, performance, availability, usability, response-time, availability, scalability, reliability, security and service level agreement (SLA) [61].
· Throughput– Throughput is an important metric for defining the performance of different fault tolerant techniques. It is a metric for measuring the time for sending and receiving data.
· Response Time- The total time of input time, processing time and transmission time is called the response time.
· Scalability– The services are up or down depending on requirements of clients. The horizontal or vertical scaling is increased horizontally or vertically. For this, the availability of resources is improved smoothly.
· Availability- Availability is defined by sign A(t). It is proportional to reliability in a definition system. The definition of mean time to failure (MTTF) is given in Equation 2.1 [62] and mean time between failures (MTBF) is called the availability.
MTTF = dt (2.1)
· Usability- The user level satisfaction from the available resources and proper utilization to achieve a goal with effectiveness and efficiency is called the usability.
· Reliability- The system receives correct or acceptable results within deadline. The reliability of system depends of running application smoothly. It is defined as sign R(t) in Equation 2.2 [62].
R (t) = dt (2.2)
· Overhead- The overhead can be associated to movement of cloudlets and inter system communication. It should be minimized for effective fault tolerance in cloud environment.