Elastic Load Balancing allows the incoming traffic to be distributed automatically across multiple healthy EC2 instances.
ELB serves as a single point of contact to the client.
Increases the application availability by allowing addition or removal of multiple EC2 instances across one or more availability zones.
ELB benefits
is itself a distributed system that is fault tolerant and actively monitored
abstracts out the complexity of managing, maintaining, and scaling load balancers
can also serve as the first line of defense against attacks on network.
can offload the work of encryption and decryption (SSL termination).
offers integration with Auto Scaling.
By default, routes each request independently to the registered instance with the smallest load.
If an EC2 instance fails, ELB automatically reroutes the traffic to the remaining running healthy EC2 instances.
If a failed EC2 instance is restored, Elastic Load Balancing restores the traffic to that instance.
Load Balancers only work across AZs within a region.
Elastic Load Balancing allows subnets to be added and creates a load balancer node in each of the Availability Zone where the subnet resides.
Elastic Load Balancer should have at least one subnet attached.
Only one subnet per AZ can be attached to the ELB. Attaching a subnet with an AZ already attached replaces the existing subnet.
Each Subnet must have a CIDR block with at least a /27 bitmask and has at least 8 free IP addresses, which ELB uses to establish connections with the back-end instances.
For High Availability, it is recommended to attach one subnet per AZ for at least two AZs, even if the instances are in a single subnet.
Subnets can be attached or detached from the ELB and it would start or stop sending requests to the instances in the subnet accordingly.
Security groups & NACLs should allow Inbound traffic, on the load balancer listener port, from the Client for an Internet ELB or VPC CIDR for an Internal ELB.
Security groups & NACLs should allow Outbound traffic to the back-end instances on both the instance listener port and the health check port.
NACLs, in addition, should allow responses on the ephemeral ports
All EC2 instances should allow incoming traffic from ELB.
For each request that a client makes through a load balancer, it maintains two connections, one connection with the client and the other connection is to the back-end instance.
For each connection, the load balancer manages an idle timeout that is triggered when no data is sent over the connection for a specified time period. If no data has been sent or received, it closes the connection after the idle timeout period (defaults to 60 seconds) has elapsed.
Deleting a load balancer does not affect the instances registered with the load balancer and they would continue to run.
Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each log contains information on when the request was received, the client's IP address, latencies, request paths, and server responses. You can use these access logs to analyze traffic patterns and to troubleshoot issues.
Launches or terminates instances based on specified conditions
Automatically registers new instances with specified load balancers.
Can launch across Availability Zones.
Can leverage On-Demand, Reserved, and Spot Instances
We recommend starting with minimum and desired capacities of 2 instances (1 per Availability Zone).
The time required for Elastic Load Balancing to scale can range from 1 to 7 minutes, depending on the changes in the traffic profile.
When Elastic Load Balancing scales, it updates the DNS record with the new list of IP addresses.
To ensure that clients are taking advantage of the increased capacity, Elastic Load Balancing uses a TTL setting on the DNS record of 60 seconds. It is critical that you factor this changing DNS record into your tests.
If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses. Because your end-users will not all be resolving to that single IP address, your test will not be a realistic sampling of real-world behavior.
Properties:
Minimum Capacity: This is the lowest number of instances this group can have. If you reach this number, but a CloudWatch alarm tries to tell Auto Scaling to scale in more, Auto Scaling won't scale in.
Maximum Capacity: This is the most instances this group can have. If CloudWatch alarms tell the group to scale out, Auto Scaling will not be able to.
Desired Capacity: When you initially create your Auto Scaling group, the desired capacity will be the number of instances your group begins with. As CloudWatch alarms go off and request Auto Scaling to scale, Auto Scaling will change the desired capacity to whatever quantity of instances it needs to scale in or out to.
For example, you start your group with the following settings: Min: 2, Max: 10 and Desired: 5. Auto Scaling launches 5 instances.
Schedule Scaling:
Scale based on schedule; scale your application ahead of known load changes.
Example: Turning off your dev and test instances at night
Dynamic Scaling:
Excellent for general scaling
Allows your scaling to respond to unanticipated changes in traffic
Example: Scaling based on a CPU utilization CloudWatch alarm
Predictive Scaling:
Easiest to use
Scales based on machine learning algorithms
Example: Want to eliminate manual monitoring and adjustment of Auto Scaling
Recomendations:
Avoid thrashing (aggressive instance termination)
Scale out early, scale in slowly
Set the min and max capacity parameters carefully
Use lifecycle hooks (perform custom actions as Auto Scaling launches or terminates instances)
Stateful applications require additional automatic configuration of instances launched into Auto Scaling groups
Configured with a default capacity
ELB Controller is the service which stores all the configuration and also monitors the load balancer and manages the capacity that is used to handle the client requests.ELB increases its capacity by utilizing either larger resources (scale up – resources with higher performance characteristics) or more individual resources (scale out).
AWS itself handles the scaling of the ELB capacity and this scaling is different to scaling of the EC2 instances to which the ELB routes its request to, which is handled by Auto Scaling
Time required for Elastic Load Balancing to scale can range from 1 to 7 minutes, depending on the changes in the traffic profile.
In certain scenarios, if there is a flash traffic spike expected or a load test cannot be configured to gradually increase traffic, recommended to contact AWS support to have the load balancer “pre-warmed”. AWS would need the information for the start, end dates and the expected request rate per second with the total size of request/response.
When scaled, Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer so that the new resources have their respective IP addresses registered in DNS.
DNS record created includes a Time-to-Live (TTL) setting of 60 seconds.
By default, ELB will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request.
It is recommended that clients will re-lookup the DNS at least every 60 seconds to take advantage of the increased capacity.
Classic Load Balancer (CLB)
Application Load Balancer (ALB)
Network Load Balancer (NLB)
For HTTPS load balancer, Elastic Load Balancing uses an Secure Socket Layer (SSL) negotiation configuration, known as a security policy, to negotiate SSL connections between a client and the load balancer.
A security policy is a combination of SSL protocols, SSL ciphers, and the Server Order Preference option.
ELB supports the following versions of the SSL protocol TLS 1.2, TLS 1.1, TLS 1.0, SSL 3.0.
To select the cipher ELB supports the Server Order Preference option for negotiating connections between a client and a load balancer (selecting the first cipher in its list that is in the client’s list of ciphers).
Elastic Load Balancer allows using a Predefined Security Policies or creating a Custom Security Policy for specific needs. If none is specified, ELB selects the latest Predefined Security Policy.
Originally, Application Load Balancers used to support only one certificate for a standard HTTPS listener (port 443). You had to use Wildcard or Multi-Domain (SAN) certificates to host multiple secure applications behind the same load balancer. The potential security risks with Wildcard certificates and the operational overhead of managing Multi-Domain certificates presented challenges.
With SNI (Server Name Indication) support you can associate multiple certificates with a listener and each secure application behind a load balancer can use its own certificate. You can use host conditions to define rules that forward requests to different target groups based on the hostname in the host header (also known as host-based routing). This enables you to support multiple domains using a single load balancer.
Load balancer performs health checks on all registered instances, whether the instance is in a healthy state or an unhealthy state, to discover the availability of the EC2 instances (the load balancer periodically sends pings, attempts connections, or sends request to health check the EC2 instances).
Health check is InService for status of healthy instances and OutOfService for unhealthy ones.
Load balancer sends a request to each registered instance at the Ping Protocol, Ping Port and Ping Path every HealthCheck Interval seconds. It waits for the instance to respond within the Response Timeout period.
If the health checks exceed the Unhealthy Threshold for consecutive failed responses, the load balancer takes the instance out of service.
When the health checks exceed the Healthy Threshold for consecutive successful responses, the load balancer puts the instance back in service.
Load balancer only sends requests to the healthy EC2 instances and stops routing requests to the unhealthy instances.
When you associate the ELB with ASG, you allow the ASG to receive the traffic from that ELB. When the health check type is ELB, the ASG will get aware of the unhealthy instances and terminate them.
Listeners is the process which checks for connection requests from client, using the configured protocol and port
Listeners are configured with a protocol and a port for front-end (client to load balancer) connections, and a protocol and a port for back-end (load balancer to back-end instance) connections.
Listeners support HTTP, HTTPS, SSL, TCP protocols.
A X.509 certificate is required for HTTPS or SSL connections and load balancer uses the certificate to terminate the connection and then decrypt requests from clients before sending them to the back-end instances.
If you want to use SSL, but don’t want to terminate the connection on the load balancer, use TCP for connections from the client to the load balancer, use the SSL protocol for connections from the load balancer to the back-end application, and deploy certificates on the back-end instances handling requests.
If you use an HTTPS/SSL connection for your back end, you can enable authentication on the back-end instance. This authentication can be used to ensure that back-end instances accept only encrypted communication, and to ensure that the back-end instance has the correct certificates.
ELB HTTPS listener does not support Client-Side SSL certificates.
As the Elastic Load Balancer intercepts the traffic between the client and the back end servers, the "back end server does not know the IP address, Protocol and the Port used between the Client and the Load balancer".
ELB provides "X-Forwarded headers" support to help back end servers track the same when using "HTTP" protocol;
X-Forwarded-For; request header to help back end servers identify the IP address of a client when you use an (HTTP/S) load balancer.
X-Forwarded-Proto; request header to help back end servers identify the protocol (HTTP/S) that a client used to connect to the server.
X-Forwarded-Port; request header to help back end servers identify the port that an (HTTP/S) load balancer uses to connect to the client.
ELB provides "Proxy Protocol" support to help back end servers track the same when using "non-HTTP protocol" (TCP) , or when using "HTTPS" and not terminating the SSL connection on the load balancer.
Proxy Protocol; is an Internet protocol used to carry connection information from the source requesting the connection to the destination for which the connection was requested.
ELB uses Proxy Protocol version 1, which uses a human-readable header format with connection information such as the source IP address, destination IP address, and port numbers.
If the ELB is already behind a Proxy with the Proxy protocol enabled, enabling the Proxy Protocol on ELB would add the header twice.
the Proxy Protocol header helps you identify the IP address of a client when you have a load balancer that uses TCP for back-end connections. Because load balancers intercept traffic between clients and your instances, the access logs from your instance contain the load balancer's IP address instead of the originating client. You can parse the first line of the request to retrieve your client's IP address and the port number.
By default, the load balancer distributes incoming requests evenly across its enabled Availability Zones for e.g. If AZ-a has 5 instances and AZ-b has 2 instances, the load will still be distributed 50% across each of the AZs
Enabling Cross-Zone load balancing allows the ELB to distribute incoming requests evenly across all the back-end instances, regardless of the AZ.
Cross-zone load balancer reduces the need to maintain equivalent numbers of back-end instances in each Availability Zone, and improves application’s ability to handle the loss of one or more back-end instances.
It is still recommended to maintain approximately equivalent numbers of instances in each Availability Zone for higher fault tolerance.
By default, if an registered EC2 instance with the ELB is deregistered or becomes unhealthy, the load balancer immediately "closes the connection".
Connection draining can help the load balancer to complete the in-flight requests made while keeping the existing connections open, and preventing any new requests being sent to the instances that are de-registering or unhealthy.
Connection draining helps perform maintenance such as deploying software upgrades or replacing back-end instances without affecting customers’ experience.
Connection draining allows you to specify a maximum time (between 1 and 3,600 seconds and default 300 seconds) to keep the connections alive before reporting the instance as de-registered. Maximum timeout limit does not apply to connections to unhealthy instances.
If the instances are part of an Auto Scaling group and connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete, or for the maximum timeout to expire, before terminating instances due to a scaling event or health check replacement.
Provide basic load balancing across Amazon EC2 instances
Support load balancing across multiple Availability Zones
Operate at both the application layer and the connection layer of the OSI.
Enables it to bind a user’s session to an instance and ensures all requests are sent to the same instance.
Stickiness remains for a period of time which can be controlled by the application’s session cookie, if one exists, or through cookie, named AWSELB, created through Elastic Load balancer.
Sticky sessions for ELB are disabled, by default.
Requirements: An HTTP/HTTPS load balancer, SSL traffic should be terminated on the ELB, ELB does session stickiness on a HTTP/HTTPS listener is by utilizing an HTTP cookie. At least one healthy instance in each Availability Zone.
Stickiness policy configuration defines a cookie expiration, which establishes the duration of validity for each cookie.
If the application cookie is explicitly removed or expires, the session stops being sticky until a new application cookie is issued.
If an instance fails or becomes unhealthy, the load balancer stops routing request to that instance, instead chooses a new healthy instance based on the existing load balancing algorithm. The load balancer treats the session as now “stuck” to the new healthy instance, and continues routing requests to that instance even if the failed instance comes back.
Operates at the layer 7 (application layer) and allows defining routing rules based on content across multiple services or containers running on one or more EC2 instances.
Support for Path-based routing, where listener rules can be configured to forward requests based on the URL in the request.
Support for routing requests to multiple services on a single EC2 instance by registering the instance using multiple ports.
Support for containerized applications. EC2 Container
Service (ECS) can select an unused port when scheduling a task and register the task with a target group using this port, enabling efficient use of the clusters.
Support for monitoring the health of each service independently, as health checks are defined at the target group level and many CloudWatch metrics are reported at the target group level.
Attaching a target group to an Auto Scaling group enables you to scale each service dynamically based on demand.
Supports HTTP, HTTPS (Secure HTTP), HTTP/2 (also with TLS) protocols.
Supports WebSockets and Secure WebSockets natively for HTTP and HTTPs.
Supports Request tracing, by default.
Supports Sticky Sessions (Session Affinity).
Supports SSL termination, to decrypt the request on ALB before sending it to the underlying targets.
Supports layer 7 specific features like X-Forwarded-For headers to help determine the actual client IP, port and protocol.
Automatically scales its request handling capacity in response to incoming application traffic.
High Availability, by allowing you to specify more than one AZ.
Integrates with ACM to provision and bind a SSL/TLS certificate to the load balancer thereby making the entire SSL offload process very easy.
Supports IPv6 addressing, for an Internet facing load balancer.
Supports Request Tracking, where in a new custom identifier “X-Amzn-Trace-Id” HTTP header is injected on all requests to help track in the request flow across various services.
Provides Access Logs, to record all requests sent the load balancer, and store the logs in S3 for later analysis in compressed format.
Provides Delete Protection, to prevent the ALB from accidental deletion.
Supports Connection Idle Timeout
Integrates with CloudWatch to provide metrics such as request counts, error counts, error types, and request latency.
Integrates with AWS WAF, a web application firewall that helps protect web applications from attacks by allowing rules configuration based on IP addresses, HTTP headers, and custom URI strings.
Integrates with CloudTrail to receive a history of ALB API calls made on the AWS account.
Listener supports HTTP & HTTPS protocol with Ports from 1-65535.
ALB supports SSL Termination for HTTPS listener, which helps to offload the work of encryption and decryption so that the targets can focus on their main work.
HTTPS listener supports exactly one SSL server certificate on the listener.
Supports HTTP/2 with HTTPS listeners;
128 requests can be sent in parallel using one HTTP/2 connection.
ALB converts these to individual HTTP/1.1 requests and distributes them across the healthy targets in the target group using the round robin routing algorithm.
HTTP/2 uses front-end connections more efficiently resulting in fewer connections between clients and the load balancer.
Server-push feature of HTTP/2 is not supported.
Each listener has a default rule, and can optionally define additional rules. With a:
Priority – Rules are evaluated in priority order, from the lowest value to the highest value. The default rule has lowest priority.
Action – Each rule action has a type and a target group. Currently, the only supported type is forward. You can change the target group for a rule at any time.
Condition – Host Condition: based on the host name in the host header and each host condition has one hostname, support for multiple domains using a single ALB. Path Condition or path-based routing: Different target groups based on the URL in the request, each path condition has one path pattern.
Advantages over Classic Load Balancer;
Action – Each rule action has a type and a target group. Currently, the only supported type is forward. You can change the target group for a rule at any time.
Support for path-based routing & Support for host-based routing.
Support for routing requests to multiple applications on a single EC2 instance. Each instance or IP address can be registered with ;the same target group using multiple ports.
Support for registering targets by IP address, including targets outside the VPC for the load balancer.
Support containerized applications with ECS using Dynamic port mapping
Support monitoring the health of each service independently, as health checks and many CloudWatch metrics are defined at the target group level.
Attaching of target group to an Auto Scaling group enables scaling of each service dynamically based on demand.
Access logs contain additional information & stored in compressed format.
Improved load balancer performance.
Network Load Balancer operates at the connection level (Layer 4), routing connections to targets – EC2 instances, containers and IP addresses based on IP protocol data.
Network Load Balancer is suited for load balancing of TCP traffic.
NLB is capable of handling millions of requests per second while maintaining ultra-low latencies, being highly available.
NLB is optimized to handle sudden and volatile traffic patterns while using a single static IP address per Availability Zone.
Accept incoming traffic from clients and distribute this traffic across targets within the same Availability Zone.
API-compatible with Application Load Balancers, including full programmatic control of target groups and targets
NLB is integrated with other AWS services such as Auto Scaling, microservices , EC2 Container Service (ECS), CloudFormation, CodeDeploy, and AWS Config.
Distributes this traffic across the targets within the same Availability Zone.
Monitors the health of its registered targets and routes the traffic only to healthy targets.
If a health check fails and an unhealthy target is detected, it stops routing traffic to that target and reroutes traffic to remaining healthy targets.
if configured with multiple AZs and if all the targets in a single AZ fail, it routes traffic to healthy targets in the other AZs.
Preserves client side source IP allowing the back-end to see client IP address.
Provides a static IP per Availability Zone (subnet) that can be used by applications as the front-end IP of the load balancer.
Elastic IP per Availability Zone (subnet) can also be assigned, optionally, thereby providing a fixed IP.
Supports network and application target health checks.
Network-level health check: is based on the overall response of the underlying target (instance or a container) to normal traffic. Target is marked unavailable if it is slow or unable to respond to new connection requests
Application-level health check: is based on a specific URL on a given target to test the application health deeper.
DNS Fail-over integrates with Route 53.
Route 53 will direct traffic to load balancer nodes in other AZs, if there are no healthy targets with NLB or if the NLB itself is unhealthy.
if NLB is unresponsive, Route 53 will remove the unavailable load balancer IP address from service and direct traffic to an alternate Network Load Balancer in another "Region".
Supports long-lived TCP connections ideal for WebSocket type of applications.
Uses the same API as Application Load Balancer; enables you to work with target groups, health checks, and load balance across multiple ports on the same EC2 instance to support containerized applications.
Integrated with CloudWatch to provide metrics such as Active Flow count, Healthy Host Count, New Flow Count, Processed bytes, and more.
Integrated with CloudTrail to track API calls to the NLB.
Use the Flow Logs feature to record all requests sent to the load balancer. Capture information about the IP traffic going to and from network interfaces in the VPC. It's stored using CloudWatch Logs
Zonal Isolation is designed for application architectures in a single zone. Automatically fails-over to other healthy AZs, if something fails in a AZ. its recommended to configure the load balancer and targets in multiple AZs for achieving high availability.
Load Balancing using IP addresses as Targets allows load balancing of any application hosted in AWS or on-premises using IP addresses of the application backends as targets. Helps migrate-to-cloud, burst-to-cloud or failover-to-cloud.
Applications hosted in on-premises locations can be used as targets over a Direct Connect connection and EC2-Classic (using ClassicLink).
Advantages over Classic Load Balancer
Ability to handle volatile workloads and scale to millions of requests per second, without the need of pre-warming.
Support for static IP/Elastic IP addresses for the load balancer.
Support for registering targets by IP address, including targets outside the VPC (on-premises) for the load balancer.
Support for routing requests to multiple applications on a single EC2 instance. Single instance or IP address can be registered with the same target group using multiple ports.
Support for containerized applications. Using Dynamic port mapping, ECS can select an unused port when scheduling a task and register the task with a target group using this port..
Support for monitoring the health of each service independently, as health checks are defined at the target group level and many CloudWatch metrics are reported at the target group level.
Attaching a target group to an Auto Scaling group enables scaling each service dynamically based on demand.
When you create a TLS listener, you must select a security policy.
Network Load Balancers does not support a custom security policy.
Network Load Balancers should consist of security policies comprising of Protocols & Ciphers. Example: ELBSecurityPolicy-TLS-1-0-2015-04
The preferred secure option to provision & store certificates to be used along with Network Load Balancer for terminating TLS is use a single certificate per TLS listener provided by AWS Certificate Manager.
If you want to have HTTPS clients to be authenticated by a web server using client certificate authentication then configure ELB with TCP listeners on TCP/443 and place the Web servers behind it "or" without ELB, you can directly use the webserver to communicate with the clients" or" set up a Route53 Record Set with the public IP address (EIP) of the webserver(s) such that the client requests would be directly routed to the webserver(s).
Elastic Load Balancing publishes data points to Amazon CloudWatch about your load balancers and back-end instances. Elastic Load Balancing reports metrics to CloudWatch only when requests are flowing through the load balancer in 60-second intervals. If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported
Metrics available:
HealthyHostCount, UnHealthyHostCount: Number of healthy and unhealthy instances registered with the load balancer. Most useful statistics are average, min, and max
RequestCount: Number of requests completed or connections made during the specified interval (1 or 5 minutes). Most useful statistic is sum.
Latency: Time elapsed, in seconds, after the request leaves the load balancer until the headers of the response are received. Most useful statistic is average
SurgeQueueLength: Total number of requests that are pending routing. Load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request. Maximum size of the queue is 1,024. Additional requests are rejected when the queue is full.
SpilloverCount; The total number of requests that were rejected because the surge queue is full. Should ideally be 0. Most useful statistic is sum.
HTTPCode_ELB_4XX, HTTPCode_ELB_5XX: Client and Server error code generated by the load balancer. Most useful statistic is sum.
HTTPCode_Backend_2XX, HTTPCode_Backend_3XX, HTTPCode_Backend_4XX, HTTPCode_Backend_5XX: Number of HTTP response codes generated by registered instances. Most useful statistic is sum.
ELB provides "access logs" (disabled by default) that capture detailed information about all requests sent to your load balancer. Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses. They are saved in the Amazon S3 bucket and are disabled by default. You are only charged for S3 storage.
AWS "CloudTrail" can be used to capture all calls to the ELB API made by or on behalf of your AWS account and either made using ELB API directly, or indirectly through the AWS Management Console or AWS CLI. The files are saved in S3.
Logs collected by "CloudTrail" can be used to monitor the activity of your load balancers and determine what API call was made, what source IP address was used, who made the call, when it was made, and so on
Classic Load Balancer
Is ideal for simple load balancing of traffic across multiple EC2 instances, and operates at both the request level and connection level.
Is intended for applications that were built within the EC2-Classic network.
Application Load Balancer
Is ideal for microservices or container-based architectures where there is a need to route traffic to multiple services or load balance across multiple ports on the same EC2 instance.
Request level (layer 7), routing traffic to targets – EC2 instances, containers, IP addresses and Lambda functions based on the content of the request.
For advanced load balancing of HTTP and HTTPS traffic, and provides advanced request routing targeted at delivery of modern application architectures.
Simplifies and improves the security of the application, by ensuring that the latest SSL/TLS ciphers and protocols are used at all times.
Network Load Balancer
Connection level (Layer 4), routing connections to targets – EC2 instances, microservices, and containers – within VPC based on IP protocol data.
Is ideal for load balancing of both TCP and UDP traffic,
Is capable of handling millions of requests per second while maintaining ultra-low latencies.
Is optimized to handle sudden and volatile traffic patterns while using a single static IP address per AZ.
Is integrated with other popular AWS services such as Auto Scaling, ECS, CloudFormation and AWS Certificate Manager (ACM).
Network Load Balancer now Supports Cross-Zone Load Balancing
AWS recommends using Application Load Balancer for Layer 7 and Network Load Balancer for Layer 4 when using VPC.
Supported Protocols:
Classic ELB operates at layer 4 and supports HTTP, HTTPS, TCP, SSL
ALB operates at layer 7 and supports HTTP, HTTPS, HTTP/2, WebSockets
NLB operates at the connection level (Layer 4) TCP and UDP
Supported Platforms: Classic ELB supports both EC2-Classic and EC2-VPC, ALB and NLB supports only EC2-VPC.
Connection Draining: All Load Balancer types support connection draining.
Health Checks: All Load Balancer types support Health checks. ALB provides health check improvements that allow detailed error codes from 200-399 to be configured.
CloudWatch Metrics: All , with ALB providing additional metrics.
Load Balancing to multiple ports on the same instance: Only ALB & NLB.
Deletion Protection: Only ALB & NLB supports Deletion Protection.
Idle Connection Timeout: Specify a time period to close the connection if no data has been sent or received. Both Classic ELB & ALB supports idle connection timeout. NLB does not support idle connection timeout.
Cross-zone Load Balancing: All, however for Classic it needs to be enabled while for ALB it is always enabled.
Stick Sessions (Cookies): Classic ELB & ALB supports sticky sessions to maintain session affinity. NLB does not support sticky sessions
Static IP and Elastic IP Address: NLB automatically provides a static IP per AZ (subnet) that can be used by applications as the front-end IP of the load balancer. NLB also allows the option to assign an Elastic IP per AZ (subnet). Classic ELB and ALB does not support Static and Elastic IP address.
Preserve source IP address; NLB preserves the client side source IP allowing the back-end to see the IP address of the client. Classic ELB and ALB do not preserve the client side source IP. It needs to be retrieved using X-Forward-Header or proxy protocol.
WebSockets: Only ALB and NLB supports WebSockets.
PrivateLink Support: Only NLB supports PrivateLink (TCP, TLS).
SSL Termination/Offloading: SSL Termination helps decrypt requests from clients before sending them to targets and hence reducing the load. SSL certificate must be installed on the load balancer. All load balancers types support SSL Termination
Access Logs: All, Capture detailed information about requests sent to the load balancer. Each log contains information such request received time, client’s IP address, latencies, request paths, and server responses. ALB providing additional attributes.
Host-based Routing & Path-based Routing: Only ALB.
Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application.
Auto Scaling; dynamically adds and removes EC2 instances (using Scaling policies), while ELB manages incoming requests by optimally routing traffic so that no one instance is overwhelmed.
ELB uses load balancers to monitor traffic and handle requests that come through the Internet.
Using ELB & Auto Scaling makes it easy to route traffic across a dynamically changing fleet of EC2 instances. Load balancer acts as a single point of contact for all incoming traffic to the instances in an Auto Scaling group.
Auto Scaling integrates with Elastic Load Balancing and enables to attach one or more load balancers to an existing Auto Scaling group.
ELB registers the EC2 instance using its IP address and routes requests to the primary IP address of the primary interface (eth0) of the instance.
After the ELB is attached, it automatically registers the instances in the group and distributes incoming traffic across the instances
When ELB is detached, it enters the Removing state while deregistering the instances in the group.
Auto Scaling adds instances to the ELB as they are launched, but this can be suspended. Instances launched during the suspension period are not added to load balancer, after resumption, and must be registered manually.
Auto Scaling can span across multiple AZs attempting to distribute instances evenly, within the same region
When one AZ becomes unhealthy or unavailable, Auto Scaling launches new instances (scheduled for replacement ) in an unaffected AZ.
Attempting to launch new instances in the AZ with the fewest instances.
When the unhealthy AZs recovers, Auto Scaling redistributes the traffic across all the healthy AZs.
For an unhealthy instance, the instance’s health check can be changed back to healthy manually but you will get an error if the instance is already terminating.
When your instance is terminated, any associated Elastic IP addresses & Elastic ESB are disassociated and are not automatically associated with the new instance.
ELB can manage a single AZ or multiple AZs within a region.
It is recommended to take advantage of the safety and reliability of geographic redundancy by using Auto Scaling & ELB by spanning Auto Scaling groups across multiple AZs within a region and then setting up ELB to distribute incoming traffic across those AZs.
Incoming traffic is load balanced equally across all the AZs enabled for ELB.
Health Checks; Auto Scaling group determines the health state of each instance by periodically checking the results of EC2 instance status checks. If the instance fails is repaced.
ELB also performs health checks on the EC2 instances that are registered with the it. Auto Scaling, by default, does not replace the instance, if the ELB health check fails.
Auto Scaling determines the health status of the instances by checking the results of both EC2 instance status and Elastic Load Balancing instance health.
ELB health check with the instances should be used to ensure that traffic is routed only to the healthy instances.
Instance is marked unhealthy when is in a state other than running, the system status is impaired, or ELB reports the instance state as OutOfService.
After registering one or more load balancers with the Auto Scaling group, Auto Scaling group can be configured to use ELB metrics (such as request latency or request count) to scale the application automatically.
Launch Configuration; is a template (similar to EC2 configuration) that an Auto Scaling group uses to launch EC2 instances. You need to select the AMI, the instance type, a key pair, one or more security groups, a block device mapping, and Basic (Console default) or Detailed (CLI or API default) monitoring. Cannot be modified after creation, and can be associated multiple Auto Scaling groups.
Auto Scaling Group; collection of EC2 instances that share similar characteristics and are treated as a logical grouping for the purposes of instance scaling and management. Requires:
Launch configuration; to determine the EC2 template.
Minimum & Maximum capacity; to determine the number of instances when an autoscaling policy is applied. A group’s minimum capacity is the fewest number of instances the group can have running.
Desired capacity; to determine the number of instances the ASG must maintain at all times. If missing, it equals to the minimum size. It's the default number of instances that should be running.
Availability Zones or Subnets; subnets to launche the instances.
Metrics & Health Checks; metrics to determine when it should launch or terminate instances and health checks to determine if the instance is healthy or not.
If an instance becomes unhealthy, it terminates and launches a new instance.
Can also use scaling policies to increase or decrease the number of instances automatically to meet changing demands.
Can contain EC2 instances in one or more AZs within the same region.
Auto Scaling groups cannot span multiple regions.
To merge separate single-zone Auto Scaling groups , rezone one of the single-zone groups into a multi-zone group, and then delete the other groups.
Auto Scaling group can be associated with a single launch configuration.
When the launch configuration for the Auto Scaling group is changed, any new instances launched use the new configuration parameters, but the existing instances are not affected.
Auto Scaling group can be deleted from CLI, if it has no running instances else need to set the minimum and desired capacity to 0. This is handled automatically when deleting an ASG from AWS management console.
Manual scaling;
Can be performed by changing the desired capacity limit of the Auto Scaling group or Attaching/Detaching instances to the Auto Scaling group.
Attaching/Detaching of an EC2 instance can be done only if: instances are running, we have the AMI, is not a member of another Auto Scaling group, same Zone as the Auto Scaling Group, is in the same VPC as the load balancer.
If the number of instances exceeds the maximum size of the group, the request fails.
When we detach instances, Auto Scaling launches new instances to replace the ones that you detached.
If you detach an instance from an Auto Scaling group that is also registered with a load balancer, the instance is deregistered from the load balancer. If connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete.
Scheduled scaling;
Allows you to scale your application in response to predictable load changes for e.g. last day of the month, last day of an financial year
Requires configuration of Scheduled actions, with the start time at which the scaling action should take effect, and the new minimum, maximum, and desired size the group should have.
Auto Scaling guarantees the order of execution for scheduled actions within the same group, but not for scheduled actions across groups.
Multiple Scheduled Actions can be specified but should have unique time value and they cannot have overlapping time scheduled which will lead to its rejection.
Dynamic scaling; Scale automatically in response to the changing demand for e.g. scale out in case CPU utilization of the instance goes above 70% and scale in when the CPU utilization goes below 30%
Uses a combination of alarms & policies for scaling conditions.
An alarm is an object that watches over a single metric over a specified time period. When the value of the metric breaches the defined threshold, for the number of specified time periods the alarm performs one or more actions (such as sending messages to Auto Scaling).
A policy is a set of instructions that tells Auto Scaling how to respond to alarm messages.
Dynamic scaling process is; (1) Amazon CloudWatch monitors the Auto Scaling group as the demand grows or shrinks, (2) when the change in the metrics breaches the threshold , the CloudWatch alarm performs an action (scale-in or scale-out).
"Target tracking scaling": Increase or decrease the current capacity of the group based on a target value for a specific metric. This is similar to the way that your thermostat maintains the temperature of your home—you select a temperature and the thermostat does the rest. You select a scaling metric and set a target value. Amazon EC2 Auto Scaling creates and manages the CloudWatch alarms that trigger the scaling policy and calculates the scaling adjustment based on the metric and the target value. The scaling policy adds or removes capacity as required to keep the metric at, or close to, the specified target value. It's self optimizing, it has an Algorithm that learns how your metric changes over time and uses that information to make sure that over and under scaling are minimized
. Example: (1) Configure a target tracking scaling policy to keep the average aggregate CPU utilization of your Auto Scaling group at 40 percent. (2) Configure a target tracking scaling policy to keep the request count per target of your Application Load Balancer target group at 1000 for your Auto Scaling group.
"Step scaling": Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach. When step adjustments are applied, and they increase or decrease the current capacity of your Auto Scaling group, the adjustments vary based on the size of the alarm breach. When you create a step scaling policy, you specify one or more step adjustments that automatically scale the number of instances dynamically based on the size of the alarm breach. Each step adjustment specifies the following: A lower bound for the metric value, An upper bound for the metric value & The amount by which to scale, based on the scaling adjustment type. When a scaling event happens on simple scaling, the policy must wait for the health checks to complete and the cooldown to expire before responding to an additional alarm. This causes a delay in increasing capacity especially when there is a sudden surge of traffic on your application. With step scaling, the policy can continue to respond to additional alarms even in the middle of the scaling event. Example: Threshold A - add 1 instance when CPU Utilization is between 40% and 50%, Threshold B - add 2 instances when CPU Utilization is between 50% and 70%, Threshold C - add 3 instances when CPU Utilization is between 70% and 90%
"SQS Scaling": There are some scenarios where you might think about scaling in response to activity in an Amazon SQS queue. If you use a target tracking scaling policy based on a custom Amazon SQS queue metric, dynamic scaling can adjust to the demand curve of your application more effectively. The issue with using a CloudWatch Amazon SQS metric like ApproximateNumberOfMessagesVisible for target tracking is that the number of messages in the queue might not change proportionally to the size of the Auto Scaling group that processes messages from the queue. That's because the number of messages in your SQS queue does not solely define the number of instances needed. The number of instances in your Auto Scaling group can be driven by multiple factors, including how long it takes to process a message and the acceptable amount of latency (queue delay). The solution is to use a backlog per instance metric with the target value being the acceptable backlog per instance to maintain
"Simple scaling": Increase or decrease the current capacity of the group based on a single scaling adjustment. Simple scaling relies on a metric as a basis for scaling. For example, you can set a CloudWatch alarm to have a CPU Utilization threshold of 80%, and then set the scaling policy to add 20% more capacity to your Auto Scaling group by launching new instances. Accordingly, you can also set a CloudWatch alarm to have a CPU utilization threshold of 30%. When the threshold is met, the Auto Scaling group will remove 20% of its capacity by terminating EC2 instances. Example: Add 1 instance when CPU Utilization is between 40% and 50%.
Multiple Policies; An Auto Scaling group can have more than one scaling policy attached to it any given time.
Each Auto Scaling group would have at least two policies: one to scale out and another to scale in.
There is always a chance that both policies can instruct the Auto Scaling to Scale Out or Scale In at the same time. Auto Scaling chooses the policy that has the greatest impact on the Auto Scaling group for e.g. if two policies are triggered at the same time and Policy 1 instructs to scale out the instance by 1 while Policy 2 instructs to scale out the instances by 2, Auto Scaling will use the Policy 2 and scale out the instances by 2 as it has a greater impact
Auto Scaling Cooldown;
Period (default 300 seconds) that helps to ensure that Auto Scaling doesn’t launch or terminate additional instances before the previous scaling activity takes effect and allows the newly launched instances to start handling traffic and reduce load.
When manually scaling the Auto Scaling group, the default is not to wait for the cooldown period.
If an instance becomes unhealthy, Auto Scaling does not wait for the cooldown period to complete before replacing the unhealthy instance.
Cooldown periods are automatically applied to dynamic scaling activities for simple scaling policies and is not supported for step scaling policies.
Suspension;
Auto Scaling processes can be suspended and then resumed. Useful to investigate a configuration problem or debug an issue with the application, without triggering the Auto Scaling process.
Auto Scaling also performs Administrative Suspension where it would suspend processes for ASGs, if the Auto Scaling groups that have been trying to launch instances for over 24 hours but have not succeeded in launching any instances.
Auto Scaling processes include;
Launch – Adds a new EC2 instance.
Terminate – Removes an EC2 instance from the group, decreasing its capacity
HealthCheck -Checks the health of the instances,
ReplaceUnhealthy – Terminates instances that are marked as unhealthy and replace them.
AlarmNotification – Accepts notifications from CloudWatch alarms. If suspended, Auto Scaling does not automatically execute policies that would be triggered by an alarm.
ScheduledActions – Performs scheduled actions.
AddToLoadBalancer – Adds instances to the load balancer when they are launched.
AZRebalance – Balances the number of EC2 instances in the group across the Availability Zones in the region.
If an AZ either is removed from the ASG or becomes unhealthy or unavailable, Auto Scaling launches new instances in an unaffected AZ before terminating the unhealthy or unavailable instances.
When the unhealthy AZ returns to a healthy state, Auto Scaling automatically redistributes the instances evenly across the Availability Zones for the group.
Note that if you suspend AZRebalance and a scale out or scale in event occurs, Auto Scaling still tries to balance the Availability Zones for e.g. during scale out, it launches the instance in the Availability Zone with the fewest instances.
If you suspend Launch, AZRebalance neither launches new instances nor terminates existing instances. This is because AZRebalance terminates instances only after launching the replacement instances.
If you suspend Terminate, the ASG can grow up to 10% larger than its maximum size, because Auto Scaling allows this temporarily during rebalancing activities. If it cannot terminate instances, your ASG could remain above its maximum size until the Terminate process is resumed.
Troubleshooting:
AMI id does not exist or is still pending and cannot be used to launch instances.
Security group provided in the launch configuration does not exist.
Key pair associated with the EC2 instance does not exist.
Autoscaling group not found or is incorrectly configured.
AZ configured with the Autoscaling group is no longer supported cause it might not be available.
Invalid EBS block device mappings.
Instance type is not supported in the AZ.
Capacity limits reached either cause of the restriction on the number of instance type that can be launched in a region or cause AWS is not able to provision the specified instance type in the AZ (for e.g. no more spot instances or On-demand instances availability).
Lifecycle
Auto Scaling Lifecycle hooks enable you to perform custom actions by pausing instances as an Auto Scaling group launches or terminates them.
Each Auto Scaling group can have multiple lifecycle hooks. However, there is a limit on the number of hooks per Auto Scaling group
If an autoscaling:EC2_INSTANCE_LAUNCHING lifecycle hook is added, the state is moved to Pending:Wait .
If an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook is added, the state is moved to Terminating:Wait.
During the scale out and scale in events, instances are put into a wait state (Pending:Wait or Terminating:Wait) and is paused until either a continue action happens or the timeout period ends. If we have a hook the instances remain in a wait state for a finite period of time. Default being 1 hour (3600 seconds) with max being 48 hours or 100 times the heartbeat timeout, whichever is smaller.
Waiting time can be adjusted using complete-lifecycle-action (CompleteLifecycleAction) command to continue to the next state, put-lifecycle-hook command the –heartbeat-timeout parameter to set the heartbeat timeout, restart the timeout period by recording a heartbeat, using the record-lifecycle-action-heartbeat (RecordLifecycleActionHeartbeat) command.
After the wait period the Auto Scaling group continues the launch or terminate process (Pending:Proceed or Terminating:Proceed). CloudWatch Events target to invoke a Lambda function when a lifecycle action occurs. Notification target (CloudWatch events, SNS, SQS) for the lifecycle hook which receives the message from EC2 Auto Scaling. Create a script that runs on the instance as the instance starts.
Result of lifecycle hook is either ABANDON or CONTINUE.
Health Check Grace Period; does not start until the lifecycle hook completes and the instance enters the InService state
Cooldown period starts when the instance enters the InService state.
Standby state enables you to remove the instance from service, troubleshoot or make changes to it, and then put it back into service.
Lifecycle hooks can be used with Spot Instances. However, a lifecycle hook does not prevent an instance from terminating due to a change in the Spot Price.