For the applications where concurrent requests are coming in at the rate of more that millions request per second, client expects reliable response as fast as possible. Complexity increases when response involve images, videos and large data.
These concurrent requests can not be handled by one standalone server. So a cluster of multiple servers is created. These servers are nothing but images of pervious stand alone server. Each server is running same application and can accept and respond independently.
Now, how does one decide how to divide this million request across these servers. To do this management a manager is brought into picture and is names "Load Balancer."
Load balancer is a component which sits between two client/server pool to distribute work load across components of the layer.
[ Diagram to show various placement of LB between layers ]
High Availability
Health Checks
Load balancer can detect unhealthy targets, stop sending traffic to them, and then spread the load across the remaining healthy targets.
Security
TLS Termination
Some load balancers like AWS Elastic Load Balancer provides integrated certificate management and SSL/TLS decryption, allowing you the flexibility to centrally manage the SSL settings of the load balancer and offload CPU intensive work from your server/application.
Layer 4 Load Balancing
load balancing at TCP/UDP layer.
Layer 7 Load Balancing
load balancing at HTTP/HTTPS layer.
Session Persistence
Information about a user’s session is often stored locally in the browser. For example, in a shopping cart application the items in a user’s cart might be stored at the browser level until the user is ready to purchase them. Changing which server receives requests from that client in the middle of the shopping session can cause performance issues or outright transaction failure. In such cases, it is essential that all requests from a client are sent to the same server for the duration of the session. This is known as session persistence.
The load balancers can handle session persistence as needed.
Another use case for session persistence is when an upstream server stores information requested by a user in its cache to boost performance. Switching servers would cause that information to be fetched for the second time, creating performance inefficiencies.
[ How to implement session persistence ]
Dynamic addition/removal of server to cluster.
Many fast‑changing applications require new servers to be added or taken down on a constant basis. This is common in environments such as the Amazon Web Services (AWS) Elastic Compute Cloud (EC2), which enables users to pay only for the computing capacity they actually use, while at the same time ensuring that capacity scales up in response traffic spikes. In such environments it greatly helps if the load balancer can dynamically add or remove servers from the group without interrupting existing connections.
Predictive Analysis
Least Connection Method
Least Response Time Method
Least Bandwidth Method
Custom Load Method
When using this method, the load balancing appliance chooses a service that is not handling any active transactions. If all of the services in the load balancing setup are handling active transactions, the appliance selects the service with the smallest load.
Round Robin Method
Weighted Round Robin
IP Hash
Least Packets Methods
This method selects the service that has received the fewest packets over a specified period of time.
Consistent hashing
In this case redundant load balancers are setup
Writing as an application
Specify Load balancing algorithm.
Take care of host recovery, addition and removal.
Suitable for small system.
[GKCS add description and image showing hash based routing and challenges in adding and removing server ]
Hardware Load Balancer
like Citrix.
Software Load Balancers / Proxy Servers
Instead of creating a an application by yourself, existing software or web services are used as load balancers.
Existing software involves HaProxy, Nginx.
Web services involves AWS Elastic Load Balancer.
Do we need Session Persistence?
In case server uses local cache, how can we avoid server switches for a request to avoid cache refresh in new servers.