Load Balancing

Why load balancing?

Load balancing is a technique used to distribute workloads evenly across multiple computing resources, such as servers, network links, or other devices, in order to optimize resource utilization, minimize response time, and maximize throughput. This technique helps ensure that no single resource is overwhelmed, thus maintaining a high level of performance and reliability.

A service performing load balancing is referred as a load balancer, which often serves as the entrypoint of a group of resource. Some benefits of introducing load balancing into the architecture includes:

Improve performance, such as caching, connection reuse, protocal optimization, etc.
Enable high availability, redundancy and scalability
Network optimization, such as geographic distribution
Provide security protection such as rate limiting to prevent DDos attacks
Cost saving through caching and optimizing resource optimization

Load Balancing Types

Load balancing is simply a technique of distributing requests with a given criteria, and it can be further categorized base on where the rules are implemented. For instance,

Application load balancing
Network load balancing
DNS load balancing
Hardware load balancing
Global Server Load Balancing

Load Balancer Types

Load balancers can be roughly categorized into two categories: hardware load balancer and software load balancer.

Hardware Load Balancer: Provides load balancing on the hardware level. Requires initial investment and can be further utlized to create virtual load balancers for centralized management.
Software Load Balancer: Installed application or third-party services that offer load balancing functions. Cost lest to set up and manage.

Load Balancer Challanges

Load balancer has become a crucial component in modern system architectures. However, it also comes with challanges:

It can become a single point of failure within system if not designed with redundancy in mind.
Configuration management requires extra cost.
Handling stateful scenarios become challanging
Adding load balancer might generate cost due to extra infrastructure
Monitoring backend service is essential.

Algorithms

Algorithms are used to distribute load across servers. These alogorithms can be categorized into two groups, static and dynamic. Dynamic algorithm take the current server metrics (e.g load, response time... ) into account while static ones don't.

Round Robin

A simple approach that distributes incoming requests in a cyclic order all available servers.

Pros
- Ensures even distribution of traffic.
- Easy to implement.
- Works well when servers have similar capacities.
- Suitable for stateless applications.
Cons
- Does not take the current load or capacity of each server into account.
- No Session Affinity, which can be problematic for stateful applications.
- May not perform optimally when servers have different capacities or varying workloads.
- Predictable distribution pattern could potentially be exploited by attackers for attacks.

Weighted Round Robin

An enhanced version of the round robin method by assigning weights to each server based on their capacity or performance, such that incoming requests are distributed according based on these weights.

Pros
- Take server capacity into account, enable better utilization of resources and performance.
- Easily adjustable to accommodate changes in server capacities or additions of new servers.
Cons
- Assigned weight requires detail benchmark to determine.
- Weight management can introduce additional overhead
- Not responsive to environments with highly variable load patterns as real-time load conditions is not considered.

Sticky Round Robin

A variance of round robin algorithm that ensure requests from same source are distributed to the identical server that the first request of the session is assigned to.

Least Connection

An incoming requests is routed to the server with the fewest active connections at the time of the request.

Pros
- Takes the load on each server into account, more likely to deliver a better utilization of resources.
- Performs better when servers have varying capacities and workloads.
Cons
- More complex to implement.
- Requires the load balancer to maintain the connection count, which can increase overhead.
- Leading to frequent rebalancing when servers experience connection spikes.

Weighted Least Connection

An improved version of the least connections algorithm by taking both the current load and the capacity of each server into account.

Pros
- Capable of adjusting request distribution based on the real-time load on each server.
- Suitable for environments with heterogeneous servers and variable load patterns effectively.
Cons
- More complex to implement.
- Requires the load balancer to keep track of both active connections and server weights.
- Determining appropriate weights for each server requires accurate performance metrics.

Least Response Time

Requests are distributed to the server with the lowest response time to ensure efficient utilization of server resources and optimal client experience.

Pros
- Requests are guaranteed to be handled by the fastest available server.
- Adjusts distribution base on real-time server performance.
Cons
- More complex to implement.
- Monitoring response times and dynamically adjusting the load can introduce additional overhead.
- Existing load amount is not considered.
- Response times can vary due to network fluctuations or transient server issues, potentially causing frequent rebalancing.

IP Hash

Assigns client requests to servers based on the client's IP address.

Pros
- Requests from the same client IP address are consistently routed to the same server, which is beneficial for stateful applications, e.g geographically distributed service.
- Easy to implement and no need to maintain connection state on the load balancer.
Cons
- Potentially lead to uneven distribution.
- Adding or removing servers can disrupt the hash mapping, causing some clients to be routed to different servers.
- Does not take the current load or capacity of servers into account.

Random

Instead of following a fixed sequence or using performance metrics, the load balancer selects a server at random to handle each request.

Pros
- Easy to implement.
- The load will be evenly distributed across servers (if random function is uniform).
Cons
- Does not consider the current load or capacity of servers, which can lead to uneven distribution if server performance varies.
- Potential imbalance in the short term.
- No session affinity, which is not suitable for stateful applications.
- Potentially dilute the visibility of attack patterns for security systems that rely on detecting anomalies (e.g., to mitigate DDoS attacks) might find it slightly more challenging to identify malicious patterns.

Least Bandwidth

Incoming requests are distributed based on the current bandwidth usage.

Pros
- Adjusts distribution base on the real-time network load.
- Ensures that all servers are utilized more effectively by balancing the bandwidth usage.
Cons
- More complex to implement.
- Monitoring bandwidth and dynamically adjusting the load can introduce additional overhead.
- Short-Term Variability: Bandwidth usage can fluctuate in the short term, potentially causing frequent rebalancing.

Why load balancing?​

Load Balancing Types​

Load Balancer Types​

Load Balancer Challanges​

Algorithms​

Round Robin​

Weighted Round Robin​

Sticky Round Robin​

Least Connection​

Weighted Least Connection​

Least Response Time​

IP Hash​

Random​

Least Bandwidth​

References​