How to achieve high availability and performance in middleware?

= ability of the system to scale dynamically when the input load changes
- the users should not feel any changes
horizontal X vertical scaling
also the network bandwidth is important
- it does not matter if I have X servers, if the network bandwidth is slow → it’s still slow for users
- we can reduce the network usage by caching

SLA = Service Level Agreement = guarantee of service availability
to be available means that the server instance fail does not affect the application availability and performance
- if it fails, the copy of the failed object should finish the job
  - the copy has to be available
  - the processing state has to be replicated
- this is called Application failover
if a server instance fails, the operation should continue on another instance
- therefore, the processing state has to be replicated

clients want the fastest response time
- achieved throught caching, good configuration of middleware/service and asychronous handling of CPU intensive tasks
QPS = queries per second, performance metric
- it’s good to be cached
it needs to be properly monitored (part of DevOps) and tuned
- tuning is based on collected, filtered, stored and presented data about the performance
  - from the application server, DB, OS etc. (has many utilities for monitoring)
- we can use commercial solutions
- open source examples: Elasticstack

Load balancing

load balancers distribute a load to multiple apps/object instances (they often run on different servers and they can share the load equally/with preferences)
load balancer also checks the health of different app instances
there could be software and hardware load balancers
how to select the server?
- round robin
- with least active connections
- with least average response time (e.g. to receive the response header/receive the full response body)
- hash from the IP or any arbitrary element determines the server
there could be a limit on the maximum number of connections for each server
- additional requests are in the queue
server slow-start - do not send a full load of request to the server, when it’s still recovering to avoid time outs

a DNS record (e.g. some website) has multiple IP addresses assigned to it (e.g. multiple servers) and the LB distributes the requests based on Round Robin algorithm
pros
- very simple to implement
cons
- no health checks (on the load or health)
- cached IP addresses take long to be completely reassigned

uses Reverse proxy, to assign requests to more servers
- clients don’t know the IP addresses of the servers, they are “behind” the LB
performs health checks
- one negative health check does not mean the instance is unhealthy
  - has to be a defined threshold of consecutive negative health checks to remove the server from the active servers list
HTTP sticky sessions
- some servers have some sessions for current users and it’s good to send those user’s requests to the same servers (to use to session data)
- Session
- 2 options:
  - passive cookie persistence - LB uses the Cookie from the server
  - active cookie persistence - LB adds it’s own Cookie
    - sticky cookie = the cookie added by the LB (it scans the requests and responses)

Session is persistently in the DB
- Sticky HTTP session is not required (as all servers have access to the Session data)
Session is in the In-memory of one of the servers
- The primary servers holds the session data in the memory, the secondary server holds its replica
- The Cookie contains all information about the primary/secondary servers + the location in the memory of the primary server
- the LB redirects according to the information in the session (to primary, if unhealthy, to secondary, if unhealthy, to any other server - but the session information is lost (side effects))