almost all application protocols run on top of TCP transportation layer (which basically unites all layers underneath into one interface (TCP or UDP))
- TCP protokol, UDP protokol
- TCP IP model vs. ISO OSI model

Major communication protocols

HTTP

HTTP protokol - summary

RMI

= remote method invocation
Java-specific, not that spread, could use HTTP underneath

XML-RPC

= remote procedure call
RPC protokol

SOAP

SOAP (Simple Object Access Protokol) + WSDL

WebSocket

protocol part of HTML5
WebSockets

Synchronous & Asynchronous communication

Synchronous

only one socket is used for both request and response (after the request, the client waits for the response - it’s blocking for the client)
it’s fast and the endpoint is defined only the server side
examples: REST API, most web apps

Asynchronous

separate sockets for requests and responses
both sides (client, server) don’t need to be available at the same time
the time between request and response can be large (even hours or days)
both client and server must define endpoint (both need to be reachable)
- server needs to be reachable for request from client
- client needst to be reachable for response from the server
non-blocking, harder to implement
examples: batch processing, long computations, message queues
via intermediary:
- solves the problem of servers that need to connect back to the client (which is hard due to NAT, firewalls or dynamic IP adresses)
  - NAT router mapping is usually temporary
  - home or corporate firewalls often block “unwanted” requests from “random” servers
    - they cannot differentiate between response to tell the client the result from an asynchronous operation or some cyberattack
  - client’s mobility and dynamic IP addresses
- decouples the asynchronous communication and acts as temporary buffer for requests/responses in both directions
- examples: message queues: RabbitMQ, Apache Kafka
via polling:
- solves the same problem as intermediary (the server cannot connect back to client due to network restrictions)
- client “polls” the server for status until requested operation is completed
  - and the client can get the response back
- only clients create sockets (so this method works through firewalls)
- it is inefficient due to a lot of unnecessary status check requests
- the same as AJAX does polling (see Komunikace)
- this is common in the web environment (the client runs in the browser)

How do sockets work?

TCP uses three-way handshake for establishing the connection (see Zahájení spojení)
socket = virtual communication channel (with unique ID) created after handshake
after the connection is established, a socket with unique ID is created on both sides (client, server) - and it allows for continuous exchange of data (reading/writing)
- connection reuse is very important (it speeds up the communication)
- HTTP Keep-Alive
- HTTP Pipelining

HTTP Keep-Alive (persistent connections)

HTTP Keep-Alive is an agreement not to close the connection between client and server after each request/response
- because opening a new TCP connection is quite expensive (time-wise) - especially, when the server that I am trying to reach is geographically far away
- reusing the same connection reduces latency and overhead with creating new TCP connections
- with HTTP Keep-Alive, first request needs 2x RTT (round-trip time) and every other request needs 1xRTT
  - without it, it’s always 2xRTT for each request
  - the the time saved is $(n - 1) * RTT$ , where $n = am o u n t_re q u es t s_se n t$
  - 1xRTT = request - response
  - 2xRTT = SYN - SYN ACK - ACK (+ request) - response
limitation: requests are processed sequentially, request is sent, waiting for response and then another request is sent etc.

HTTP pipelining

an enhancement to HTTP Keep-Alive (persistent connections) by allowing to send multiple requests without waiting for their responses
requests sent in batch are processed in the same order (FIFO) at the server
limitation:
- “head of line blocking”, but still, if the first requests takes long to process, it still blocks the other requests in “line” (it’s FIFO)
- buffering of the requests can exhaust server’s resources
is not used today

Multiple parallel connections

opening multiple TCP connections simultaneously
- there are 6 connections per host/browser (it’s a trade-off between higher parallelism and client/server overhead)
  - it is also a prevention agains DoS attacks (so one client cannot exhaust the whole server)
- so one slower response does not block the faster ones
domain sharding:
- a workaround for the 6-connection limit per host
  - a server can “shard” it’s domain into subdomains (IPs), which all point to the same server/content
    - static1.example.com, static2.example.com, static3.example.com
  - a browser sees them as different hosts → allows for 6 connections per domain (so 3 shards ⇒ 6x3=18 connections at once)
- trade-off:
  - more DNS lookups, more TCP handshakes, bigger complexity
  - server must implement different ways to prevent DoS attacks

Serving the HTTP protokol request

user enters the URL to the browser
DNS resolution: browser gets the IP address for the server
TCP three way handshake + creating the socket for communication
browser sends ACK and HTTP request
web servers passes the request to the running web application which serves the request and then sends the response back to the client

there could be more domains/websites on one physical web server thanks to VirtualHosts, the IP is the same and client uses Host header to pick the correct virtual host
- on Zpracování požadavku od klienta na Apache httpd

Keeping the state

HTTP is stateless, we are using different mechanisms to keep the state:
- Authorization header is copied in every request
- Cookies - small bits of information sent with the request to inform the server about the state (or to keep a Session = temporary storage of the state on the webserver side)
- Hypertext - original HTTP design principle
  - app state is represented by resources (hypermedia) and links define transitions between states
  - e.g. HATEOAS principles

SOAP protocol

SOAP and WSDL (inside the note)

TLS + proxy servers and their modes

when communicating via TLS with servers, there could be different intermediates (= proxies) intercepting the communication
different modes of proxies:
- TLS Offloading (= SSL offloading)
  - encrypted connection is only between client and proxy
  - proxy decrypts data and forwards them in plain HTML to backend servers
    - this saves resources for backend servers, but has to be done in trusted and closed networks (e.g. data centers)
    - proxy logs traffic, scans for malware, does the load balancing, caching etc.
  - web article: https://sectigostore.com/blog/what-is-ssl-offloading-features-benefits-of-ssl-offloading/
- TLS Bridging (= SSL bridging, re-encryption)
  - the same thing as TLS Offloading, but the requests/responses are again encrypted by proxy when going to/from backend servers
  - this is more secure, it still allows the proxy to read the data (and modify them), but it is also slower (encrypting and decrypting both ways)
- TLS Pass-through (= SSL Pass-through)
  - proxy cannot decrypt data and just sends them over to the backend server
  - true end-to-end encryption, but it increases the load on backend servers
    - no content inspection on the proxy level (scanning for malware, logging etc.)

SNI = modern solution to virtual hosting

with virtual hosting, we can host more websites on one machine, which is great
but with HTTPS - the server needs to provide a certificate before the connection
- but! in order to send the correct certificate, the server needs to know, which domain/website is the client trying to reach
- but the server cannot receive the HTTPS request with the Host header before the TLS handshake
old solutions:
- one website = one server (wasteful, not enough IPv4 addresses)
- wildcard certificates - works only for subdomains of one website, not for different websites on one server
- SAN = multidomain certificates (for all domains on one server)
  - commonly on shared webhostings, but you don’t want to share the certificate with other websites
  - when a new website is added, new certificate for all website has to be issued - management nightmare
SNI solves this problem by extending the TLS protocol
- the name of the host is actually sent during the TLS handshake, so the server knows which certificate to send for verification
- there is one privacy concern
  - the first handshake message has to be in plaintext, so the attacker could see which host are you trying to connect to
- thanks to plaintext SNI, even the TLS pass-through proxies can intelligently load balance the requests

Petrova digitální zahrada 🚀

Procházet

AM1 - 4. lecture

Major communication protocols

HTTP

RMI

XML-RPC

SOAP

WebSocket

Synchronous & Asynchronous communication

Synchronous

Asynchronous

How do sockets work?

HTTP Keep-Alive (persistent connections)

HTTP pipelining

Multiple parallel connections

Serving the HTTP protokol request

Keeping the state

SOAP protocol

TLS + proxy servers and their modes

SNI = modern solution to virtual hosting

Graf

Obsah

Příchozí odkazy