Stateless vs Stateful
Storing the canonical source of truth in the database and horizontally scaling by adding new stateless service instances as needed has been very effective.
- easy to scale horizontally
- only requires simple round-robin load balancing.
- The problem is our applications do have state and we are hitting limits where one database doesn’t cut it anymore.
- increased latency from the roundtrips to the database
- the complexity of the caching layer required to hide database latency problems
- In response we’re sharding relational databases or using NoSQL databases. This gives up strong consistency which causes part of the database abstraction to leak into services.
- troublesome consistency issues
Data Shipping Paradigm
A client makes a service request. The service talks to the database and the database replies with some data. The service does some computation. A reply is sent to the client. And then the data disappears from the service.
The next request will be load balanced to a different machine and the whole process happens all over again. It’s wasteful to repeatedly pull resources into load balanced services
- Data Locality: data is right on that server, no need to hit database, low latency
- Function Shipping Paradigm:Once the request has been handled the data is left on the service. The next time the client makes a request the request is routed to the same machine so it can operate on data that’s already in memory.
- Sticky connections: A client makes a request to a cluster of servers and the request is always routed to the same machine.
- open up a persistent HTTP connection or a TCP connection
- Once the connection breaks the stickiness is gone and the next request will be load balanced to another server.SELECT
- load balancing :It’s easy to overwhelm a single server with too many connections
philips: Essentially I see the world broken down into four potential application types: 1) Stateless applications: trivial to scale at a click of a button with no coordination. These can take advantage of Kubernetes deployments directly and work great behind Kubernetes Services or Ingress Services. 2) Stateful applications: postgres, mysql, etc which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, etc or utilize StatefulSets. 3) Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by StatefulSets. 4) Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept