Database in K8S or outside?

My (ordered) recommendation:

0) If I can use a Database-as-a-Service in my project, I do it. I love to delegate the most difficult problems to my cloud provider!

1) No DB-as-a-Service, I would go with a separate Kubernetes cluster for your database.

I do not mind extra MB and CPU for running a K8S master, if I can have a consistent API for all my infrastructure. Pinning your DB instances to the nodes, give you a-like-VM experience. I treat here the k8s as my distributed init. Why separated? You might want to use the newest k8s for running your app, the databases are more fragile, they do not like changes.

2) If not K8S, a separate VM/VMs, but, you know, life is too short for provisioning your machine with ssh, ansible, or salt.

3) You can as well go with DB in K8S with persistent volumes (ofc). If you do not have a lot of data, you can go pretty far with it.

Notice: you MUST (RFC2119) understand the consequences of such a decision. It also depends on your cost of the downtime, SLA, your backup/recovery strategy, budget, …, and the sanity of your cloud provider. You will not die immediately if you do that.

To sum up. The persistence is hard, it did not get easier while we got excited with K8S. In onCloud, we delegate it. OnPremise, we have to cope with it and take extra care of our SQL or noSQL databases.

There is hope. We have new databases, e.g., CockRoachdb or FaunaDB, that are more resilience and are better in hiding the complexity of the data replication/clustering (see Jepsen analyses). Who knows. We might get to a point, when we are not afraid to run databases even on the spot instances ;).

ps. Regarding running your own database on Azure, they like to restart your machines. The MS cloud will increase the speed of learning how to build your own robust database setup.


Centralized logging for kubernetes with fluentd and elasticsearch

I had to setup a centralized monitoring for our production and staging applications. In my company, we saw a need to have an alternative solution (introduction of pricing, etc.) to the google stackdrive logger in place.

Having some bad experience with running Logstash, I decided to go with EFK (elasticsearch, fluentd, and kibana). Plus. I could reuse the existing fluentd deployment definition in the kubernetes github repo, so it was easy to get something running in short time.