A cloud platform is a set of virtual machines and the infrastructure for them. It includes: networks, subnets, routing rules, firewalls, and virtual disks. This is the foundation of a cloud provider, Infrastructure as a Service (IaaS). At the base layer of the cloud, you can buy or rent these or other resources: databases, Kubernetes, Hadoop.
Building services above the base tier is more difficult. You need to be sure that all services, not just yours, are up and running. An example is a PaaS that runs on a Kubernetes infrastructure, namely the master node group, on which the management plane functions: kube-apiserver, scheduler, controller-manager and database. They manage the Kubernetes cluster and distribute the load. There is also a group of worker nodes on which these workloads are executed.
When the Kubernetes container starts up, the latter “runs” on one of these nodes. Communication begins: the masters join together in a quorum, exchanging data. The wizard also communicates with the vorkers and tells them what to do, and what to do if any of the nodes break. Everything happens inside the network, and such a cluster has a load balancer that distributes traffic to the master nodes.
We can’t just deploy Kubernetes on top of the infrastructure via Terraform, this process is hard to control. Something can go wrong and we can’t react quickly. And we need to react, so we have a consistent process that looks like this:
This is a flowchart of creating a Kubernetes cluster in a cloud infrastructure. Some steps and branching are omitted, but the general logic is like this:
- Initialize, create and configure the network
- Create and configure the master node
- Configure the management plane
- Create and start the worker node
- Reconfigure and start up the remaining resources
- Test availability and release the cluster to the client
How did we test?
There’s such a thing as a test pyramid, it shows the ratio of tests in the project. At the bottom are Unit-Tests, which should be the majority, then come Integration Tests, then End-to-end, Functional and other tests. They are more expensive to write and run, but you need less of them because almost everything is covered by Unit-tests. That’s exactly the scheme we started with.
It turned out that most popular methods don’t suit our project because it is based on distributed sequential processes executed asynchronously.
What disadvantages did we find?
Let’s analyze each type of tests separately:
Unit tests: generally good, but there are nuances. First – they don’t allow us to cover a multi-step process. Second, because of the specifics of our processes Unit-tests are often inefficient: we don’t have any individual small functions, but we do have domain logic. We develop services as part of domain driven design, and the logic is combined around domains and aggregators. For each step we have to prepare the state of the domain, then perform the step and run tests. Unit tests don’t lend themselves well to DDD, because in our case DDD is very much about working with states. Can this be covered by integration tests?
Integration tests have pros, but there are obvious cons. They are expensive to run and not well suited for logic testing. An integration test is good to write when it is the interaction with the database that must be tested. There is a fixed number of SQL-queries, and we write a test for each of them. But a large number of logical operations leads to branching of scenarios. Conducting an integration test for each of them is long and expensive, hard to write and maintain.
End-to-end tests. They are good, but they are also long and difficult to write. Such tests are hard to maintain: every time the services change, you need to change the tests too. The frequency of change is roughly equal to the number of tests multiplied by the number of services. In addition, it is almost impossible to raise the entire environment at once. If your service is running a virtualization infrastructure – you can’t just emulate that infrastructure, you have to run virtual machines and networks, which is expensive and not always possible.
And what about API tests, you ask? There’s an ambiguity here: our API is quite thin, there are some rest entities that the frontend works with, and behind them there’s a layer of technical, infrastructure and business logic that changes faster than the API. In addition, not everything works through the API – there are some processes inside the service: autoscaling, autohealing, working with persistent volume.