Service Discovery in Kubernetes: Combining the Best of Two Worlds

Before jumping to any Kubernetes specifics, let's talk about the service discovery problem in general.

What is Service Discovery

In the world of web service development, it's a common practice to run multiple copies of a service at the same time. Every such copy is a separate instance of the service represented by a network endpoint (i.e. some IP and port) exposing the service API. Traditionally, virtual or physical machines have been used to host such endpoints, with the shift towards containers in more recent times. Having multiple instances of the service running simultaneously increases its availability and helps to adjust the service capacity to meet the traffic demand. On the other hand, it also complicates the overall setup - before accessing the service, a client (the term client is intentionally used loosely here; oftentimes a client of some service is another service) needs to figure out the actual IP address and the port it should use. The situation becomes even more tricky if we add the ephemeral nature of instances to the equation. New instances come and existing instances go because of the non-zero failure rate, up- and downscaling, or maintenance. That's how a so-called service discovery problem arises.

Service discovery problem

Service discovery problem.

Read more

API Developers Never REST

Disclaimer: despite the controversial title, this article is not trying to show that RPC is a superior approach to REST, or GraphQL is superior to RPC. Instead, the goal of the article is to give you an overview of the approaches, their strengths and weaknesses. The final choice anyway will be a trade-off.

Even though HTTP is an application layer (i.e. L7), protocol, when it comes to API development, HTTP de facto plays the role of a lower-level transport mechanism.

There are multiple conceptually different approaches on how to implement an API on top of HTTP:

  • REST
  • RPC
  • GraphQL

...but the actual list of things an average API developer needs to be aware of is not limited by these three dudes. There are also JSON, gRPC, protobuf, and many other terms in the realm. Let's try to sort them out, once and for all!

What is REST? What is RPC? What is GraphQL? What is the difference between REST, RPC, and GraphQL?

Read more

Container Networking Is Simple!

Just kidding, it's not... But fear not and read on!

Working with containers always feels like magic. In a good way for those who understand the internals and in a terrifying - for those who don't. Luckily, we've been looking under the hood of the containerization technology for quite some time already and even managed to uncover that containers are just isolated and restricted Linux processes, that images aren't really needed to run containers, and on the contrary - to build an image we need to run some containers.

Now comes a time to tackle the container networking problem. Or, more precisely, a single-host container networking problem. In this article, we are going to answer the following questions:

  • How to virtualize network resources to make containers think each of them has a dedicated network stack?
  • How to turn containers into friendly neighbors, prevent them from interfering, and teach to communicate well?
  • How to reach the outside world (e.g. the Internet) from inside the container?
  • How to reach containers running on a machine from the outside world (aka port publishing)?

While answering these questions, we'll setup a container networking from scratch using standard Linux tools. As a result, it'll become apparent that the single-host container networking is nothing more than a simple combintion of the well-known Linux facilities:

  • network namespaces;
  • virtual Ethernet devices (veth);
  • virtual network switches (bridge);
  • IP routing and network address translation (NAT).

And for better or worse, no code is required to make the networking magic happen...

Read more

Traefik: canary deployments with weighted load balancing

Traefik is The Cloud Native Edge Router yet another reverse proxy and load balancer. Omitting all the Cloud Native buzzwords, what really makes Traefik different from Nginx, HAProxy, and alike is the automatic and dynamic configurability it provides out of the box. And the most prominent part of it is probably its ability to do automatic service discovery. If you put Traefik in front of Docker, Kubernetes, or even an old-fashioned VM/bare-metal deployment and show it how to fetch the information about the running services, it'll automagically expose them to the outside world. If you follow some conventions of course...

Read more

Service Proxy, Pod, Sidecar, oh my!

How services talk to each other?

Imagine you're developing a service... For certainty, let's call it A. It's going to provide some public HTTP API to its clients. However, to serve requests it needs to call another service. Let's call this upstream service - B.

Service A talks to Service B directly.

Obviously, neither network nor service B is ideal. If service A wants to decrease the impact of the failing upstream requests on its public API success rate, it has to do something about errors. For instance, it could start retrying failed requests.

Read more