What is service discovery, what does it do, do you even need it? For the average user running applications on single servers or even a few servers service discovery is not needed.
Service discovery becomes useful when running a cluster with multiple instances of applications, databases, web and app servers. The applications are typically decoupled so you can scale individual instances across servers or architected with microservices that are exposed by REST APIs. Instances often come and go or scale up and down.
This makes it important to have a registry where applications and their IP addresses and ports are registered and can be discovered by other applications that depend on them. Thus there is no need to hardcode IPs, dependent services can be found by their name.
In trying to understand service discovery it may be useful to think of websites as services. When your browser need to reach a particular website the underlying dns client makes a query to a known dns server (this is what the /etc/resolv.conf file in Linux lists) and gets the websites' IP address. Behind the scenes there are root dns servers that list all currently available websites and their IP addresses.
A service discovery endpoint lists all available services and their corresponding addresses and ports.
Clients can query the registry with the service name to get their address. Most service discovery endpoints expose both a dns interface and a REST API.
We will take a few liberties and use a simplified example to illustrate how service discovery works
Let's take a typical app like Wordpress most users are familiar with. Wordpress depends on a database (Mysql), PHP, a webserver like Nginx/Apache. This is the basic architecure of most applications.
However when scaling you want a decoupled architecture. Wordpress is not designed to scale across multiple instances seamlessly. This is where we assume multiple wordpress instances can simply be decoupled and scaled.
Most scalable apps are designed to scale across instances and architected to manage databases, state and caching. Decoupling your application enables spinning up multiple instances on demand. Databases are similarly deployed to scale with replication, build in redundancy and more.
While we are assuming containers as the underlying base hosting the apps here, it could be VMs or even bare metal servers. What matters here is scaling to multiple instances, though by abstracting away bare metal servers VMs make it easier, and containers even more so.
Let's use 5 Wordpress php-fpm instances, a cluster of MySQL instances and a loadbalancer to direct traffic to the Wordpress instances. These instances will be spread across the cluster and can scale on any available servers to accomodate more load.
On startup the Wordpress instances register themselves to the discovery endpoint as wordpress.app. Note you can have multiple instances register with the same service name. Similarly the MySQL instances also register themselves to the endpoint as mysql.db.
The Wordpress instances are configured to find their database at mysql.db by querying the endpoint. Then endpoint will return the IP address and port of the mysql.db service name. Multiple instances of the mysql.db service can exist with different IP addresses.
The load balancer similarly is configured to query the endpoint for wordpress.app. The service endpoint will return all the instances of the wordpress.app via DNS SRV records or a HTTP API. The endpoint can also be configured to return services on a round robin basis. The endpoint can also do health checks to remove any offline instances.
When the load balancer queries the endpoint for the instances it gets the IP addressess and ports and can direct traffic accordingly. If there is a surge more Wordpress instances can be easily spun up and found via the service discovery endpoint.
There are a few other ways to do this. Some discovery solutions have addons to populate load balancers like Haproxy and Nginx. In this case all service instances for the load balancer are automatically picked up and added to the load balancer configuration file. Haproxy 1.8 now has a run time resolver built in that can parse SRV records. So out of the box can discover all services backends via DNS.
The are a number of things we have glossed over. How does the the initial registration of services happen? How is service name to IP association discovered? Similarly how do services deregister themsevles when they are leaving the cluster. And more importantly how does the discovery happen. For instance the conventional load balancer out of the box can typically only query by DNS unless you configure a custom solution to query a HTTP REST API.
Now lets move to a more real world implementation and here we will use Flockport's own implementation of discovery to illustrate how it works.
Consul is a popular service discovery application. It offers both a DNS interface and REST API for clients and can also do round robin addressing.
Service discovery depends on automation. It's not feasible to manually add and remove services to the endpoint. That does not scale. Similarly they must be automated mechanisms for health checks, alerts, responding to load and spinning up new instances.
Flockport service discovery uses Consul. Once you add an endpoint it becomes available for use. Containers in a Flockport cluster are designed to check for any available endpoints on startup and if found will register any services associated with that container to the endpoint. And on being stopped they will deregister the service.
Flockport let's you associate services with containers. On startup the container will publish any defined service to any available service endpoints.
This still leaves the question of discovery open. While the Consul endpoint can be queried via it's REST API it also offers service discovery via DNS resolution. With a simple configuration you can configure the endpoint to become a recursor dns and then configure your clustered systems dns to point to the Consul service endpoint IP.
This way any dns queries will first hit the Consul endpoint and Consul will forward the request to the recursor if the service name is not found.
Alternatively you can configure your DNS client e.g. Dnsmasq to forward all queries for the .consul domain to the consul endpoint. All Consul registered services use the .consul scheme.
The important thing is to have an infrastructure where application instances can register themselves automatically to the discovery endpoint, have health checks and allow other applications which need these services be able to access the registry by HTTP REST API or DNS.
When you use Consul manually without an orchestration platfrom you will need to create a way for services to be registered to the endpoint, how service IPs will be discovered and how clients will consume the registry. This is actually not too hard for any programmer and most automation platforms offer a solution out of the box.
Here is a quick screencast of service discovery in action in Flockport