Once you have containers across multiple hosts networking becomes important. How do containers across hosts talk to each other? Normally containers are in a private NAT network on the host. They can reach the outside world but cannot be reached from outside. This is the usually the default setup.
There are ways to design the network so containers are not in a private network. This would involve bridging one of the host's physical networks and connecting containers to the that bridge. This way containers are in the same subnet as the host. If you do this across multiple hosts in the same subnet all containers and hosts will be in the same subnet. This is a flat network.
But this needs control of the router and in many scenarios, for instance on the cloud this is usually not possible. In many cases network design or constraints call for other options.
The simplest option is routing. Take 2 servers in the same subnet connected to each other, both hosting container networks within them. While the hosts can talk to each other the containers cannot. One easy way to fix this is to add a route on each of the hosts. While this works for a few hosts it can quickly become unwieldy as host count increases.
This is where applications like Quagga come in handy. They can automatically share, setup and manage routes across servers. This is the most scalable way to connect containers across hosts that should be the first choice.
Another option is to use overlay networks. Overlay networks are networks built on top of existing networks. Fortunately for Linux users Vxlan is part of the kernel.
Vxlan lets you build layer 2 overlays over layer 3 networks. How it works is you add a vxlan device on each host and connect it to a standalone bridge. And connect containers to the bridge. Now any containers on bridges connected to the vxlan device are on the same layer 2 network. Vxlan also lets you segment networks as required with vxids.
We personally prefer layer 3 networks where container subnets across hosts are routed via hosts to each other. Its a much more scalable model than stretching layer 2 network across servers. BGP lets you do this and is relatively easy to setup and use with tools like Quagga. We already have an article up on connecting container subnet with Quagga BGP.
Lets use 3 servers for this example. The first step is to add vxlan devices on each of the hosts.
ip link add vx0 type vxlan id 42 group 239.1.1.1 dev eth0
This creates a Vxlan device vx0 with vxid 42 connected to network eth0. In this case we are assuming the outgoing network interface for the host to the other hosts who are going to be part of this Vxlan network is eth0. Vxlan uses multicast and 239.1.1.1 is the multicast IP. Vxids are used to segment networks when required.
Once we have created the vxlan devices we just need to create a standalone bridge on each host and add the vxlan device to it.
First let's add the bridge device with the brctl command. If you don't have the brctly utility install the bridge-utils package. It is usually preinstalled on most distributions.
brctl addbr br0
Now add the vxlan device to the bridge.
brctl addbr vx0
Now any container or vm which connects to the vx0 bridge will be on the same layer 2 network across hosts and should be able to ping each other.
Note there are no DHCP services on this network. Without DHCP you will need to manually setup networking on containers by adding static IPs and default routes.
Setting up DHCP ensures all containers or VMs connecting to the network get an IP and networking set up automatically. Setting up DHCP is not difficult, you need to choose a subnet for the overlay network, let's say 10.0.50.0/24, and then simply start a Dnsmasq instance attached to the vx0 bridge on one of the hosts. This instance will take care of DHCP services for the entire overlay network.
If you need containers on this network to access the internet this can be done by adding an masquerading rule.
Flockport automates this process and makes it easier to add Vxlan networks with DHCP services and masqerading as required. Here is a screencast of Flockport setting up a Vxlan network.
Kernel version 3.14 and up is the minimum recommended for Vxlan networks. A note of Vxlan ports. The Vxlan IANA assigned port is 4789. Older versions of the kernel Vxlan driver defaulted to port 8472. Often if the Vxlan ports are not consistent across hosts the network will not work so its important to use a recent version of the kernel. Also if you are using a firewall please ensure port 4789 is open.
Please note Vxlan uses multicast and so will not work on the cloud as few cloud providers support multicast. Vxlan is a great option for your own setups, datacenters etc but on the cloud you need to use other options. Using layer 3 networks and BGP is a good option for cloud based networks. We covered Wireguard previously which is another option and provides encryption out of the box.