Container Networking is Hard Enough…
To those not versed in the dark networking arts, one of the mysteries of OpenShift (RedHat’s wrapper around Kubernetes) is how a pod communicates with the outside world.
This article is more about DNS on clusters, but the point is the same: things can get pretty complicated pretty quickly.
Let’s Add DNSMasq…
Recently I was grappling with this while debugging a Vagrant OpenShift cluster test suite when someone smarter than me took the time to explain what was happening
I wasn’t sure I’d got all the details, so I put together these diagrams to help me follow.
External DNS Lookup
Here’s the ‘simple’ case of a single-container pod pinging google.com:
The steps can be described linearly as:
- Process starts in container, and needs to know what google.com resolves to.
- Process looks up /etc/resolv.conf to see where dns queries should be resolved.
- Process asks the 10.0.2.15 on port 53 (DNS) to get it google.com’s IP
- DNSMasq determines that this is a query that needs to go to the outside world, so passes it out
- In this particular setup it passes the lookup to DNSMasq’s configured DNS exit point (which is eth1 in this Vagrant setup)
Even here I’m skirting over a lot. The ping process can also refer to /etc/hosts.conf, /etc/nsswitch.conf, /etc/gai.conf, and /etc/hosts, for example. And I use landrush to manage host lookups for my VMs (between the VMs and to/from the host).
In these diagrams, I don’t show the cluster, rather everything happening on the one node.
Also, the IP addresses for the ‘eth0’ are the standard Vagrant-allocated IPs.
resolv.conf
In OpenShift, the resolv.conf file in the container is constructed by taking the resolv.conf from the host operating system. It then places a nameserver above these (which can be set in your node.yaml file).
DNSmasq
By default, this nameserver points to your host IP (10.0.2.15 in this Vagrant setup), which expects a DNS resolver (typically, a dnsmasq server) to be sitting on that IP’s port 53. If no item is provided, it defaults to the kubernetes service IP, by-passing dnsmasq.
DNSMasq uses the servers specified in the files in /etc/dnsmasq.d/*
According to this thread, there is no specific ordering to the asks, it just asks each in turn until it gets an answer.
Local Cluster DNS Lookup
So that’s the ‘simple’ case of an external lookup.
Now we come onto a local dns lookup on the Kubernetes cluster.
The steps can be described linearly as:
- Process starts in container, and needs to know what kubernetes.default.svc.cluster.local resolves to.
- Process looks up /etc/resolv.conf to see where dns queries should be resolved.
- Process asks the 10.0.2.15 on port 53 (DNS) to get it kubernetes.default.svc.cluster.local’s address
- DNSMasq determines that this is a query that needs to go to the cluster, so passes it to the OpenShift node process to look up. This process is listening on port 53 of the localhost IP (127.0.0.1).
- The OpenShift node process either returns the IP address from its cache (which is why bouncing the node process can make some resolution issues go away), or passes the request to the master process DNS.
To help see this setup, you can run this command on the host. In this setup, I have a node and a master OpenShift process running on one Vagrant VM:
[root@master1 ~]# netstat -nltp | grep 53 tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 30998/openshift tcp 0 0 10.0.2.15:53 0.0.0.0:* LISTEN 31034/dnsmasq tcp 0 0 0.0.0.0:8053 0.0.0.0:* LISTEN 29316/openshift
This is based on work in progress from the second edition of Docker in Practice
Get 39% off with the code: 39miell2
So this is a master that’s also a node? Would it be worth separating the master and the node as separate hosts? It might be more clear what is hosted where.