Pod DNS Problems

25 Feb 2022 08:42 k3s core-dns dns

I’ve got an extra instance of CoreDNS running in my cluster, serving *.k3s.differentpla.net, with LoadBalancer and Ingress names registered in it, and it’s working fine for queries to the cluster. It’s not working fine for queries inside the cluster. What’s up with that?

It’s DNS. It’s always DNS.

Motivation

While setting up an ArgoCD project, I set the Repository URL to https://git.k3s.differentpla.net/USER/REPO.git, but it failed, complaining that the name didn’t resolve. This threw me because it works fine from outside the cluster.

Debugging DNS Resolution

See https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/.

$ kubectl run dnsutils -it \
  --restart=Never --rm \
  --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 -- /bin/bash

You probably need to press Enter.

DNS queries:

root@dnsutils:/# nslookup kubernetes.default
Server:		10.43.0.10
Address:	10.43.0.10#53

Name:	kubernetes.default.svc.cluster.local
Address: 10.43.0.1

dig (unlike nslookup) doesn’t honour search directives in /etc/resolv.conf, so dig kubernetes.default won’t work.

k3s.differentpla.net?

root@dnsutils:/# nslookup git.k3s.differentpla.net
Server:		10.43.0.10
Address:	10.43.0.10#53

** server can't find git.k3s.differentpla.net: NXDOMAIN

command terminated with exit code 1

So, yeah, DNS lookup of *.k3s.differentpla.net from inside a pod isn’t working.

Do external names resolve? I’d probably have noticed if they didn’t, but let’s check:

root@dnsutils:/# nslookup blog.differentpla.net
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
blog.differentpla.net	canonical name = differentpla-net.github.io.
Name:	differentpla-net.github.io
Address: 185.199.110.153
...etc.

It also works if I specify my router explicitly:

root@dnsutils:/# nslookup git.k3s.differentpla.net 192.168.28.1
Server:		192.168.28.1
Address:	192.168.28.1#53

Name:	git.k3s.differentpla.net
Address: 192.168.28.13

So the problem is – probably – that the cluster-default CoreDNS (at 10.43.0.10) isn’t using my router (at 192.168.28.1) for DNS resolution.

Pod’s DNS Policy

The default DNS policy for a pod is "ClusterFirst". This forwards queries to the upstream nameserver inherited from the node.

I’m not convinced by that, however, because when I look at /etc/resolv.conf in a container, it has nameserver 10.43.0.10, which is the ClusterIP service for CoreDNS:

$ kubectl --namespace kube-system get service kube-dns
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   66d

CoreDNS DNS Policy

The DNS Policy for CoreDNS is "Default" (which – confusingly – isn’t the default). Anyhow:

$ kubectl --namespace kube-system get pod coredns-96cc4f57d-cztp4 -o yaml | grep dnsPolicy
  dnsPolicy: Default

The documentation says:

"Default": The Pod inherits the name resolution configuration from the node that the pods run on.

…which is odd, because that pod is running on rpi405:

$ kubectl --namespace kube-system get pod coredns-96cc4f57d-cztp4 -o json | jq -r '.spec.nodeName'
rpi405

…and DNS resolution works correctly on that node:

ubuntu@rpi405:~$ nslookup git.k3s.differentpla.net
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	git.k3s.differentpla.net
Address: 192.168.28.13

So: is k3s doing something out-of-spec with CoreDNS? Or am I just misunderstanding how it’s supposed to work?

CoreDNS ConfigMap

$ kubectl --namespace kube-system get cm coredns -o yaml
...
data:
  Corefile: |
    .:53 {
        ...
        forward . /etc/resolv.conf
    }
...

Nothing particularly surprising there. What’s in /etc/resolv.conf in the CoreDNS pod?

$ kubectl --namespace kube-system exec -it coredns-96cc4f57d-cztp4 -- /bin/sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "139317fa90a415ce57a358ea81920eca242b4f258ffd4adc7c02544415eb5a4c": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown

That’s a problem: The CoreDNS container is so stripped down that it doesn’t even contain a shell.

Debugging with an ephemeral debug container

See https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/.

$ kubectl --namespace kube-system debug -it coredns-96cc4f57d-cztp4 --image=busybox
Defaulting debug container name to debugger-lfc7b.
error: ephemeral containers are disabled for this cluster (error from server: "the server could not find the requested resource").

Another problem: ephemeral containers are behind a feature gate, and k3s has it disabled by default.

I might have been able to work around this by editing the CoreDNS Deployment to explicitly add a debugging container, but upon further reading, it seems that containers within a pod don’t share a filesystem (unless you explicitly use a Volume), which means that I’d not be able to inspect CoreDNS’s /etc/resolv.conf, anyway.

CoreDNS logging

See https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#are-dns-queries-being-received-processed:

Turn on CoreDNS logging by editing the config map:

$ kubectl --namespace kube-system edit cm coredns

apiVersion: v1
data:
  Corefile: |
    .:53 {
        log     # <--
        errors
        health
# ...

$ kubectl exec -i -t dnsutils -- nslookup git.k3s.differentpla.net
...

$ kubectl --namespace kube-system logs coredns-96cc4f57d-cztp4
...
[INFO] 127.0.0.1:55248 - 51746 "HINFO IN 7620827858334401234.2272423027405113866. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.012383467s
[INFO] 10.42.0.179:43855 - 12121 "A IN git.k3s.differentpla.net.default.svc.cluster.local. udp 68 false 512" NXDOMAIN qr,aa,rd 161 0.000657711s
[INFO] 10.42.0.179:49057 - 9870 "A IN git.k3s.differentpla.net.svc.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000627026s
[INFO] 10.42.0.179:56370 - 40719 "A IN git.k3s.differentpla.net.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000681747s
[INFO] 10.42.0.179:37850 - 36437 "A IN git.k3s.differentpla.net. udp 42 false 512" NXDOMAIN qr,rd,ra 138 0.103363513s

Nothing directly interesting in there. Turn logging off again.

Aside: Custom overrides?

What is interesting is that the log contains these:

[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

It seems that, at some point in November 2021, support was added for customizing the cluster CoreDNS server. That bears further investigation: maybe I can get rid of my custom instance of CoreDNS (which would be cleaner), or maybe I can explicitly forward k3s.differentpla.net to it (which would fix the problem at hand).

Container filesystem

The /etc/resolv.conf file for the CoreDNS container must be stored somewhere. k3s uses containerd, so we can go looking on the relevant node:

ubuntu@rpi405:~$ sudo ctr c ls | grep coredns
b472a447...    docker.io/rancher/mirrored-coredns-coredns:1.8.6              io.containerd.runc.v2
da61d860...    docker.io/rancher/mirrored-coredns-coredns:1.8.6              io.containerd.runc.v2

Note: The long identifiers are truncated for readability in this section.

Why are there two of them? Dunno. There’s only one container in the pod:

$ kubectl --namespace kube-system get pod coredns-96cc4f57d-cztp4 -o json | jq '.spec.containers | length'
1

Having said that, however:

$ kubectl --namespace kube-system get pod coredns-96cc4f57d-cztp4 -o json | gron | grep b472
json.status.containerStatuses[0].lastState.terminated.containerID = "containerd://b472a447...";

$ kubectl --namespace kube-system get pod coredns-96cc4f57d-cztp4 -o json | gron | grep da61
json.status.containerStatuses[0].containerID = "containerd://da61d860...";

So it looks like we want da61.... You’ll note that /etc/resolv.conf is mounted explicitly:

ubuntu@rpi405:~$ sudo ctr c info da61d860... | gron | grep resolv
json.Spec.mounts[11].destination = "/etc/resolv.conf";
json.Spec.mounts[11].source = "/var/lib/rancher/k3s/agent/containerd/io.containerd.grpc.v1.cri/sandboxes/304b.../resolv.conf";

What’s in it?

ubuntu@rpi405:~$ sudo cat /var/lib/rancher/k3s/agent/containerd/io.containerd.grpc.v1.cri/sandboxes/304b.../resolv.conf
nameserver 8.8.8.8

Well, there’s your problem.

CoreDNS, as default-configured by k3s, uses Google’s DNS servers at 8.8.8.8, rather than locally-configured DNS servers. So it’s asking 8.8.8.8 about git.k3s.differentpla.net, which is asking my externally-facing DNS server about git.k3s.differentpla.net, which knows nothing about it, and returns NXDOMAIN.

Rancher: Troubleshooting DNS

It turns out that the Rancher docs have “Troubleshooting DNS” page, with a section entitled Check upstream nameservers in resolv.conf. It says to try this:

$ kubectl run -i --restart=Never --rm test-${RANDOM} --image=ubuntu --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"dnsPolicy":"Default"}}' -- sh -c 'cat /etc/resolv.conf'
nameserver 8.8.8.8
pod "test-19198" deleted

That just confirms it. But where is it coming from, and can I change it? Should I change it?

Whence 8.8.8.8?

Searching for "8.8.8.8" in the k3s repository on GitHub takes me to commit a4df9f4, wherein it “validates” the nameserver configured in the node’s /etc/resolv.conf, and if that’s not valid, uses 8.8.8.8 by default.

To check for validity, it uses the Go function IsGlobalUnicast, which is implemented as follows:

func (ip IP) IsGlobalUnicast() bool {
	return (len(ip) == IPv4len || len(ip) == IPv6len) &&
		!ip.Equal(IPv4bcast) &&
		!ip.IsUnspecified() &&
		!ip.IsLoopback() &&
		!ip.IsMulticast() &&
		!ip.IsLinkLocalUnicast()
}

The node’s /etc/resolv.conf looks like this:

$ cat /etc/resolv.conf
# ...systemd warnings ellided...

nameserver 127.0.0.53
...

…and that’s a loopback address, so it thinks this file is invalid.

According to my reading of the source code in that commit, it ought to next try /run/systemd/resolve/resolv.conf, which looks like this:

$ cat /run/systemd/resolve/resolv.conf
# ...systemd warnings ellided...

nameserver 192.168.28.1
nameserver fe80::...
...

That nameserver 192.168.28.1 is my router, so it ought to be valid, but I think it’s being tripped up by the nameserver fe80::..., which is a link-local unicast address (even though it’s also my router), and that’s causing it to reject the whole file. At that point, it gives up and defaults to using 8.8.8.8.

My router is set up for “Stateless” RA mode, so the fe80::... nameserver is apparently being invented by each client.

Using a custom override

As noted above, CoreDNS supports importing custom zones by placing files in the /etc/coredns/custom directory. I’ll cover this in a separate post.