apt upgrade on K3s node
Given the problems I had when I last upgraded everything on my K3s cluster, I’m going to put a runbook together for doing it “properly”.
Drain the server
% kubectl drain roger-nuc0
node/roger-nuc0 cordoned
error: unable to drain node "roger-nuc0" due to error:[cannot delete Pods with local storage (use --delete-emptydir-data to override): ..., cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): ..., cannot delete Pods declare no controller (use --force to override): default/dnsutils], continuing command...
There are pending nodes to be drained:
roger-nuc0
...
So it’s cordoned the node, but it can’t drain it. This is annoying, because it won’t even start draining the node until you resolve all of the problems. Let’s try again. First we’ll delete the uncontrolled pod:
% kubectl delete pod dnsutils
pod "dnsutils" deleted
Then we’ll run the command again, adding the suggested options:
% kubectl drain roger-nuc0 --ignore-daemonsets --delete-emptydir-data
node/roger-nuc0 already cordoned
Warning: ignoring DaemonSet-managed Pods: ...
evicting pod grafana/grafana-5f7f6d4d8c-lksrj
...
Checks
At this point, I’m going to check that my cluster’s still working. This is probably paranoia, and should be automated.
- The Longhorn dashboard shows that the volumes are all still healthy.
- Grafana is still working. There’s a brief gap while various pods were being evicted, but everything looks basically OK.
- ArgoCD looks healthy. Refreshing the apps succeeded, which implies that Gitea is still OK.
- I can log into Gitea and everything seems to be OK.
Upgrade the server
sudo apt update
sudo apt upgrade
sudo shutdown -r now
And then we wait.
Upgrade the agents
Repeat for the agent nodes. I did them in reverse order: roger-nuc3
, roger-nuc2
, roger-nuc1
.