Erlang cluster on Kubernetes: CertificateRequest cleanup
In this post, I showed how to use
an init container to create CertificateRequest objects, which cert-manager signs, returning the certificates. A new
request is created every time a pod starts. This eventually leaves a lot of stale CertificateRequest objects. We
should clean those up.
I opted to use a CronJob.
Dockerfile
The Dockerfile is very similar to the init container:
FROM docker.io/alpine
# We need coreutils for a version of 'date' that can do --date '15 minutes ago'.
# We need curl and jq to access the Kubernetes API and to parse the responses.
RUN apk add --no-cache coreutils && \
apk add --no-cache curl && \
apk add --no-cache jq
WORKDIR /erlclu-request-cleanup
COPY erlclu-request-cleanup.sh erlclu-request-cleanup.sh
ENTRYPOINT ["/erlclu-request-cleanup/erlclu-request-cleanup.sh"]
erlclu-request-cleanup.sh
The cleanup script looks like this:
#!/bin/sh
expiry=$(date --date="${MAX_AGE_MINS} minutes ago" --utc +"%Y-%m-%dT%H:%M:%SZ")
AUTH_TOKEN="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
NAMESPACE="$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)"
CA_CERT_BUNDLE=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cert_manager_api_base_url="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/apis/cert-manager.io/v1"
certificate_requests_base_url="${cert_manager_api_base_url}/namespaces/${NAMESPACE}/certificaterequests"
certificate_requests=$(curl -s -X GET \
--header "Accept: application/json" \
--header "Authorization: Bearer ${AUTH_TOKEN}" \
--cacert "${CA_CERT_BUNDLE}" \
"${certificate_requests_base_url}?labelSelector=app=${APPLICATION_LABEL}")
expired_requests=$(echo "$certificate_requests" | \
jq --arg expiry "$expiry" -r '.items[] | select(.metadata.creationTimestamp < $expiry) | .metadata.name')
for x in $expired_requests; do
curl -s -X DELETE \
--header "Authorization: Bearer ${AUTH_TOKEN}" \
--cacert "${CA_CERT_BUNDLE}" \
"${certificate_requests_base_url}/$x" --output /dev/null
done
- It works out an expiry time for certificates (e.g. 15 minutes ago); we need the
coreutilsversion ofdate, rather than the busybox one, for this to work. - It queries for all of our
CertificateRequestobjects. - It runs that through
jqto find those that have expired. - It deletes each of the expired objects.
By looking for requests that are at least 15 minutes old, we don’t accidentally delete those that are still in the process of being issued. This assumes that issuing a certificate takes less than 15 minutes; I’m comfortable with that.
CronJob
The CronJob looks like this:
apiVersion: batch/v1
kind: CronJob
metadata:
name: erlclu-request-cleanup
namespace: erlclu
spec:
schedule: "8/15 * * * *"
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: request-cleanup
image: docker.k3s.differentpla.net/erlclu-request-cleanup:0.1.0
imagePullPolicy: Always
env:
- name: APPLICATION_LABEL
value: erlclu
- name: MAX_AGE_MINS
value: "15"
restartPolicy: OnFailure
activeDeadlineSeconds: 90
serviceAccountName: erlclu
It runs every 15 minutes, starting at 8 minutes past the hour: H+8, H+23, H+38, H+53.
I’ve set successfulJobsHistoryLimit and failedJobsHistoryLimit to 1, meaning that you can look at the logs
afterwards. On the other hand, it does leave the pod lying around, which is untidy.
Service account
When we originally defined the erlclu service account, we only granted it create and get permissions; we need to
add list and delete:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: erlclu
name: certificate-requester
rules:
- apiGroups: ["cert-manager.io"]
resources: ["certificaterequests"]
verbs: ["create", "get", "list", "delete"]