Erlang cluster on Kubernetes: TLS distribution
In the previous post, we got clustering working without TLS. Lifting from the investigation that I wrote up here, I’ll add TLS distribution to my Erlang cluster, but only with server certificates and with no verification (for now).
We need server certificates to allow encryption. We can’t use verification because the certificates don’t match the host names.
Enabling TLS distribution
To enable TLS distribution, we need to make some changes to vm.args.src
:
%...
-proto_dist inet_tls
-ssl_dist_optfile ${ROOTDIR}/inet_tls_dist.config
%...
This tells the Erlang runtime to use the inet_tls_dist
module for distribution (the _dist
is implicit), and provides
a file containing the various TLS options.
The ROOTDIR
environment variable is set by the startup script; we use it to ensure a fully-qualified path to the
config file.
Configuring TLS distribution
The configuration file looks like this:
[
{server, [
{certfile, "/secrets/erlclu-dist-tls.crt"},
{keyfile, "/secrets/erlclu-dist-tls.key"},
{verify, verify_none},
{secure_renegotiate, true}
]},
{client, [
{verify, verify_none},
{secure_renegotiate, true}
]}
].
This configures the server’s certificate and private key, and it disables peer verification for both the server and the client. This allows us to get encryption (because of the keypair), but not authentication (because we’re not actually verifying the certificates).
The documentation says that the certfile
must be a PEM file containing both the certificate and key. This isn’t
true; you can use certfile
and keyfile
to specify them separately.
Creating a self-signed certificate
My ultimate goal is to use a cert-manager Issuer
object to manage a private CA for issuing mTLS certificates. For,
now, however, we’ll make do with a self-signed certificate. I’ll use my elixir-certs script for this:
./certs self-signed \
--out-cert erlclu-dist-tls.crt --out-key erlclu-dist-tls.key \
--template server \
--subject "/CN=inet_tls_dist"
Kubernetes TLS secret
We need to put the certificate in a secret:
kubectl --namespace erlclu \
create secret tls erlclu-dist-tls \
--cert=erlclu-dist-tls.crt \
--key=erlclu-dist-tls.key
Using the secret
…and we need to mount the secret where the pod is expecting it:
#...
spec:
containers:
- name: erlclu
#...
volumeMounts:
- name: erlclu-dist-tls
mountPath: /secrets
volumes:
- name: erlclu-dist-tls
secret:
secretName: erlclu-dist-tls
items:
- key: tls.key
path: erlclu-dist-tls.key
- key: tls.crt
path: erlclu-dist-tls.crt
#...
Is it working?
Well, according to the home page, all of the nodes are talking to each other.
Is their communication encrypted? It’s hard to test. I’ll write two posts later to discuss that. One will be a complete hack; the other will use Wireshark.
Broken remote console
This does introduce one problem, however: kubectl exec ... bin/erlclu remote_console
command no longer works.
I asked on the Erlang forums, but in the meantime, I did my own investigation.
Before connecting to the remote node, the script uses erl_call
to “ping” the node (check for aliveness); erl_call
knows nothing about TLS distribution, so this fails.
We can work around the first problem by using nodetool
instead:
kubectl --namespace erlclu exec -it deploy/erlclu -- env "USE_NODETOOL=1" /erlclu/bin/erlclu remote_console
Once we get that working, it still fails, because it needs the correct TLS distribution options to pass to erlexec
.
To fix that, in vm.args.src
, put the two new arguments on the same line:
-proto_dist inet_tls -ssl_dist_optfile ${ROOTDIR}/inet_tls_dist.config
Because the startup script is looking for /^-proto_dist/
, it picks up both arguments. This doesn’t seem to break the
default Erlang runtime parsing.
All of this works, but it takes several seconds to connect. I can think of a few potential solutions:
- Expose the remote console over SSH. This is the one I ended up choosing, because it’s what we do at work. It was top-of-mind, and I wanted to know more about how it works.
- Write a custom distribution protocol that doesn’t require TLS for localhost connections. CouchDB, as far as I can tell, does this. This is perhaps more user-friendly, because it works the same as before, and is thus less surprising. I might spend some time looking at this in future.
- Patch the startup script to stub out (or remove) the
ping_or_exit
function. This only occurred to me much later, so I’ve not looked into this.