A bug in libgnutls
An Electric Imp customer reported that, after a recent server deploy, their agent could no longer connect to a Google API.
We could reproduce it with the following agent code:
local req = http.get("https://sheets.googleapis.com/$discovery/rest?version=v4");
server.log(res.statuscode); // should be 200; was 60
In an Electric Imp agent, HTTP status codes < 100 are curl error codes; error 60 is
It was possible to work around the problem by turning off certificate validation. Obviously this is not ideal.
- We couldn’t reproduce the problem in our staging environment. This is good, because it gives us a working platform to compare with.
- I could reproduce the problem on my development PC. This is good, because it means it’s quicker to experiment with tests and fixes.
- I couldn’t reproduce the problem using
Because this had something to do with server certificate verification, I guessed that it had something to do with the root CA store, which is stored in
/etc/ssl/certs/ on Ubuntu.
This is managed by the
ca-certificates package on Ubuntu.
Looking at the installed version of this package on the working versus non-working servers (including my PC) showed that a recent Ubuntu update had made some changes to the root certificates and, presumably, broken something.
That is: the (working) server in staging was using an older version of the package, compared to the (broken) server in production (and my development PC).
We pre-load the CA trust list in our code (it’s expensive to parse all those certificates on every outbound agent connection). Fortunately, there’s a single PEM file with all of the certificates in:
/etc/ssl/certs/ca-certificates.crt, which makes it easy to try alternative CA bundles when testing.
I grabbed the (working)
ca-certificates.crt file from the staging server and confirmed that an agent running on my PC worked. Then I double-checked that with the system-installed file, it didn’t work.
Comparing the two files revealed that several certificates had been removed from the CA bundle, including the root CA used by
googleapis.com. This shouldn’t have been a problem, because one of the intermediate CA certificates that Google are using is in the CA bundle.
But: other clients (
curl, etc.) were still working, just not ours.
At this point, while I got on with putting together an MCVE, I posted a question on security.stackexchange.com, where Steffen Ulrich pointed out that there had been a bug in OpenSSL where it was ignoring trusted intermediate CA certificates and then failing because it couldn’t find the root CA.
This exactly describes the problem that we were seeing.
Except that we use GnuTLS for our certificate handling, not OpenSSL. For agents, anyway.
Today, I managed to put together a short-ish piece of C++, cobbled together from the GnuTLS examples, which reproduces the problem – which I posted with a question to StackOverflow. I won’t bother reproducing it here.
A little while later, after a bit of back-and-forth in comments, Steffen replied saying that he’d found a corresponding bug report against libgnutls28 on Ubuntu 14.04.
That bug report includes a patch.