Exact same issue here. Looking at the ESP32 console, we receive the following:
E event.c:18:mg_error 0x39 TLS handshake: -0x2700
Cesanta also advised that our ca.pem was out of date. What’s interesting is that FW built with mDash v1.2.13 + esp-idf v3.3.6 failed to reconnect, whereas FW with mDash v1.2.16 + esp-idf v4.4.6 could reconnect even though they had identical ca.pem files. Very odd.
It appears Cesanta has rolled back the mDash certificate today which allowed our units with old FW to reconnect, but that is likely just a temporary fix. We need to update ca.pem on the devices as you’ve noted.
Thanks for your post. We were advised by Cesanta tech support that this was a temporary fix lasting less than 30 days. They also advised the following:
“Certificates from LetsEncrypt have life span of 3 months.
Our automatic updates trigger 1 month before the expiration day.
Meaning, that if your devices manage to connect, you have maximum 1 month to
update your devices. After that, NOTHING could help you reconnect your devices remotely.
You’d need to go and physically update the certificates.”
What is very troubling is that this has never been a problem for the past 4 years. I can find no documentation on the process of obtaining the ca.pem file and exactly how to update it on our Firmware.
this seems like a big deal. I’ve got devices with customers some of which are 3 years old.
@cesanta I’ve never seen any mention of keeping CA certs up to date on remote devices, is this something has been written up and we’ve missed?
What I’m most confused by is how ca.pem was created in the first place, does anyone know the details of this? Can it be fixed by flashing a new FW build?
ca.pem and its expiration is just a common TLS knowledge
Not mentioned explicitly in the documentation.
If you’re most confused how ca.pem was created in the first place, you should educate yourself on the build process and on the TLS essentials.
That is very helpful! This explains why our devices running mDash1.2.16 are functional even though they have a discrete ca.pem file loaded that is expired.
FWIW, we are trying to roll out 1.2.16 images to our fleet. Unfortunately, this only works if devices were online before reckoning day. If not the devices are bricks
Thanks for your post, very helpful! We are trying to update our fleet to 1.2.16 however we’re running into substantial issues. And as Autodog pointed out we will have to recall all units that are not online.
If you are able to build with 1.2.16 version of mdash think you’ll be okay. We took the code from mdash GitHub Examples minimal built it and it did successfully log on.
Yes - migrating from 1.2.13 to 1.2.16 was very painful for us, many breaking changes. Make sure you test thoroughly before deploying. There were many things that didn’t transition smoothly for us. Your mileage may vary.
I think this needs to be fixed, you have paying customers in this forum who have missed this detail and are being caused pain. Would be very beneficial to include this information in the mDash/Mongoose OS documentation somewhere.
@cesanta can you point me to some details on where the build process so I can educate myself?
What I don’t understand is why this is just happening now - my ca.pem that exists on all my devices seems to be the same which is:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
44:af:b0:80:d6:a3:27:ba:89:30:39:86:2e:f8:40:6b
Signature Algorithm: sha1WithRSAEncryption
Issuer: O = Digital Signature Trust Co., CN = DST Root CA X3
Validity
Not Before: Sep 30 21:12:19 2000 GMT
Not After : Sep 30 14:01:15 2021 GMT
Subject: O = Digital Signature Trust Co., CN = DST Root CA X3
Looks like the cert expired a few years ago and never had an issue somehow?
From my limited testing, fortunately the fix seems to be as easy as creating a new FW that includes the ca.pem file I’ve created per the instructions in the Cesanta post and adding it to the filesystem.
Still trying to work out what I do for devices that have been offline for a little while and may come back online after the 30 day countdown…
If such an extension is possible, that would be a HUGE help to all of our customers. The logistical nightmare of swapping out bricks in the field makes my head (and wallet) hurt.
Would be interested in @cesanta’s view on this one.
I concur it would be a huge benefit to all of us! We have found anything built with 1.2.16 successfully logs in and remains online. We are considering developing a do-nothing app and deploying it, to keep our customers online past the deadline. In this way we would still be able to deploy the app once we get it tested.
@klimbot thank you for digging out the cause of the TLS failure.
It’s LetsEncrypt wants to migrate entirely to their own CAs, cause they are now bundled by browsers. Thus they shortened the expiration time of their original cert, signed by another company. That caused this issue.
Yes, they made a hint on how to keep requesting old stuff.
We made this change, and let it run for a week.
Then we’ll revert to the new root - we want to make sure the update work, and catch those people who do not update.
This is necessary, cause if we keep running on the old cert and do not text until the doom day, it’ll be too late to fix then.
We very much dislike the “many more months to update” attitude, as experience tells that things get undone, and turn un-fixable then. So please do NOT expect this to continue for months. Update NOW.
Agree, we are working on an immediate fix and will roll out ASAP to online devices, but many of our fleet come online/offline periodically so we really need it rolled back after you test to ensure we can actually reach the majority of our fleet over the next couple of months.
If I can, I’d love to suggest something like the following timeline based on the Let’s Encrypt timeline(assuming it works for all):
April 14th - Cesanta tests any root ca changes that are required, and customers check that updated devices are connecting to mDash as expected without X3 cert in chain
April 15th or 16th - Cesanta move to new root ca that includes X3 in chain
April 16th – May 31st - Customers of mDash continue to update fleet devices that come online
June 1st - Cesanta request new cert from Let’s Encrypt valid till end of Sept
June 1st - New cert still allows customers to update remote devices that come online
Aug 1st - Cesanta auto-update cert process requests new cert 2 months after previous as per Cesanta automated process
Aug 2nd - Any issues that pop up can be resolved just as they have been by Cesanta rolling back cert, extending max window 1 month.
Sept 1st - No X3 based certs available anymore - remote devices on old cert will be bricked