HTTPS request causes hw wdt reset on esp8266

joepelderman · June 24, 2019, 12:24pm

In our project, most communication is handled with a Mdash shadow but we need access to a rest API for some functionality. During the development, these requests went over http but for production, we need to use https. We changed the endpoint urls to https and at first everything seemed to work just fine. All the requests completed. But when the device is running for a while and these requests are done again the hardware watchdog timer triggers.

[Jun 24 12:13:48.399] LedTimer.c:603           free heap size: 22328
[Jun 24 12:13:48.451] mg_ssl_if_mbedtls.c:35  0x3fff231c ciphersuite: TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256
[Jun 24 12:13:48.966] SW ECDH curve 3
[Jun 24 12:13:50.139] 
[Jun 24 12:13:50.139] HW WDT @ 0x40268c05
[Jun 24 12:13:50.139]  A0: 0x40268b58  A1: 0x3ffffba0  A2: 0x00000028  A3: 0x3fff2890
[Jun 24 12:13:50.145]  A4: 0x0000000c  A5: 0x00000000  A6: 0x3ffffbdc  A7: 0x3ffffbdc
[Jun 24 12:13:50.151]  A8: 0x00000009  A9: 0x000018ce A10: 0x3ffeec90 A11: 0xffff8000
[Jun 24 12:13:50.156] A12: 0x3ffffbdc A13: 0x3fff2890 A14: 0xffffffff A15: 0x3ffffbdc
[Jun 24 12:13:50.162] 
[Jun 24 12:13:50.162] (exc SP: 0x3ffff9e0)
[Jun 24 12:13:50.165] 
[Jun 24 12:13:50.165] --- BEGIN CORE DUMP ---
[Jun 24 12:13:50.167] mos: catching core dump
[Jun 24 12:13:53.013] ....
[Jun 24 12:14:01.718] ---- END CORE DUMP ----

Once it has failed, it reboots and all consecutive https requests also fail, causing even more reboots. The minimum free heap when it first happens is 22328 bytes. Could to little heap space be what is causing this? I would have expected a memory related exception in this case and not a HW WDT reset. There are three separate API requests that need to take place periodically when I only upgrade one of them to https it will keep working but when I enable https on 2 or all three requests the above behaviour happens. Any Ideas on what this is caused by is much appreciated!

lsm · June 25, 2019, 12:58pm

Looks like WDT is triggered cause the handshake takes too long to execute.
Not sure why that manifests only when certain time is passed.

Could you clarify please, why MQTT is not enough, please? What endpoints are you calling over HTTPS?

joepelderman · June 26, 2019, 9:24am

The product is a smart light with an app, every user can manage the settings of their own devices. Since it is not possible to limit the mdash API access for different API keys. we’ve introduced our own server that communicates with the app (so individual app users don’t need a mdash API key). This server in turn updates the lights status over mdash. Two of the HTTP request to our server couple devices to users. 1 of the requests updates the desired shadow over HTTP (since I can’t update the mdash desired shadow only the reported state through the regular means)

Since we control the server is there a different cypher we could use that will take less time to complete?

lsm · June 26, 2019, 9:59am

Thank you for the background.
This bit is still somewhat unclear: Two of the HTTP request to our server couple devices to users. 1 of the requests updates the desired shadow over HTTP (since I can’t update the mdash desired shadow only the reported state through the regular means). Could you elaborate please? A quick bullet point list with a sequence of calls might clarify a lot.

Still trying to figure out, why an external shadow update requires a device to use HTTPS endpoint. It feels wrong - but let’s go ahead and brainstorm the best solution here.

joepelderman · June 26, 2019, 12:15pm

Thanks for your quick reply!

As for the shadow update, I agree, it feels wrong not to be able to set the desired state in the shadow from the device in the same way as the reported state, If you know of a better way to do it, I will gladly get rid of the https endpoint that we use now.

below a list of the whole device flow

Device starts in ap mode, a user (app) connects, passes wifi credentials and a user-token received from our server over an RPC call to the device.
Device switches to sta mode, sends an https request to our server to get mdash credentials for the device.
once mdash is connected the device sends an HTTP request to our server to confirm successful onboarding.
light should now operate normally, on a device state change (e.g. a user button is pressed) this new state needs to be communicated to the shadow. This is done with an HTTP request to our server which in turn changes the desired state in mdash trough the API.

This mechanism worked fine on HTTP but has trouble when using HTTPS.

lsm · June 26, 2019, 1:45pm

Thank you.

Why do you need your device to change the desired state? An idiomatic shadow usage assumes that the device only changes the reported state. Note, some clouds, like Azure, explicitly disallow changing desired from the device side! The desired is for an external actor saying “hey, device, i need you to be in state X”. Making a device itself setting desired indicates a flaw. A device must get the delta, and set reported equal to the desired thus clearing the delta. If you need to delete the desired key, that’s not a device that should do it.
I assume you’re using Mongoose OS shadow API, which only allows to change state.reported (following Azure) . If you still want to have a full access to the shadow, don’t use the shadow API. Use MQTT connection method instead of the dash lib (see Connect Mongoose OS to mDash) and direct shadow access over MQTT (see Remote control via device shadow - overview)
Said all that, the functionality your custom backend implements is:
- auto-provisioning on mDash. How do you authenticate new devices btw?
- ability to change certain shadow keys for some external users
- (assuming) relay device status changes to the mobile app via WS or long poll
  Does it make sense for mDash to provide that functionality out of the box, or not?
Might worth looking at http://mdash.net/#/pwa

joepelderman · June 26, 2019, 2:25pm

Both the light itself and the app have a “light switch”. If I switch the device on in the app, the desired state gets set and the device will turn on, after this, the device will update it’s reported state. If I use the physical button on the device to turn the light on, the app still thinks the switch is off. If I would update just the reported state of the device, I will immediately get back a shadow delta telling the device to go off, since the desired state has not changed. I, therefore, need to update the desired shadow so the app knows the new state of the light. The Idiomatic shadow usages assumes no user interaction on the device, but since this is the case updating the desired shadow is my only option, or am I missing something here?
Sounds good, I will look into that!
Correct

The devices are authenticated by their esp8266 device ID, checked against a list of know ID’s
Yes a way to relay device status to the app would be a great mDash feature, but is the desired state not intended for this purpose?

Your link does not seem to go anywhere but the main page of mDash

lsm · June 26, 2019, 2:52pm

Thank you.

If the device itself has a physical switch, it IS an “external actor”, and it should update the desired - that’s a valid use case, thanks for the background
The link - please login to mdash first, then follow the link

joepelderman · June 26, 2019, 4:57pm

Thanks!

To sum up, the only way to fix the HTTPS requests from failing is not sending the requests at all and only use the shadow via the MQTT connection method to have access to the reported and desired states. Does this mean that using the shadow API simultaneously with your own HTTPS requests is not recommended on the esp8266, or do you think that using other cypher’s for the encryption could also make it work?

lsm · June 27, 2019, 2:51pm

With ESP8266, it is very unlikely.
It is lucky that even one TLS connection works. Having two TLS connections, and also some RAM for the device logic, realistically, is too much.
8266 has about 50k free RAM at the device init - that’s the maximum you can get. The minimum is about 10k, at which point the RAM fragmentation kicks in and makes ESP8266 very unstable. You could have it work even with less than 10k, but the general observation is that ~10k is the limit.
Now, mbedtls connection (unpatched) takes (depending on the build) from 15k to >25k of RAM. Device logic takes whatever RAM it takes. A handshake also takes RAM.
So, having two simultaneous TLS connections is a luxury.

That said, your usage of the second connection is a kludge, so it is better to fix the reason. Use only one TLS connection.

joepelderman · June 28, 2019, 2:33pm

Thanks for the clarification, I agree it is not the way to fix it, but I was curious anyway. Thanks for all your help!