mDashNotify seems missing some notifications

#1

I have created a simple Node.js app that doesn’t do anything but logs the incoming messages based on the example provided.
I am using ESP32 (ESP IDF 3.3.1 based) with latest mDash library 1.2.5-idf.
In my code, I have a call to
int res = mDashNotify("MyTopic", "{%Q:%Q}", valname, valstr};
The return code is 1, but some of the notifications (every 5-10th) don’t reach the Node.js app.
It is interesting, that since the notification is triggered by a change in the shadow variable, I get the ‘updated’ event, but not my custom notification sent to MyTopic.
It seems like a possible race condition.
Have anybody notice a similar behavior?
Is there a limit on the notifications?
Thanks,
Istvan.

#2

Turn on logging everywhere. Send sequence number in the message, catch the message that does not get logged. Increase log level on a device side to 3, capture console logs to the log file.

#3

Let me summarize what I did to locate the problem:

  • added a notification counter in notification JSON string when sending via mDashNotify to track lost notifications (attached C++ code, first image)
  • created a simple Node.js app that listens to event from mDash and prints them (attached last image, without API key, please ignore the cleanness of the code). Tried on PC an RPi in parallel, they behave same.
  • tried to increase mDashSetLogLevel to 3, but then I am constantly disconnected from mDash when I call mDashNotify, therefore no notification goes through (different issue I guess, 3rd screenshot attached - feel free to fork another issue for that). So I fall back to log level 2.

The notifications are still being dropped. Logs attached. See screenshot 2, notifications 12 and 13 are not received, although should be according to logs.

Best regards,
Istvan.

PS: a minor issue - the firmware version is in my case 20200605 instead of 20200105 - could this be due a wrong Jan/Jun conversion in mDash library?

#4

Here are my finding after further examinations:

  • I have created a sample that sends a notification on a button press, without triggering state update (and thus generating less messages)
  • some notifications are still lost, especially when button presses are more frequent than 1 second
  • It seems that there is a 1-sec processing loop either in mDash library on or in mDash cloud, since the reception of the notifications at nodejs side is (more-or-less) at 1-sec ticks.
  • Then I introduced a second notification on same event (effectively two mDashNotify calls in sequence per button click). The second notification never arrives. I think this is the easiest way to reproduce the issue - just call mDashNotify several times in a row.

Based on this, I conclude that somewhere in the system there is a queue of length 1, meaning that new notifications will override the yet non-delivered notifications. State updates may interfere with this as well. And this behavior is causing loss of notifications on nodejs side.

So the question is how to solve this, and whether a fix is needed?

Hope this helps,
Istvan.

#5

Yes, there is an issue with Jan/Jun, thanks for reporting.
Fixed, the next release will have the fix.

I see online/offline events - it looks like your device flaps. Maybe that’s the reason for the lost events. Can you confirm that the connection is being lost periodically?

#6

Your theory seems plausible.

Sent data gets buffered into the outbout IO buffer, and networking code wraps that buffer into the Websocket frame and sends to the mDash. mDash gets the frame and parses it as a JSON-RPC message.

Maybe, if two notifications are written fast, an output buffer would have two JSON strings, which will be subsequently wrapped into a single Websocket frame.

The receiving side on mDash should handle that well - the JSON strings might cross websocket frame borders, but in reality that might not be true, and mDash might choke on the “double json” frame and disconnect, thinking that frame is invalid, or simply ignore the second JSON string.

Let us verify this theory.

#7

Only when I increase log level with
mDashSetLogLevel(3);
The connection is totally unstable.

When I use log level 2, the connection is stable.

The notification issue remains.

#8
  1. Date parsing bug fixed, 1.2.6 released
  2. The notification issue theory was wrong. The root cause was a notification buffer size (was 1) on mDash side. If the receiver is not fast enough to fetch notifications, they get lost. Increased buffer size to 10. Thus the issue still can manifest itself if a notification catcher is too slow, and a device producing notifications is fast.
#9

Wow, that was fast!
It’s great you found out the cause.
Although I am not sure it is up to the slowness of the receiver or slowness of delivery mechanism, since I was using a plain websocket in nodejs (which we use in our 100k client ecosystem with much higher speeds with no problems).
What bothers me is the ~1 second notification delivery tick. That looks odd, since it wasn’t enforced neither by esp32 fw nor by my nodejs code (unless something weird is going on in websocket implementation).
Nevertheless, great progress, thanks!

#10

Could you expand on 1s delivery ?
It should be close to instant.

#11

Regarding 1.2.6: are there release notes available for new mDash releases?

#12

Do you mean, notes for the client library?
Not at the moment. We’ll start making notes in a future releases.

#13

Yes - just the client library. It would be helpful to have a summary of what has changed between releases and when it makes sense to sync to new releases.

-AD