Robust Build Management And Diagnostics Features With A Focus At The Edge


The first blog in this series focused on the state of scaling up of Industrial IoT and as well as the need for a Flexible and versatile Architecture that works at the cloud, on-premise and at the edge. This second blog in this series will focus on the importance of Robust Build management and Diagnostics features, particularly at the Edge.

IoT Platform is a fundamental piece of any IoT solution enabling connectivity, provisioning, device management, automation, dashboarding and data analytics of the connected devices and the data captured from these devices. IoT Platform provides a ready-to-use set of features that greatly speeds up the development of applications and the infrastructure required to manage and monitor the plethora of IoT devices promising to take care of scalability and also some amount of cross-device compatibility. The origination of such platforms started with what we commonly refer to it as ‘IoT MiddleWare’ but evolved to be much more comprehensive.

It is important to note that many IoT startups who started 4-5 years back built their own IoT Platforms in their initial and formative years (including and because robust IoT Platforms did not exist back then. A plethora (maybe too many) of such Platforms exist today in all flavors catering to different needs, verticals and cost point to choose from. There are opensource IoT Platforms, horizontal IoT Platforms and as well as vertical-specific integrated IoT Platforms. But the majority of them are Cloud-based today.

At a high level, any IoT Platform performs the following tasks.

The emergence of edge IoT platforms
Now historically all the data was typically sent over to the Cloud and the Gateway essentially was a more of a North-South bridge, taking the data from the sensors in different protocols, converting/translating and then sending the data to the Cloud using IoT protocols like CoAP/MQQT and using a more networking IP based networking protocols. Data Management of sensor data and Over-The-Air (OTA) upgrades of sensor devices were also built into the Gateways.

But now with the move from the Cloud to Edge, the Gateway and the edge model has gained prominence. Besides the basics of protocol conversion and device management, the Edge IoT Platform now needs to do many of the inherent Cloud IoT platforms features as well including storage, data analytics, data normalization, visualization etc. All these need to be done in a more resource constraint environment.

Importance of connectivity and build management
Managing a large number of edge devices in the field can be a huge challenge. Without a robust infrastructure to manage, maintain and monitor these edge devices, many IoT solutions startups have had the entire ROI go south with multiple truck/personnel rolls even for basic debugging and updating of the devices in the field. Interestingly, most of these above aspects are either not implemented or are side-stepped during Pilots and Proof of Concepts, but they come back to bite you as you scale. Several IoT solution providers have highlighted the pains as they scale. Connectivity of sensors, edge devices, gateways, their buffering capacity, energy harvesting, battery and the build compatibility with each other can all be a source of a large number of menaces in running and supporting the IoT system.

Few common things that go wrong in a practical large IoT deployment are:

  • Wireless connectivity can go down
  • Sensor/Edge device may need power rebooting
  • Sensor and Edge are not in a compatible build and unable to do OTA upgrades
  • Edge analytics need a special update for a couple of devices but not all
  • A sensor was sold two years back and now the customer is bringing it online (can happen in Industrial IoT)
  • Jason (or any other format) based data are not reaching cloud (can be a connection or edge build issue)
  • Edge build is crashing
  • API is failing after a minor update on edge

All the above issues happen frequently and hence it is necessary to have an automated debugging and mitigation plan. If all of the above (the above list is only a sample) have to manage manually, there is absolutely no way an IoT system can be built to scale.

Just assume 100 sensors and 100 edges in a system in a factory – a very modest scale IoT system.  A simple combinatorial analysis will show the factory can experience 3200 IoT failures in a single factory! As the number of sensors will grow in a system, say 10,000- a total number of failure will exceed in 320,000. No one can manually manage such a complex system without automation of IoT support system.

Most so-called IoT edge platforms do not support edge analytics well as yet. At the edge, usually, three builds need to be managed – sensor firmware, OS and analytics. Typically analytics need to be updated way faster, for others the update frequency is much lower. Now all three builds need to be synced automatically during an update. The second important aspect is the OTA of analytics, as analytics packages are far larger like 400 MB package, whereas a system package is much smaller. Hence OTA for the analytics package requires a more fail-safe fragmented approach.

In any IoT solution, the operator needs to know what happened, why and where. Sensor network failure data is available internally at the sensor node. Internet connectivity data, server data, gateway data and API logs are also available. But as things scale, there needs to be a central automated tool to do some amount of machine learning to find out the relationship between an issue and its root cause, analyzing the available log files. Essentially, the logs files need to be parsed and put on a somewhat structured data format so that the issue – cause relationship can be auto-detected and mitigation logic built in. In any large scale IoT deployment, the above automation ideally needs to be done at an architectural level, else it is difficult to handle situations when failures happen and subsequent rectification eats away resources, time and margin.

At and, many of these automation aspects have already been implemented and a few more are in development. These automation tools were built and implemented as part of a scalable IoT system for the strategic investor Novatec (largest equipment manufacturer for the plastics industry in the US) in the last 4 years. Like other Industrial IoT startups, the team also had its shares of painful troubles in debugging issues and had to send people out to the field for the areas that were not automated.

System health and diagnostics
As IoT deployments scale it is imperative to monitor the system health of the server instances, sensor electronics, edge electronics, running processes and so on. Just, for example, a system of 10,000 sensors may consist of 10,000 sensor electronics, 1000 gateway/edge electronics and 100 servers. Any of them may go down or may need a restart or a diagnosis for a fix which in turn could just a patch or system update. Server health data is available via API from the public cloud and one can extract the same level of API driven health data of Gateway (hub)/Edge device and sensor electronics. Tracking of all these system processes is very important. This enforces a methodical and unified Dashboard for Time Series data and Alarm/SMS driven system to alert the system admin that a particular sensor or server is either going down or may go down. The typical rule of thumb ‘To Do’s might automate the fixes as well as much as possible.

As we summarize the blog, in very simple words, the whole idea of IoT is to automate or reduce the manpower to deliver better service and information. Hence while checking the health of the IoT system 24×7, if additional manpower is required, the basic premise will get defeated.

At and, the goal was to make the life of the IoT admin easy, by doing predictive maintenance of IoT system itself.

If you liked this, please share and also read our first blog in this series focused on the state of scaling up of Industrial IoT and as well as the need for a Flexible and versatile Architecture that works at the cloud, in-premise and at the edge

About the Authors

Dr. Biplab Pal is the CTO and Founder of MachineSense. He has been a pioneer and practitioner in the IoT domain with years of experience in sensors, sensor networks and started his IoTcompany back in 2012. He has been instrumental in developing and deploying IoT solutions across factories, structural health monitoring for bridges and high-value assets, water, energy management, predictive maintenance of the machine and power quality for the last two decades.

Som Pal Choudhury is a Partner at Bharat Innovation Fund, a $100M venture Fund focused on core technology startups from India. He is also an advisor to several IoT companies, part of the core group of IoTForum (, Co-Founder/Co-Chair of India’s premium conference IoTNext and is a frequent speaker and thought leader on IoT. He was involved in the IoT/M2M space from early last decade as the first employee of a Smart Grid company, American Grid and has traversed the entire ‘IoT stack’ from ‘Sensor to the Cloud’ while working for companies like Analog Devices and NETGEAR