Challenges in Running AI On Embedded Computers


In the next 5 years, AI is going to be inside everything; and edge AI is going to be the next big thing within AI.

Curated by Vinay Prabhakar Minj

The work going on in the AI space nowadays is more on the application side to solve the industry problem. And people want to get the solution into the market as fast as possible. For this, most of the AI libraries are built into the cloud. In that, there is a large stack, where the first challenge starts with data.

You need to train a model. But it’s not creating a model that is the biggest challenge, but finding data and arranging it in a way to efficiently train your model. So, there is a large piece of market for getting the right data.

The second part is building a model. There are many applications  which do not require creating a model like a face recognition and object detection. Because these common applications already have ready models available out there -one just needs to choose the right one for the application. And after that you need to further train your model, as most models are usually trained for different sets of data which do not give a good output.

As each and every library is good at one thing or the other, therefore one needs to carefully choose the right one that is suitable for the application.

After having a working model, it must be then used purposefully to generate enough money. Some of the questions which the money-making guys should focus on are : What should it (the model) be used for? What is the market looking for?

Apart from all this, there is a major competition since everyone in AI is directly working directly on the cloud with respect to model training and data. However, AI can run equally well on embedded computers. This is commonly unknown to people.

The number of libraries available for AI today stands at approximately 800 plus. And this is also the amount of competition in the AI space.

A Lot of AI Will Happen on the Edge

When running your model on the cloud, usually Google engine API or Amazon’s object recognition models are used for which an amount is paid per data. Once the solution is built, then over a period of time you have to pay a lot of money.

Instead, the same model can be coded onto hardware. On embedded you can get the performance of the cloud without paying on a long term basis. All analysis happens on the embedded device and only sends some data to the cloud.

Edge AI is going to be the next big thing within AI. All of the data which the AI library is sending down to processor for computation are actually vector calculations. Example, in the past, when gaming came in, optimised GPUs were created for such fast mathematical calculations.Now, there are Vector Processing Units on the chips which can process vectors (data) coming from AI program very fast.

People are unaware that a lot of AI applications can be built on such embedded SoC which have hard fpu(armv8) or vector processing units(marketed as neural processing)

The number of libraries available for AI today stands at approximately 800 plus. And this is also the amount of competition in the AI space. This might sound unbelievable, but when computers came in, people never imagined it to turn out to be this big. Gradually it became a necessity in our lives. And AI is going to be much bigger than that. In the next 5 years, AI is going to be inside everything. A lot of products will come with AI built into them.

Currently, there are 20 plus companies dealing with AI technology. Some of the popular ones are Tencent, TensorFlow, Caffe, Chainer, ONNX and PyTorch.

There is this perception in the industry that to run AI on the edge, you need a lot of computation, it is not correct

Challenges Faced in Embedded

Challenges such as model size, choosing the appropriate model and framework and TensorFlow or PyTorch…Which one to use? are commonly faced by people who work on AI. The system side challenges are something which most people are not aware of. This includes:-

  • Platform: In the Cloud, the code is pre-installed. But for embedded, you need to fetch the source and compile it into the machine.
  • Fetching: Finding the right source is also a challenge. One needs to do a lot of research to find the right source, the correct patches and configure it.
  • Architecture support: Mostly ARM is used, which can be either ARM7 or ARM8. With the help of Neon support in ARM8, hardware FPUs (Floating Point Units) provide 5x performance. It is very advantageous, but challenging as well.
  • Compilation: Cross-compiling does not give the required performance. So to derive performance and make a library work optimally on a hardware, you need to do native compilation. That requires a very long duration of coding and takes a lot of time.
  • Installation: You may encounter several error messages after compilation. Care needs to be taken to install libraries correctly, especially for new machines.

Not just these, all libraries usually have some dependencies. Dependencies are some additional packages that are needed to run a library. Each tool/framework/library on an average has 4–5 dependencies.

Those packages also need to be compiled to get optimal performance natively from the AI libraries installed on your embedded system. A software called Docker is the quickest way to run packages, but it is not optimised while it is running and must be used with care.

There is this perception in the industry that to run AI on the edge, you need a lot of computation or one needs to have better and better computing power so that enough performance is obtained. However, this is not correct. In such an approach when that solution is sold your cost is going to go higher as systems have to handle more complex and larger data. So, the focus should be on better engineering which increases efficiency with complexity and lowers price overtime without compromising on technology advantage.

Shunya OS optimises AI libraries for embedded systems to enable developers to build AI on edge solution very fast and at very low cost

Optimising AI for Embedded 

Shunya OS examines the underlying hardware if it has CPU/GPU/NPU available for processing and how it can be specifically optimised for the AI library being used. It auto configures Open Gl and Open Cl wherever available. (OpenGL -OpenGl is a graphics optimisation library while OpenCl- is a computation power optimisation library)
It further checks the underlying hardware CPU capabilities and optimises the library to have architecture and machine specific tunings done to improve performance.
What this means for AI application engineer?
He can directly get a cloud like optimised platform on edge to make his code run efficiently.
Somewhat similar solution is provided by Arm NN, TVM.AI , Intel Openvino, Nvdia Tensor RT. Some of these are tied to proprietary hardware or architecture but TVM is almost platform agonistic and works from Intel to arm to even FPGA and I think currently is the best opensource project in this category.


About the author

This is an extract from a speech presented by Nikhil Bhaskaran, Founder, Shunya OS, at IOTSHOE.IN 2019. Shunya OS is an operating system that optimises AI libraries for embedded systems, to enable developers to build AI on edge solution very fast and at a very low cost.

Nikhil has more than a decade of experience in core electronics from design to production. He has lived in Shenzhen, China for 8 years and has deep experience of embedded hardware. He is also the founder of –  the largest IoT & AI innovators community in Pune.