Optimizing Data Storage Environment: From Edge to Fog to Cloud

2814
IoT

One of the biggest factors driving data is video, and it is pushing the requirement to have more and more storage/internal memory. Thus the need to create an efficient data storage environment arises. 

Curated by Vinay Prabhakar Minj 

Somewhere in 2021 – 2022, a billion devices are expected on the IoT spectrum. When you have so many devices connected to each other and talking to each other, they will be absorbing a lot of data say to the tune 155 Yottabytes ( A Yottabyte is equivalent to thousand zettabytes). This is a very large number, which is hard to imagine right now.

This figure (155 YB) actually refers to the amount of data that is being absorbed and not stored by IoT devices, because not all of this data are useful. Only a small fraction i.e. 0.0013 percent of the generated data is being stored on the edge device. And out of this data that is being stored, only 4.2 percent (88 EB) will be transmitted to the cloud.

Data has a lot of phases to it. It can be used as a record, for communication, to improve efficiency or as a currency. It is said that data is the new oil of the digital communication.

With AI everywhere, connected via 5G and learning from data everywhere, the entire earth is going to be like one big brain and there arises the need for a temporary or long-term storage.

Edge to Cloud architecture 

Initially, when IoT applications were developed, it was mostly related to an IoT device and the cloud. It was expected that all the data from the IoT device will be transmitted to the cloud and the cloud will do all of the computation for the IoT device to make a decision. But, later it was realised that there were certain limitations. If one needs to rely on the cloud, then the latencies should be really low and the bandwidth should be high. And if you need real-time analytics, then you can’t depend on cloud.

Seeing this, the concept of Edge to Cloud was developed. So, data can be stored in the edge, do some computation and the rest of the data can then be transferred to the cloud.

Another concept of having a middle tier came up known as Fog.

So now, we have the edge, the fog and the cloud. In all these three tiers, there is competition as well as storage at different levels.

Video evolving in different segments with AI-technology

One of the biggest factors driving data is video. It is present in all segments, be it consumer,  medical, automobile, surveillance or enterprise. Example, for AR/VR, video content is present in 4k or 8k quality. For surveillance, 720p quality videos are widely used. It is the video that creates a need to have more and more storage.

Devices causing more data storage

1)     Smartphones: The amount of data processing taking place in a smartphone is pushing the requirement to have more and more storage/internal memory.

2)    Automotive: Several automotive applications demand a lot of data. A market estimate says that a typical car having a lot of electronics will generate around 2 terabytes of data.

Fog computing

One driver of fog computing is Smart City. There are a lot of edge devices here and all the data needs to be assimilated, processed and sent to the backend (with high capacity storage). And that is where the fog layer comes in.

Tiered structure of Fog

  • At the bottom, edge devices are present which will have a limited amount of storage.
  • At the middle is the Fog layer.
  • And above them all, there is the backend/enterprise device with high capacity storage.

Devices such as surveillance cameras, set-top boxes, remote control, 5G gateway are present in the edge (access network). On the fog, you have caching servers (aggregation network). And data centers are present in the cloud (core network).

The requirement of storage for all of the above three will be different. Example, in access network, devices that are small, low-powered and stores gigabytes of data with fast speeds will be required.  In aggregation network, a little more data storage (preferably in terabytes) will be required, having a decent speed.

In the core network, a lot of data will be required (preferably in petabytes).

So, at the edge one can look at MicroSD card, embedded MultiMediaCard (eMMC), USB. At the fog, you can have solid-state drive (SSD) sitting on the SaaS, and hard drives on the cloud.

Challenges in Fog computing storage

  • Performance (lower latency and high speed).
  • Scalable (storage upgradation without making major changes to the architecture and high capacity).
  • Cost (US$ per gigabyte should be optimised based on the kind of solution)
  • Reliable (stored data should not be lost even after months or years)
  • Endurance (should perform higher write cycles, or allow more data to be written on it)
  • Serviceable (should be easy to maintain and allow support for the storage element)

New requirements for storage

Data gathered from customers call for storage to have larger temperature range, higher endurance rate, longer retention and innovative ways to manage remotely. In diverse temperatures, the storage device must survive vibrations while out in the field.

Also, storage meets with a variety of workload. The workload could be: sequential data, random data or read/write intensive data. These all impact storage. So, it is important to make a solution which fits all of them.

Three Different Scenarios

 1)     IoT device communicating to the cloud.

2)    Computation on the iot edge and then a bit of data being transferred to the cloud (for long term storage and further computation).

3)    Multiple edge devices connecting to an edge gateway (could be fog based analytics) and then transfer to the cloud.

If you develop any IoT application, then you might fit into any of the above scenarios.

Role of Flash Storage in IoT  

  • You can store OS boot code and application code into the Flash device.
  • You can also store data on it.
  • It allows machine learning and video analytics on the edge.
  • One can also overcome all the latency and bandwidth issues.

Role of Flash in embedded applications

1)     Operational storage: Includes OS boot and application code (data retention reliability).

2)    Functional storage: Constantly write data and store data into it (endurance performance).

Both of these require different behaviour from a Flash device. However, retention and endurance go hand-in-hand. Which means, an increase in retention causes a significant decrease in endurance. To overcome this problem, a physical partitioning can be created such that any change in one will not affect the other. This is called smart partitioning.

Endurance and workload

Consider a 400GB card and an 8GB high endurance card. The former has 1000 program/erase cycles that gives 400TB. So, during its entire lifespan, one can write data of 4000TB into it.  On the other hand, the latter has 50000 programs/erase cycles that again gives 400TB.

So, out of these, the 8GB card can be chosen when you don’t want to store a lot of data, but want it to last for several years.

For the 400GB, you would want to store a lot of data, but the lifespan is shorter. Accordingly, one can choose which is better for their applications.

Considerations when choosing a Flash device

  • Always look at the Read-Write performance.
  • Look out for Endurance in terms of PE cycles or TBW
  • Look out for data retention i.e. how long can a data reside on a card in a power-off condition. A loss of data can occur over the years if no power is provided to the firmware.
  • Reliability of the data
  • Temperature range
  • Security issues
  • Special features such as Host Lock, Health Monitor
  • Price and availability

About the author

This article is an extract from a speech presented by Leo Jose, Field Application Engineer, Western Digital, atIOTSHOW.IN 2019.

Jose has a demonstrated history of working in the semiconductor industry. He worked as Sr. Field Apps Engineer at Sandesk, before it was acquired by Western Digital. Prior to this, he was associated with NXP, elnfochips inc. and Techntona Soft Solutions.