Free Software and Open Hardware for Industrial Automation

It is already possible to implement a complete architecture for Industrial Automation based on industry standards, Free Software and Open Hardware. Yet, some standards are questionable, a few components are missing and some technical problems still need to be resolved.
  • Last Update:2019-02-13
  • Version:001
  • Language:en

All technologies needed to build a fully open source / open hardware system for industrial automation are now available, mostly from European Free Software publishers and Open Hardware vendors. We will explain in this article how to build such a system. We will then discuss possible limitations of certain standards and problems which remain to be resolved. 

Architecture

A possible architecture for and industrial automation system has four segments:

  • Cloud;
  • Edge;
  • IoT;
  • Client.

The diagram bellow illustrates the role of each segment:

The Cloud segment, which can be public cloud or private cloud, runs backend servers for applications such as ERP, data lake. They could even run MES applications as long as there is some form of proxy at the edge for the real time part of manufacturing execution.

The Edge segment runs services that must be deployed next to the production line. This includes services that provide real time control logic to IoT, real time signal processing to Remote Radio Head (RRH), real-time proxy of a cloud hosted MES, network management services, etc. ERP could also be deployed at the edge too if wide area network (WAN) is not reliable enough to access the cloud segment.

The IoT segment does very simple operations to convert analog signals to digital and vice-versa. The IoT usually runs a micro-controller, possibly without operating system. It is connected to the Edge either through a local area network (LAN) or through a wide area network (WAN). Typical LAN can be Ethernet or Wifi. Typical WAN can be LTE, NR, NB-IoT, etc. 

The Client segment provides a user interface to display and control the system. It can also run some local processing.

Standards

Industrial automation is an industry looking for standards, but which is dominated by mutually incompatible protocols from Siemens, Schneider, Beckhoff, Omron, etc.

Most parts of an industrial automation system could now be implemented based on four standards:

  • OPC-UA (IEC 62541);
  • TSN (IEEE 802.1);
  • POSIX (ISO/IEC 9945-1:1996);
  • HTML5 (W3C).

OPC-UA defines a standard way to exchange messages between Cloud, Edge and IoT. It provides a standard way to describe message content, address each device and ensure a limited form of resiliency with OPC-UA PubSub. It is a fast evolving standard closely linked to the progress of Industry 4.0. It has a lot of open source implementations.

TSN defines a standard for deterministic networking that can handle time constraints typical of video or field buses. Its specification is very wide and covers scheduled traffic (see "Time-Sensitive Networking: From Theory to Implementation in Industrial Automation"). However, it is still far from being widely implemented or adopted. Implementing it requires changes in the Ethernet driver code or chipset. Its open source implementation for Linux is still partial. 

POSIX is one of the very few standards for service deployment which is independent of the operating system. The GNU C Library is a mature implementation of POSIX licensed as Free Software which has always taken care to provide upward compatibility.

HTML5 is another standard for service deployment which is independent of the operating system. Although this is little known, HTML5 can run autonomous processes offline and has the power to replace completely POSIX at the Edge. It defines a wide range of APIs which cover data persistence (IndexedDB), multi-processing (Web Workers) and local service provisioning (Service Workers). Pyodide demonstrates how HTML5 can be used to deploy autonomous A.I. processing inside a browser.

Cloud segment

The cloud segment requires mainly a mature Service Lifecycle Management (SLM) system, that can automate lifecyle (build, configure, run) and all aspects of service management:

  • provisioning;
  • orchestration;
  • monitoring;
  • accounting;
  • disaster recovery;
  • resiliency;
  • billing.

A service can be anything: a database, a virtual machine (VM), an HTTPS front-end, ERP, MES, data lake, etc.

Services are deployed on bare metal servers similar to those used by Facebook (Open Compute Project). The use of virtualisation or OS namespaces/jails for services is optional. Thanks to SlapOS recursive architecture, service lifecyle management can be self-deployed and self- tested. Slave services provide a way to partition a single service into multiple sub-services which can be provisioned individually.

Networking is based on full IPv6 (just like Facebook infrastructure) with modern low latency routing (RFC 6126).

Multi-protocol data collection is implemented through fluentd protocol.

ERP and MES on the cloud segment are based on proprietary standard. It could be interesting to see how to integrate them based on ISA-95 standard reformulation promoted by OPC Foundation

As far as we know, there is no standard OPC-UA profile for Service Lifecycle Management. It could be interesting to reformulate SLAP protocol into OPC-UA.

Cloud Segment Components
Purpose URL Vendor Provides Based on
Intel Xeon Server https://www.opencompute.org ITRenew POSIX  
Firmware https://www.linuxboot.org ITRenew POSIX  
Secure Boot Operating System https://elbe-rfs.org Linutronix POSIX  
OPC-UA https://open62541.org   OPC-UA C
Data Center Management
Service Lifecycle Management
Edge Management
IoT Management
https://slapos.nexedi.com Nexedi   POSIX
Resilient WAN
IPv6 Range Management
https://re6st.nexedi.com Nexedi IPv6
RFC 6126 
POSIX
Manufacturing Execution System
Entreprise Resource Planning
https://erp5.nexedi.com Nexedi   POSIX
Data Lake
Out-of-core processing
https://wendelin.nexedi.com Nexedi fluentd POSIX
fluentd

Edge segment

The Edge segment is remotely driven by the cloud segment. It is autonomous enough to keep on operating in case of Wide Area Network (WAN) outage.

The Edge segment infrastructure can be based on a micro-servers (Olimex open hardware) or on high performance server (OCP open hardware).

Edge segment usually provides Content Delivery Network (CDN) to support high performance HTTPS content delivery and caching. Another type of common edge service is multi-protocol data collection gateway that converts any data collection protocol (ex. MQTT, syslog, OPC-UA) into the data collection protocol expected by the cloud segment, in our case fluentd.

Edge segment can provide IPv6 addresses to neighbouring IoT and access to resilient WAN through multiple links (ex. LTE/NR, FTTx, etc.).

Edge segment can run a local Service Lifecycle Management (SLM) system dedicated to IoTs connected to the same local area network (LAN).

Some proposed standards for OPC-UA  profiles at the Edge are emerging:

Edge Segment Components
Purpose URL Vendor Provides Based on
Intel Xeon Server https://www.opencompute.org ITRenew POSIX  
Micro-server https://www.olimex.com/Products/OLinuXino Olimex POSIX  
Firmware https://www.linuxboot.org ITRenew POSIX  
Secure Boot Operating System https://elbe-rfs.org Linutronix POSIX  
OPC-UA https://open62541.org   OPC-UA C
Service Lifecycle Management
IoT Management
Content Delivery Network
https://slapos.nexedi.com Nexedi   POSIX
Resilient WAN
IPv6 announcement
https://re6st.nexedi.com Nexedi IPv6
RFC 6126
POSIX
Data collection gateway https://www.fluentd.org ARM fluentd POSIX
fluentd
MQTT
syslog
kafka
OPC-UA
etc.
IoT Logic ?   OPC-UA OPC-UA

IoT segment

The IoT segment is considered here as an extension of the Edge segment. The purpose of each IoT is to act as an extension of the "IoT Logic" service deployed at the edge which does additional processing required to convert digital events to/from analog signal. Communication between Edge and IoT segments may require strict time constraints typical of real time applications.

For example, the Edge segment could implement a complete LTE radio physical emulation (a.k.a. aNodeB) whereas the IoT segment acts as a Remote Radio Head (RRH) which modulates a 2,600 MHz frequency with this 20 Mhz signal. Although this example is related to telecommunication infrastructure rather than industrial automation, it shows clearly the difference of algorithmic complexity between the Edge and the IoT segments.

The IoT segment does simple and slowly evolving - yet possibly very fast - processing whereas the edge segment does complex (and constantly evolving) processing.

The IoT segment may have too little resources (RAM, CPU, storage) to run a full POSIX operating system. The Edge segment is a full blown POSIX operating system.

The IoT segment does not provide much standard API, for now. The Edge segment is based on POSIX standard.

The only common thing to all possible firmware for IoT seems to be the C language. OPC-UA also provides common ground to simplify interfacing Edge and IoT (addressing, payload schema, payload transport, etc.), just like USB simplified interfacing personal computers with an ever growing ecosystem of device.

Yet, multiple incompatible abstractions still exist as a possible API for IoT software developers:

  • POSIX OS API abstraction (μCLinux);
  • custom OS API abstraction (RIOT, Mongoose, FreeRTOS, mbed, etc.);
  • Language API abstraction (Python, Javascript, LUA, etc.).

The market is still very fragmented.Also, as far as we know, there is no standard OPC-UA standard library at the IoT.

IoT Segment Components
Purpose URL Vendor Provides Based on
IoT https://www.olimex.com/Products/IoT Olimex C SDK  
IoT Firmware https://micropython.org   Python C
https://duktape.org   Javascript C
https://github.com/cmbahadir/opcua-pubsub-esp32   OPC-UA C
https://www.riot-os.org INRIA
FU Berlin
  C
https://mongoose-os.com Cesanta   C
https://www.freertos.org Amazon   C
Data collection https://fluentbit.io ARM fluentd C
IoT Processing ?   OPC-UA OPC-UA

Client segment

The client uses Teres open hardware laptop from Olimex. It runs an HTML5 browser provided by Qt framework, which itself derives from Chromium.

Data analysis and visualisation is based on Iodide and Pyodide frameworks created by Mozilla.

The client segment could in theory act as an Edge segment or as an IoT segment. HTML5 can actually do much more than what most developers believe. Implementing a complete A.I. engine in HTML5 is quite easy. Such an engine could drive an IoT in real time. 

Client Segment Components
Purpose URL Vendor Provides Based on
Laptop https://www.olimex.com/Products/DIY-Laptop Olimex POSIX  
Browser Firmware http://doc.qt.io/QtWebBrowser Qt HTML5 POSIX
Data analysis and visualisation https://github.com/iodide-project/iodide Mozilla   HTML5
IoT Logic ?   OPC-UA OPC-UA
IoT Processing ?   OPC-UA OPC-UA

Risks

The adoption of OPC-UA and TSN for industrial automation involves certain risks or questions listed bellow.

TSN could become a beautiful standard without implementation IEEE standards such as 802.11 already experienced this issue. The Point Coordination Function (PCF) which provides a way to ensure a form of determinism over Wifi and solve the hidden station problem is still implemented by virtually no chipset (except Atmel). TSN standard is so wide that it could be uneconomical for any vendor to implement it entirely. This could prevent interoperability to happen soon. Even Intel seems to be struggling for implementing TSN entirely with OpenAvnu (see "The Road Towards a Linux TSN Infrastructure"). 

TSN is layer-2 standard in a Layer-3 world. Routing is the dominant form of networking between cloud, edge and IoT nowadays. One could argue that this makes TSN unsuitable for a modern networking infrastructure which combines distributed radio (ex. LTE, 5G) and wired networks (ex. Ethernet, CPRI, USB, etc.). Routing (see "Delay-based Metric Extension for the Babel Routing Protocol") and traffic control approaches might make more sense (see "tc-fq_codel (8) - Linux Man Pages") for a truly unified architecture.

TSN could be an overkill for industrial automation .A complete LTE/NT physical signal can be transported to IoT over 10 GbE standard switch and processed at the Edge. Is there anything in industrial automation which requires more time constraints than that? 

OPC-UA does not define standard payloads. Vendors of OPC-UA hardware could embed binary data into payloads as a way to ensure their data formats remain secret and mutually incompatible with other vendors.

POSIX or TRON might be a better HAL for IoT. Instead of trying to invent yet another abstraction or Hardware Abstraction Layer (HAL), it might be easier to rely on proven abstractions such as POSIX or iTRON already deployed in the industry and supported for decades. A/UX BSD Unix could run on a 512 KB Macintosh with a 68030 CPU. ucLinux requires less than 200 KB to operate. RIOT provides partial POSIX support. eCOS and RTEMs provide both POSIX and TRON APIs.

Existing OS could be a better HAL for IoT. Instead of trying to invent yet another abstraction or Hardware Abstraction Layer (HAL), it might be easier to rely on existing abstraction such as Mongoose OS or LiteOS.

Unsolved Problems and Opportunities

Current standards (OPC-UA, TSN, POSIX, HTML5) do not provide a solution for the following problems:

  1. Time-sensitive routing (TSR);
  2. Standard API for non-POSIX IoT;
  3. Standard cross-platform build and OTA upgrade for non-POSIX IoT.

Selected technologies (SlapOS, open62541) have some limitations:

  1. Lack of implementation of TSN for most network controllers;
  2. Lack of proven resiliency of OPC-UA PubSub in most implementations;
  3. Lack of implementation of non-POSIX service lifecycle management in SlapOS;
  4. Lack of time sensitive orchestration in SlapOS.

Each unresolved problem can be viewed as an opportunity for Open Hardware and Free Software in the field of industrial automation:

  • better support of TSN in Linux kernel;
  • time sensitive routing (TSR) protocol;
  • proven resilient PubSub implementation;
  • time sensitive extension of SlapOS;
  • implementation of IoT support in SlapOS including cross-platform build  and OTA upgrade;
  • OPC-UA schema for SlapOS SLM;
  • OPC-UA schema for ERP;
  • OCP-UA schema for MES;
  • standard library for OPC-UA IoT logic and processing;
  • OPC-UA schema for fluentd;
  • OPC-UA support in fluentd and fluentbit.

Technology Dead-ends

Two technologies should be banned from any industrial automation project:

  • Docker;
  • OpenStack.

Docker is not a bad technology. However, most users tend to believe that it provides portability from one Linux distribution to another. This is not the case due to the Kernel ABI mismatch problem, a problem that is not specific to Docker itself but to Linux binary portability in general. As a result, running a Docker binary images at the Edge provides no guarantee of stability, unless both Docker image and Edge server are based on the same Linux kernel with same compilation options. 

Other (fixable) issues with Docker - and LXC containers - include lack of support of some system calls, increased difficulty to debug kernel related issues (ex. network corruption) or lack of repeatable build in China due to network restrictions. Another (non fixable) issue with Docker is that it is based on Linux, not on POSIX. It is thus not portable to other POSIX operating systems (ex. OpenBSD, μCLinux).

All current Docker limitations were solved in SlapOS 10 years ago.

OpenStack case it is different. Any project using OpenStack has a very high probability to explicitly waste taxpayer's money and implicitly promote proprietary solutions (Huawei, Amazon, Google, Microsoft,etc.).

OpenStack is a bloated project run by a bloated community that has produced unstable software and wasted huge amounts of taxpayer's money. Its design does not follow the basic principles of self-converging systems defined by Mark Burgess, without which it is impossible to operate reliably a large complex system. As a consequence, OpenStack systems need to be entirely rebooted from time to time. The average number of unexpected reboots of an OpenStack VM operated by OVH or Rackspace is 1 to 5 times per year. This compares with 0.11 reboots per year for an average bare metal server operated by OVH.

One of the most famous OpenStack project is the French government sovereign cloud. Rather than using reliable European technologies (GANDI, NIftyName, Proxmox, SlapOS, etc.), highly subsidised companies such as Orange, Thales, Bull (now Atos) and SFR decided to support OpenStack. 10 years after, Orange operates an OpenStack cloud... provided by Huawei and based on a heavily modified version of OpenStack. French taxpayer's money has thus sponsored Chinese industry and proprietary software rather then French or European pioneering SMEs and Free Software.

Nearly all European research projects based on OpenStack have produced very few results that are in use today: Reservoir, CompatibleOne, EASI Clouds, Nuage, Andromede, etc.. SlapOS is one of the few stealth results produced by two of these projects.

Many large companies which tried to operate their own OpenStack cloud also failed and now rely on Amazon AWS, Microsoft Azure (ex. Walmart) or Google Cloud. The list of failure can not be published here because few CIOs are ready to admit it in public. However, anyone can find online examples of failures such as "British Telecom threatens to abandon OpenStack in its current form". Failure is so frequent with OpenStack that it is now part of its own marketing with all kinds of suspicious arguments, such as the size of the team (reminder: it takes 2 days for a single engineer to deploy SlapOS entirely).

Scalable Business Models

An Open Hardware / Free Software solution for industrial automation could be widely adopted if it is:

  • available worldwide;
  • supported worldwide.

Three business models can support this level of scalability:

  • luxury service (ex. McKinsey);
  • branded support (ex. RHEL);
  • branded hardware (ex. Olimex);
  • online services (ex. ViFiB);
  • copyright licensing (ex. MySQL, LASO).

The luxury service business model consists of selling highly qualified, personalised service at high price. What the customer gets is the certainty to receive consistent service from bright brains. It requires to setup a specific education and knowledge sharing process aEcross the organisation. 

Branded hardware consists of distributing a package of Free Software under a brand which is proprietary and attach high quality services to this brand. Red Hat's Linux distribution is based on this idea. Red Hat provides at the same time a system with branded support (RHEL) and a system with no support (CentOS) which share the same code.

Branded hardware consists of distributing open hardware device with a brand. What the customer gets is the certainty to get a working device. It requires to setup a global logistic network.

Online services consist of providing services online that support the implementation of Free Software. What the customer gets is effortless deployment and maintenance. It requires to automate the maintenance service, based on data protected by trade secret.

Copyright licensing consists of providing the same code as in a Free Software but under a different license. MySQL is for example installed in many CISCO routers. CISCO has been licensing MySQL code under a proprietary license.

References