Wendelin Exanalytics Libre

WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python

Wendelin architecture is based on 5 layers:

  • Analytics layer: Wendelin leverages a wide variety of Numpy based analytics libraries such as scikit-learn, Pandas, NLTK, OpenCV-python, etc
  • Storage layer: Wendelin stores native python objects on NEO distributed storage and thus eliminates format conversion steps found in other NoSQL technologies.
  • Elasticity layer: Wendelin distributes data processing scripts on a cluster thanks to ERP5 active python object technology. Scripts are stored on NEO and can be modified in real time without any system restart.
  • Deployment layer: Wendelin deployment is automated thanks to SlapOS mesh computing operating system. Analytics libraries are optimized automatically by SlapOS based on the targert CPU.
  • Infrastructure Layer: Wendelin can be deployed on commodity hardware, private cloud or public cloud.

Wendelin architecture provides key features not found in other platforms:

  • python based
  • native code compiler for key algorithms
  • GPU compiler for key algorithms
  • native storage of low level matrix data structure
  • best machine learning algorithms
  • wide scientific community thanks to Numpy
  • support 30+ years of FORTRAN optimizations
  • distributed multi-index
  • orthogonal index/storage topology for high throughput and fast access

Wendelin vs. HADOOP

Wendelin focuses on python based data analytics and in particular on Numpy standard whereas HADOOP mostly related to Java programming world. Thanks to this, Wendelin can benefit more quickly from the growing homogenization of scientific computing on python.

Some similarities however exist between both architectures as illustrated in the following table, with some typical examples of software components used in both cases.

Wendelin HADOOP
High-level programming language
Python  Java
Low-level language
Standard data structure
Numpy N/A
Native x86 compiler
Numba N/A
GPU compiler
Parakeet N/A
Machine learning
Scikit-learn Weka
Distributed storage
NEO Spark
Distributed processing
ERP5 Activity
Job Tracker
Management portal
ERP5 Data
Cloudera Manager
Natural language processing
NLTK Lucene
Video processing
Financial statistics
Distributed index
Cloud deployment and orchestration SlapOS Zookeeper