Wendelin Exanalytics Libre

WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python

Table of Contents

Abstract

This short HowTo will teach you how to ingest data inside Wendelin platform using fluentd. In order to do so you must have already a Wendelin instance ready and know its URL and username / password to access. There's no need of additional configuration at Wendelin side as it come already pre configured.

You can read wendelin-HowTo.Install.Wendelin.Standalone to know how to install Wendelin.

For the purpose of the HowTo we will show how to ingest a simple JSON data but it can be anything.

Step 1: Install fluentd and Wendelin fluentd plugin

root@debian: ~$ apt install ruby ruby-dev
root@debian: ~$ gem install --user-install fluentd
root@debian: ~$ gem install --user-install fluent-plugin-wendelin

Step 2: Clone default Wendelin's plugin directory

Before this step you need to be aware of your Wendelin's instance URL, username and password.

ivan@debian: ~$ git clone https://lab.nexedi.com/nexedi/fluent-plugin-wendelin.git
ivan@debian: ~$ cd fluent-plugin-wendelin/example
# set proper username / password and URL in configuration file!
ivan@debian: ~/fluent-plugin-wendelin/example$ vi to_wendelin.conf
ivan@debian: ~/fluent-plugin-wendelin/example$ ~/.gem/ruby/2.7.0/bin/fluentd -v -c to_wendelin.conf

Step 3: Ingest

ivan@debian: ~$ curl -X POST -d 'json={"foo1":"bar1"}' http://localhost:8888/test_sensor.test_product

Step 4: Check everything is successfully ingested at Wendelin side

Wendelin's Data model is quite complex. For the purpose of the HowTo it's enough to see where data was successfully ingested. In the concrete example it's ingested inside a "Data Stream" object which has a reference "test_sensor-test_product". By going to this object's view we shall see it's size which should increase after multiple "curl" calls.

Also one can also use following command line to read what was ingested into the Data Stream (please use as a template and insert proper values for your setup!)

ivan@debian: ~$ curl -su <your_wendelin_user>:<your_wendelin_password>  <Wendelin_URL>/erp5/data_stream_module/<Ingested_Data_Stream_Id> -r 0-19 > instance1.msgpack
ivan@debian: ~$ python
>>> import msgpack
>>> msgpack.unpackb(open("instance1.msgpack").read())
[1596106294, {'foo1': 'bar1'}]
# 1596106294 is the timestamp value inserted by wendelin fluentd plugin.