Wendelin Exanalytics Libre

WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python

Analyse: Work with Ingested Data


Wendelin Out of Core Computing
  • Wendelin.Core enables computation beyond limits of existing RAM
  • We have integrated Wendelin and Wendelin.Core With Jupyter
  • ERP5 Kernel (out-of-core compliant) vs. Python 2 Kernel (default)

Todo: Head to Jupyter (Notebook)

Wendelin-ERP5 - Juypter Interface
  • Head to Juypter http://[x].pydata-class.erp5.cn
  • Start a new ERP5 Notebook
  • This will make sure you use the ERP5 Kernel
  • The Python 2 Kernel is the default Jupyter Kernel
  • Using Python 2 will disregard Wendelin and Wendelin.Core, so it's basic Jupyter
  • Using ERP5 Kernel will use Wendelin.core in the background
  • To make good use of it, all code written should be Out-of-core "compatible"
  • For example you should not just load a large file into memory (see below)

Todo: Learn ERP5 Kernel (Notebook)

Wendelin-ERP5 - Juypter Interface Introduction
  • Note you have to connect to Wendelin/ERP5
  • The reference you set will store your notebook in the Date Notebook Module
  • Passing login/password will authenticate Juypter with Wendelin/ERP5
  • Note that your ERP5_URL in this case should be your internal url
  • You can retrieve it be running erp5-show -s in your webrunner terminal
  • Note, outside of the tutorial we would set the external IPv6 adress of ZOPE

Todo: Getting Started (Notebook)

Wendelin-ERP5 - Juypter Access Wendelin
  • Connect, set arbitrary reference and authenticate

Todo: Accessing Objects (Notebook)

Wendelin-ERP5 - Juypter Accessing Objects
  • Import necessary libs
  • Type context , this will give you the Wendelin/ERP5 Object
  • Type context.data_stream_module["1"] to get your uploaded sound file
  • Accessing data works the same ways throughout [IPv6]:30002/erp5/[module_name]/[id]
  • All modules you see on the Wendelin/ERP5 start page can be accessed like this
  • Once you have an object you can manipulate it
  • Note that accessing a file by internal id (1) is only one way
  • The standard way would be using the reference of the respective object, which will also allow to user portal_catalog to query

Todo: Accessing Data Itself (Notebook)

Wendelin-ERP5 - Juypter Accessing Data Itself
  • Try to get the length of the file using getData and via iterate
  • Note then when using ERP5 kernel all manipulations should be "Big Data Aware"
  • Just loading a file via getData() works for small files, but will break with volume
  • It's important to understand that manipulations outside of Wendelin.Core need to be Big Data "compatible"
  • Internally Wendelin.Core will run all manipulations "context-aware"
  • An alternative way to work would be to create your scripts inside Wendelin/ERP5 and call them from Juypter
  • Scripts/Manipulations are stored in Data Operations Module

Todo: Compute Fourier (Notebook)

Wendelin-ERP5 - Juypter Compute Fourier Series
  • Proceed to fetch data using getData for now
  • Extract one channel, save it back to Wendelin and compute FFT
  • Note, that ERP5 kernel at this time doesn't support %matplotlib inline
  • Note the way to call methods from Wendelin/ERP5 (Base_renderAsHtml )
  • Wendelin/ERP5 has a system of method acquistion. Every module can come with its own module specific methods and method names are always context specific ([object_name]_[method_name] ). Base methods on the other hand are core methods of Wendelin/ERP5 and applicable to more than one object.

Todo: Display Fourier (Notebook)

Wendelin-ERP5 - Juypter Display Fourier Series
  • Check the rendered Fourier graphs of your recorded sound file

Todo: Save Image (Notebook)

Wendelin-ERP5 - Juypter Save Image
  • Save the image back to Wendelin/ERP5.

Todo: Create BigFile Reader (Notebook)

Wendelin-ERP5 - Create Big File Class
  • Add a new class BigFileReader
  • Allows to pass out-of-core objects

Todo: Rerun using Big File Reader (Notebook)

Wendelin-ERP5 - Juypter Rerun using Big File Reader
  • Rerun using the Big File Reader
  • Now one more step is out of core compliant
  • Verify graphs render the same
  • We are now showing how to step by step convert our code to being Out-of-Core compatible
  • This will only be possible for code we write ourselves
  • Whenever we have to rely on 3rd party libraries, there is no guarantee that data will be handled in the correct way. The only option to be truly Out-of-Core is to either make sure the 3rd party methods used are compatible and fixing them accordingly/committing back or to reimplement a 3rd party library completely.

Todo: Redraw from Wendelin (Notebook)

Wendelin-ERP5 - Juypter Recover Plot
  • Redraw the plot directly from data stored in Wendelin/ERP5

Todo: Verify Images are Stored

Wendelin-ERP5 - Image Module
  • Head back to Wendelin/ERP5
  • Go to Image module and verify your stored images are there.

Todo: Verify Data Arrays are Stored

Wendelin-ERP5 - Data Array Module
  • Switch to the Data Array module
  • Verify all computed files are there.