WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python
Table of Contents
Analyse: Work with Ingested Data
Out-of-Core
- Wendelin.Core enables computation beyond limits of existing RAM
- We have integrated Wendelin and Wendelin.Core With Jupyter
- ERP5 Kernel (out-of-core compliant) vs. Python 2 Kernel (default)
- Head to Juypter
http://[x].pydata-class.erp5.cn
- Start a new ERP5 Notebook
- This will make sure you use the ERP5 Kernel
- The Python 2 Kernel is the default Jupyter Kernel
- Using Python 2 will disregard Wendelin and Wendelin.Core, so it's basic Jupyter
- Using ERP5 Kernel will use Wendelin.core in the background
- To make good use of it, all code written should be Out-of-core "compatible"
- For example you should not just load a large file into memory (see below)
- Note you have to connect to Wendelin/ERP5
- The reference you set will store your notebook in the Date Notebook Module
- Passing login/password will authenticate Juypter with Wendelin/ERP5
- Note that your ERP5_URL in this case should be your internal url
- You can retrieve it be running
erp5-show -s
in your webrunner terminal
- Note, outside of the tutorial we would set the external IPv6 adress of ZOPE
- Connect, set arbitrary reference and authenticate
- Import necessary libs
- Type
context
, this will give you the Wendelin/ERP5 Object
- Type
context.data_stream_module["1"]
to get your uploaded sound file
- Accessing data works the same ways throughout
[IPv6]:30002/erp5/[module_name]/[id]
- All modules you see on the Wendelin/ERP5 start page can be accessed like this
- Once you have an object you can manipulate it
- Note that accessing a file by internal id (1) is only one way
- The standard way would be using the reference of the respective object, which will also allow to user portal_catalog to query
notebook)">Todo: Accessing Data Itself (Notebook)
- Try to get the length of the file using
getData
and via iterate
- Note then when using ERP5 kernel all manipulations should be "Big Data Aware"
- Just loading a file via getData() works for small files, but will break with volume
- It's important to understand that manipulations outside of Wendelin.Core need to be Big Data "compatible"
- Internally Wendelin.Core will run all manipulations "context-aware"
- An alternative way to work would be to create your scripts inside Wendelin/ERP5 and call them from Juypter
- Scripts/Manipulations are stored in Data Operations Module
- Proceed to fetch data using
getData
for now
- Extract one channel, save it back to Wendelin and compute FFT
- Note, that ERP5 kernel at this time doesn't support
%matplotlib inline
- Note the way to call methods from Wendelin/ERP5 (
Base_renderAsHtml
)
- Wendelin/ERP5 has a system of method acquistion. Every module can come with its own module specific methods and method names are always context specific (
[object_name]_[method_name]
). Base methods on the other hand are core methods of Wendelin/ERP5 and applicable to more than one object.
- Check the rendered Fourier graphs of your recorded sound file
- Save the image back to Wendelin/ERP5.
notebook)">Todo: Create BigFile Reader (Notebook)
- Add a new class BigFileReader
- Allows to pass out-of-core objects
notebook)">Todo: Rerun using Big File Reader (Notebook)
- Rerun using the Big File Reader
- Now one more step is out of core compliant
- Verify graphs render the same
- We are now showing how to step by step convert our code to being Out-of-Core compatible
- This will only be possible for code we write ourselves
- Whenever we have to rely on 3rd party libraries, there is no guarantee that data will be handled in the correct way. The only option to be truly Out-of-Core is to either make sure the 3rd party methods used are compatible and fixing them accordingly/committing back or to reimplement a 3rd party library completely.
- Redraw the plot directly from data stored in Wendelin/ERP5
Todo: Verify Images are Stored
- Head back to Wendelin/ERP5
- Go to Image module and verify your stored images are there.
Todo: Verify Data Arrays are Stored
- Switch to the Data Array module
- Verify all computed files are there.
Related Articles