WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python
Table of Contents
This short HowTo will teach you how to ingest data inside Wendelin platform using ebulk. In order to do so you must have already a Wendelin instance ready and know its URL and username / password to access. There's no need of additional configuration at Wendelin side as it comes already pre configured.
You can read wendelin-HowTo.Install.Wendelin.Standalone to know how to install Wendelin. Please note that during installation you should have checked to install the proposed data lake functionality!
Step 1: Install ebulk
root@debian: ~$ add-apt-repository ppa:rporchetto/ebulk-ppa
root@debian: ~$ apt-get update
root@debian: ~$ apt-get install ebulk
Step 2: Configure ebulk client
Before this step you need to be aware of your Wendelin's instance URL, username and password.
# when asked please insert your Wendelin instance' URL which if you used installation HowTo should be
# following: https://<ip_v4>/erp5
ivan@debian: ~$ ebulk set-data-lake-url
# please insert username / password which if you used installation HowTo should be provided by "erp5-show -s" command
ivan@debian: ~$ ebulk store-credentials
Step 3: Init your "data" and push to Wendelin
# this step will prepare internally your folder with ebulk's metadata files inside
ivan@debian: ~$ ebulk init <Your_BIG_Data_Set>
# real push to Wendelin
ivan@debian: ~$ ebulk push <Your_BIG_Data_Set>
Step 4: Check your data is ingested at Wendelin side
If you used Wendelin's installation HowTo with "data lake" funtionality selected for installation a default data lake website user interface will be available under this URL:
And your newly uploaded data set should be there.