Nexedi was recently requested advice about which Free Software is currently suitable for reporting purpose in the context of entreprise applications. Our advice is: Jedox PALO and Jupyter. We also expect that Olapy will be suitable some day.
The reason why we love PALO at Nexedi is because it is - to our knowledge - the only reporting tool that accountants can use without any support from IT department. PALO was implemented for Aide et Action international NGO and was able to produce complex budget reports based on multi-dimensional analysis of accounting transactions.
What makes PALO better than other tools is that once the OLAP cube has been built (an easy step), users can produce all sorts of complex reports directly from their Excel spreadsheet (or to some extend from LibreOffice). This leads to the complete elimination of any effort on the IT department side when it comes to reporting, once the OLAP cube has been prepared. Preparing an OLAP cubes that can serve user's needs during a couple of years takes a few hours to a few days. Return on investment is thus massive. Flexiblity for end-users is extraordinary.
Nexedi could implement automated deployment of PALO with SlapOS, which demonstrated the ability to deploy PALO massively in an enterprise environment or to build a SaaS/PaaS.
However, Jedox policy in terms of open source licensing has been evolving constantly. To our knowledge, the latest version of Jedox is no longer Free Software. The term open source can hardly be found on any recent content of the corporate web site. This means that either one should rely on version 5.1 and improve it or one should simply consider that Jedox has become proprietary software.
In our opinion, Jedox is a typical example of open source company financed by venture capital that starts with a freemium business model and ends up with a fully proprietary business model. The evolution of Jedox business model should serve as a reminder to everyone that there is no guaranteed future for any open source company financed by venture capital and based on freemium business model.
That said, Jedox software is absolutely great and can serve very well the purpose of empowering end users while reducing work load of IT departments when it comes to multi-dimensional reporting. PALO version 5.1 which remains available as Free Software still works very well and has no better equivalent available under Free Software license.
Users with big data sets and scalability requirements should however rely on Jedox's commercial support (purchased directly from Jedox, not through a system integrator) in order to ensure that Jedox's PALO engine can actually scale up on their data set. Jedox proprietary software supports GPU acceleration. Only proprietary versions of PALO are actually supported. Direct contract with Jedox is thus the only way to ensure that reporting performance will not be limited by some missing features or some unpublished bug fixes.
Nexedi was offered to train sales managers of a gaming company to the use of Pandas and Jupyter for reporting purpose. We suddently discovered that some companies are ready to train their staff to python programming and empower them to process data in any possible way. Afterall, the python language was initially created by Guido Van Rossum to teach programming to children and was used by graphic designers at Industrial Light & Magic to create blockbuster movies such as Terminator.
There is therefore no reason why a sales manager or executive would not be able to achieve what a graphic designer can achieve: use scripting to automate IT tasks.
Nexedi also found out that the most difficult task in reporting consists of defining which data selection should be used as input for aggregation or visualisation. No matter which technology is used (SQL in Birt, set theory as in Business Objects, slicing and selecting in Pandas), some users have hard time in specifying a selection in a data set. The addition of a graphical user interface does not seem to change this observation. Some users will be able to work faster with a graphical user interface, but those who are having hard time without it will still have hard time with it.
This lead us to the following rule of thumb: any report that can not be produced with PALO can only be produced by users who are able to learn python.
This explains why we recommend a tool called Juypter as a general purpose tool for executives, not only for IT department. Juypter is what people call a "Notebook". It combines in a single web page python code, text and visual representations of data. Jupyter is used by most data scientists in the world. It is extremely popular in finance. We have tested on various executives of large French corporations: they were able to download it, install it and use it. As long as one knows what is a matrix or an array of floating point numbers, one can use Jupyter. In most countries (France, China, Russia, etc.), all executives had to learn this concept during their first year of college, if not before.
Juypter supports all python libraries (scikit-learn for machine learning, Pandas for data aggregation, Keras for deep learning, etc.). It also supports other languages useful for data processing: R for statistics, Julia for numerical computing, etc. It is thus a kind of universal platform for data science that costs zero. And for users who require a graphical user interface, Dataiku can serve as a graphical frontend that generates Jupyter notebooks (Dataiku is proprietary software).
Juypter is perfect as single user platform. It can access data through various connectors and process it localy. Juypter can also be used as a Web based platform shared by a few users that connect to various data sources with simple access rights.
Standard Jupyter has a few limitations though:
Jupyter limitations can be solved by using Wendelin big data platform and its dedicated Jupyter backend. With Wendelin, all data can be stored as native python's ndarrays on a distributed transactional database that can scale up to thousands of terabytes using a redundant array of inexpensive computers. Wendelin can thus act as a huge data lake shared by thousands of Jupyter users.
Access to data is controlled for every user through access rules. Access rights can be based on any arbitrary combination of project, data type, position in the company, date, etc. Scripts launched by a user and running in background or concurrently acquire the user's permissions.
Wendelin can efficiently save thousands of Juypter notebooks and execute them currently on a cluster of dozens of computers without wasting system's memory as it would happen on most python based systems. This was made possible thanks to wendelin.core, an out-of-core technology that has currently no equivalent on the market. Wendelin.core acts as a shared, distributed virtual memory manager that ensures that a single slice of array is never loaded more than once into RAM. It is thus possible to run dozens of python processes on a server without duplicating data a dozen time in RAM. It is also possible to access an array of 1TB on a server that has only 128 GB of RAM.
Wendelin can be integrated with other big data technologies using fluentd and embulk. Fluentd provides compatibility with dozens of real time ingestion protocols (MQTT, Kafka, ftp, syslog, etc.). Embulk provides an efficient way to copy data from another system (ex. HADOOP, Spark) into Wendelin. Embulk also supports dozens of protocols.
Typical use cases of Wendelin are:
The uncertainty of Jedox PALO's future that is inherent to venture capital financing requires to consider a future alternative to PALO. Olapy is an experimental library that implements the MDX protocol on top of Pandas. It is already able to act as an OLAP engine with Excel. Olapy is currently in its early days and quite immature. We expect that within a year or two, it will be integrated with wendelin.core and provide even higher scalability than PALO. We also expect that Web based spreadsheets such as OfficeJS will provide an efficient alternative to Jedox suite.
The best aspect of Olapy is that it will be integrated with Jupyter and thus provide to Jupyter users the combination of OLAP reporting and other forms of reporting such as machine learning.
Until this happens, PALO is still the way to go for Excel based OLAP reports.
For many kinds of reports PALO is probably the best reporting tool because it was designed to be used by average Excel users and thus saves a lot of time to IT departments. PALO is available as Free Software (version 5.1) or proprietary software (version 7). Acquiring a commercial license of version 7 may be required in many cases: SAP connectivity, scalability, etc.
Whenever PALO is not suitable or proprietary software is not an option, we advice to use Jupyter and teach users how to write python scripts. In every big company, in every department and on every site, there is a significant share of users who can actually write python code. Many executives with an MBA have actually no difficulty in writing python code.
For those users who refuse to write python code, we recommend Dataiku proprietary software, as long as everything produced by Dataiku is converted and saved into Jupyter notebooks. Just like Jedox, Dataiku is financed through venture capital which means that its future can not be guaranteed.
For any applications that require to share a large number of data sets among a large number of users, we recommend to use Wendelin in combination with Jupyter.
Last, we recommend to keep an eye on Olapy as a possible alternative to Jedox PALO.