Wendelin Exanalytics Libre

WENDELIN combines Scikit Learn machine learning and NEO distributed storage for out-of-core data analytics in python

Table of Contents

  • Known slowdowns
  • Regression hunt
  • Paths in catalog

    This section describes common design mistakes which impact performances.

    Never get an object from its uid when you can get it from its path

    Using uids implies using the catalog, and the catalog is not meant for single object gathering. To get an object when its uid and path are available, use the path.

    Bad Example:

      def f(self, odject_uid, object_path, **kw):
        object = self.getPortalObject().portal_catalog(uid=object_uid)
        # do something on object
    

    Good Example:

      def f(self, object_uid, object_path, **kw):
        object = self.restrictedTraverse(object_path)
        # do something on object
    

    If the function or script gathering the object is the action of an html form, it's very likely that the request contains object_uid and object_path.

    Never use type() to test the type of a variable

    Bad Example:

      type(a) == type('')
    

    Good Example:

      isinstance(a, str)
    

    Never call what produces nothing

    Never call a function or invoke a macro if the context contains all the information neeeded to know that this function or macro will produce nothing directly or indirectly (side-effect). It is better for example to test a variable and only invoke the macro or the function if that variable allows to be certain that the macro or the function will be produce something required.

    Bad Example:

      def f(test, b):
        if test:
           # do something with b
    
      def g(test):
         # do something
         f(test,2)
         # do something else
    

    Good Example:

      def f(test, b):
         # do something with b
    
      def g(test):
        # do something
        if test:
          f(test,2)
        # do something else
    

    Such change in page templates (developper mode rendering) improved performances when developper mode is disabled by about ten percent.

    Never split tal directives when they can fit in the same html tag

    Do not generate unnecessary nesting in page templates.

    Bad Example:

      <tal:block tal:define="foo bar">
        <tal:block tal:repeat="baz foo">
          <!-- Do something with baz -->
        </tal:block>
      </tal:block>

    Good Example:

      <tal:block tal:define="foo bar"
                 tal:repeat="baz foo">
        <!-- Do something with baz -->
      </tal:block>

    Never call a function in a loop without caching it first

    Bad Example:

      <tal:block tal:repeat="some_list">
        <tal:block metal:use-macro="here/some_page_template/macros/master" />
      </tal:block>

    Good Example:

      <tal:block tal:define="some_page_template_master nocall: here/some_page_template/macros/master"
                 tal:repeat="some_list">
        <tal:block metal:use-macro="some_page_template_master" />
      </tal:block>

    Never undo work

    Never process data returned by a function if the function handled at some point the data you want after processing. Better write another function which return the intermediate result you are interested in.

    Bad Example:

      def foo(self):
        return [object.id for object in context.objectValues()]
    
      def bar(self):
        return [self.restrictedTraverse(object_id) for object_id in self.foo()]
    

    Good Example:

      def foo(self):
        return [object.id for object in context.objectValues()]
    
      def bar(self):
        return context.objectValues()
    

    This case can be much common when foo returns the result of a CachingMethod : as we must not cache object we cache their paths and then do a restrictedTraverse to get the objects from that path... And sometimes the result is used to get the path from the objects, which "undoes" work done just before - in pure loss.

    Never use len on objectIds inside a BTree

    Instead call len directly on the btree, for example do len(btree) instead of len(btree.objectIds()). Results are the same but not the time to compute it, with a btree containing about 300000 objects, first method take less than 1s and second one takes 40 min.

    Never use "different" operator in SQL queries

    SQL engine can not make use of an index when a criterion is compared for non-equality, but it can when compared for equality (and with some index types it can also for inferior/superior conditions).

    So alway prefer

    foo in ('bar', 'baz')

    to

    foo <> 'hoge'

    when possible.

    Known slowdowns

    Here are listed all known slowdowns which require investigation - or are being investigated - to be solved.

    - object.getDestinationPaymentTitle() is slower than object.getDestinationPaymentValue().getTitle()

    • The cause is in ERP5Type/PropertySheet/Arrow.py : some accessors are specially defined here. need precise explanation

    Regression hunt

    "Never hunt two rabbits at the same time."

    When trying to optimise a system S, made of two components A and B, if you find that a change in B has decreased performance, do not touch to A at all until you completely understand and optimise B. Otherwise, the changes you make to A may hide or void the performance decrease on B and even make it impossible to later optimise B by comparing versions of B and understand their impact.

    Paths in catalog

    Note: in the following chapter, we will only mention paths. But the same is true for relative_url and id.

    Paths in catalog are present for exactly 2 purposes:

    • - Find an object in ZODB when its uid is known.
      • For example, when searching in catalog and we want to fetch object matching conditions.
      - Find catalog entry when the only thing we know about an object is its path.
      • This does not include case where the ZODB and the object are available, since you can then retrieve its uid and pass it to catalog instead.

    If you are using catalog's paths in any other case, chances are you're doing it wrong, and will face poor performances.

    An example of this is when lookuing up all objects which have a certain path as a category value. MySQL will poorly optimise the join because it does not know that path is unique (and does not need to know in the design explained above).