Soledad performance improvements

The following is an unordered list of possible performance improvements to be analysed, sorted, and possibly implemented.

Single user mode

  • the upload of data is currently queued and that prevents from maxing out the available network bandwith. This can be improved by enabling batching (already implemented) and returning early for each piece of data transferred.
  • the use of couchdb for also storing sync/db metadata implies too many couch requests for each piece of data that is transferred. There might be room for improvement but needs some re-evaluating current couchdb backend.
  • the server is a wsgi reference implementation that can be improved either to take advantage of twisted async for requests handling and couch access, or some other thing which is not so clear for now.
  • possible improvements on gpg usage:
    • initialization of object may take a long time on some environments because of the usage of --list-config --with-colons for getting the version.
    • the use of temporary keyrings imply creation and importing of keys for each enc/dec/sign/verify operation. Some measurement of the time of these operations might bring light on the possibility of reusing the keyring instead of always setting up a new one.
  • Use DeferredLock in pragmas.py.
  • Evaluate time of SQLite PRAGMAS as they might take some time and might be triggered more times than needed.
  • Review async code for blocking pieces that might be blocking the reactor.

Problems already identified and fixed:

  • Use of TCP_NODELAY flag on couchdb sockets: leap.se/code/issues/8264
  • Use of HAVE_USLEEP flag on pysqlcipher compile.
  • Get rid of design docs on couch.
  • stop using view functions for storing metadata (use _local docs)

Multi user mode (pixelated/tw use)

  • the use of one sqlcipher db for each user might be a problem as sqlite does not focus on scalability. The storage could be changed by something else that still provides crypto, like mariadb for example.
  • the use of one reactor for the server and all clients could be reviewed to (1) ensure all blocking calls do not block the reactor, and (2) maybe use more than one reactor to avoid blocking client code to block the server for other clients.
  • it would also benefit from the improvements of single user mode.
  • it is important to note that using the sync in this mode seems to be a requirement to maintain compatibility between “organizational mode” (i.e. keys in the server) and “activist mode” (i.e. keys in the client).

Other notes

  • we decided to collect all this discussion on a single page and strive to use public channels to address it. This will probably be started during next week.
  • leap has been focusing on setting up environments for proper testing and performance measurements. The idea is to have a timeline with graphs for metric for resource usage (ram, cpu) and download/upload time for specific cases. Some work still has to be done in that direction but some things have already been worked out (the tools to do that, and some performance testing scripts).
  • during this week’s hackathon a lot has been discussed about how to address problems and how to contribute. Being able to openly criticize and also being open to criticism seems to be a good thing to focus on. Listening to the people that were present, it seems that both teams are open for contributions, as long as they can maintain a good balance between their own agendas.