Sodium internals and differences between language implementations

cubuspl42 2020-02-09 19:08:46 UTC #1

I'm trying to better understand how Sodium is implemented. Unfortunately, the book doesn't have a chapter on implementing an FRP system at all.

After a brief investigation of the Java/JavaScript source code, I got an impression that the most complex part of Sodium is topological sorting maintenance based on ranks. Is it correct?

I noticed that the JS version uses a Synchronous Cycle Collection algorithm with four colors. Is there any specific reason why only JS version uses it? Is it related to the lack of weakrefs/finalizers?

Is there any document/source where the problem itself (requirements/challenges) of implementing an FRP system in a real-world imperative language is described? Investigation of an actual solution is helpful, but I'm worried that I'm only reverse engineering the actual problem that that is being solved.

Thank you in advance.

the-real-blackh 2020-02-10 00:09:24 UTC #2

Hi @cubuspl42, we don't have a document about how to implement an FRP system, so I'll try to give a sketch of it...

Yes, the topological sorting based on ranks is the most complex part, but the memory management is also complex in some implementations.

RANKING

The rule is that if node B depends on node A then A is executed first. We do this by 1. assigning ranks based on walking the directed graph and ensuring our invariant holds (and dealing with loops), and 2. putting every outstanding job into a priority queue and pulling them off the head, so we execute them in rank order.

This isn't the most efficient way of doing things, but it's easy to understand. A simple optimization with a huge effect is to have a shortcut when the queue contains 1 item. The most efficient implementation would maintain the execution order statically, but change it in response to the SWITCH primitive.

MEMORY MANAGEMENT

Objects are kept alive in the reverse of how they are in the observer pattern. So if B listens to A, then...

Observer pattern: B is kept alive by A
FRP: A is kept alive by B

This makes a lot of sense if you think of FRP as a way to solve all the problems with the observer pattern. The purpose of the observer pattern is to reverse dependencies, but it fails to do so for the memory dependency. So we fix that.

Since FRP uses the observer pattern under the covers, the easiest way to implement it is to deregister all listeners upon finalization of an FRP object. We also hold a reference to the object being listened to to establish the memory dependency in that direction.

In Javascript, weak references and finalizers are not possible. So, we use a reference counting system with the four-colour cycle collection algorithm you mentioned.

I haven't surveyed the FRP landscape for a while, so I can't tell you whether anyone else has documented this. Learning to implement this thing was a huge amount of work!

If you've got any other questions, ask away!

cubuspl42 2020-02-10 08:13:25 UTC #3

Thank you for your answer!

The most efficient implementation would maintain the execution order statically, but change it in response to the SWITCH primitive.

What do you mean exactly? One way is to keep it in a DAG, and do, in every transaction, a traditional one-shot topological sorting with multiple roots, where each sink that fires an event in the current transaction is a root. But would that be much better performance-wise?

the-real-blackh 2020-02-10 20:43:46 UTC #4

By "maintain the execution order statically", this is what I mean:

The DAG can be translated into an ordered sequence of operations. In most languages you would do this by constructing a structure of objects where each object references the next node in the sequence directly.

Or if you can actually generate code, you're even better off. LLVM is an option. This would strip out all the function call overhead and allow for much better optimization.

You'd have to watch out for the performance of switches with this approach, though.

But - one thing to remember about FRP is that there aren't actually that many real-world applications where its performance is a problem.

cubuspl42 2020-02-10 22:40:13 UTC #5

But - one thing to remember about FRP is that there aren't actually that many real-world applications where its performance is a problem.

I'm writing a game level editor in TypeScript. In a relatively small level with 2500 objects, the most simple single event propagation in Sodium takes around 100ms. It's possible that I'm doing something wrong, but I don't think so. I really hope that there's room for optimisations, because it would be a pity to drop FRP for such a pragmatic reason like performance.

The LLVM way is not really a low hanging fruit, and not really applicable for the Web, but is still interesting.

In most languages you would do this by constructing a structure of objects where each object references the next node in the sequence directly.

It sounds a bit like a linked list representing a topological sorting of the whole FRP graph. Is it correct, or am I missing something? In a single transaction, only one/few streams actually fire, while the graph can have thousands of vertices.

the-real-blackh 2020-02-10 23:26:45 UTC #6

If you're running into performance problems, then we should do something about it. You're not doing something wrong. The performance really is that bad. Until now we've been focused on just getting it work right.

Here's something really easy you can do:

In Transaction.prioritized(), add a special case for when prioritizedQ is empty. You'll find this here in prioritized() on line 227:

github.com

SodiumFRP/sodium-cxx/blob/master/sodium/transaction.h

/**
 * Copyright (c) 2012-2014, Stephen Blackheath and Anthony Jones
 * Released under a BSD3 licence.
 *
 * C++ implementation courtesy of International Telematics Ltd.
 */
#ifndef _SODIUM_TRANSACTION_H_
#define _SODIUM_TRANSACTION_H_

#include <sodium/config.h>
#include <sodium/light_ptr.h>
#include <sodium/sodium_base.h>
#include <boost/optional.hpp>
#include <boost/intrusive_ptr.hpp>
#include <sodium/unit.h>
#include <map>
#include <set>
#include <list>
#include <memory>
#include <mutex>

This file has been truncated. show original

What we do here is to have a variable called prioritized_single. That's used when there are no entries in the queue.

And, in this file in process_transactional() on line 310...

github.com

SodiumFRP/sodium-cxx/blob/master/sodium/transaction.cpp

/**
 * Copyright (c) 2012-2014, Stephen Blackheath and Anthony Jones
 * Released under a BSD3 licence.
 *
 * C++ implementation courtesy of International Telematics Ltd.
 */
#include <sodium/sodium.h>
#if !defined(SODIUM_SINGLE_THREADED) && defined(SODIUM_USE_PTHREAD_SPECIFIC)
#include <pthread.h>
#endif

using namespace std;
using namespace boost;

namespace sodium {

#if defined(SODIUM_SINGLE_THREADED)
    static impl::transaction_impl* global_current_transaction;
#elif defined(SODIUM_USE_PTHREAD_SPECIFIC)
    static pthread_key_t current_transaction_key;

This file has been truncated. show original

...you'll see that when we pull an item off the prioritizedQ, we always pick prioritized_single first if it's not null.

This small change makes a huge difference to performance.

the-real-blackh 2020-02-10 23:38:30 UTC #7

Basically, yes. You might even achieve the same thing by just adding extra fields to the existing objects that are linked according to the ranks. Event propagation can start at different points, depending on which sinks you send things into. There can be more than one in a single transaction.

You'd have to work it all out. I haven't done this.

It's potentially a bit complex. For example, if you send into two sinks, and they are then merged, e.g.

StreamSink<Integer> a = new StreamSink();
StreamSink<Integer> b = new StreamSink();
Stream<Integer> c = a.orElse(b);
Stream<Integer> d = c.map((a) -> a + 1);

If you send into a and b in the same transaction, then ... let's process a first. Then when you get to c, you have to process b next before processing c and d. Whether you process b or not depends on whether you sent a value to b in the same transaction.

So what I'm proposing is to make as much of this static as possible because the priority queue is a performance killer.

the-real-blackh 2020-02-10 23:46:01 UTC #8

An improvement on the C++ optimization above would be to have it use prioritized_single if the thing to add has a smaller rank than the smallest entry in prioritizedQ, rather than deciding on the basis of whether prioritizedQ is empty or not.

The invariant is that if prioritized_single is non-null, then its rank is smaller than the smallest rank in prioritizedQ.

cubuspl42 2020-02-11 16:23:30 UTC #9

Again, thank for your answers.

I'm not asking you to solve this for me, I only want to make sure that we're on the same page here.

Currently my understanding is that if one prepares a static topological sorting of the whole graph, then there's no way (known to me) that lets you not to actually process the whole graph every time, which can be 100 times bigger than any reasonable subset active in any single transaction (does that sound like a reasonable assumption?). Current queue-based approach doesn't have this issue and processes only the active subset (am I right?).

cubuspl42 2020-02-11 16:26:48 UTC #10

I have another extremely simple question. Did you try to implement classic toplogical sorting algorithms (https://en.wikipedia.org/wiki/Topological_sorting), like Kann or DFS-based? Or, maybe, the current approach is a variation of one of these algorithms in a non-obvious way?

the-real-blackh 2020-02-12 23:08:10 UTC #11

No, I didn't look at those algorithms. The algorithm I'm using comes from Ingo Maier and Martin Odersky's paper "Deprecating the observer pattern".

cubuspl42 2020-02-13 12:38:05 UTC #12

That was an interesting lecture. It seems that the primary motivation of the authors was to optimize for inactive subgraphs of the DAG, i.e. exit early in case a vertex is calm (calm unchanged cell, filtered stream, ...). It seems reasonable. Topological sorting once per transaction would indeed require to process the whole subgraph containing the active roots (sinks).

The algorithm itself seems to be fine, I'll investigate the Sodium implementation starting with the hint you provided. Thank you again for all the answers!