Programming Models for Concurrency


1: Abstract

2: Introduction

There is no widely accepted best practice for programming highly concurrent complex control systems. UML 1.4 can perhaps be viewed as the most ambitious current attempt to provide a standard model for concurrency programming.

In Chapter 2, the precise semantics of UML 1.4 are described. These involve:

This model is the one most commonly used when programming concurrency in e.g. C++. It is also quite similar to e.g. OHaskell, a fairly recent attempt at offering concurrency support in the popular functional programming language Haskell. Ericsson programmers may also find it rather similar to the semantics of the Ericsson-proprietary language PLEX.

Looking at the developments in telecoms, the programming languages Chill, EriPascal, HLPLEX and Erlang (the most recent and most widely used of these languages), all use a slightly different approach:

These languages evolved out of experiences from building complex telecom software. This document attempts to illustrate the qualitative differences between the "UML" approach and the "Erlang" approach, when applied to a typical telecom problem.

3: Setup

4: Experiments

I decided to use a POTS example similar to that found in the book "Concurrent Programming in Erlang" (Armstrong et al, ISBN 0-13-508301-X, page 240.) This example clearly illustrates the problems of writing control systems for telecoms. For various reasons, the program used here and the one in the book are not the same. In the book, a "middleware" layer, tele_os, is provided that interfaces to the hardware. The corresponding middleware layer in this example is called 'lim', for Line Interface Module.

4.1: Original Implementation

The original implementation uses the classical Erlang programming model of

Measurements

In the following table, I have collected a few code metrics. Briefly explained,

  Original
Lines of code 161
Branches 50
Functions 16
Timing Dependencies 0

4.2: Event-based Programming, blocking APIs (asynch_0)

In order to implement an event-based control module in Erlang, I wrote a small event loop:

The actual program was quite easy to write in Erlang (and in fact, the commonly used "generic server" framework in Erlang is based on this "event-driven" programming. The resulting module is in fact significantly shorter than the original. A subjective opinion is that it is more difficult to follow, given this particular problem.

It should be noted that this module "cheats", in the sense that is still uses a hardware API with blocking semantics. This API is implemented using synchronous wrappers, each sending an asynchronous message to the hardware and then blocking until the answer arrives (implicitly buffering all other messages in the meantime.) This assumes the possibility to program a "blocking RPC", something that you can readily do in Erlang, but not in OHaskell or UML 1.4. It should be noted that a blocking RPC method is introduced in UML 1.5, even though the UML semantics do not allow the programmer to easily describe such functionality in UML itself.

Another thing to note is that the code relies rather heavily on function-head pattern matching, which is a rather unique Erlang feature. One could have written the code using case statements instead. This would have increased the number of branches in the code by roughly the number of functions. The same goes for the later examples, where this factor would become more dominant. It is unclear whether this would make the comparison more or less fair, so I have strived to write the code the way I thought best.

Measurements

  Original asynch_0
Lines of code 161 104
Branches 50 34
Functions 16 18
Timing Dependencies 0 0

Semantics

Are the two implementations equivalent? Actually, no. Both do, as far as I can tell, solve the problem and adhere to the specification (however, note that the specification is quite primitive.)

Let's consider a specific message sequence to illustrate the differences:

{lim,offhook} - {lim,onhook} - {Pid,request_connection}

If the A:onhook and Pid:request_connection arrive at roughly the same time, internal scheduling details may determine which message appears first in the message queue of subscriber A. In the original implementation, this doesn't matter; the onhook message will be serviced first, as it appears first in the list of patterns in the 'receive' clause. The request_connection message will then be serviced in the idle state and will be accepted. Note that in this programming model, we can actually control the behaviour in this situation, by prioritizing the match patterns.

In the async_0 implementation, the message that arrives first will be serviced first, so if the request_connection message arrives before the onhook, the connection request will be rejected. We have very little control of this aspect in this programming model.

The behaviours in this case are both valid. From a user's point of view, it is perhaps more desireable that the call be serviced when possible. On the other hand, even in the original implementation, the "optimized" behaviour is only possible if the onhook message is actually in the message queue. This is a delicate timing matter, and in this particular case, we may well be prepared to ignore the subtle difference. However, we will see other variations on the same theme below, with more dire consequences.

4.3: Event-based, One non-blocking API (asynch_1)

This version is based on the previous example (asynch_0), but with the hardware API changed to non-blocking semantics. This complicates the state machine, and introduces a number of timing dependencies. This is intended to simulate the style of programming required to solve the problem in e.g. OHaskell or UML 1.4.

I have tested the basic functionality, but not the timing-dependent code. This type of code is generally difficult to test, as it often requires tricky fault injection techniques. One may run a profiler on the code to make sure that all the code has been covered, but there may still be timing dependencies that the programmer hasn't thought of that are not represented in the code. I encountered a number of such bugs when writing the code.

Measurements

  Original asynch_0 asynch_1
Lines of code 161 104 175
Branches 50 34 64
Functions 16 18 37
Timing Dependencies 0 0 6

4.4: Event-based, Two non-blocking APIs (asynch_2)

This version goes one step further and also makes the number analysis API non-blocking. Even though the number analysis API consists of only one function, and it is only used within a limited segment of the state machine, implementation complexity grows substantially. The reason is that the existing timing dependencies interact with the new ones, multiplying the effect.

Measurements

  Original asynch_0 asynch_1 asynch_2
Lines of code 161 104 175 203
Branches 50 34 64 70
Functions 16 18 37 37
Timing Dependencies 0 0 6 13

5. Conclusion

Some work remains on finding truly language-neutral comparison techniques, but there seems to be a clear indication that the non-blocking event-based programming style leads to increasingly complex code as more asynchronous APIs are introduced.

An important observation is that the timing dependencies introduced in the implementations do not correspond to actual timing dependencies in the real world. There is, for example, no actual correlation between the pressing of a digit on the telephone and the sending of a tone request reply from our switch hardware. The dependencies exist in our implementation solely due to the programming model.

The original program is free from such arbitrary timing dependencies. This makes it stable in the sense that it is not affected if parts of the code are distributed, if hardware is upgraded, changing the capacity and thereby the timing aspects of the system, or if a supporting component is made more (or less) complex.

It should be noted that the 'defer' semantics in UML make it possible to avoid some of the complexity described above. However, it doesn't eliminate the timing dependencies, since messages to be deferred must be declared explicitly. This means that one also needs to clearly identify all vital messages and carefully verify that these messages cannot be accidentally discarded in any state where the message might arrive. This must of course be repeated each time the model is altered or extended, as new states or new messages are introduced. An alternative would be to enforce the convention that all unknown messages are always deferred. An obvious problem with this is that deferred messages must also be explicitly "recalled"; thus one must first know that a message has been deferred in order to recall it. I believe that there is a "recall_all" function, but this should probably only be used to flush unknown messages.

Looking more closely at what makes the first program stable, we can identify the following core requirements:


Ulf Wiger
Last modified: Fri Feb 6 13:58:56 MET 2004