Multicores

Since computers with multiple cores are the norm today, let us discuss how this can be exploited with D.

While concurrency and parallelism are actually different concepts, we will not discuss them separately, because programmers usually encounter them at same time anyways. However, this means we only provide an overview here.

Threads

The basic mechanism for threads managed by the operating system (PThread) is provided by std.concurrency.

import std.stdio;
import std.concurrency;

void spawnedFunc(Tid tid)
{
  // Receive a message from the owner thread.
  receive(
    (int i) { writeln("Received the number ", i);}
  );

  // Send a message back to the owner thread
  // indicating success.
  send(tid, true);
}

void main()
{
  // Start spawnedFunc in a new thread.
  auto tid = spawn(&spawnedFunc, thisTid);

  // Send the number 42 to this new thread.
  send(tid, 42);

  // Receive the result code.
  auto wasSuccessful = receiveOnly!(bool);
  assert(wasSuccessful);
  writeln("Successfully printed number.");
}

Once you use threads, keep in mind that in D global variables are thread-local. You can declare them global to the whole process with the shared qualifier.

shared int countA;  // global counter
int countB;         // thread-local counter

Tasks

This is about the lightweight thread variant, which is managed by the program itself. It is also called user-level threads. Since the tasks are usually executed by a thread pool, they are run in parallel as well.

import std.algorithm, std.parallelism, std.range;

void main() {
    // Parallel reduce can be combined with
    // std.algorithm.map to interesting effect.
    // The following example (thanks to Russel Winder)
    // calculates pi by quadrature  using
    // std.algorithm.map and TaskPool.reduce.
    // getTerm is evaluated in parallel as needed by
    // TaskPool.reduce.
    //
    // Timings on an Athlon 64 X2 dual core machine:
    //
    // TaskPool.reduce:       12.170 s
    // std.algorithm.reduce:  24.065 s

    immutable n = 1_000_000_000;
    immutable delta = 1.0 / n;

    real getTerm(int i)
    {
        immutable x = ( i - 0.5 ) * delta;
        return delta / ( 1.0 + x * x ) ;
    }

    immutable pi = 4.0 * taskPool.reduce!"a + b"(
        std.algorithm.map!getTerm(iota(n))
    );
}

This is the same mechanism that is used by Google Go, Rust, Erlang, and others. Try the Vibe.d web framework for an integrated environment in this style. However, D is not specialized for this and permits some mistakes. If a task calls a blocking function (e.g. read a file), the executing thread blocks. Since the maximal parallelism is defined by number of running threads in the underlying pool, this might leave a core idle, even if there are more tasks to run. Specialized languages can simply start another thread, to compensate for the blocking. In D pool size maintenance must be done manually and is probably forgotten in most cases. You are fine, if you can restrict yourself to asynchronous IO.

Synchronization

Once you have parallelism, usually concurrency arises and the need for synchronization. Of course, D provides the ususal suspects: Locks, semaphores, atomic variables.

Memory Model

Currently, D has no specified memory model. However, you can assume that it will follow the intentions of C++ memory model. This means: Sequential consistency for data-race-free programs. In other words: Properly synchronize your stuff with locks and stuff from the standard library and everything will behave to your intuition. It only becomes difficult, if your requirements are tighter such that lower-level mechanisms must be used.