It is well known that modern functional programming languages are naturally
amenable to parallel programming. Achieving efficient parallelism using
functional languages, however, remains difficult. Perhaps the most important
reason for this is their lack of support for efficient in-place updates, i.e.,
mutation, which is important for the implementation of both parallel algorithms
and the run-time system services (e.g., schedulers and synchronization
primitives) used to execute them.

In this paper, we propose techniques for efficient mutation in parallel
functional languages. To this end, we couple the memory manager with the thread
scheduler to make reading and updating data allocated by nested threads
efficient. We describe the key algorithms behind our technique, implement them
in the MLton Standard ML compiler, and present an empirical evaluation. Our
experiments show that the approach performs well, significantly improving
efficiency over existing functional language implementations.