abusing Rust's type system for sound bounds check elision

published on 2019-03-13

Memory safety enforced through borrow checking is one Rust's main selling points over traditional system's programming languages like C and C++. Amongst other things, it ensures that one can point at values within datastructures with the guarantee that by the time you follow that so-called reference, the datastructure will still exist and further, that the value you have a reference to has not been changed.

Consider this example. It allocates memory on the heap for at least six integers, creates a reference to one of these integers and tries to derefence that reference after dropping (also known as freeing or deallocating) the vector.

let my_vec = vec![1, 1, 2, 3, 5, 8];
let my_reference = &my_vec[4];
std::mem::drop(my_vec);
assert!(*my_reference == 5);

The Rust compiler will throw the following error for this code:

error[E0505]: cannot move out of `my_vec` because it is borrowed
  |
1 |     let my_reference = &my_vec[4];
  |                         ------ borrow of `my_vec` occurs here
2 |     std::mem::drop(my_vec);
  |                    ^^^^^^ move out of `my_vec` occurs here
3 |     assert!(*my_reference == 5);
  |             ------------- borrow later used here

And for good reason too! The memory that my_reference is pointing to has been returned to the operating system by the call to std::mem::drop and anything may have happened to that memory. It may have been reclaimed by another thread of our program and overwritten with an arbitrary sequence of bytes, or, if we're lucky, it will be outside the program's assigned range of virtual memory, in which case dereferencing my_reference would be an illegal operation and crash our program. The compiler saved us from making a so-called use after free error, a common class of errors in systems programming languages.

While that's great, these safety measures can also be limiting. Suppose we want to implement a vector-backed binary tree. We may try something like this first:

struct Tree {
    inner: Vec>,
}

struct Node {
    val: T,
    left_child: Option<&Node<t>>,
    right_child: Option<&Node</t>>,
}

but the compiler won't accept it. It cannot automatically infer that a Node's references to its children will live inside the same vector as the node itself. Therefore, it may very well be possible that they are dropped or mutated by the time we dereference them. We could try to make this explicit by adding a reference to the tree from every node and state that every child node should live at least as long as the tree the node is in. One could attempt to use explicit lifetimes for this:

struct Node<'tree, T> {
    containing_tree: &'tree Tree<t>,
    val: T,
    left_child: Option<&'tree Node<'tree, T>>,
    right_child: Option<&'tree Node<'tree, T>>,
}

Indeed, this would guarantee that the children of this node live long enough to prevent use after free errors. However, by adding a reference to the tree we have created new problems! Recall that the borrow checker does not allow us to mutate data that has live references to it, so creating a Node with a reference to a tree would prevent us from adding it to the tree as that involves mutation of the tree! It seems that references are fundamentally incompatible with self-referential datastructures.

beyond references

So what can we do? There are three common strategies for dealing with self-referential datastructures in Rust. One is to disable the guard rails and work with raw pointers as one would in C. The borrow checker won't check whether the pointers point to live memory, or whether data is mutated while it is being referenced. The burden of maintaining soundness and correctness of the code is then left entirely to the programmer. This approach works, but opens us up to many of the dangerous errors that Rust seeks to prevent. We'd rather not take that route.

A safer strategy is to use runtime borrow checking mechanisms. Instead of using references and having the compiler proving them correct, we can use concepts like reference counting and interior mutability to enforce memory safety at runtime. In a nutshell, reference counting involves tracking how many references exists during the execution of the program. Every time we create a reference to a value, its reference counter is incremented and whenever a reference is dropped, we decrement it. When the counter drops to zero, we can safely return the memory back to the operating system. Since each node manages its own memory, we don't even need a backing struct anymore:

use std::rc::Rc;
use std::cell::RefCell;

struct Tree</t> {
    val: T,
    left_child: Option<rc <RefCell<Tree<T>>>>,
    right_child: Option</rc>>>>,
}

This totally works. The RefCell wrapper takes care of interior mutability, which means that we can mutate values inside it even when there are multiple references to the cell. To ensure no BadStuff™ happens, it keeps track of the number of active readers and writers of the value and makes sure that no reading and writing happens at the same time. When it does detect that, it panics, safely aborting the program with an error message. This is essentially borrow checking, but done emperically instead of with a compiler proof.

Note that we have strayed somewhat from our initial goal. We wanted to implement a vector-backed tree, but ended up with a tree that allocates for every node and has to perform reference counting and borrow checking at runtime. Is there a way to build an efficient tree while staying within the realm of safe Rust?

intro indices

Fortunately, there is. Instead of using references to point to nodes in the tree, we can use indices. Indices are sort of like references in that refer to a single value within a datastructure, but they aren't subject to the compiler's borrow checker. That means we won't have the same issues with mutability we had with references. To the compiler, indices are just meaningless values like any other. In order to read or write values inside a datastructure using indices, we can upgrade them to references temporarily and use those references to read or write data. In this way, you can think of indices as weak references. Until they are actually used to create references, no restriction is placed on the usage of the tree and its nodes, and whenever we perform an upgrade, the compiler protects us from using them in an unsafe manner with its borrow checker.

Let's have another shot at our vector-backed tree:

struct Tree {
    inner: Vec>,
}

impl<t> Tree</t> {
    fn index_mut(&mut self, ix: usize) -> &mut Node<t> {
        &mut self.inner[ix]
    }
}

struct Node</t> {
    val: T,
    left_child: Option<usize>,
    right_child: Option</usize>,
}

impl Node<t> {
    fn new(val: T) -> Self {
        Self {
            val,
            left_child: None,
            right_child: None,
        }
    }
}

And confirm that it works as intended:

let mut tree = Tree {
    inner: vec![
        Node::new(1), Node::new(1), Node::new(2),
        Node::new(3), Node::new(5), Node::new(8)
    ],
};
// create two coexistent indices, no problem
let ix1 = 1;
let ix2 = 4;

// create a mutable reference, use it to update an internal value,
// and then immediately drop the reference
tree.index_mut(ix1).left_child = Some(5);

// create a second mutable reference, which is fine since the first
// one was dropped already
let my_mut_ref = tree.index_mut(ix2);

// this the compiler won't allow since there is already a live reference
// to the tree which can't be dropped immediately since it's used below
let another_mut_ref = tree.index_mut(ix1);

my_mut_ref.val = 32;

That's great, but there is one more remaining problem: since we are able to create arbitrary indices, we can also create indices to nodes that don't exist in the backing vector. We therefore need to check whether the provided index is valid every time we upgrade one to a reference. This is called bounds checking and is a form of runtime audit. It is done implicitly and automatically whenever indexing a vector through the square bracket notation. Its overhead is much, much lower than that of reference counting and interior mutability checks. Still, there are situations in which even this small overhead is unnecessary by exploiting Rust's type system to guarantee index validity.

modules and opaque types

One way to guarantee that indices are valid for the lifetime of the tree is to only create indices whenever an item is inserted and to make sure that items are never removed. Luckily, the Rust module system provides a way to implement this. We can introduce a new index type in a module without a constructor. Outside of that module, it will then be impossible to create a value of this index type without resorting to unsafe.

mod our_tree {
    /// Zero cost wrapper around usize.
    pub struct TreeIndex(usize);

    pub struct Tree</t> {
        inner: Vec>,
    }

    impl Tree<t> {
        // Index function. Note that it takes a `TreeIndex` now instead of
        // a `usize`.
        pub fn index_mut(&mut self, ix: TreeIndex) -> &mut Node</t> {
            &mut self.inner[ix.0]
        }

        // Inserts a value into the tree and returns its index.
        pub fn insert(&mut self, node: Node<t>) -> TreeIndex {
            let len = self.inner.len();
            self.inner.push(node);
            TreeIndex(len)
        }

        // We also need a function creating new trees since this type
        // is now also in a module.
        pub fn new() -> Self {
            Self {
                inner: Vec::new(),
            }
        }
    }

    pub struct Node</t> {
        val: T,
        left_child: Option<treeindex>,
        right_child: Option</treeindex>,
    }

    impl<t> Node</t> {
        pub fn new(val: T) -> Self {
            Self {
                val,
                left_child: None,
                right_child: None,
            }
        }
        
        pub fn get_val(&self) -> &T {
            &self.val
        }
    }
}

In the above implementation, the only way to safely construct a TreeIndex is to call insert on a tree, which returns that value's index. Since the tree does not have any deletion methods, this index will correspond to a valid node for the tree's entire existence.

let mut my_tree = our_tree::Tree::new();

// get an index from an insertion
let my_ix: our_tree::TreeIndex = my_tree.insert(our_tree::Node::new(1i32));

// retrieve value using index
assert!(1i32 == *my_tree.index_mut(my_ix).get_val());

// try creating an index ourselves - doesn't work!
let bad_ix = our_tree::TreeIndex(3);

The compiler will throw the following error, preventing us from creating an invalid index:

error[E0603]: tuple struct `TreeIndex` is private
   |
   |     let bad_ix = our_tree::TreeIndex(3);
   |                            ^^^^^^^^^

So that means we're done right? We can elide the check on the validity of indices whenever retrieving values, since we have convinced ourselves that they will always be valid! Not quite yet, unfortunately. While it is true that indices will always be valid for the tree that created them, the same is not true for other trees! For example, we could do this:

let mut empty_tree = our_tree::Tree::new();
let mut non_empty_tree = our_tree::Tree::new();
let ix = non_empty_tree.insert(our_tree::Node::new("hello world"));

// try to retrieve value from `empty tree` using `non_empty_tree`'s
// index!
let _: &&str = empty_tree.index_mut(ix).get_val();

This compiles fine since all the types check out and no borrowing rules are violated, but executing the code results in a pacic!

Thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /rustc/2aa4c46cfdd726e97360c2734835aa3515e8c858/src/libcore/slice/mod.rs:2461:14

The validity check that trips this panic is still very much essential to guarantee memory safety. Having an append-only datastructure together with insertion generated indices is apparently not enough. Somehow we need to encode for every index what tree originally created it and then enforce that it can only be used in to index nodes on that specific tree. Is there a way to do this is a way that is enforced by the type checker and adds no runtime overhead?

no two closures are alike

This is where things get tricky. It turns out that we can, but have to rely on some peculiarities of Rust's type system. In particular, we'll exploit the fact that no two closures are considered identical. No two distinct functions even have the same type! This is somewhat at odds with most people's intuition, as it conflicts with the very natural function extensionality principle, which states that every two mathematical functions are identical whenever they have same signature and they agree on all inputs. Note that not even the concepts of signature and function type align here.

Consider for example the following example:

fn compare_funcs_on_zero<f>(f1: F, f2: F) -> bool
    where F: Fn(i32) -> i32
{
    f1(0) == f2(0)
}

compare_funcs_on_zero(|x| x + 17, |x| x + 17);

The rust compiler won't compile this, even though the closures are semantically completely identical.

error[E0308]: mismatched types
   |
7  | compare_funcs_on_zero(|x| x + 17, |x| x + 17);
   |                                   ^^^^^^^^^^ expected closure, found a different closure
   |
   = note: expected type `[closure@src/main.rs:7:23: 7:33]`
              found type `[closure@src/main.rs:7:35: 7:45]`
   = note: no two closures, even if identical, have the same type
   = help: consider boxing your closure and/or using it as a trait object

Interesting, but it's not immediately clear how we can exploit this to solve our index problem. To see that, consider the possibility of associating a closure to a datatype. That datatype would become generic with respect to that closure type, just as it would with any other type. The only difference is that closure's aren't actually stored as data inside the type, the function just gets associated to it.

This generic type, for example,

struct MyDataType</f> {
    func: F,
    val: i32,
}

impl<f> MyDataType</f>
    where F: Fn(i32) -> i32
{
    fn get_fn_val(&self) -> i32 {
        (self.func)(self.val)
    }
}

could be considered as a simple data type, with an associated function func_F for every instantiation of MyDataType with function type F:

struct MyDataType {
    val: i32,
}

fn func_F(val: i32) -> i32;

impl MyDataType {
    fn get_fn_val(&self) -> i32 {
        func_F(self.val)
    }
}

In other words, the functions that look like they are part of datatype are actually stored inside the program itself. In this sense, there is no memory overhead to adding a generic closure to a datatype.

The big trick is now to associate to every tree an arbitrary closure and have it generate indices with its own associated closure. If we then only accept indices with the right associated closure to access our vector, we can guarantee that only indices created by the tree itself can be used on that tree. This sounds really abstract, so let's make it more concrete.

First, we add closures to our tree and indices:

mod our_tree {
    pub struct Tree
    {
        inner: Vec>,
        _f: F,
    }

    pub struct TreeIndex {
        inner: usize,
        _f: F,
    }
…

We're not really interested in calling these closures ever, and to signal that we can make them act on an empty types. Since there is no way to create a value of an empty type, there is no way to call the closure.

pub enum Empty {}

    impl<t , F> Tree</t>
        where F: Copy + Fn(Empty) -> (),
    {
        // Note that closure type F of the index is equal to the closure
        // type of this struct
        pub fn index_mut(&mut self, ix: TreeIndex) -> &mut Node<t , F> {
            &mut self.inner[ix.inner]
        }

        // Inserts a value into the tree and returns an index with the
        // same closure type as this struct
        pub fn insert(&mut self, node: Node</t>) -> TreeIndex {
            let len = self.inner.len();
            self.inner.push(node);
            TreeIndex { inner: len, _f: self._f }
        }
        
        pub fn new(_f: F) -> Self {
            let inner = Vec::new();
            Self { inner, _f }
        }
    }
    
    pub struct Node {
        val: T,
        left_child: Option<treeindex <F>>,
        right_child: Option</treeindex>>,
    }

    impl<t , F> Node</t> {
        pub fn new(val: T) -> Self {
            Self {
                val,
                left_child: None,
                right_child: None,
            }
        }
        
        pub fn get_val(&self) -> &T {
            &self.val
        }
    }
}

And that should do it. Let's try it out:

// create a binary tree from a dummy closure
let mut tree1 = our_tree::Tree::new(|_|());
// get an index with the same associated dummy closure
let ix1 = tree1.insert(our_tree::Node::new(5i32));
// do lookup with that index
assert!(5 == *tree1.index_mut(ix1).get_val());

// and do the same for another tree
let mut tree2 = our_tree::Tree::new(|_|());
let ix2 = tree2.insert(our_tree::Node::new(3i32));
assert!(3 == *tree2.index_mut(ix2).get_val());

// try using the index of one tree for the other
tree2.index_mut(ix1);

As we hoped, this throws an error for the last line:

error[E0308]: mismatched types
   |
73 | tree2.index_mut(ix1);
   |                 ^^^ expected closure, found a different closure
   |
   = note: expected type `our_tree::TreeIndex<[closure@src/main.rs:68:45: 68:50]>`
              found type `our_tree::TreeIndex<[closure@src/main.rs:61:45: 61:50]>`
   = note: no two closures, even if identical, have the same type
   = help: consider boxing your closure and/or using it as a trait object

The message is super cryptic and would probably thoroughly confuse you if you stumbled upon it without going through the above, but it's exactly what we wanted! This means that we can finally elide the index validity check without fear of introducing memory unsafety:

        pub fn index_mut(&mut self, ix: TreeIndex) -> &mut Node {
            unsafe {
                self.inner.get_unchecked_mut(ix.inner)
            }
        }

An unsafe block is used internally, but that's only necessary because we can't prove our invariants to the compiler in the language of Rust. The method itself should be safe because of the invariants that we established earlier:

Finally recall that the closures are only used to enforce the relation between trees and indices and are never actually used during program execution. It's even likely that the optimizer recognizes that they are never called and removes the functions from the resulting binary altogether. Thus, they introduce no runtime overhead, constituting a zero-cost abstraction.

final tally

What does all this work net us in terms of performance? Very little, surprisingly. It saves us from doing a bounds check (or branch, in jargon) on index access, but given that modern CPUs are masters of branch prediction, speculating what the outcome of a check will be and continuing execution of code as if that result were given, and that there is no branch as predictable as a bounds check, performance improvements will be in the low single digit percentages.

On the other hand, there is a substantial increase in complexity, as well as an ingress of other problems. For example, because of the additional closure type, it's no longer possible to store an arbitrary number of trees in other datastructures. And when we add a fixed number of trees to a datastructure, that datastracture itself becomes generic with respect to these closure types, quickly creating an unmaintainable mess.

So while it's definitely cool to see that our concept works, let us avoid the classic scientist pitfall. After being so preoccupied with whether or not we could, we should probably stop to think if we should.

errata update (2018-03-22)

After submitting this post to the Rust subreddit, the competent people there found a vulnerability in the above construction. While the logic should be sound when two trees are passed distinct closures, it fails whenever they are passed the same. This snippet compiles eventhough it is unsound, for example:

let closure = |_| ();

let mut tree1 = our_tree::Tree::new(closure);
let ix1 = tree1.insert(our_tree::Node::new(5i32));
let mut tree2 = our_tree::Tree::::new(closure);
// try using the index of one tree for the other
tree2.index_mut(ix1);

The problem is that closures can be cloneable or even copyable, which means that we can a single closure to construct multiple trees, which will have interchangable indices. To solve this, we'd need a way to encode in type closures that are not clonable. Another problem, pointed out by reddit user /u/matthieum, is that considers closures created at the same location to have the same type. Therefore, even the following would work.

fn creator<t>() -> Tree</t> ()> {
    Tree::new(|| ())
}

let mut tree1 = creator();
let ix1 = tree1.insert(Node::new(5i32));

let mut tree2 = creator::();
// try using the index of one tree for the other
tree2.index_mut(ix1);

If we were truly determined to this approach work, one of the last feasible options seems to be the restriction of tree instantiation to a creator function which takes an arbitrary mutable reference to some value and capture it in the closure we pass to the tree.

fn creator<'a, T>(x: &'a mut ()) -> our_tree::Tree ()> {
    our_tree::Tree::new(move |_| *x)
}

This would guarantee that each function would have a separate type. The price paid is even worse ergonomics, since we wouldn't just be hauling around a closure, but also a reference to dummy value and all the borrow checker restrictions that come with it.

Fortunately, there are more viable alternatives. A few years prior /u/Gankro created a post demonstrating a similar approach, using lifetimes instead of closures to encode index validity that seems to work better. That proof of concept has since been made into the indexing crate.