BXT

Back What Is in a Rust Allocator?

What Is in a Rust Allocator?

Recently I was reading the documentation forjemalloc and wondered what actuallygoes into a custom memory allocator for Rust. To add jemalloc, or any othercustom allocator, to a Rust application, one uses the#[global_allocator]macro. Conveniently, the documentation forGlobalAllocactually has a full example allocator for us to explore. It starts out asfollows:

const ARENA_SIZE: usize = 128 * 1024;const MAX_SUPPORTED_ALIGN: usize = 4096;#[repr(C, align(4096))] // 4096 == MAX_SUPPORTED_ALIGNstruct SimpleAllocator { arena: UnsafeCell<[u8; ARENA_SIZE]>, remaining: AtomicUsize, // we allocate from the top, counting down}

We begin by defining a constantarena size of128 kB, which will be statically allocated at startup. Arenas are often fasterbecause they make fewer memory allocation requests to the operating system andalso reduce memory fragmentation, at the cost of memory overhead due to theunused space in the arena. In almost every real world application, a memoryallocator would use many smaller arenas as opposed to one large one, and alsohave the capabilities to add additional arenas at runtime as the existing onesfill up.

Next we define the representation via repr. The representation describes thelayout of this type in memory. The default representation is Rust, which meansthe Rust compiler can do whatever it wants for optimization purposes. repr(C)lays out the memory like C/C++ do, which is useful for interoperability withother languages. It is also somewhat more predictable, as those layouts do notchange anymore. The Rustreference has a detailedbreakdown of the different layout options.

In this case we also define a memory alignment, which means we are saying thatthe memory address of the allocator should be a multiple of 4096. Rust will alsomake sure that each of the fields will be aligned in this way, and add paddingin between the fields if necessary. We do this because clients will laterrequest memory with a certain memory alignment themselves, and aligning theoverall array means any aligned offset inside the array will also be alignedautomatically.

The struct itself has two fields. The arena is a fixed size byte array, wrappedin an UnsafeCell.UnsafeCell is the base for all the regular, safe cells likeCell orRefCell, all ofwhich enable interior mutability. This is required becauseGlobalAlloc'salloc method takes a shared &self reference, not a mutable &mut self one,and returns a *mut u8 pointer into the interior array. UnsafeCell gives usthe required escape hatch to return a mutable pointer into a struct we only havea shared reference to.

The second field remaining is an offset from the start of the arena,effectively acting as a pointer into the arena. Because we are counting down"from the top," it starts out with the size of the arena, going down as memoryis allocated.The benefit of this approach is that checking if we have enoughmemory available to satisfy a request is as simple as checking if remaining isat least as large as the requested amount of memory.1

remaining is anAtomicUsize.Atomics are used to signal to both the compiler and the CPU that ordering ofmemory access is important to ensure correctness. If we were using a plainusize, the compiler or the CPU could decide that it would be faster to reorderreads or writes, compromising correctness. Rust inherits the C++ atomics memorymodel, which comes with a few ways of describing access restrictions, which wewill dig into in a bit. The Rustonomicon also has a wholesection on atomics with moredetails.2

[global_allocator]static ALLOCATOR: SimpleAllocator = SimpleAllocator { arena: UnsafeCell::new([0x55; ARENA_SIZE]), remaining: AtomicUsize::new(ARENA_SIZE),};

This is the bit that actually selects the globally used allocator, using theaforementioned#[global_allocator]macro. It is simply creating a static instance of the allocator, initializingthe arena with 0x55.3

unsafe impl Sync for SimpleAllocator {}

We then implement the Sync markertrait for our allocator,which does not have any methods, but simply tells the compiler that it is safeto share references to the allocator across thread boundaries. This is an unsafetrait, and it requires us to make sure that the resulting behaviour is actuallysound. We will see below how we actually accomplish this.

Now let us look at the implementation forGlobalAlloc,again an unsafe trait.

unsafe impl GlobalAlloc for SimpleAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let size = layout.size(); let align = layout.align();

The first required method isalloc,which requests some memory from the allocator according toLayout, expecting apointer to that memory. If there is no memory matching the request available, wereturn a null pointer.Layout has twomethods that are interesting to us,size andalign,which return the desired size and alignment respectively. The size is just anumber of bytes, and the alignment works just the same as we noted above whenlooking at repr.

    // \`Layout\` contract forbids making a \`Layout\` with align=0, or align not power of 2.        // So we can safely use a mask to ensure alignment without worrying about UB.        let align\_mask\_to\_round\_down \= !(align \- 1);

This comment mentions a contract, a requirement that is not captured outside ofdocumentation. We can assume that alignment is a non-zero power of two. This isimportant for the next step. We set up a bitmask that we will use for ensuringthe correct alignment.

Alignment is a power of two by contract, which means it's a single set bit. Bydecrementing it by one, we transform e.g. 8 (00001000) to 7 (00000111), andthen invert that to set all high bits up to and including the original one,11111000. We will then & this onto our memory address later, which acts as around down by setting all the lower bits to zero. Rounding down works because wemade sure to align the overall array, and are allocating from the top down, sothat rounding down introduces a gap "above" the newly allocated memory. If wewere counting from the bottom up, we would be rounding up instead to introduce agap "below."

    if align \> MAX\_SUPPORTED\_ALIGN {            return null\_mut();        }

At this point we just check if the client requested an alignment higher than wehave chosen to support, in which case we return a null pointer to signal that wecannot fulfill the request. Again, this is dependent on the alignment of thearena array.

    let mut allocated \= 0;        if self            .remaining            .fetch\_update(SeqCst, SeqCst, |mut remaining| {                if size \> remaining {                    return None;                }                remaining \-= size;                remaining &= align\_mask\_to\_round\_down;                allocated \= remaining;                Some(remaining)            })            .is\_err()        {            return null\_mut();        };

This is where the heavy lifting of allocation is done, and there is a lot tounpack here. We recall that remaining is anAtomicUsizedescribing an offset from the start of the arena, counting down as we allocatememory. Atomics rely on the underlying platform to maintain the correct order ofoperations, which is why they cannot be read or set like regular variables, buthave a special set of methods to access them. In this case we are using thefetch_updatemethod, which is a form ofread-write-modifyinstruction. As mentioned above, if we were not using an atomic, either thecompiler or the CPU itself could decide to reorder the different reads andwrites to the counter for performance optimizations, compromising correctness.

remaining starts out pointing at the start of the previous allocation, if any,so we just subtract the size of the new allocation. Note that this is also wherewe use our bitmask to round down the address to get the right alignment.

The function passed to thefetch_updatemethod returns Some(new_value), in this case Some(remaining), butfetch_update itself returns Result<old_value, Err>. Incidentally the passedfunction might also be called multiple times if there is contention from otherthreads. The mutable variable allocated in the outer scope allows us toextract the new value of remaining without re-fetching remaining and riskinga data race.

fetch_update also takes two arguments which in this case are both SeqCst,short for sequentially consistent. These are atomic memory orderings inheritedfrom C++'s memory model, and specify the level of synchronization required whenaccessing this memory. There are a few different levels of which this is thestrictest, guaranteeing that no reads and writes are reordered such that theyswitch order with the access to remaining. This is important to avoid dataraces that could lead to overlapping allocations or over-committing memory thatis not actually available.

    self.arena.get().cast::<u8\>().add(allocated)    }

We get a pointer to the start of the arena, add the offset of the newlyallocated region from the start of the arena, and then return it as a pointer tothat same region.

This is the reason for wrapping the arena in anUnsafeCell, itallows us to get a mutable pointer to its contents from a shared reference. Wecannot usestd::ptr::addr_of_mut!or just coerce a reference to the right type because we only have a sharedreference to self, and thus cannot derive a mutable reference without interiormutability. We could use aRefCell and itsas_ptrmethod though.4

unsafe fn dealloc(&self, \_ptr: \*mut u8, \_layout: Layout) {}

Releasing memory in this case is a no-op, which means that memory once allocatedis gone forever, even after the contained values have been dropped. In areal-world application, we would probably want to keep track of memory that hasbecome available again, so that we could use it again for new allocations.

And that is all that is to it. Of course this is a very simple allocator, but itis actually functional. It does not actually dynamically request memory from theoperating system, but that is actually a common pattern in embedded developmentwhere there is no operating system. Still, in those scenarios we would probablywant to reuse freed memory, which would require keeping track of which regionsin the arena are currently in use and which are not, if we can live with thememory overhead required to store that information.


source: https://blog.sulami.xyz/posts/what-is-in-a-rust-allocator/
https://bxt.org/rr4p1