blob: d17c4591b2e4040034f8d6dec10c2eaf2f301694 [file] [log] [blame]
# libpas - Phil's Awesome System
Libpas is a configurable memory allocator toolkit designed to enable adoption
of isoheaps. Currently, libpas's main client is WebKit's bmalloc project, where
it's used as a replacement for all of bmalloc's functionality (bmalloc::api,
IsoHeap<>, and Gigacage). Libpas's jit_heap API is also used by WebKit's
ExecutableAllocator.
# How To Build And Test
This section describes how to build libpas standalone. You'll be doing this a
lot when making changes to libpas. It's wise to run libpas's tests before
trying out your change in any larger system (like WebKit) since libpas tests
are great at catching bugs. If libpas passes its own tests then basic browsing
will seem to work. In production, libpas gets built as part of some other
project (like bmalloc), which just pulls all of libpas's files into that
project's build system.
Build and test:
./build_and_test.sh
Just build:
./build.sh
Just test:
./test.sh
Clean:
./clean.sh
On my M1 machine, I usually do this:
./build_and_test.sh -a arm64e
This avoids building fat arm64+arm64e binaries.
The libpas build will, by default, build (and test, if you're use
`build_and_test.sh` or `test.sh`) both a testing variant and a default variant.
The testing variant has testing-only assertions. Say you're doing some
speed tests and you just want to build the default variant:
./build.sh -v default
By default, libpas builds Release, but you can change that:
./build.sh -c Debug
All of the tools (`build.sh`, `test.sh`, `build_and_test.sh`, and `clean.sh`)
take the same options (`-h`, `-c <config>`, `-s <sdk>`, `-a <arch>`,
`-v <variant>`, `-t <target>`, and `-p <port>`).
# Synopsis
Libpas is a toolkit for creating memory allocators. When built as a dylib, it
exposes a bunch of different memory allocators, with different configurations
ranging from sensible to deranged, without overriding malloc and friends. For
production use, libpas is meant to be built as part of some larger malloc
project; for example, when libpas sees the PAS_BMALLOC macro, it will provide
everything that WebKit's bmalloc library needs to create an allocator.
Libpas's toolkit of allocators has three building blocks:
- The segregated heap. This implements something like simple segregated
storage. Roughly speaking, size classes hold onto collections of pages, and
each page contains just objects of that size. The segregated heap also has
some support for pages having multiple size classes, but this is intended as
a "cold start" mode to reduce internal fragmentation. The segregated heap
works great as an isoheap implementation since libpas makes it easy and
efficient to create _lots_ of heaps, and heaps have special optimizations for
having only one size class. Segregated heaps are also super fast. They beat
the original bmalloc at object churn performance. A typical segregated
allocation operation does not need to use any atomic operations or fences
and only relies on a handful of loads, some addition and possibly bit math,
and a couple stores. Ditto for a typical segregated deallocation operation.
- The bitfit heap. This implements first-fit allocation using bit-per-minalign-
granule. Bitfit heaps are super space-efficient but not nearly as fast as
segregated heaps. Bitfit heaps have an upper bound on the sizes they can
handle, but can be configured to allocate objects up to hundreds of KB.
Bitfit heaps are not appropriate for isoheaps. Bitfit is used mostly for
"marge" allocations (larger than segregated's medium but smaller than large)
and for the jit_heap.
- The large heap. This implements a cartesian-tree-based first-fit allocator
for arbitrary sized objects. Large heaps are appropriate for isoheaps; for
example they remember the type of every free object in memory and they
remember where the original object boundaries are.
Each of the building blocks can be configured in a bunch of ways. Libpas uses
a C template programming style so that the segregated and bitfit heaps can be
configured for different page sizes, minaligns, security-performance
trade-offs, and so on.
All of the heaps are able to participate in physical page sharing. This means
that anytime any system page of memory managed by the heap becomes totally
empty, it becomes eligible for being returned to the OS via decommit. Libpas's
decommit strategy is particularly well tuned so as to compensate for the
inherent memory overheads of isoheaping. Libpas achieves much better memory
usage than bmalloc because it returns pages sooner than bmalloc would have, and
it does so with neglibile performance overheads. For example, libpas can be
configured to return a page to the OS anytime it has been free for 300ms.
Libpas is a heavy user of fine-grained locking and intricate lock dancing. Some
data structures will be protected by any of a collection of different locks,
and lock acquisition involves getting the lock, checking if you got the right
lock, and possibly relooping. Libpas's algorithms are designed around:
- Reducing the likelihood that any long-running operation would want to hold a
lock that any frequently-running operation would ever need. For example,
decommitting a page requires holding a lock and making a syscall, but the
lock that gets held is one that has a low probability of ever being needed
during any other operation. That "low probability" is not guaranteed but it
is made so thanks to lots of different tricks.
- Reducing the likelihood that two operations that run in a row that both grab
locks would need to grab different locks. Libpas has a bunch of tricks to
make it so that data structures that get accessed together dynamically adapt
to using the same locks to reduce the total number of lock acquisitions.
Finally, libpas is designed for having lots of tests, including unit and mock
tests. While libpas itself is written in C (to reduce friction of adoptiong it
in either C or C++ codebases), the test suite is written in C++. Ideally
everything that libpas does has a test.