Raptor
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
targeting in-memory computing / processing-in-memory (PIM) architectures.
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
target-specific artifacts (currently JSON code for the pimsim-nn simulator).
Overview
PIM architectures perform most of the computation directly in memory.
Raptor's first supported target is pimsim-nn, which simulates a chip with:
- a shared host memory,
- a number of cores that do most of the computation directly in their memory (vector ops, vmm/mvm on ReRAM crossbars),
- no branching instructions (branchless architecture) and no hardware loop support — any repeated work (e.g. convolutions) must be unrolled into explicit per-iteration instructions.
Because of this, the amount of emitted instructions explodes quickly and the compiler must optimize aggressively at every stage to keep compilation tractable.
A second target, PulPim, is planned for an accelerator with RISC-V cores
each carrying its own in-memory computing unit and crossbars. It will live in
a dedicated dialect (future work).
Targets and simulators
pimsim-nn (under backend-simulators/pim/pimsim-nn) is used for
performance estimates (latency, energy), but does not functionally execute
the JSON code it consumes. To validate the numerical correctness of the JSON
code produced by Raptor (or, for comparison, by the pimcomp compiler), we use
a Rust simulator we maintain in-tree at
backend-simulators/pim/pim-simulator.
Compilation pipeline
The PIM-related sources live under src/PIM and the tests under test/PIM.
When working on this codebase, most changes should stay confined to those
trees (you only need to look outside, e.g. at onnx-mlir or llvm, for
framework-level details).
High-level lowering flow:
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM JSON
-
ONNX → Spatial (
src/PIM/Conversion/ONNXToSpatial). Lowers ONNX ops into thespatdialect (src/PIM/Dialect/Spatial). Spatial models a high-level spatial in-memory accelerator: vmm/mvm operations are accelerated by storing a constant RHS matrix into a crossbar. Crossbars cannot be re-programmed during execution, have a limited fixed size, and there is a limited number of them per core. Conversion patterns are split by op family underConversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}(Conv, Gemm, MatMul, Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather, Reshape, Resize, Split). -
Spatial → Pim (
src/PIM/Conversion/SpatialToPim). Lowers Spatial to thepimdialect (src/PIM/Dialect/Pim), which materializes PIM cores (pim.core), inter-core communication (pim.send/pim.receive), halts, and crossbar-level operations. -
Merge compute nodes (
src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes). A DCP-inspired heuristic (Dynamic Critical Path — see the original scheduling paper by Kwok & Ahmad, DCP-eScience2007) that coarsens the virtual node graph and decides how to group compute nodes onto cores. Our implementation is only DCP-inspired: it is a heuristic with different assumptions from the paper (different cost model, constraints from crossbar capacity / core resources, and a windowed coarsening loop instead of full-graph reprioritization). Thedcp-critical-window-sizeoption controls how many lowest-slack virtual nodes each coarsening iteration considers (0 = legacy full-graph analysis). Related sources:DCPGraph/DCPAnalysis.cpp,Graph.cpp/.hpp,MergeComputeNodesPass.cpp. -
Bufferization (
src/PIM/Dialect/Pim/Transforms/Bufferization). Converts tensor-semantics PIM IR into memref-semantics PIM IR using the standard MLIRBufferizableOpInterfacemachinery (OpBufferizationInterfaces.*,PimBufferization.td). -
PIM code generation (
src/PIM/Pass/PimCodegen):HostConstantFolding— folds host-side constants.MaterializeHostConstantsPass— materializes the remaining host constants for emission.VerificationPass— checks invariants before emission.EmitPimJsonPass— emits the final PIM JSON consumed bypimsim-nnandpim-simulator.
Supporting pieces:
src/PIM/Compiler— PIM-specific compiler options (crossbar size/count, core count, DCP window, experimental conv impl, concat error handling, …) andPimCodeGenentry points.src/PIM/Common— shared utilities (PimCommon,LabeledList).src/PIM/Pass— auxiliary passes (MessagePass,CountInstructionPass) and thePIMPasses.hregistry used byPimAccelerator.src/PIM/PimAccelerator.{cpp,hpp}— accelerator entry point: registers dialects, passes, and plugs Raptor into the ONNX-MLIR driver.
Key compiler options
Pass these on the onnx-mlir command line when compiling for PIM:
--maccel=PIM— select the PIM accelerator.--EmitSpatial/--EmitPim/--EmitPimBufferized/--EmitPimCodegen— stop the pipeline at the requested stage (default:EmitPimCodegen).--pim-only-codegen— assume the input is already bufferized PIM IR and run only the codegen tail.--crossbar-size=<N>/--crossbar-count=<N>— crossbar dimensions and per-core count.--core-count=<N>— number of cores (-1picks the minimum).--dcp-critical-window-size=<N>— DCP coarsening window (0 = legacy).--use-experimental-conv-impl— alternative convolution lowering.--ignore-concat-error— soft-fail corner case inConcatOp.
Validation
Functional validation lives in validation/ and drives the Rust
pim-simulator to compare Raptor's output against a reference.
Per-operation validation (from validation/):
validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
--onnx-include-dir ../onnx-mlir/include
End-to-end network validation (example: first 4 layers of YOLOv11n):
validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
--onnx-include-dir ../onnx-mlir/include \
--operations-dir ./networks/yolo11n/depth_04 \
--crossbar-size 2048
Available networks under validation/networks/: vgg16, yolo11n.
Available operations under validation/operations/: add, conv, div,
gather, gemm, gemv, mul, pool, reduce_mean, relu, resize,
sigmoid, softmax, split.
Rebuilding
Release build (fast):
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
A slower debug build is also available — configure it the same way but with
-DCMAKE_BUILD_TYPE=Debug (see installation instructions below).
Build
Protobuf
Use the following commands to install protobuf:
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
cd protobuf
mkdir build
cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
ninja
sudo ninja install
You can now remove the protobuf repo directory with:
cd ../..
rm -rf protobuf
Mlir
Follow the first part of instructions here to build mlir.
Remember to set -DCMAKE_BUILD_TYPE=Debug for developing on Raptor
Moreover, if compiling with build type debug, it is also suggested to use mold as linker (you will need to install it if you don't have it already) to reduce memory usage during linking. You can use it by setting the options:
-DLLVM_USE_LINKER=mold
Raptor
Use the following commands to build Raptor.
Remember to set -DCMAKE_BUILD_TYPE=Debug for developing on Raptor.
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage, setting the options:
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
git submodule update --init --recursive
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
mkdir build && cd build
cmake .. -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
cmake --build .
If the build fails because of protobuf missing uint definitions,
just patch the problematic files by adding #include <cstdint> to their includes.