# Raptor Raptor is a domain-specific MLIR compiler for neural networks (ONNX format) targeting in-memory computing / processing-in-memory (PIM) architectures. It progressively lowers ONNX-MLIR through a set of MLIR dialects down to target-specific artifacts (currently JSON code for the `pimsim-nn` simulator). ## Overview PIM architectures perform most of the computation directly in memory. Raptor's first supported target is `pimsim-nn`, which simulates a chip with: - a shared host memory, - a number of cores that do most of the computation directly in their memory (vector ops, vmm/mvm on ReRAM crossbars), - no branching instructions (branchless architecture) and no hardware loop support — any repeated work (e.g. convolutions) must be unrolled into explicit per-iteration instructions. Because of this, the amount of emitted instructions explodes quickly and the compiler must optimize aggressively at every stage to keep compilation tractable. A second target, `PulPim`, is planned for an accelerator with RISC-V cores each carrying its own in-memory computing unit and crossbars. It will live in a dedicated dialect (future work). ### Targets and simulators `pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for **performance** estimates (latency, energy), but does not functionally execute the JSON code it consumes. To validate the numerical correctness of the JSON code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use a Rust simulator we maintain in-tree at `backend-simulators/pim/pim-simulator`. ## Compilation pipeline The PIM-related sources live under `src/PIM` and the tests under `test/PIM`. When working on this codebase, most changes should stay confined to those trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for framework-level details). High-level lowering flow: ``` ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM JSON ``` 1. **ONNX → Spatial** (`src/PIM/Conversion/ONNXToSpatial`). Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`). Spatial models a high-level spatial in-memory accelerator: vmm/mvm operations are accelerated by storing a constant RHS matrix into a crossbar. Crossbars cannot be re-programmed during execution, have a limited fixed size, and there is a limited number of them per core. Conversion patterns are split by op family under `Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul, Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather, Reshape, Resize, Split). 2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`). Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which materializes PIM cores (`pim.core`), inter-core communication (`pim.send` / `pim.receive`), halts, and crossbar-level operations. 3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`). A DCP-inspired heuristic (Dynamic Critical Path — see the original scheduling paper by Kwok & Ahmad, [DCP-eScience2007](https://clouds.cis.unimelb.edu.au/papers/DCP-eScience2007.pdf)) that coarsens the virtual node graph and decides how to group compute nodes onto cores. Our implementation is only DCP-*inspired*: it is a heuristic with different assumptions from the paper (different cost model, constraints from crossbar capacity / core resources, and a windowed coarsening loop instead of full-graph reprioritization). The `dcp-critical-window-size` option controls how many lowest-slack virtual nodes each coarsening iteration considers (0 = legacy full-graph analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`, `MergeComputeNodesPass.cpp`. 4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`). Converts tensor-semantics PIM IR into memref-semantics PIM IR using the standard MLIR `BufferizableOpInterface` machinery (`OpBufferizationInterfaces.*`, `PimBufferization.td`). 5. **PIM code generation** (`src/PIM/Pass/PimCodegen`): - `HostConstantFolding` — folds host-side constants. - `MaterializeHostConstantsPass` — materializes the remaining host constants for emission. - `VerificationPass` — checks invariants before emission. - `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn` and `pim-simulator`. Supporting pieces: - `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count, core count, DCP window, experimental conv impl, concat error handling, …) and `PimCodeGen` entry points. - `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`). - `src/PIM/Pass` — auxiliary passes (`MessagePass`, `CountInstructionPass`) and the `PIMPasses.h` registry used by `PimAccelerator`. - `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers dialects, passes, and plugs Raptor into the ONNX-MLIR driver. ## Key compiler options Pass these on the `onnx-mlir` command line when compiling for PIM: - `--maccel=PIM` — select the PIM accelerator. - `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen` — stop the pipeline at the requested stage (default: `EmitPimCodegen`). - `--pim-only-codegen` — assume the input is already bufferized PIM IR and run only the codegen tail. - `--crossbar-size=` / `--crossbar-count=` — crossbar dimensions and per-core count. - `--core-count=` — number of cores (`-1` picks the minimum). - `--dcp-critical-window-size=` — DCP coarsening window (0 = legacy). - `--use-experimental-conv-impl` — alternative convolution lowering. - `--ignore-concat-error` — soft-fail corner case in `ConcatOp`. ## Validation Functional validation lives in `validation/` and drives the Rust `pim-simulator` to compare Raptor's output against a reference. Per-operation validation (from `validation/`): ``` validate.py \ --raptor-path ../cmake-build-release/Release/bin/onnx-mlir \ --onnx-include-dir ../onnx-mlir/include ``` End-to-end network validation (example: first 4 layers of YOLOv11n): ``` validate.py \ --raptor-path ../cmake-build-release/Release/bin/onnx-mlir \ --onnx-include-dir ../onnx-mlir/include \ --operations-dir ./networks/yolo11n/depth_04 \ --crossbar-size 2048 ``` Available networks under `validation/networks/`: `vgg16`, `yolo11n`. Available operations under `validation/operations/`: `add`, `conv`, `div`, `gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`, `sigmoid`, `softmax`, `split`. ## Rebuilding Release build (fast): ``` cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30 ``` A slower debug build is also available — configure it the same way but with `-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below). ## Build ### Protobuf Use the following commands to install protobuf: ``` git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf cd protobuf mkdir build cd build cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release ninja sudo ninja install ``` You can now remove the protobuf repo directory with: ``` cd ../.. rm -rf protobuf ``` ### Mlir Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir. Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor Moreover, if compiling with build type debug, it is also suggested to use mold as linker (you will need to install it if you don't have it already) to reduce memory usage during linking. You can use it by setting the options: ``` -DLLVM_USE_LINKER=mold ``` ### Raptor Use the following commands to build Raptor. Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor. Also in this case, it is suggested to use mold as linker to reduce link time and memory usage, setting the options: ``` -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold" ``` ``` git submodule update --init --recursive MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir mkdir build && cd build cmake .. -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DONNX_MLIR_ACCELERATORS=PIM \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DMLIR_DIR=${MLIR_DIR} cmake --build . ``` If the build fails because of protobuf missing uint definitions, just patch the problematic files by adding ```#include ``` to their includes.