# Raptor Raptor is a domain-specific MLIR compiler for neural networks in ONNX format, targeting in-memory computing / processing-in-memory (PIM) architectures. It extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR through custom MLIR dialects to simulator artifacts. The current target is the PIM simulator stack under `backend-simulators/pim`. Raptor emits binary per-core `.pim` instruction files by default, plus `memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON instruction files with `--pim-emit-json`. ## Overview PIM architectures perform most computation directly in memory. The supported target models a chip with: - shared host memory, - multiple PIM cores, - ReRAM crossbars for vector-matrix / matrix-vector work, - explicit communication between cores, - no hardware branch or loop support in emitted simulator code. Because repeated work such as convolutions is eventually made explicit, emitted instruction counts can grow quickly. Most compiler work therefore focuses on lowering, scheduling, memory layout, and code-generation optimizations. ### Targets and simulators - `backend-simulators/pim/pim-simulator` is the in-tree Rust functional simulator used by validation. It reads Raptor's `pim/` artifact directory and compares simulator output against native ONNX-MLIR execution. - `backend-simulators/pim/pimsim-nn` is the performance simulator submodule. The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and contain local paths; treat them as local utilities, not portable workflows. ## Compilation pipeline The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes them to ONNX-MLIR through generated shim directories under `onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`. High-level lowering flow: ``` ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts ``` 1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`). Lowers supported ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under `Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul, elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax, Concat, Gather, Reshape, Resize, and Split. 2. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`). Builds a compute graph, schedules it with the PEFT scheduler, and materializes the merge schedule into Spatial IR. Supporting scheduling code lives under `MergeComputeNodes/Scheduling`. 3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`). Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`), including `pim.core`, `pim.core_batch`, communication, tensor packing, global tensor materialization, and return-path normalization. 4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`). Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's bufferization interfaces. 5. **Static memory coalescing** (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`). Reuses compatible local memref allocations inside PIM cores before codegen. 6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and `src/PIM/Compiler`). Folds host constants, materializes remaining host constants, verifies PIM IR, emits `.pim` core files, writes weights, and writes `memory.bin` / `config.json`. Supporting pieces: - `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility helpers. - `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary instruction format, artifact writing, weight emission, and codegen entry points. - `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion pass. - `src/PIM/Pass` - pass registration and auxiliary passes. - `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point. ## Key compiler options Pass these to `onnx-mlir` when compiling for PIM: - `--maccel=PIM` - select the PIM accelerator. - `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`, `--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM default is `--EmitPimCodegen`. - `--core-count=` - required positive core count for PIM compilation. - `--crossbar-size=` - crossbar width/height. Default in code is `2`. - `--crossbar-count=` - crossbars per core. Default in code is `256`. - `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted value in the current code. - `--pim-only-codegen` - assume input is already bufferized PIM IR and only run the codegen tail. - `--pim-emit-json` - also emit `core_*.json` instruction files alongside `core_*.pim`. - `--use-experimental-conv-impl` - use the alternate convolution lowering. - `--ignore-concat-error` - soft-fail a ConcatOp corner case. Example: ```bash ./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \ --maccel=PIM --EmitPimCodegen \ --crossbar-size=2048 --crossbar-count=256 --core-count=1000 ``` This writes PIM artifacts under `/tmp/raptor/pim/`. ## Validation Functional validation lives in `validation/`. It compiles ONNX models, builds a native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs the Rust PIM simulator, and compares outputs. Python dependencies used by the validation scripts are `numpy`, `onnx`, and `colorama`. The simulator requires the Rust toolchain. Per-operation validation from the repository root: ```bash python3 validation/validate.py \ --raptor-path build_release/Release/bin/onnx-mlir \ --onnx-include-dir onnx-mlir/include \ --core-count 1000 ``` Validate one network or a subset by pointing `--operations-dir` at any directory containing `.onnx` files: ```bash python3 validation/validate.py \ --raptor-path build_release/Release/bin/onnx-mlir \ --onnx-include-dir onnx-mlir/include \ --operations-dir validation/networks/yolo11n/depth_04 \ --crossbar-size 2048 --crossbar-count 256 --core-count 1000 ``` Useful validation options: - `--simulator-dir ` - override the auto-detected `backend-simulators/pim/pim-simulator` path. - `--threshold ` - maximum allowed per-element output difference. - `--seed ` - RNG seed for generated inputs. - `--command-timeout-seconds ` - timeout for compiler, runner, and simulator subprocesses. - `--verbose` - print subprocess logs and average PIM pass timings. - `--clean` - remove generated validation artifacts and exit. Each validation run writes artifacts in the model workspace, for example under `validation/operations/gemm/small/`: - `inputs/` - generated input CSV files. - `outputs/` - native ONNX-MLIR reference outputs. - `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under `dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`. - `runner/` - generated reference runner source, build tree, and shared library. - `simulation/out.bin` - raw simulator output used for comparison. The compiler currently dumps dialect snapshots such as `spatial0.mlir`, `spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`, `pim2_coalesced.mlir`, `pim3_folded.mlir`, and `pim4_materialized.mlir` when an output directory is available. To rerun the simulator manually with tracing after validation has produced a `raptor/pim/` directory: ```bash cd backend-simulators/pim/pim-simulator cargo run --no-default-features --features tracing --release \ --package pim-simulator --bin pim-simulator -- \ -f /path/to/workspace/raptor/pim \ -o /path/to/workspace/simulation/out.bin \ -d ,,,,... ``` With `--features tracing`, the simulator writes per-core traces as `TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally computes the `-d` ranges from `raptor/pim/config.json` and model output shapes. Available validation networks under `validation/networks/`: `vgg16`, `yolo11n`, `yolo11nv2`. Available operation suites under `validation/operations/`: `add`, `concat`, `conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`, `reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`. Generated operation tests can be regenerated with: ```bash python3 validation/operations/gen_tests.py ``` ## Build Initialize submodules first: ```bash git submodule update --init --recursive ``` The project follows ONNX-MLIR's build requirements. The CI workflow documents the currently used versions and setup: - CMake 4.3.0 in CI, - LLVM/MLIR checked out under `onnx-mlir/llvm-project`, - Protobuf `v34.0`, - Rust stable for `pim-simulator`, - Python packages `numpy`, `onnx`, `colorama` for validation. ### Protobuf Install Protobuf if your system does not already provide a compatible version: ```bash git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf cmake -S protobuf -B protobuf/build -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -Dprotobuf_BUILD_TESTS=OFF cmake --build protobuf/build sudo cmake --install protobuf/build ``` You can then remove the temporary checkout: ```bash rm -rf protobuf ``` ### MLIR Follow the ONNX-MLIR instructions in `onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build expects `MLIR_DIR` to point at the MLIR CMake package, for example: ```bash MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir ``` If your LLVM build directory is named `build` instead of `build_release`, adjust the path accordingly. ### Raptor Configure a release build: ```bash MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir cmake -S . -B build_release -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DONNX_MLIR_ACCELERATORS=PIM \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DMLIR_DIR=${MLIR_DIR} ``` Configure a debug build similarly: ```bash MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir cmake -S . -B build_debug -G Ninja \ -DCMAKE_BUILD_TYPE=Debug \ -DONNX_MLIR_ACCELERATORS=PIM \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DMLIR_DIR=${MLIR_DIR} ``` For debug development, using `mold` can reduce link time and memory use: ```bash cmake -S . -B build_debug -G Ninja \ -DCMAKE_BUILD_TYPE=Debug \ -DONNX_MLIR_ACCELERATORS=PIM \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DMLIR_DIR=${MLIR_DIR} \ -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold" ``` Build the compiler with CMake: ```bash cmake --build ./build_release cmake --build ./build_debug ``` Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's configuration and generated shims stay consistent. If a build fails because Protobuf headers are missing fixed-width integer definitions, patch the affected Protobuf-generated files by adding `#include `. ## Tests The Rust simulator has its own tests: ```bash cd backend-simulators/pim/pim-simulator cargo test ``` ## Repository Layout - `src/PIM/` - PIM accelerator implementation. - `test/PIM/` - PIM C++ unit tests. - `validation/` - functional validation scripts, ONNX operation tests, network slices, and pimsim config generation. - `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator. - `backend-simulators/pim/pimsim-nn/` - performance simulator submodule. - `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN. - `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup for MLIR/Protobuf caching, building Raptor, and validation.