Update README and AGENTS
Validate Operations / validate-operations (push) Waiting to run

This commit is contained in:
ilgeco
2026-05-27 15:09:30 +02:00
parent c6b02af7a9
commit 013ae0ac2a
2 changed files with 243 additions and 185 deletions
+2 -2
View File
@@ -1,7 +1,7 @@
- Always read the full README.md before doing anything.
- Build commands:
- `cmake --build ./build_release --target onnx-mlir -j 30`
- `cmake --build ./build_debug --target onnx-mlir -j 30`
- `cmake --build ./build_release`
- `cmake --build ./build_debug`
- Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache.
# Code changes
+236 -178
View File
@@ -1,168 +1,178 @@
# Raptor
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
targeting in-memory computing / processing-in-memory (PIM) architectures.
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator).
Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
targeting in-memory computing / processing-in-memory (PIM) architectures. It
extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
through custom MLIR dialects to simulator artifacts.
The current target is the PIM simulator stack under `backend-simulators/pim`.
Raptor emits binary per-core `.pim` instruction files by default, plus
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
instruction files with `--pim-emit-json`.
## Overview
PIM architectures perform most of the computation directly in memory.
Raptor's first supported target is `pimsim-nn`, which simulates a chip with:
- a shared host memory,
- a number of cores that do most of the computation directly in their memory
(vector ops, vmm/mvm on ReRAM crossbars),
- no branching instructions (branchless architecture) and no hardware loop
support — any repeated work (e.g. convolutions) must be unrolled into
explicit per-iteration instructions.
PIM architectures perform most computation directly in memory. The supported
target models a chip with:
- shared host memory,
- multiple PIM cores,
- ReRAM crossbars for vector-matrix / matrix-vector work,
- explicit communication between cores,
- no hardware branch or loop support in emitted simulator code.
Because of this, the amount of emitted instructions explodes quickly and the
compiler must optimize aggressively at every stage to keep compilation
tractable.
A second target, `PulPim`, is planned for an accelerator with RISC-V cores
each carrying its own in-memory computing unit and crossbars. It will live in
a dedicated dialect (future work).
Because repeated work such as convolutions is eventually made explicit, emitted
instruction counts can grow quickly. Most compiler work therefore focuses on
lowering, scheduling, memory layout, and code-generation optimizations.
### Targets and simulators
`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for
**performance** estimates (latency, energy), but does not functionally execute
the JSON code it consumes. To validate the numerical correctness of the JSON
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use
a Rust simulator we maintain in-tree at
`backend-simulators/pim/pim-simulator`.
- `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
simulator used by validation. It reads Raptor's `pim/` artifact directory and
compares simulator output against native ONNX-MLIR execution.
- `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
contain local paths; treat them as local utilities, not portable workflows.
## Compilation pipeline
The PIM-related sources live under `src/PIM` and the tests under `test/PIM`.
When working on this codebase, most changes should stay confined to those
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for
framework-level details).
The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
them to ONNX-MLIR through generated shim directories under
`onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.
High-level lowering flow:
```
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM code
ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
```
1. **ONNX Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`).
Spatial models a high-level spatial in-memory accelerator: vmm/mvm
operations are accelerated by storing a constant RHS matrix into a
crossbar. Crossbars cannot be re-programmed during execution, have a
limited fixed size, and there is a limited number of them per core.
Conversion patterns are split by op family under
`Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
Reshape, Resize, Split, etc...).
1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
Lowers supported ONNX ops into the `spat` dialect
(`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
`Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
Concat, Gather, Reshape, Resize, and Split.
2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`).
Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which
materializes PIM cores (`pim.core`), inter-core communication
(`pim.send` / `pim.receive`), halts, and crossbar-level operations.
2. **Merge compute nodes**
(`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
Builds a compute graph, schedules it with the PEFT scheduler, and materializes
the merge schedule into Spatial IR. Supporting scheduling code lives under
`MergeComputeNodes/Scheduling`.
3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
A PEFT heuristic that coarsens the virtual node graph and decides how to group compute
nodes onto cores. Our implementation is only DCP-*inspired*: it is a
heuristic with different assumptions from the paper (different cost
model, constraints from crossbar capacity / core resources, and a
windowed coarsening loop instead of full-graph reprioritization). The
`dcp-critical-window-size` option controls how many lowest-slack virtual
nodes each coarsening iteration considers (0 = legacy full-graph
analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
`MergeComputeNodesPass.cpp`.
3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
including `pim.core`, `pim.core_batch`, communication, tensor packing, global
tensor materialization, and return-path normalization.
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
Converts tensor-semantics PIM IR into memref-semantics PIM IR using the
standard MLIR `BufferizableOpInterface` machinery
(`OpBufferizationInterfaces.*`, `PimBufferization.td`).
Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
bufferization interfaces.
5. **Static memory coalescing** (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
Conservatively reuses same-typed local memref allocations inside PIM cores
after bufferization and before code generation.
5. **Static memory coalescing**
(`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
Reuses compatible local memref allocations inside PIM cores before codegen.
6. **PIM code generation** (`src/PIM/Pass/PimCodegen`):
- `HostConstantFolding` — folds host-side constants.
- `MaterializeHostConstantsPass` materializes the remaining host
constants for emission.
- `VerificationPass` — checks invariants before emission.
- `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
and `pim-simulator`.
6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
`src/PIM/Compiler`).
Folds host constants, materializes remaining host constants, verifies PIM IR,
emits `.pim` core files, writes weights, and writes `memory.bin` /
`config.json`.
Supporting pieces:
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count,
core count, DCP window, experimental conv impl, concat error handling, …)
and `PimCodeGen` entry points.
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`).
- `src/PIM/Pass` — auxiliary passes (`MessagePass`)
and the `PIMPasses.h` registry used by `PimAccelerator`.
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers
dialects, passes, and plugs Raptor into the ONNX-MLIR driver.
- `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
helpers.
- `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
instruction format, artifact writing, weight emission, and codegen entry
points.
- `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
pass.
- `src/PIM/Pass` - pass registration and auxiliary passes.
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.
## Key compiler options
Pass these on the `onnx-mlir` command line when compiling for PIM:
Pass these to `onnx-mlir` when compiling for PIM:
- `--maccel=PIM` select the PIM accelerator.
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen`
stop the pipeline at the requested stage (default: `EmitPimCodegen`).
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and
run only the codegen tail.
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and
per-core count.
- `--core-count=<N>` — number of cores. Required for PIM compilation.
- `--pim-merge-scheduler={peft,dcp}` — scheduler used by the Spatial
merge-compute-nodes pass (default: `peft`).
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy).
- `--use-experimental-conv-impl` alternative convolution lowering.
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`.
- `--maccel=PIM` - select the PIM accelerator.
- `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
`--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
default is `--EmitPimCodegen`.
- `--core-count=<N>` - required positive core count for PIM compilation.
- `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
- `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
- `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
value in the current code.
- `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
the codegen tail.
- `--pim-emit-json` - also emit `core_*.json` instruction files alongside
`core_*.pim`.
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.
Example:
```bash
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
--maccel=PIM --EmitPimCodegen \
--crossbar-size=2048 --crossbar-count=256 --core-count=1000
```
This writes PIM artifacts under `/tmp/raptor/pim/`.
## Validation
Functional validation lives in `validation/` and drives the Rust
`pim-simulator` to compare Raptor's output against a reference.
Functional validation lives in `validation/`. It compiles ONNX models, builds a
native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
the Rust PIM simulator, and compares outputs.
Per-operation validation (from `validation/`):
Python dependencies used by the validation scripts are `numpy`, `onnx`, and
`colorama`. The simulator requires the Rust toolchain.
```
validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
--onnx-include-dir ../onnx-mlir/include \
Per-operation validation from the repository root:
```bash
python3 validation/validate.py \
--raptor-path build_release/Release/bin/onnx-mlir \
--onnx-include-dir onnx-mlir/include \
--core-count 1000
```
End-to-end network validation (example: first 4 layers of YOLOv11n):
Validate one network or a subset by pointing `--operations-dir` at any directory
containing `.onnx` files:
```
validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
--onnx-include-dir ../onnx-mlir/include \
--operations-dir ./networks/yolo11n/depth_04 \
```bash
python3 validation/validate.py \
--raptor-path build_release/Release/bin/onnx-mlir \
--onnx-include-dir onnx-mlir/include \
--operations-dir validation/networks/yolo11n/depth_04 \
--crossbar-size 2048 --crossbar-count 256 --core-count 1000
```
Each validation run writes debugging artifacts into the benchmark's workspace
directory (for example `validation/operations/gemm/small/`):
- `inputs/` — generated input CSVs used for the run.
- `outputs/` — reference outputs dumped by the native ONNX runner.
- `raptor/` — compiler artifacts:
`*.onnx.mlir`, `dialects/spatial0.mlir`, `dialects/spatial1_dcp_merged.mlir`,
`dialects/pim0.mlir`, `dialects/pim1_buff.mlir`, `dialects/pim2_coalesced.mlir`,
`dialects/pim3_folded.mlir`, `dialects/pim4_materialized.mlir`,
`pim/config.json`, `pim/core_*.pim`, `pim/memory.bin`, and reports under
`raptor/reports/` such as `dcp_merge_report.txt`,
`memory_report.txt`, and `static_memory_coalescing_report.txt`.
- `runner/` — generated reference runner source, build tree, and shared library.
- `simulation/out.bin` — raw simulator output dump used for output comparison.
Useful validation options:
- `--simulator-dir <path>` - override the auto-detected
`backend-simulators/pim/pim-simulator` path.
- `--threshold <float>` - maximum allowed per-element output difference.
- `--seed <int>` - RNG seed for generated inputs.
- `--command-timeout-seconds <float>` - timeout for compiler, runner, and
simulator subprocesses.
- `--verbose` - print subprocess logs and average PIM pass timings.
- `--clean` - remove generated validation artifacts and exit.
That means you usually do not need to rerun standalone `--EmitSpatial` or
`--EmitPim` commands while debugging validation failures: the per-pass dialect
dumps are already available under `raptor/dialects/`.
Each validation run writes artifacts in the model workspace, for example under
`validation/operations/gemm/small/`:
- `inputs/` - generated input CSV files.
- `outputs/` - native ONNX-MLIR reference outputs.
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
`dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
- `runner/` - generated reference runner source, build tree, and shared library.
- `simulation/out.bin` - raw simulator output used for comparison.
The validator does not currently expose a simulator tracing flag, but once a
validation has produced `raptor/pim/` you can rerun the simulator manually with
tracing enabled:
The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
`spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
`pim2_coalesced.mlir`, `pim3_folded.mlir`, and
`pim4_materialized.mlir` when an output directory is available.
To rerun the simulator manually with tracing after validation has produced a
`raptor/pim/` directory:
```bash
cd backend-simulators/pim/pim-simulator
@@ -174,90 +184,138 @@ cargo run --no-default-features --features tracing --release \
```
With `--features tracing`, the simulator writes per-core traces as
`simulation/TraceCore0`, `simulation/TraceCore1`, ... next to `simulation/out.bin`.
The validator normally computes the `-d` dump ranges from `raptor/pim/config.json`
and the model output shapes. If you need a clean slate before rerunning, use:
`TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.
Available validation networks under `validation/networks/`: `vgg16`,
`yolo11n`, `yolo11nv2`.
Available operation suites under `validation/operations/`: `add`, `concat`,
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.
Generated operation tests can be regenerated with:
```bash
validate.py --clean
python3 validation/operations/gen_tests.py
```
Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
Available operations under `validation/operations/`: `add`, `conv`, `div`,
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
`sigmoid`, `softmax`, `split`.
## Rebuilding
Release build (fast):
```
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
```
A slower debug build is also available — configure it the same way but with
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).
## Build
Initialize submodules first:
```bash
git submodule update --init --recursive
```
The project follows ONNX-MLIR's build requirements. The CI workflow documents
the currently used versions and setup:
- CMake 4.3.0 in CI,
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
- Protobuf `v34.0`,
- Rust stable for `pim-simulator`,
- Python packages `numpy`, `onnx`, `colorama` for validation.
### Protobuf
Use the following commands to install protobuf:
```
Install Protobuf if your system does not already provide a compatible version:
```bash
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
cd protobuf
mkdir build
cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
ninja
sudo ninja install
cmake -S protobuf -B protobuf/build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-Dprotobuf_BUILD_TESTS=OFF
cmake --build protobuf/build
sudo cmake --install protobuf/build
```
You can now remove the protobuf repo directory with:
```
cd ../..
You can then remove the temporary checkout:
```bash
rm -rf protobuf
```
### Mlir
### MLIR
Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir.
Follow the ONNX-MLIR instructions in
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
expects `MLIR_DIR` to point at the MLIR CMake package, for example:
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor
Moreover, if compiling with build type debug, it is also suggested to use
mold as linker (you will need to install it if you don't have it already)
to reduce memory usage during linking. You can use it by setting the options:
```
-DLLVM_USE_LINKER=mold
```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
```
If your LLVM build directory is named `build` instead of `build_release`, adjust
the path accordingly.
### Raptor
Use the following commands to build Raptor.
Configure a release build:
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor.
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage,
setting the options:
```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
cmake -S . -B build_release -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
```
Configure a debug build similarly:
```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
cmake -S . -B build_debug -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
```
For debug development, using `mold` can reduce link time and memory use:
```bash
cmake -S . -B build_debug -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR} \
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
```
```
git submodule update --init --recursive
Build the compiler with CMake:
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
mkdir build && cd build
cmake .. -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
cmake --build .
```bash
cmake --build ./build_release
cmake --build ./build_debug
```
If the build fails because of protobuf missing uint definitions,
just patch the problematic files by adding ```#include <cstdint>``` to their includes.
Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
configuration and generated shims stay consistent.
If a build fails because Protobuf headers are missing fixed-width integer
definitions, patch the affected Protobuf-generated files by adding
`#include <cstdint>`.
## Tests
The Rust simulator has its own tests:
```bash
cd backend-simulators/pim/pim-simulator
cargo test
```
## Repository Layout
- `src/PIM/` - PIM accelerator implementation.
- `test/PIM/` - PIM C++ unit tests.
- `validation/` - functional validation scripts, ONNX operation tests, network
slices, and pimsim config generation.
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
for MLIR/Protobuf caching, building Raptor, and validation.