Update README and AGENTS
Validate Operations / validate-operations (push) Waiting to run

This commit is contained in:
ilgeco
2026-05-27 15:09:30 +02:00
parent c6b02af7a9
commit 013ae0ac2a
2 changed files with 243 additions and 185 deletions
+2 -2
View File
@@ -1,7 +1,7 @@
- Always read the full README.md before doing anything. - Always read the full README.md before doing anything.
- Build commands: - Build commands:
- `cmake --build ./build_release --target onnx-mlir -j 30` - `cmake --build ./build_release`
- `cmake --build ./build_debug --target onnx-mlir -j 30` - `cmake --build ./build_debug`
- Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache. - Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache.
# Code changes # Code changes
+236 -178
View File
@@ -1,168 +1,178 @@
# Raptor # Raptor
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format) Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
targeting in-memory computing / processing-in-memory (PIM) architectures. targeting in-memory computing / processing-in-memory (PIM) architectures. It
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator). through custom MLIR dialects to simulator artifacts.
The current target is the PIM simulator stack under `backend-simulators/pim`.
Raptor emits binary per-core `.pim` instruction files by default, plus
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
instruction files with `--pim-emit-json`.
## Overview ## Overview
PIM architectures perform most of the computation directly in memory. PIM architectures perform most computation directly in memory. The supported
Raptor's first supported target is `pimsim-nn`, which simulates a chip with: target models a chip with:
- a shared host memory, - shared host memory,
- a number of cores that do most of the computation directly in their memory - multiple PIM cores,
(vector ops, vmm/mvm on ReRAM crossbars), - ReRAM crossbars for vector-matrix / matrix-vector work,
- no branching instructions (branchless architecture) and no hardware loop - explicit communication between cores,
support — any repeated work (e.g. convolutions) must be unrolled into - no hardware branch or loop support in emitted simulator code.
explicit per-iteration instructions.
Because of this, the amount of emitted instructions explodes quickly and the Because repeated work such as convolutions is eventually made explicit, emitted
compiler must optimize aggressively at every stage to keep compilation instruction counts can grow quickly. Most compiler work therefore focuses on
tractable. lowering, scheduling, memory layout, and code-generation optimizations.
A second target, `PulPim`, is planned for an accelerator with RISC-V cores
each carrying its own in-memory computing unit and crossbars. It will live in
a dedicated dialect (future work).
### Targets and simulators ### Targets and simulators
`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for - `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
**performance** estimates (latency, energy), but does not functionally execute simulator used by validation. It reads Raptor's `pim/` artifact directory and
the JSON code it consumes. To validate the numerical correctness of the JSON compares simulator output against native ONNX-MLIR execution.
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use - `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
a Rust simulator we maintain in-tree at The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
`backend-simulators/pim/pim-simulator`. contain local paths; treat them as local utilities, not portable workflows.
## Compilation pipeline ## Compilation pipeline
The PIM-related sources live under `src/PIM` and the tests under `test/PIM`. The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
When working on this codebase, most changes should stay confined to those them to ONNX-MLIR through generated shim directories under
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for `onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.
framework-level details).
High-level lowering flow: High-level lowering flow:
``` ```
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM code ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
``` ```
1. **ONNX Spatial** (`src/PIM/Conversion/ONNXToSpatial`). 1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`). Lowers supported ONNX ops into the `spat` dialect
Spatial models a high-level spatial in-memory accelerator: vmm/mvm (`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
operations are accelerated by storing a constant RHS matrix into a `Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
crossbar. Crossbars cannot be re-programmed during execution, have a elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
limited fixed size, and there is a limited number of them per core. Concat, Gather, Reshape, Resize, and Split.
Conversion patterns are split by op family under
`Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
Reshape, Resize, Split, etc...).
2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`). 2. **Merge compute nodes**
Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
materializes PIM cores (`pim.core`), inter-core communication Builds a compute graph, schedules it with the PEFT scheduler, and materializes
(`pim.send` / `pim.receive`), halts, and crossbar-level operations. the merge schedule into Spatial IR. Supporting scheduling code lives under
`MergeComputeNodes/Scheduling`.
3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`). 3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
A PEFT heuristic that coarsens the virtual node graph and decides how to group compute Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
nodes onto cores. Our implementation is only DCP-*inspired*: it is a including `pim.core`, `pim.core_batch`, communication, tensor packing, global
heuristic with different assumptions from the paper (different cost tensor materialization, and return-path normalization.
model, constraints from crossbar capacity / core resources, and a
windowed coarsening loop instead of full-graph reprioritization). The
`dcp-critical-window-size` option controls how many lowest-slack virtual
nodes each coarsening iteration considers (0 = legacy full-graph
analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
`MergeComputeNodesPass.cpp`.
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`). 4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
Converts tensor-semantics PIM IR into memref-semantics PIM IR using the Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
standard MLIR `BufferizableOpInterface` machinery bufferization interfaces.
(`OpBufferizationInterfaces.*`, `PimBufferization.td`).
5. **Static memory coalescing** (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`). 5. **Static memory coalescing**
Conservatively reuses same-typed local memref allocations inside PIM cores (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
after bufferization and before code generation. Reuses compatible local memref allocations inside PIM cores before codegen.
6. **PIM code generation** (`src/PIM/Pass/PimCodegen`): 6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
- `HostConstantFolding` — folds host-side constants. `src/PIM/Compiler`).
- `MaterializeHostConstantsPass` materializes the remaining host Folds host constants, materializes remaining host constants, verifies PIM IR,
constants for emission. emits `.pim` core files, writes weights, and writes `memory.bin` /
- `VerificationPass` — checks invariants before emission. `config.json`.
- `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
and `pim-simulator`.
Supporting pieces: Supporting pieces:
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count, - `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
core count, DCP window, experimental conv impl, concat error handling, …) helpers.
and `PimCodeGen` entry points. - `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`). instruction format, artifact writing, weight emission, and codegen entry
- `src/PIM/Pass` — auxiliary passes (`MessagePass`) points.
and the `PIMPasses.h` registry used by `PimAccelerator`. - `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers pass.
dialects, passes, and plugs Raptor into the ONNX-MLIR driver. - `src/PIM/Pass` - pass registration and auxiliary passes.
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.
## Key compiler options ## Key compiler options
Pass these on the `onnx-mlir` command line when compiling for PIM: Pass these to `onnx-mlir` when compiling for PIM:
- `--maccel=PIM` select the PIM accelerator. - `--maccel=PIM` - select the PIM accelerator.
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen` - `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
stop the pipeline at the requested stage (default: `EmitPimCodegen`). `--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and default is `--EmitPimCodegen`.
run only the codegen tail. - `--core-count=<N>` - required positive core count for PIM compilation.
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and - `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
per-core count. - `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
- `--core-count=<N>` — number of cores. Required for PIM compilation. - `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
- `--pim-merge-scheduler={peft,dcp}` — scheduler used by the Spatial value in the current code.
merge-compute-nodes pass (default: `peft`). - `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy). the codegen tail.
- `--use-experimental-conv-impl` alternative convolution lowering. - `--pim-emit-json` - also emit `core_*.json` instruction files alongside
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`. `core_*.pim`.
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.
Example:
```bash
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
--maccel=PIM --EmitPimCodegen \
--crossbar-size=2048 --crossbar-count=256 --core-count=1000
```
This writes PIM artifacts under `/tmp/raptor/pim/`.
## Validation ## Validation
Functional validation lives in `validation/` and drives the Rust Functional validation lives in `validation/`. It compiles ONNX models, builds a
`pim-simulator` to compare Raptor's output against a reference. native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
the Rust PIM simulator, and compares outputs.
Per-operation validation (from `validation/`): Python dependencies used by the validation scripts are `numpy`, `onnx`, and
`colorama`. The simulator requires the Rust toolchain.
``` Per-operation validation from the repository root:
validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \ ```bash
--onnx-include-dir ../onnx-mlir/include \ python3 validation/validate.py \
--raptor-path build_release/Release/bin/onnx-mlir \
--onnx-include-dir onnx-mlir/include \
--core-count 1000 --core-count 1000
``` ```
End-to-end network validation (example: first 4 layers of YOLOv11n): Validate one network or a subset by pointing `--operations-dir` at any directory
containing `.onnx` files:
``` ```bash
validate.py \ python3 validation/validate.py \
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \ --raptor-path build_release/Release/bin/onnx-mlir \
--onnx-include-dir ../onnx-mlir/include \ --onnx-include-dir onnx-mlir/include \
--operations-dir ./networks/yolo11n/depth_04 \ --operations-dir validation/networks/yolo11n/depth_04 \
--crossbar-size 2048 --crossbar-count 256 --core-count 1000 --crossbar-size 2048 --crossbar-count 256 --core-count 1000
``` ```
Each validation run writes debugging artifacts into the benchmark's workspace Useful validation options:
directory (for example `validation/operations/gemm/small/`): - `--simulator-dir <path>` - override the auto-detected
- `inputs/` — generated input CSVs used for the run. `backend-simulators/pim/pim-simulator` path.
- `outputs/` — reference outputs dumped by the native ONNX runner. - `--threshold <float>` - maximum allowed per-element output difference.
- `raptor/` — compiler artifacts: - `--seed <int>` - RNG seed for generated inputs.
`*.onnx.mlir`, `dialects/spatial0.mlir`, `dialects/spatial1_dcp_merged.mlir`, - `--command-timeout-seconds <float>` - timeout for compiler, runner, and
`dialects/pim0.mlir`, `dialects/pim1_buff.mlir`, `dialects/pim2_coalesced.mlir`, simulator subprocesses.
`dialects/pim3_folded.mlir`, `dialects/pim4_materialized.mlir`, - `--verbose` - print subprocess logs and average PIM pass timings.
`pim/config.json`, `pim/core_*.pim`, `pim/memory.bin`, and reports under - `--clean` - remove generated validation artifacts and exit.
`raptor/reports/` such as `dcp_merge_report.txt`,
`memory_report.txt`, and `static_memory_coalescing_report.txt`.
- `runner/` — generated reference runner source, build tree, and shared library.
- `simulation/out.bin` — raw simulator output dump used for output comparison.
That means you usually do not need to rerun standalone `--EmitSpatial` or Each validation run writes artifacts in the model workspace, for example under
`--EmitPim` commands while debugging validation failures: the per-pass dialect `validation/operations/gemm/small/`:
dumps are already available under `raptor/dialects/`. - `inputs/` - generated input CSV files.
- `outputs/` - native ONNX-MLIR reference outputs.
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
`dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
- `runner/` - generated reference runner source, build tree, and shared library.
- `simulation/out.bin` - raw simulator output used for comparison.
The validator does not currently expose a simulator tracing flag, but once a The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
validation has produced `raptor/pim/` you can rerun the simulator manually with `spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
tracing enabled: `pim2_coalesced.mlir`, `pim3_folded.mlir`, and
`pim4_materialized.mlir` when an output directory is available.
To rerun the simulator manually with tracing after validation has produced a
`raptor/pim/` directory:
```bash ```bash
cd backend-simulators/pim/pim-simulator cd backend-simulators/pim/pim-simulator
@@ -174,90 +184,138 @@ cargo run --no-default-features --features tracing --release \
``` ```
With `--features tracing`, the simulator writes per-core traces as With `--features tracing`, the simulator writes per-core traces as
`simulation/TraceCore0`, `simulation/TraceCore1`, ... next to `simulation/out.bin`. `TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
The validator normally computes the `-d` dump ranges from `raptor/pim/config.json` computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.
and the model output shapes. If you need a clean slate before rerunning, use:
Available validation networks under `validation/networks/`: `vgg16`,
`yolo11n`, `yolo11nv2`.
Available operation suites under `validation/operations/`: `add`, `concat`,
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.
Generated operation tests can be regenerated with:
```bash ```bash
validate.py --clean python3 validation/operations/gen_tests.py
``` ```
Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
Available operations under `validation/operations/`: `add`, `conv`, `div`,
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
`sigmoid`, `softmax`, `split`.
## Rebuilding
Release build (fast):
```
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
```
A slower debug build is also available — configure it the same way but with
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).
## Build ## Build
Initialize submodules first:
```bash
git submodule update --init --recursive
```
The project follows ONNX-MLIR's build requirements. The CI workflow documents
the currently used versions and setup:
- CMake 4.3.0 in CI,
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
- Protobuf `v34.0`,
- Rust stable for `pim-simulator`,
- Python packages `numpy`, `onnx`, `colorama` for validation.
### Protobuf ### Protobuf
Use the following commands to install protobuf: Install Protobuf if your system does not already provide a compatible version:
```
```bash
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
cd protobuf cmake -S protobuf -B protobuf/build -G Ninja \
mkdir build -DCMAKE_BUILD_TYPE=Release \
cd build -Dprotobuf_BUILD_TESTS=OFF
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release cmake --build protobuf/build
ninja sudo cmake --install protobuf/build
sudo ninja install
``` ```
You can now remove the protobuf repo directory with: You can then remove the temporary checkout:
```
cd ../.. ```bash
rm -rf protobuf rm -rf protobuf
``` ```
### Mlir ### MLIR
Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir. Follow the ONNX-MLIR instructions in
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
expects `MLIR_DIR` to point at the MLIR CMake package, for example:
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor ```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
Moreover, if compiling with build type debug, it is also suggested to use
mold as linker (you will need to install it if you don't have it already)
to reduce memory usage during linking. You can use it by setting the options:
```
-DLLVM_USE_LINKER=mold
``` ```
If your LLVM build directory is named `build` instead of `build_release`, adjust
the path accordingly.
### Raptor ### Raptor
Use the following commands to build Raptor. Configure a release build:
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor. ```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage, cmake -S . -B build_release -G Ninja \
setting the options: -DCMAKE_BUILD_TYPE=Release \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
``` ```
Configure a debug build similarly:
```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
cmake -S . -B build_debug -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
```
For debug development, using `mold` can reduce link time and memory use:
```bash
cmake -S . -B build_debug -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR} \
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \ -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold" -DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
``` ```
``` Build the compiler with CMake:
git submodule update --init --recursive
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir ```bash
mkdir build && cd build cmake --build ./build_release
cmake .. -G Ninja \ cmake --build ./build_debug
-DCMAKE_BUILD_TYPE=Release \
-DONNX_MLIR_ACCELERATORS=PIM \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DMLIR_DIR=${MLIR_DIR}
cmake --build .
``` ```
If the build fails because of protobuf missing uint definitions, Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
just patch the problematic files by adding ```#include <cstdint>``` to their includes. configuration and generated shims stay consistent.
If a build fails because Protobuf headers are missing fixed-width integer
definitions, patch the affected Protobuf-generated files by adding
`#include <cstdint>`.
## Tests
The Rust simulator has its own tests:
```bash
cd backend-simulators/pim/pim-simulator
cargo test
```
## Repository Layout
- `src/PIM/` - PIM accelerator implementation.
- `test/PIM/` - PIM C++ unit tests.
- `validation/` - functional validation scripts, ONNX operation tests, network
slices, and pimsim config generation.
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
for MLIR/Protobuf caching, building Raptor, and validation.