322 lines
11 KiB
Markdown
322 lines
11 KiB
Markdown
# Raptor
|
|
|
|
Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
|
|
targeting in-memory computing / processing-in-memory (PIM) architectures. It
|
|
extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
|
|
through custom MLIR dialects to simulator artifacts.
|
|
|
|
The current target is the PIM simulator stack under `backend-simulators/pim`.
|
|
Raptor emits binary per-core `.pim` instruction files by default, plus
|
|
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
|
|
instruction files with `--pim-emit-json`.
|
|
|
|
## Overview
|
|
|
|
PIM architectures perform most computation directly in memory. The supported
|
|
target models a chip with:
|
|
- shared host memory,
|
|
- multiple PIM cores,
|
|
- ReRAM crossbars for vector-matrix / matrix-vector work,
|
|
- explicit communication between cores,
|
|
- no hardware branch or loop support in emitted simulator code.
|
|
|
|
Because repeated work such as convolutions is eventually made explicit, emitted
|
|
instruction counts can grow quickly. Most compiler work therefore focuses on
|
|
lowering, scheduling, memory layout, and code-generation optimizations.
|
|
|
|
### Targets and simulators
|
|
|
|
- `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
|
|
simulator used by validation. It reads Raptor's `pim/` artifact directory and
|
|
compares simulator output against native ONNX-MLIR execution.
|
|
- `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
|
|
The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
|
|
contain local paths; treat them as local utilities, not portable workflows.
|
|
|
|
## Compilation pipeline
|
|
|
|
The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
|
|
them to ONNX-MLIR through generated shim directories under
|
|
`onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.
|
|
|
|
High-level lowering flow:
|
|
|
|
```
|
|
ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
|
|
```
|
|
|
|
1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
|
Lowers supported ONNX ops into the `spat` dialect
|
|
(`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
|
|
`Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
|
|
elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
|
|
Concat, Gather, Reshape, Resize, and Split.
|
|
|
|
2. **Merge compute nodes**
|
|
(`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
|
Builds a compute graph, schedules it with the PEFT scheduler, and materializes
|
|
the merge schedule into Spatial IR. Supporting scheduling code lives under
|
|
`MergeComputeNodes/Scheduling`.
|
|
|
|
3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
|
|
Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
|
|
including `pim.core`, `pim.core_batch`, communication, tensor packing, global
|
|
tensor materialization, and return-path normalization.
|
|
|
|
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
|
|
Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
|
|
bufferization interfaces.
|
|
|
|
5. **Static memory coalescing**
|
|
(`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
|
|
Reuses compatible local memref allocations inside PIM cores before codegen.
|
|
|
|
6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
|
|
`src/PIM/Compiler`).
|
|
Folds host constants, materializes remaining host constants, verifies PIM IR,
|
|
emits `.pim` core files, writes weights, and writes `memory.bin` /
|
|
`config.json`.
|
|
|
|
Supporting pieces:
|
|
- `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
|
|
helpers.
|
|
- `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
|
|
instruction format, artifact writing, weight emission, and codegen entry
|
|
points.
|
|
- `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
|
|
pass.
|
|
- `src/PIM/Pass` - pass registration and auxiliary passes.
|
|
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.
|
|
|
|
## Key compiler options
|
|
|
|
Pass these to `onnx-mlir` when compiling for PIM:
|
|
|
|
- `--maccel=PIM` - select the PIM accelerator.
|
|
- `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
|
|
`--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
|
|
default is `--EmitPimCodegen`.
|
|
- `--core-count=<N>` - required positive core count for PIM compilation.
|
|
- `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
|
|
- `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
|
|
- `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
|
|
value in the current code.
|
|
- `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
|
|
the codegen tail.
|
|
- `--pim-emit-json` - also emit `core_*.json` instruction files alongside
|
|
`core_*.pim`.
|
|
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
|
|
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
|
|
--maccel=PIM --EmitPimCodegen \
|
|
--crossbar-size=2048 --crossbar-count=256 --core-count=1000
|
|
```
|
|
|
|
This writes PIM artifacts under `/tmp/raptor/pim/`.
|
|
|
|
## Validation
|
|
|
|
Functional validation lives in `validation/`. It compiles ONNX models, builds a
|
|
native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
|
|
the Rust PIM simulator, and compares outputs.
|
|
|
|
Python dependencies used by the validation scripts are `numpy`, `onnx`, and
|
|
`colorama`. The simulator requires the Rust toolchain.
|
|
|
|
Per-operation validation from the repository root:
|
|
|
|
```bash
|
|
python3 validation/validate.py \
|
|
--raptor-path build_release/Release/bin/onnx-mlir \
|
|
--onnx-include-dir onnx-mlir/include \
|
|
--core-count 1000
|
|
```
|
|
|
|
Validate one network or a subset by pointing `--operations-dir` at any directory
|
|
containing `.onnx` files:
|
|
|
|
```bash
|
|
python3 validation/validate.py \
|
|
--raptor-path build_release/Release/bin/onnx-mlir \
|
|
--onnx-include-dir onnx-mlir/include \
|
|
--operations-dir validation/networks/yolo11n/depth_04 \
|
|
--crossbar-size 2048 --crossbar-count 256 --core-count 1000
|
|
```
|
|
|
|
Useful validation options:
|
|
- `--simulator-dir <path>` - override the auto-detected
|
|
`backend-simulators/pim/pim-simulator` path.
|
|
- `--threshold <float>` - maximum allowed per-element output difference.
|
|
- `--seed <int>` - RNG seed for generated inputs.
|
|
- `--command-timeout-seconds <float>` - timeout for compiler, runner, and
|
|
simulator subprocesses.
|
|
- `--verbose` - print subprocess logs and average PIM pass timings.
|
|
- `--clean` - remove generated validation artifacts and exit.
|
|
|
|
Each validation run writes artifacts in the model workspace, for example under
|
|
`validation/operations/gemm/small/`:
|
|
- `inputs/` - generated input CSV files.
|
|
- `outputs/` - native ONNX-MLIR reference outputs.
|
|
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
|
|
`dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
|
|
- `runner/` - generated reference runner source, build tree, and shared library.
|
|
- `simulation/out.bin` - raw simulator output used for comparison.
|
|
|
|
The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
|
|
`spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
|
|
`pim2_coalesced.mlir`, `pim3_folded.mlir`, and
|
|
`pim4_materialized.mlir` when an output directory is available.
|
|
|
|
To rerun the simulator manually with tracing after validation has produced a
|
|
`raptor/pim/` directory:
|
|
|
|
```bash
|
|
cd backend-simulators/pim/pim-simulator
|
|
cargo run --no-default-features --features tracing --release \
|
|
--package pim-simulator --bin pim-simulator -- \
|
|
-f /path/to/workspace/raptor/pim \
|
|
-o /path/to/workspace/simulation/out.bin \
|
|
-d <addr0>,<size0>,<addr1>,<size1>,...
|
|
```
|
|
|
|
With `--features tracing`, the simulator writes per-core traces as
|
|
`TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
|
|
computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.
|
|
|
|
Available validation networks under `validation/networks/`: `vgg16`,
|
|
`yolo11n`, `yolo11nv2`.
|
|
|
|
Available operation suites under `validation/operations/`: `add`, `concat`,
|
|
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
|
|
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.
|
|
|
|
Generated operation tests can be regenerated with:
|
|
|
|
```bash
|
|
python3 validation/operations/gen_tests.py
|
|
```
|
|
|
|
## Build
|
|
|
|
Initialize submodules first:
|
|
|
|
```bash
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
The project follows ONNX-MLIR's build requirements. The CI workflow documents
|
|
the currently used versions and setup:
|
|
- CMake 4.3.0 in CI,
|
|
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
|
|
- Protobuf `v34.0`,
|
|
- Rust stable for `pim-simulator`,
|
|
- Python packages `numpy`, `onnx`, `colorama` for validation.
|
|
|
|
### Protobuf
|
|
|
|
Install Protobuf if your system does not already provide a compatible version:
|
|
|
|
```bash
|
|
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
|
|
cmake -S protobuf -B protobuf/build -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
-Dprotobuf_BUILD_TESTS=OFF
|
|
cmake --build protobuf/build
|
|
sudo cmake --install protobuf/build
|
|
```
|
|
|
|
You can then remove the temporary checkout:
|
|
|
|
```bash
|
|
rm -rf protobuf
|
|
```
|
|
|
|
### MLIR
|
|
|
|
Follow the ONNX-MLIR instructions in
|
|
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
|
|
expects `MLIR_DIR` to point at the MLIR CMake package, for example:
|
|
|
|
```bash
|
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
|
```
|
|
|
|
If your LLVM build directory is named `build` instead of `build_release`, adjust
|
|
the path accordingly.
|
|
|
|
### Raptor
|
|
|
|
Configure a release build:
|
|
|
|
```bash
|
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
|
cmake -S . -B build_release -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
|
-DMLIR_DIR=${MLIR_DIR}
|
|
```
|
|
|
|
Configure a debug build similarly:
|
|
|
|
```bash
|
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
|
|
cmake -S . -B build_debug -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Debug \
|
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
|
-DMLIR_DIR=${MLIR_DIR}
|
|
```
|
|
|
|
For debug development, using `mold` can reduce link time and memory use:
|
|
|
|
```bash
|
|
cmake -S . -B build_debug -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Debug \
|
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
|
-DMLIR_DIR=${MLIR_DIR} \
|
|
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
|
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
|
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
|
```
|
|
|
|
Build the compiler with CMake:
|
|
|
|
```bash
|
|
cmake --build ./build_release
|
|
cmake --build ./build_debug
|
|
```
|
|
|
|
Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
|
|
configuration and generated shims stay consistent.
|
|
|
|
If a build fails because Protobuf headers are missing fixed-width integer
|
|
definitions, patch the affected Protobuf-generated files by adding
|
|
`#include <cstdint>`.
|
|
|
|
## Tests
|
|
|
|
The Rust simulator has its own tests:
|
|
|
|
```bash
|
|
cd backend-simulators/pim/pim-simulator
|
|
cargo test
|
|
```
|
|
|
|
## Repository Layout
|
|
|
|
- `src/PIM/` - PIM accelerator implementation.
|
|
- `test/PIM/` - PIM C++ unit tests.
|
|
- `validation/` - functional validation scripts, ONNX operation tests, network
|
|
slices, and pimsim config generation.
|
|
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
|
|
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
|
|
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
|
|
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
|
|
for MLIR/Protobuf caching, building Raptor, and validation.
|