All checks were successful
Validate Operations / validate-operations (push) Successful in 21m14s
220 lines
8.3 KiB
Markdown
220 lines
8.3 KiB
Markdown
# Raptor
|
|
|
|
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
|
|
targeting in-memory computing / processing-in-memory (PIM) architectures.
|
|
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
|
|
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator).
|
|
|
|
## Overview
|
|
|
|
PIM architectures perform most of the computation directly in memory.
|
|
Raptor's first supported target is `pimsim-nn`, which simulates a chip with:
|
|
- a shared host memory,
|
|
- a number of cores that do most of the computation directly in their memory
|
|
(vector ops, vmm/mvm on ReRAM crossbars),
|
|
- no branching instructions (branchless architecture) and no hardware loop
|
|
support — any repeated work (e.g. convolutions) must be unrolled into
|
|
explicit per-iteration instructions.
|
|
|
|
Because of this, the amount of emitted instructions explodes quickly and the
|
|
compiler must optimize aggressively at every stage to keep compilation
|
|
tractable.
|
|
|
|
A second target, `PulPim`, is planned for an accelerator with RISC-V cores
|
|
each carrying its own in-memory computing unit and crossbars. It will live in
|
|
a dedicated dialect (future work).
|
|
|
|
### Targets and simulators
|
|
|
|
`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for
|
|
**performance** estimates (latency, energy), but does not functionally execute
|
|
the JSON code it consumes. To validate the numerical correctness of the JSON
|
|
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use
|
|
a Rust simulator we maintain in-tree at
|
|
`backend-simulators/pim/pim-simulator`.
|
|
|
|
## Compilation pipeline
|
|
|
|
The PIM-related sources live under `src/PIM` and the tests under `test/PIM`.
|
|
When working on this codebase, most changes should stay confined to those
|
|
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for
|
|
framework-level details).
|
|
|
|
High-level lowering flow:
|
|
|
|
```
|
|
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM JSON
|
|
```
|
|
|
|
1. **ONNX → Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
|
Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`).
|
|
Spatial models a high-level spatial in-memory accelerator: vmm/mvm
|
|
operations are accelerated by storing a constant RHS matrix into a
|
|
crossbar. Crossbars cannot be re-programmed during execution, have a
|
|
limited fixed size, and there is a limited number of them per core.
|
|
Conversion patterns are split by op family under
|
|
`Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
|
|
Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
|
|
Reshape, Resize, Split).
|
|
|
|
2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`).
|
|
Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which
|
|
materializes PIM cores (`pim.core`), inter-core communication
|
|
(`pim.send` / `pim.receive`), halts, and crossbar-level operations.
|
|
|
|
3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
|
A DCP-inspired heuristic (Dynamic Critical Path — see the original
|
|
scheduling paper by Kwok & Ahmad,
|
|
[DCP-eScience2007](https://clouds.cis.unimelb.edu.au/papers/DCP-eScience2007.pdf))
|
|
that coarsens the virtual node graph and decides how to group compute
|
|
nodes onto cores. Our implementation is only DCP-*inspired*: it is a
|
|
heuristic with different assumptions from the paper (different cost
|
|
model, constraints from crossbar capacity / core resources, and a
|
|
windowed coarsening loop instead of full-graph reprioritization). The
|
|
`dcp-critical-window-size` option controls how many lowest-slack virtual
|
|
nodes each coarsening iteration considers (0 = legacy full-graph
|
|
analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
|
|
`MergeComputeNodesPass.cpp`.
|
|
|
|
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
|
|
Converts tensor-semantics PIM IR into memref-semantics PIM IR using the
|
|
standard MLIR `BufferizableOpInterface` machinery
|
|
(`OpBufferizationInterfaces.*`, `PimBufferization.td`).
|
|
|
|
5. **PIM code generation** (`src/PIM/Pass/PimCodegen`):
|
|
- `HostConstantFolding` — folds host-side constants.
|
|
- `MaterializeHostConstantsPass` — materializes the remaining host
|
|
constants for emission.
|
|
- `VerificationPass` — checks invariants before emission.
|
|
- `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
|
|
and `pim-simulator`.
|
|
|
|
Supporting pieces:
|
|
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count,
|
|
core count, DCP window, experimental conv impl, concat error handling, …)
|
|
and `PimCodeGen` entry points.
|
|
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`).
|
|
- `src/PIM/Pass` — auxiliary passes (`MessagePass`, `CountInstructionPass`)
|
|
and the `PIMPasses.h` registry used by `PimAccelerator`.
|
|
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers
|
|
dialects, passes, and plugs Raptor into the ONNX-MLIR driver.
|
|
|
|
## Key compiler options
|
|
|
|
Pass these on the `onnx-mlir` command line when compiling for PIM:
|
|
|
|
- `--maccel=PIM` — select the PIM accelerator.
|
|
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen`
|
|
— stop the pipeline at the requested stage (default: `EmitPimCodegen`).
|
|
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and
|
|
run only the codegen tail.
|
|
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and
|
|
per-core count.
|
|
- `--core-count=<N>` — number of cores (`-1` picks the minimum).
|
|
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy).
|
|
- `--use-experimental-conv-impl` — alternative convolution lowering.
|
|
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`.
|
|
|
|
## Validation
|
|
|
|
Functional validation lives in `validation/` and drives the Rust
|
|
`pim-simulator` to compare Raptor's output against a reference.
|
|
|
|
Per-operation validation (from `validation/`):
|
|
|
|
```
|
|
validate.py \
|
|
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
|
--onnx-include-dir ../onnx-mlir/include
|
|
```
|
|
|
|
End-to-end network validation (example: first 4 layers of YOLOv11n):
|
|
|
|
```
|
|
validate.py \
|
|
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
|
--onnx-include-dir ../onnx-mlir/include \
|
|
--operations-dir ./networks/yolo11n/depth_04 \
|
|
--crossbar-size 2048
|
|
```
|
|
|
|
Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
|
|
Available operations under `validation/operations/`: `add`, `conv`, `div`,
|
|
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
|
|
`sigmoid`, `softmax`, `split`.
|
|
|
|
## Rebuilding
|
|
|
|
Release build (fast):
|
|
|
|
```
|
|
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
|
|
```
|
|
|
|
A slower debug build is also available — configure it the same way but with
|
|
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).
|
|
|
|
## Build
|
|
|
|
### Protobuf
|
|
|
|
Use the following commands to install protobuf:
|
|
```
|
|
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
|
|
cd protobuf
|
|
mkdir build
|
|
cd build
|
|
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
|
|
ninja
|
|
sudo ninja install
|
|
```
|
|
|
|
You can now remove the protobuf repo directory with:
|
|
```
|
|
cd ../..
|
|
rm -rf protobuf
|
|
```
|
|
|
|
### Mlir
|
|
|
|
Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir.
|
|
|
|
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor
|
|
|
|
Moreover, if compiling with build type debug, it is also suggested to use
|
|
mold as linker (you will need to install it if you don't have it already)
|
|
to reduce memory usage during linking. You can use it by setting the options:
|
|
```
|
|
-DLLVM_USE_LINKER=mold
|
|
```
|
|
|
|
### Raptor
|
|
|
|
Use the following commands to build Raptor.
|
|
|
|
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor.
|
|
|
|
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage,
|
|
setting the options:
|
|
```
|
|
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
|
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
|
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
|
```
|
|
|
|
```
|
|
git submodule update --init --recursive
|
|
|
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
|
|
mkdir build && cd build
|
|
cmake .. -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
|
-DMLIR_DIR=${MLIR_DIR}
|
|
cmake --build .
|
|
```
|
|
|
|
If the build fails because of protobuf missing uint definitions,
|
|
just patch the problematic files by adding ```#include <cstdint>``` to their includes.
|