Raptor/README.md

# Raptor

Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
targeting in-memory computing / processing-in-memory (PIM) architectures.
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator).

## Overview

PIM architectures perform most of the computation directly in memory.
Raptor's first supported target is `pimsim-nn`, which simulates a chip with:
- a shared host memory,
- a number of cores that do most of the computation directly in their memory
  (vector ops, vmm/mvm on ReRAM crossbars),
- no branching instructions (branchless architecture) and no hardware loop
  support — any repeated work (e.g. convolutions) must be unrolled into
  explicit per-iteration instructions.

Because of this, the amount of emitted instructions explodes quickly and the
compiler must optimize aggressively at every stage to keep compilation
tractable.

A second target, `PulPim`, is planned for an accelerator with RISC-V cores
each carrying its own in-memory computing unit and crossbars. It will live in
a dedicated dialect (future work).

### Targets and simulators

`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for
**performance** estimates (latency, energy), but does not functionally execute
the JSON code it consumes. To validate the numerical correctness of the JSON
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use
a Rust simulator we maintain in-tree at
`backend-simulators/pim/pim-simulator`.

## Compilation pipeline

The PIM-related sources live under `src/PIM` and the tests under `test/PIM`.
When working on this codebase, most changes should stay confined to those
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for
framework-level details).

High-level lowering flow:

```
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM JSON
```

1. **ONNX → Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
   Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`).
   Spatial models a high-level spatial in-memory accelerator: vmm/mvm
   operations are accelerated by storing a constant RHS matrix into a
   crossbar. Crossbars cannot be re-programmed during execution, have a
   limited fixed size, and there is a limited number of them per core.
   Conversion patterns are split by op family under
   `Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
   Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
   Reshape, Resize, Split).

2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`).
   Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which
   materializes PIM cores (`pim.core`), inter-core communication
   (`pim.send` / `pim.receive`), halts, and crossbar-level operations.

3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
   A DCP-inspired heuristic (Dynamic Critical Path — see the original
   scheduling paper by Kwok & Ahmad,
   [DCP-eScience2007](https://clouds.cis.unimelb.edu.au/papers/DCP-eScience2007.pdf))
   that coarsens the virtual node graph and decides how to group compute
   nodes onto cores. Our implementation is only DCP-*inspired*: it is a
   heuristic with different assumptions from the paper (different cost
   model, constraints from crossbar capacity / core resources, and a
   windowed coarsening loop instead of full-graph reprioritization). The
   `dcp-critical-window-size` option controls how many lowest-slack virtual
   nodes each coarsening iteration considers (0 = legacy full-graph
   analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
   `MergeComputeNodesPass.cpp`.

4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
   Converts tensor-semantics PIM IR into memref-semantics PIM IR using the
   standard MLIR `BufferizableOpInterface` machinery
   (`OpBufferizationInterfaces.*`, `PimBufferization.td`).

5. **PIM code generation** (`src/PIM/Pass/PimCodegen`):
   - `HostConstantFolding` — folds host-side constants.
   - `MaterializeHostConstantsPass` — materializes the remaining host
     constants for emission.
   - `VerificationPass` — checks invariants before emission.
   - `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
     and `pim-simulator`.

Supporting pieces:
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count,
  core count, DCP window, experimental conv impl, concat error handling, …)
  and `PimCodeGen` entry points.
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`).
- `src/PIM/Pass` — auxiliary passes (`MessagePass`, `CountInstructionPass`)
  and the `PIMPasses.h` registry used by `PimAccelerator`.
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers
  dialects, passes, and plugs Raptor into the ONNX-MLIR driver.

## Key compiler options

Pass these on the `onnx-mlir` command line when compiling for PIM:

- `--maccel=PIM` — select the PIM accelerator.
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen`
  — stop the pipeline at the requested stage (default: `EmitPimCodegen`).
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and
  run only the codegen tail.
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and
  per-core count.
- `--core-count=<N>` — number of cores (`-1` picks the minimum).
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy).
- `--use-experimental-conv-impl` — alternative convolution lowering.
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`.

## Validation

Functional validation lives in `validation/` and drives the Rust
`pim-simulator` to compare Raptor's output against a reference.

Per-operation validation (from `validation/`):

```
validate.py \
    --raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
    --onnx-include-dir ../onnx-mlir/include
```

End-to-end network validation (example: first 4 layers of YOLOv11n):

```
validate.py \
    --raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
    --onnx-include-dir ../onnx-mlir/include \
    --operations-dir ./networks/yolo11n/depth_04 \
    --crossbar-size 2048
```

Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
Available operations under `validation/operations/`: `add`, `conv`, `div`,
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
`sigmoid`, `softmax`, `split`.

## Rebuilding

Release build (fast):

```
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
```

A slower debug build is also available — configure it the same way but with
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).

## Build

### Protobuf

Use the following commands to install protobuf:
```
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
cd protobuf
mkdir build
cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
ninja
sudo ninja install
```

You can now remove the protobuf repo directory with:
```
cd ../..
rm -rf protobuf
```

### Mlir

Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir.

Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor

Moreover, if compiling with build type debug, it is also suggested to use
mold as linker (you will need to install it if you don't have it already)
to reduce memory usage during linking. You can use it by setting the options:
```
-DLLVM_USE_LINKER=mold
```

### Raptor

Use the following commands to build Raptor.

Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor.

Also in this case, it is suggested to use mold as linker to reduce link time and memory usage,
setting the options:
```
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
```

```
git submodule update --init --recursive

MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
mkdir build && cd build
cmake .. -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DONNX_MLIR_ACCELERATORS=PIM \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DMLIR_DIR=${MLIR_DIR}
cmake --build .
```

If the build fails because of protobuf missing uint definitions,
just patch the problematic files by adding ```#include <cstdint>``` to their includes.