Raptor/README.md

# Raptor

Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
targeting in-memory computing / processing-in-memory (PIM) architectures. It
extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
through custom MLIR dialects to simulator artifacts.

The current target is the PIM simulator stack under `backend-simulators/pim`.
Raptor emits binary per-core `.pim` instruction files by default, plus
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
instruction files with `--pim-emit-json`.

## Overview

PIM architectures perform most computation directly in memory. The supported
target models a chip with:
- shared host memory,
- multiple PIM cores,
- ReRAM crossbars for vector-matrix / matrix-vector work,
- explicit communication between cores,
- no hardware branch or loop support in emitted simulator code.

Because repeated work such as convolutions is eventually made explicit, emitted
instruction counts can grow quickly. Most compiler work therefore focuses on
lowering, scheduling, memory layout, and code-generation optimizations.

### Targets and simulators

- `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
  simulator used by validation. It reads Raptor's `pim/` artifact directory and
  compares simulator output against native ONNX-MLIR execution.
- `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
  The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
  contain local paths; treat them as local utilities, not portable workflows.

## Compilation pipeline

The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
them to ONNX-MLIR through generated shim directories under
`onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.

High-level lowering flow:

```
ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
```

1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
   Lowers supported ONNX ops into the `spat` dialect
   (`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
   `Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
   elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
   Concat, Gather, Reshape, Resize, and Split.

2. **Merge compute nodes**
   (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
   Builds a compute graph, schedules it with the PEFT scheduler, and materializes
   the merge schedule into Spatial IR. Supporting scheduling code lives under
   `MergeComputeNodes/Scheduling`.

3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
   Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
   including `pim.core`, `pim.core_batch`, communication, tensor packing, global
   tensor materialization, and return-path normalization.

4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
   Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
   bufferization interfaces.

5. **Static memory coalescing**
   (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
   Reuses compatible local memref allocations inside PIM cores before codegen.

6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
   `src/PIM/Compiler`).
   Folds host constants, materializes remaining host constants, verifies PIM IR,
   emits `.pim` core files, writes weights, and writes `memory.bin` /
   `config.json`.

Supporting pieces:
- `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
  helpers.
- `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
  instruction format, artifact writing, weight emission, and codegen entry
  points.
- `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
  pass.
- `src/PIM/Pass` - pass registration and auxiliary passes.
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.

## Key compiler options

Pass these to `onnx-mlir` when compiling for PIM:

- `--maccel=PIM` - select the PIM accelerator.
- `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
  `--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
  default is `--EmitPimCodegen`.
- `--core-count=<N>` - required positive core count for PIM compilation.
- `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
- `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
- `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
  value in the current code.
- `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
  the codegen tail.
- `--pim-emit-json` - also emit `core_*.json` instruction files alongside
  `core_*.pim`.
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.

Example:

```bash
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
  --maccel=PIM --EmitPimCodegen \
  --crossbar-size=2048 --crossbar-count=256 --core-count=1000
```

This writes PIM artifacts under `/tmp/raptor/pim/`.

## Validation

Functional validation lives in `validation/`. It compiles ONNX models, builds a
native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
the Rust PIM simulator, and compares outputs.

Python dependencies used by the validation scripts are `numpy`, `onnx`, and
`colorama`. The simulator requires the Rust toolchain.

Per-operation validation from the repository root:

```bash
python3 validation/validate.py \
  --raptor-path build_release/Release/bin/onnx-mlir \
  --onnx-include-dir onnx-mlir/include \
  --core-count 1000
```

Validate one network or a subset by pointing `--operations-dir` at any directory
containing `.onnx` files:

```bash
python3 validation/validate.py \
  --raptor-path build_release/Release/bin/onnx-mlir \
  --onnx-include-dir onnx-mlir/include \
  --operations-dir validation/networks/yolo11n/depth_04 \
  --crossbar-size 2048 --crossbar-count 256 --core-count 1000
```

Useful validation options:
- `--simulator-dir <path>` - override the auto-detected
  `backend-simulators/pim/pim-simulator` path.
- `--threshold <float>` - maximum allowed per-element output difference.
- `--seed <int>` - RNG seed for generated inputs.
- `--command-timeout-seconds <float>` - timeout for compiler, runner, and
  simulator subprocesses.
- `--verbose` - print subprocess logs and average PIM pass timings.
- `--clean` - remove generated validation artifacts and exit.

Each validation run writes artifacts in the model workspace, for example under
`validation/operations/gemm/small/`:
- `inputs/` - generated input CSV files.
- `outputs/` - native ONNX-MLIR reference outputs.
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
  `dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
- `runner/` - generated reference runner source, build tree, and shared library.
- `simulation/out.bin` - raw simulator output used for comparison.

The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
`spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
`pim2_coalesced.mlir`, `pim3_folded.mlir`, and
`pim4_materialized.mlir` when an output directory is available.

To rerun the simulator manually with tracing after validation has produced a
`raptor/pim/` directory:

```bash
cd backend-simulators/pim/pim-simulator
cargo run --no-default-features --features tracing --release \
  --package pim-simulator --bin pim-simulator -- \
  -f /path/to/workspace/raptor/pim \
  -o /path/to/workspace/simulation/out.bin \
  -d <addr0>,<size0>,<addr1>,<size1>,...
```

With `--features tracing`, the simulator writes per-core traces as
`TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.

Available validation networks under `validation/networks/`: `vgg16`,
`yolo11n`, `yolo11nv2`.

Available operation suites under `validation/operations/`: `add`, `concat`,
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.

Generated operation tests can be regenerated with:

```bash
python3 validation/operations/gen_tests.py
```

## Build

Initialize submodules first:

```bash
git submodule update --init --recursive
```

The project follows ONNX-MLIR's build requirements. The CI workflow documents
the currently used versions and setup:
- CMake 4.3.0 in CI,
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
- Protobuf `v34.0`,
- Rust stable for `pim-simulator`,
- Python packages `numpy`, `onnx`, `colorama` for validation.

### Protobuf

Install Protobuf if your system does not already provide a compatible version:

```bash
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
cmake -S protobuf -B protobuf/build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -Dprotobuf_BUILD_TESTS=OFF
cmake --build protobuf/build
sudo cmake --install protobuf/build
```

You can then remove the temporary checkout:

```bash
rm -rf protobuf
```

### MLIR

Follow the ONNX-MLIR instructions in
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
expects `MLIR_DIR` to point at the MLIR CMake package, for example:

```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
```

If your LLVM build directory is named `build` instead of `build_release`, adjust
the path accordingly.

### Raptor

Configure a release build:

```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
cmake -S . -B build_release -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DONNX_MLIR_ACCELERATORS=PIM \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DMLIR_DIR=${MLIR_DIR}
```

Configure a debug build similarly:

```bash
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
cmake -S . -B build_debug -G Ninja \
  -DCMAKE_BUILD_TYPE=Debug \
  -DONNX_MLIR_ACCELERATORS=PIM \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DMLIR_DIR=${MLIR_DIR}
```

For debug development, using `mold` can reduce link time and memory use:

```bash
cmake -S . -B build_debug -G Ninja \
  -DCMAKE_BUILD_TYPE=Debug \
  -DONNX_MLIR_ACCELERATORS=PIM \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DMLIR_DIR=${MLIR_DIR} \
  -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
  -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
  -DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
```

Build the compiler with CMake:

```bash
cmake --build ./build_release
cmake --build ./build_debug
```

Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
configuration and generated shims stay consistent.

If a build fails because Protobuf headers are missing fixed-width integer
definitions, patch the affected Protobuf-generated files by adding
`#include <cstdint>`.

## Tests

The Rust simulator has its own tests:

```bash
cd backend-simulators/pim/pim-simulator
cargo test
```

## Repository Layout

- `src/PIM/` - PIM accelerator implementation.
- `test/PIM/` - PIM C++ unit tests.
- `validation/` - functional validation scripts, ONNX operation tests, network
  slices, and pimsim config generation.
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
  for MLIR/Protobuf caching, building Raptor, and validation.