This commit is contained in:
@@ -1,7 +1,7 @@
|
|||||||
- Always read the full README.md before doing anything.
|
- Always read the full README.md before doing anything.
|
||||||
- Build commands:
|
- Build commands:
|
||||||
- `cmake --build ./build_release --target onnx-mlir -j 30`
|
- `cmake --build ./build_release`
|
||||||
- `cmake --build ./build_debug --target onnx-mlir -j 30`
|
- `cmake --build ./build_debug`
|
||||||
- Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache.
|
- Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache.
|
||||||
|
|
||||||
# Code changes
|
# Code changes
|
||||||
|
|||||||
@@ -1,168 +1,178 @@
|
|||||||
# Raptor
|
# Raptor
|
||||||
|
|
||||||
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
|
Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
|
||||||
targeting in-memory computing / processing-in-memory (PIM) architectures.
|
targeting in-memory computing / processing-in-memory (PIM) architectures. It
|
||||||
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
|
extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
|
||||||
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator).
|
through custom MLIR dialects to simulator artifacts.
|
||||||
|
|
||||||
|
The current target is the PIM simulator stack under `backend-simulators/pim`.
|
||||||
|
Raptor emits binary per-core `.pim` instruction files by default, plus
|
||||||
|
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
|
||||||
|
instruction files with `--pim-emit-json`.
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
PIM architectures perform most of the computation directly in memory.
|
PIM architectures perform most computation directly in memory. The supported
|
||||||
Raptor's first supported target is `pimsim-nn`, which simulates a chip with:
|
target models a chip with:
|
||||||
- a shared host memory,
|
- shared host memory,
|
||||||
- a number of cores that do most of the computation directly in their memory
|
- multiple PIM cores,
|
||||||
(vector ops, vmm/mvm on ReRAM crossbars),
|
- ReRAM crossbars for vector-matrix / matrix-vector work,
|
||||||
- no branching instructions (branchless architecture) and no hardware loop
|
- explicit communication between cores,
|
||||||
support — any repeated work (e.g. convolutions) must be unrolled into
|
- no hardware branch or loop support in emitted simulator code.
|
||||||
explicit per-iteration instructions.
|
|
||||||
|
|
||||||
Because of this, the amount of emitted instructions explodes quickly and the
|
Because repeated work such as convolutions is eventually made explicit, emitted
|
||||||
compiler must optimize aggressively at every stage to keep compilation
|
instruction counts can grow quickly. Most compiler work therefore focuses on
|
||||||
tractable.
|
lowering, scheduling, memory layout, and code-generation optimizations.
|
||||||
|
|
||||||
A second target, `PulPim`, is planned for an accelerator with RISC-V cores
|
|
||||||
each carrying its own in-memory computing unit and crossbars. It will live in
|
|
||||||
a dedicated dialect (future work).
|
|
||||||
|
|
||||||
### Targets and simulators
|
### Targets and simulators
|
||||||
|
|
||||||
`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for
|
- `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
|
||||||
**performance** estimates (latency, energy), but does not functionally execute
|
simulator used by validation. It reads Raptor's `pim/` artifact directory and
|
||||||
the JSON code it consumes. To validate the numerical correctness of the JSON
|
compares simulator output against native ONNX-MLIR execution.
|
||||||
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use
|
- `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
|
||||||
a Rust simulator we maintain in-tree at
|
The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
|
||||||
`backend-simulators/pim/pim-simulator`.
|
contain local paths; treat them as local utilities, not portable workflows.
|
||||||
|
|
||||||
## Compilation pipeline
|
## Compilation pipeline
|
||||||
|
|
||||||
The PIM-related sources live under `src/PIM` and the tests under `test/PIM`.
|
The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
|
||||||
When working on this codebase, most changes should stay confined to those
|
them to ONNX-MLIR through generated shim directories under
|
||||||
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for
|
`onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.
|
||||||
framework-level details).
|
|
||||||
|
|
||||||
High-level lowering flow:
|
High-level lowering flow:
|
||||||
|
|
||||||
```
|
```
|
||||||
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM code
|
ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
|
||||||
```
|
```
|
||||||
|
|
||||||
1. **ONNX → Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
||||||
Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`).
|
Lowers supported ONNX ops into the `spat` dialect
|
||||||
Spatial models a high-level spatial in-memory accelerator: vmm/mvm
|
(`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
|
||||||
operations are accelerated by storing a constant RHS matrix into a
|
`Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
|
||||||
crossbar. Crossbars cannot be re-programmed during execution, have a
|
elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
|
||||||
limited fixed size, and there is a limited number of them per core.
|
Concat, Gather, Reshape, Resize, and Split.
|
||||||
Conversion patterns are split by op family under
|
|
||||||
`Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
|
|
||||||
Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
|
|
||||||
Reshape, Resize, Split, etc...).
|
|
||||||
|
|
||||||
2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`).
|
2. **Merge compute nodes**
|
||||||
Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which
|
(`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
||||||
materializes PIM cores (`pim.core`), inter-core communication
|
Builds a compute graph, schedules it with the PEFT scheduler, and materializes
|
||||||
(`pim.send` / `pim.receive`), halts, and crossbar-level operations.
|
the merge schedule into Spatial IR. Supporting scheduling code lives under
|
||||||
|
`MergeComputeNodes/Scheduling`.
|
||||||
|
|
||||||
3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
|
||||||
A PEFT heuristic that coarsens the virtual node graph and decides how to group compute
|
Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
|
||||||
nodes onto cores. Our implementation is only DCP-*inspired*: it is a
|
including `pim.core`, `pim.core_batch`, communication, tensor packing, global
|
||||||
heuristic with different assumptions from the paper (different cost
|
tensor materialization, and return-path normalization.
|
||||||
model, constraints from crossbar capacity / core resources, and a
|
|
||||||
windowed coarsening loop instead of full-graph reprioritization). The
|
|
||||||
`dcp-critical-window-size` option controls how many lowest-slack virtual
|
|
||||||
nodes each coarsening iteration considers (0 = legacy full-graph
|
|
||||||
analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
|
|
||||||
`MergeComputeNodesPass.cpp`.
|
|
||||||
|
|
||||||
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
|
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
|
||||||
Converts tensor-semantics PIM IR into memref-semantics PIM IR using the
|
Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
|
||||||
standard MLIR `BufferizableOpInterface` machinery
|
bufferization interfaces.
|
||||||
(`OpBufferizationInterfaces.*`, `PimBufferization.td`).
|
|
||||||
|
|
||||||
5. **Static memory coalescing** (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
|
5. **Static memory coalescing**
|
||||||
Conservatively reuses same-typed local memref allocations inside PIM cores
|
(`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
|
||||||
after bufferization and before code generation.
|
Reuses compatible local memref allocations inside PIM cores before codegen.
|
||||||
|
|
||||||
6. **PIM code generation** (`src/PIM/Pass/PimCodegen`):
|
6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
|
||||||
- `HostConstantFolding` — folds host-side constants.
|
`src/PIM/Compiler`).
|
||||||
- `MaterializeHostConstantsPass` — materializes the remaining host
|
Folds host constants, materializes remaining host constants, verifies PIM IR,
|
||||||
constants for emission.
|
emits `.pim` core files, writes weights, and writes `memory.bin` /
|
||||||
- `VerificationPass` — checks invariants before emission.
|
`config.json`.
|
||||||
- `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
|
|
||||||
and `pim-simulator`.
|
|
||||||
|
|
||||||
Supporting pieces:
|
Supporting pieces:
|
||||||
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count,
|
- `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
|
||||||
core count, DCP window, experimental conv impl, concat error handling, …)
|
helpers.
|
||||||
and `PimCodeGen` entry points.
|
- `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
|
||||||
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`).
|
instruction format, artifact writing, weight emission, and codegen entry
|
||||||
- `src/PIM/Pass` — auxiliary passes (`MessagePass`)
|
points.
|
||||||
and the `PIMPasses.h` registry used by `PimAccelerator`.
|
- `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
|
||||||
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers
|
pass.
|
||||||
dialects, passes, and plugs Raptor into the ONNX-MLIR driver.
|
- `src/PIM/Pass` - pass registration and auxiliary passes.
|
||||||
|
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.
|
||||||
|
|
||||||
## Key compiler options
|
## Key compiler options
|
||||||
|
|
||||||
Pass these on the `onnx-mlir` command line when compiling for PIM:
|
Pass these to `onnx-mlir` when compiling for PIM:
|
||||||
|
|
||||||
- `--maccel=PIM` — select the PIM accelerator.
|
- `--maccel=PIM` - select the PIM accelerator.
|
||||||
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen`
|
- `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
|
||||||
— stop the pipeline at the requested stage (default: `EmitPimCodegen`).
|
`--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
|
||||||
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and
|
default is `--EmitPimCodegen`.
|
||||||
run only the codegen tail.
|
- `--core-count=<N>` - required positive core count for PIM compilation.
|
||||||
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and
|
- `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
|
||||||
per-core count.
|
- `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
|
||||||
- `--core-count=<N>` — number of cores. Required for PIM compilation.
|
- `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
|
||||||
- `--pim-merge-scheduler={peft,dcp}` — scheduler used by the Spatial
|
value in the current code.
|
||||||
merge-compute-nodes pass (default: `peft`).
|
- `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
|
||||||
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy).
|
the codegen tail.
|
||||||
- `--use-experimental-conv-impl` — alternative convolution lowering.
|
- `--pim-emit-json` - also emit `core_*.json` instruction files alongside
|
||||||
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`.
|
`core_*.pim`.
|
||||||
|
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
|
||||||
|
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
|
||||||
|
--maccel=PIM --EmitPimCodegen \
|
||||||
|
--crossbar-size=2048 --crossbar-count=256 --core-count=1000
|
||||||
|
```
|
||||||
|
|
||||||
|
This writes PIM artifacts under `/tmp/raptor/pim/`.
|
||||||
|
|
||||||
## Validation
|
## Validation
|
||||||
|
|
||||||
Functional validation lives in `validation/` and drives the Rust
|
Functional validation lives in `validation/`. It compiles ONNX models, builds a
|
||||||
`pim-simulator` to compare Raptor's output against a reference.
|
native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
|
||||||
|
the Rust PIM simulator, and compares outputs.
|
||||||
|
|
||||||
Per-operation validation (from `validation/`):
|
Python dependencies used by the validation scripts are `numpy`, `onnx`, and
|
||||||
|
`colorama`. The simulator requires the Rust toolchain.
|
||||||
|
|
||||||
```
|
Per-operation validation from the repository root:
|
||||||
validate.py \
|
|
||||||
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
```bash
|
||||||
--onnx-include-dir ../onnx-mlir/include \
|
python3 validation/validate.py \
|
||||||
--core-count 1000
|
--raptor-path build_release/Release/bin/onnx-mlir \
|
||||||
|
--onnx-include-dir onnx-mlir/include \
|
||||||
|
--core-count 1000
|
||||||
```
|
```
|
||||||
|
|
||||||
End-to-end network validation (example: first 4 layers of YOLOv11n):
|
Validate one network or a subset by pointing `--operations-dir` at any directory
|
||||||
|
containing `.onnx` files:
|
||||||
|
|
||||||
```
|
```bash
|
||||||
validate.py \
|
python3 validation/validate.py \
|
||||||
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
--raptor-path build_release/Release/bin/onnx-mlir \
|
||||||
--onnx-include-dir ../onnx-mlir/include \
|
--onnx-include-dir onnx-mlir/include \
|
||||||
--operations-dir ./networks/yolo11n/depth_04 \
|
--operations-dir validation/networks/yolo11n/depth_04 \
|
||||||
--crossbar-size 2048 --crossbar-count 256 --core-count 1000
|
--crossbar-size 2048 --crossbar-count 256 --core-count 1000
|
||||||
```
|
```
|
||||||
|
|
||||||
Each validation run writes debugging artifacts into the benchmark's workspace
|
Useful validation options:
|
||||||
directory (for example `validation/operations/gemm/small/`):
|
- `--simulator-dir <path>` - override the auto-detected
|
||||||
- `inputs/` — generated input CSVs used for the run.
|
`backend-simulators/pim/pim-simulator` path.
|
||||||
- `outputs/` — reference outputs dumped by the native ONNX runner.
|
- `--threshold <float>` - maximum allowed per-element output difference.
|
||||||
- `raptor/` — compiler artifacts:
|
- `--seed <int>` - RNG seed for generated inputs.
|
||||||
`*.onnx.mlir`, `dialects/spatial0.mlir`, `dialects/spatial1_dcp_merged.mlir`,
|
- `--command-timeout-seconds <float>` - timeout for compiler, runner, and
|
||||||
`dialects/pim0.mlir`, `dialects/pim1_buff.mlir`, `dialects/pim2_coalesced.mlir`,
|
simulator subprocesses.
|
||||||
`dialects/pim3_folded.mlir`, `dialects/pim4_materialized.mlir`,
|
- `--verbose` - print subprocess logs and average PIM pass timings.
|
||||||
`pim/config.json`, `pim/core_*.pim`, `pim/memory.bin`, and reports under
|
- `--clean` - remove generated validation artifacts and exit.
|
||||||
`raptor/reports/` such as `dcp_merge_report.txt`,
|
|
||||||
`memory_report.txt`, and `static_memory_coalescing_report.txt`.
|
|
||||||
- `runner/` — generated reference runner source, build tree, and shared library.
|
|
||||||
- `simulation/out.bin` — raw simulator output dump used for output comparison.
|
|
||||||
|
|
||||||
That means you usually do not need to rerun standalone `--EmitSpatial` or
|
Each validation run writes artifacts in the model workspace, for example under
|
||||||
`--EmitPim` commands while debugging validation failures: the per-pass dialect
|
`validation/operations/gemm/small/`:
|
||||||
dumps are already available under `raptor/dialects/`.
|
- `inputs/` - generated input CSV files.
|
||||||
|
- `outputs/` - native ONNX-MLIR reference outputs.
|
||||||
|
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
|
||||||
|
`dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
|
||||||
|
- `runner/` - generated reference runner source, build tree, and shared library.
|
||||||
|
- `simulation/out.bin` - raw simulator output used for comparison.
|
||||||
|
|
||||||
The validator does not currently expose a simulator tracing flag, but once a
|
The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
|
||||||
validation has produced `raptor/pim/` you can rerun the simulator manually with
|
`spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
|
||||||
tracing enabled:
|
`pim2_coalesced.mlir`, `pim3_folded.mlir`, and
|
||||||
|
`pim4_materialized.mlir` when an output directory is available.
|
||||||
|
|
||||||
|
To rerun the simulator manually with tracing after validation has produced a
|
||||||
|
`raptor/pim/` directory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd backend-simulators/pim/pim-simulator
|
cd backend-simulators/pim/pim-simulator
|
||||||
@@ -174,90 +184,138 @@ cargo run --no-default-features --features tracing --release \
|
|||||||
```
|
```
|
||||||
|
|
||||||
With `--features tracing`, the simulator writes per-core traces as
|
With `--features tracing`, the simulator writes per-core traces as
|
||||||
`simulation/TraceCore0`, `simulation/TraceCore1`, ... next to `simulation/out.bin`.
|
`TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
|
||||||
The validator normally computes the `-d` dump ranges from `raptor/pim/config.json`
|
computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.
|
||||||
and the model output shapes. If you need a clean slate before rerunning, use:
|
|
||||||
|
Available validation networks under `validation/networks/`: `vgg16`,
|
||||||
|
`yolo11n`, `yolo11nv2`.
|
||||||
|
|
||||||
|
Available operation suites under `validation/operations/`: `add`, `concat`,
|
||||||
|
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
|
||||||
|
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.
|
||||||
|
|
||||||
|
Generated operation tests can be regenerated with:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
validate.py --clean
|
python3 validation/operations/gen_tests.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
|
|
||||||
Available operations under `validation/operations/`: `add`, `conv`, `div`,
|
|
||||||
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
|
|
||||||
`sigmoid`, `softmax`, `split`.
|
|
||||||
|
|
||||||
## Rebuilding
|
|
||||||
|
|
||||||
Release build (fast):
|
|
||||||
|
|
||||||
```
|
|
||||||
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
|
|
||||||
```
|
|
||||||
|
|
||||||
A slower debug build is also available — configure it the same way but with
|
|
||||||
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).
|
|
||||||
|
|
||||||
## Build
|
## Build
|
||||||
|
|
||||||
|
Initialize submodules first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git submodule update --init --recursive
|
||||||
|
```
|
||||||
|
|
||||||
|
The project follows ONNX-MLIR's build requirements. The CI workflow documents
|
||||||
|
the currently used versions and setup:
|
||||||
|
- CMake 4.3.0 in CI,
|
||||||
|
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
|
||||||
|
- Protobuf `v34.0`,
|
||||||
|
- Rust stable for `pim-simulator`,
|
||||||
|
- Python packages `numpy`, `onnx`, `colorama` for validation.
|
||||||
|
|
||||||
### Protobuf
|
### Protobuf
|
||||||
|
|
||||||
Use the following commands to install protobuf:
|
Install Protobuf if your system does not already provide a compatible version:
|
||||||
```
|
|
||||||
|
```bash
|
||||||
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
|
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
|
||||||
cd protobuf
|
cmake -S protobuf -B protobuf/build -G Ninja \
|
||||||
mkdir build
|
-DCMAKE_BUILD_TYPE=Release \
|
||||||
cd build
|
-Dprotobuf_BUILD_TESTS=OFF
|
||||||
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
|
cmake --build protobuf/build
|
||||||
ninja
|
sudo cmake --install protobuf/build
|
||||||
sudo ninja install
|
|
||||||
```
|
```
|
||||||
|
|
||||||
You can now remove the protobuf repo directory with:
|
You can then remove the temporary checkout:
|
||||||
```
|
|
||||||
cd ../..
|
```bash
|
||||||
rm -rf protobuf
|
rm -rf protobuf
|
||||||
```
|
```
|
||||||
|
|
||||||
### Mlir
|
### MLIR
|
||||||
|
|
||||||
Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir.
|
Follow the ONNX-MLIR instructions in
|
||||||
|
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
|
||||||
|
expects `MLIR_DIR` to point at the MLIR CMake package, for example:
|
||||||
|
|
||||||
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor
|
```bash
|
||||||
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
||||||
Moreover, if compiling with build type debug, it is also suggested to use
|
|
||||||
mold as linker (you will need to install it if you don't have it already)
|
|
||||||
to reduce memory usage during linking. You can use it by setting the options:
|
|
||||||
```
|
|
||||||
-DLLVM_USE_LINKER=mold
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
If your LLVM build directory is named `build` instead of `build_release`, adjust
|
||||||
|
the path accordingly.
|
||||||
|
|
||||||
### Raptor
|
### Raptor
|
||||||
|
|
||||||
Use the following commands to build Raptor.
|
Configure a release build:
|
||||||
|
|
||||||
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor.
|
```bash
|
||||||
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
||||||
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage,
|
cmake -S . -B build_release -G Ninja \
|
||||||
setting the options:
|
|
||||||
```
|
|
||||||
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
|
||||||
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
|
||||||
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
git submodule update --init --recursive
|
|
||||||
|
|
||||||
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
|
|
||||||
mkdir build && cd build
|
|
||||||
cmake .. -G Ninja \
|
|
||||||
-DCMAKE_BUILD_TYPE=Release \
|
-DCMAKE_BUILD_TYPE=Release \
|
||||||
-DONNX_MLIR_ACCELERATORS=PIM \
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||||
-DLLVM_ENABLE_ASSERTIONS=ON \
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||||
-DMLIR_DIR=${MLIR_DIR}
|
-DMLIR_DIR=${MLIR_DIR}
|
||||||
cmake --build .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
If the build fails because of protobuf missing uint definitions,
|
Configure a debug build similarly:
|
||||||
just patch the problematic files by adding ```#include <cstdint>``` to their includes.
|
|
||||||
|
```bash
|
||||||
|
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
|
||||||
|
cmake -S . -B build_debug -G Ninja \
|
||||||
|
-DCMAKE_BUILD_TYPE=Debug \
|
||||||
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||||
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||||
|
-DMLIR_DIR=${MLIR_DIR}
|
||||||
|
```
|
||||||
|
|
||||||
|
For debug development, using `mold` can reduce link time and memory use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cmake -S . -B build_debug -G Ninja \
|
||||||
|
-DCMAKE_BUILD_TYPE=Debug \
|
||||||
|
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||||
|
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||||
|
-DMLIR_DIR=${MLIR_DIR} \
|
||||||
|
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
||||||
|
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
||||||
|
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
||||||
|
```
|
||||||
|
|
||||||
|
Build the compiler with CMake:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cmake --build ./build_release
|
||||||
|
cmake --build ./build_debug
|
||||||
|
```
|
||||||
|
|
||||||
|
Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
|
||||||
|
configuration and generated shims stay consistent.
|
||||||
|
|
||||||
|
If a build fails because Protobuf headers are missing fixed-width integer
|
||||||
|
definitions, patch the affected Protobuf-generated files by adding
|
||||||
|
`#include <cstdint>`.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
The Rust simulator has its own tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd backend-simulators/pim/pim-simulator
|
||||||
|
cargo test
|
||||||
|
```
|
||||||
|
|
||||||
|
## Repository Layout
|
||||||
|
|
||||||
|
- `src/PIM/` - PIM accelerator implementation.
|
||||||
|
- `test/PIM/` - PIM C++ unit tests.
|
||||||
|
- `validation/` - functional validation scripts, ONNX operation tests, network
|
||||||
|
slices, and pimsim config generation.
|
||||||
|
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
|
||||||
|
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
|
||||||
|
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
|
||||||
|
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
|
||||||
|
for MLIR/Protobuf caching, building Raptor, and validation.
|
||||||
|
|||||||
Reference in New Issue
Block a user