This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
- Always read the full README.md before doing anything.
|
||||
- Build commands:
|
||||
- `cmake --build ./build_release --target onnx-mlir -j 30`
|
||||
- `cmake --build ./build_debug --target onnx-mlir -j 30`
|
||||
- `cmake --build ./build_release`
|
||||
- `cmake --build ./build_debug`
|
||||
- Never use `ninja` directly: it bypasses cmake's configuration and invalidates the build cache.
|
||||
|
||||
# Code changes
|
||||
|
||||
@@ -1,168 +1,178 @@
|
||||
# Raptor
|
||||
|
||||
Raptor is a domain-specific MLIR compiler for neural networks (ONNX format)
|
||||
targeting in-memory computing / processing-in-memory (PIM) architectures.
|
||||
It progressively lowers ONNX-MLIR through a set of MLIR dialects down to
|
||||
target-specific artifacts (currently JSON code for the `pimsim-nn` simulator).
|
||||
Raptor is a domain-specific MLIR compiler for neural networks in ONNX format,
|
||||
targeting in-memory computing / processing-in-memory (PIM) architectures. It
|
||||
extends ONNX-MLIR with a PIM accelerator and progressively lowers ONNX-MLIR
|
||||
through custom MLIR dialects to simulator artifacts.
|
||||
|
||||
The current target is the PIM simulator stack under `backend-simulators/pim`.
|
||||
Raptor emits binary per-core `.pim` instruction files by default, plus
|
||||
`memory.bin`, `config.json`, and weight binaries. It can also emit per-core JSON
|
||||
instruction files with `--pim-emit-json`.
|
||||
|
||||
## Overview
|
||||
|
||||
PIM architectures perform most of the computation directly in memory.
|
||||
Raptor's first supported target is `pimsim-nn`, which simulates a chip with:
|
||||
- a shared host memory,
|
||||
- a number of cores that do most of the computation directly in their memory
|
||||
(vector ops, vmm/mvm on ReRAM crossbars),
|
||||
- no branching instructions (branchless architecture) and no hardware loop
|
||||
support — any repeated work (e.g. convolutions) must be unrolled into
|
||||
explicit per-iteration instructions.
|
||||
PIM architectures perform most computation directly in memory. The supported
|
||||
target models a chip with:
|
||||
- shared host memory,
|
||||
- multiple PIM cores,
|
||||
- ReRAM crossbars for vector-matrix / matrix-vector work,
|
||||
- explicit communication between cores,
|
||||
- no hardware branch or loop support in emitted simulator code.
|
||||
|
||||
Because of this, the amount of emitted instructions explodes quickly and the
|
||||
compiler must optimize aggressively at every stage to keep compilation
|
||||
tractable.
|
||||
|
||||
A second target, `PulPim`, is planned for an accelerator with RISC-V cores
|
||||
each carrying its own in-memory computing unit and crossbars. It will live in
|
||||
a dedicated dialect (future work).
|
||||
Because repeated work such as convolutions is eventually made explicit, emitted
|
||||
instruction counts can grow quickly. Most compiler work therefore focuses on
|
||||
lowering, scheduling, memory layout, and code-generation optimizations.
|
||||
|
||||
### Targets and simulators
|
||||
|
||||
`pimsim-nn` (under `backend-simulators/pim/pimsim-nn`) is used for
|
||||
**performance** estimates (latency, energy), but does not functionally execute
|
||||
the JSON code it consumes. To validate the numerical correctness of the JSON
|
||||
code produced by Raptor (or, for comparison, by the `pimcomp` compiler), we use
|
||||
a Rust simulator we maintain in-tree at
|
||||
`backend-simulators/pim/pim-simulator`.
|
||||
- `backend-simulators/pim/pim-simulator` is the in-tree Rust functional
|
||||
simulator used by validation. It reads Raptor's `pim/` artifact directory and
|
||||
compares simulator output against native ONNX-MLIR execution.
|
||||
- `backend-simulators/pim/pimsim-nn` is the performance simulator submodule.
|
||||
The helper scripts in `pimcomp_utils/` are for comparison with PIMCOMP-NN and
|
||||
contain local paths; treat them as local utilities, not portable workflows.
|
||||
|
||||
## Compilation pipeline
|
||||
|
||||
The PIM-related sources live under `src/PIM` and the tests under `test/PIM`.
|
||||
When working on this codebase, most changes should stay confined to those
|
||||
trees (you only need to look outside, e.g. at `onnx-mlir` or `llvm`, for
|
||||
framework-level details).
|
||||
The PIM sources live under `src/PIM` and tests under `test/PIM`. CMake exposes
|
||||
them to ONNX-MLIR through generated shim directories under
|
||||
`onnx-mlir/src/Accelerators/PIM` and `onnx-mlir/test/accelerators/PIM`.
|
||||
|
||||
High-level lowering flow:
|
||||
|
||||
```
|
||||
ONNX-MLIR ──► Spatial ──► Pim (tensor) ──► Pim (bufferized) ──► PIM code
|
||||
ONNX-MLIR -> Spatial -> Pim (tensor) -> Pim (bufferized) -> PIM artifacts
|
||||
```
|
||||
|
||||
1. **ONNX → Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
||||
Lowers ONNX ops into the `spat` dialect (`src/PIM/Dialect/Spatial`).
|
||||
Spatial models a high-level spatial in-memory accelerator: vmm/mvm
|
||||
operations are accelerated by storing a constant RHS matrix into a
|
||||
crossbar. Crossbars cannot be re-programmed during execution, have a
|
||||
limited fixed size, and there is a limited number of them per core.
|
||||
Conversion patterns are split by op family under
|
||||
`Conversion/ONNXToSpatial/Patterns/{Math,NN,Tensor}` (Conv, Gemm, MatMul,
|
||||
Elementwise, ReduceMean, Pool, Relu, Sigmoid, Softmax, Concat, Gather,
|
||||
Reshape, Resize, Split, etc...).
|
||||
1. **ONNX -> Spatial** (`src/PIM/Conversion/ONNXToSpatial`).
|
||||
Lowers supported ONNX ops into the `spat` dialect
|
||||
(`src/PIM/Dialect/Spatial`). Conversion patterns are split by op family under
|
||||
`Patterns/{Math,NN,Tensor}` and currently cover Conv, Gemm, MatMul,
|
||||
elementwise Add/Mul/Div, ReduceMean, pooling, Relu, Sigmoid, Softmax,
|
||||
Concat, Gather, Reshape, Resize, and Split.
|
||||
|
||||
2. **Spatial → Pim** (`src/PIM/Conversion/SpatialToPim`).
|
||||
Lowers Spatial to the `pim` dialect (`src/PIM/Dialect/Pim`), which
|
||||
materializes PIM cores (`pim.core`), inter-core communication
|
||||
(`pim.send` / `pim.receive`), halts, and crossbar-level operations.
|
||||
2. **Merge compute nodes**
|
||||
(`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
||||
Builds a compute graph, schedules it with the PEFT scheduler, and materializes
|
||||
the merge schedule into Spatial IR. Supporting scheduling code lives under
|
||||
`MergeComputeNodes/Scheduling`.
|
||||
|
||||
3. **Merge compute nodes** (`src/PIM/Dialect/Spatial/Transforms/MergeComputeNodes`).
|
||||
A PEFT heuristic that coarsens the virtual node graph and decides how to group compute
|
||||
nodes onto cores. Our implementation is only DCP-*inspired*: it is a
|
||||
heuristic with different assumptions from the paper (different cost
|
||||
model, constraints from crossbar capacity / core resources, and a
|
||||
windowed coarsening loop instead of full-graph reprioritization). The
|
||||
`dcp-critical-window-size` option controls how many lowest-slack virtual
|
||||
nodes each coarsening iteration considers (0 = legacy full-graph
|
||||
analysis). Related sources: `DCPGraph/DCPAnalysis.cpp`, `Graph.cpp/.hpp`,
|
||||
`MergeComputeNodesPass.cpp`.
|
||||
3. **Spatial -> Pim** (`src/PIM/Conversion/SpatialToPim`).
|
||||
Lowers Spatial operations to the `pim` dialect (`src/PIM/Dialect/Pim`),
|
||||
including `pim.core`, `pim.core_batch`, communication, tensor packing, global
|
||||
tensor materialization, and return-path normalization.
|
||||
|
||||
4. **Bufferization** (`src/PIM/Dialect/Pim/Transforms/Bufferization`).
|
||||
Converts tensor-semantics PIM IR into memref-semantics PIM IR using the
|
||||
standard MLIR `BufferizableOpInterface` machinery
|
||||
(`OpBufferizationInterfaces.*`, `PimBufferization.td`).
|
||||
Converts tensor-semantics PIM IR into memref-semantics PIM IR using MLIR's
|
||||
bufferization interfaces.
|
||||
|
||||
5. **Static memory coalescing** (`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
|
||||
Conservatively reuses same-typed local memref allocations inside PIM cores
|
||||
after bufferization and before code generation.
|
||||
5. **Static memory coalescing**
|
||||
(`src/PIM/Dialect/Pim/Transforms/StaticMemoryCoalescing`).
|
||||
Reuses compatible local memref allocations inside PIM cores before codegen.
|
||||
|
||||
6. **PIM code generation** (`src/PIM/Pass/PimCodegen`):
|
||||
- `HostConstantFolding` — folds host-side constants.
|
||||
- `MaterializeHostConstantsPass` — materializes the remaining host
|
||||
constants for emission.
|
||||
- `VerificationPass` — checks invariants before emission.
|
||||
- `EmitPimJsonPass` — emits the final PIM JSON consumed by `pimsim-nn`
|
||||
and `pim-simulator`.
|
||||
6. **PIM code generation** (`src/PIM/Pass/PimCodegen` and
|
||||
`src/PIM/Compiler`).
|
||||
Folds host constants, materializes remaining host constants, verifies PIM IR,
|
||||
emits `.pim` core files, writes weights, and writes `memory.bin` /
|
||||
`config.json`.
|
||||
|
||||
Supporting pieces:
|
||||
- `src/PIM/Compiler` — PIM-specific compiler options (crossbar size/count,
|
||||
core count, DCP window, experimental conv impl, concat error handling, …)
|
||||
and `PimCodeGen` entry points.
|
||||
- `src/PIM/Common` — shared utilities (`PimCommon`, `LabeledList`).
|
||||
- `src/PIM/Pass` — auxiliary passes (`MessagePass`)
|
||||
and the `PIMPasses.h` registry used by `PimAccelerator`.
|
||||
- `src/PIM/PimAccelerator.{cpp,hpp}` — accelerator entry point: registers
|
||||
dialects, passes, and plugs Raptor into the ONNX-MLIR driver.
|
||||
- `src/PIM/Common` - shared IR, filesystem, diagnostics, reports, and utility
|
||||
helpers.
|
||||
- `src/PIM/Compiler` - PIM compiler options, memory/address planning, binary
|
||||
instruction format, artifact writing, weight emission, and codegen entry
|
||||
points.
|
||||
- `src/PIM/Conversion/SpatialToGraphviz` - optional Spatial graphviz conversion
|
||||
pass.
|
||||
- `src/PIM/Pass` - pass registration and auxiliary passes.
|
||||
- `src/PIM/PimAccelerator.{cpp,hpp}` - ONNX-MLIR accelerator entry point.
|
||||
|
||||
## Key compiler options
|
||||
|
||||
Pass these on the `onnx-mlir` command line when compiling for PIM:
|
||||
Pass these to `onnx-mlir` when compiling for PIM:
|
||||
|
||||
- `--maccel=PIM` — select the PIM accelerator.
|
||||
- `--EmitSpatial` / `--EmitPim` / `--EmitPimBufferized` / `--EmitPimCodegen`
|
||||
— stop the pipeline at the requested stage (default: `EmitPimCodegen`).
|
||||
- `--pim-only-codegen` — assume the input is already bufferized PIM IR and
|
||||
run only the codegen tail.
|
||||
- `--crossbar-size=<N>` / `--crossbar-count=<N>` — crossbar dimensions and
|
||||
per-core count.
|
||||
- `--core-count=<N>` — number of cores. Required for PIM compilation.
|
||||
- `--pim-merge-scheduler={peft,dcp}` — scheduler used by the Spatial
|
||||
merge-compute-nodes pass (default: `peft`).
|
||||
- `--dcp-critical-window-size=<N>` — DCP coarsening window (0 = legacy).
|
||||
- `--use-experimental-conv-impl` — alternative convolution lowering.
|
||||
- `--ignore-concat-error` — soft-fail corner case in `ConcatOp`.
|
||||
- `--maccel=PIM` - select the PIM accelerator.
|
||||
- `--EmitSpatial`, `--EmitPim`, `--EmitPimBufferized`,
|
||||
`--EmitPimCodegen` - stop the PIM pipeline at the requested stage. The PIM
|
||||
default is `--EmitPimCodegen`.
|
||||
- `--core-count=<N>` - required positive core count for PIM compilation.
|
||||
- `--crossbar-size=<N>` - crossbar width/height. Default in code is `2`.
|
||||
- `--crossbar-count=<N>` - crossbars per core. Default in code is `256`.
|
||||
- `--pim-merge-scheduler=peft` - merge scheduler. `peft` is the only accepted
|
||||
value in the current code.
|
||||
- `--pim-only-codegen` - assume input is already bufferized PIM IR and only run
|
||||
the codegen tail.
|
||||
- `--pim-emit-json` - also emit `core_*.json` instruction files alongside
|
||||
`core_*.pim`.
|
||||
- `--use-experimental-conv-impl` - use the alternate convolution lowering.
|
||||
- `--ignore-concat-error` - soft-fail a ConcatOp corner case.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
./build_release/Release/bin/onnx-mlir model.onnx -o /tmp/raptor/model \
|
||||
--maccel=PIM --EmitPimCodegen \
|
||||
--crossbar-size=2048 --crossbar-count=256 --core-count=1000
|
||||
```
|
||||
|
||||
This writes PIM artifacts under `/tmp/raptor/pim/`.
|
||||
|
||||
## Validation
|
||||
|
||||
Functional validation lives in `validation/` and drives the Rust
|
||||
`pim-simulator` to compare Raptor's output against a reference.
|
||||
Functional validation lives in `validation/`. It compiles ONNX models, builds a
|
||||
native ONNX-MLIR reference runner, generates random inputs, runs Raptor, runs
|
||||
the Rust PIM simulator, and compares outputs.
|
||||
|
||||
Per-operation validation (from `validation/`):
|
||||
Python dependencies used by the validation scripts are `numpy`, `onnx`, and
|
||||
`colorama`. The simulator requires the Rust toolchain.
|
||||
|
||||
```
|
||||
validate.py \
|
||||
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
||||
--onnx-include-dir ../onnx-mlir/include \
|
||||
Per-operation validation from the repository root:
|
||||
|
||||
```bash
|
||||
python3 validation/validate.py \
|
||||
--raptor-path build_release/Release/bin/onnx-mlir \
|
||||
--onnx-include-dir onnx-mlir/include \
|
||||
--core-count 1000
|
||||
```
|
||||
|
||||
End-to-end network validation (example: first 4 layers of YOLOv11n):
|
||||
Validate one network or a subset by pointing `--operations-dir` at any directory
|
||||
containing `.onnx` files:
|
||||
|
||||
```
|
||||
validate.py \
|
||||
--raptor-path ../cmake-build-release/Release/bin/onnx-mlir \
|
||||
--onnx-include-dir ../onnx-mlir/include \
|
||||
--operations-dir ./networks/yolo11n/depth_04 \
|
||||
```bash
|
||||
python3 validation/validate.py \
|
||||
--raptor-path build_release/Release/bin/onnx-mlir \
|
||||
--onnx-include-dir onnx-mlir/include \
|
||||
--operations-dir validation/networks/yolo11n/depth_04 \
|
||||
--crossbar-size 2048 --crossbar-count 256 --core-count 1000
|
||||
```
|
||||
|
||||
Each validation run writes debugging artifacts into the benchmark's workspace
|
||||
directory (for example `validation/operations/gemm/small/`):
|
||||
- `inputs/` — generated input CSVs used for the run.
|
||||
- `outputs/` — reference outputs dumped by the native ONNX runner.
|
||||
- `raptor/` — compiler artifacts:
|
||||
`*.onnx.mlir`, `dialects/spatial0.mlir`, `dialects/spatial1_dcp_merged.mlir`,
|
||||
`dialects/pim0.mlir`, `dialects/pim1_buff.mlir`, `dialects/pim2_coalesced.mlir`,
|
||||
`dialects/pim3_folded.mlir`, `dialects/pim4_materialized.mlir`,
|
||||
`pim/config.json`, `pim/core_*.pim`, `pim/memory.bin`, and reports under
|
||||
`raptor/reports/` such as `dcp_merge_report.txt`,
|
||||
`memory_report.txt`, and `static_memory_coalescing_report.txt`.
|
||||
- `runner/` — generated reference runner source, build tree, and shared library.
|
||||
- `simulation/out.bin` — raw simulator output dump used for output comparison.
|
||||
Useful validation options:
|
||||
- `--simulator-dir <path>` - override the auto-detected
|
||||
`backend-simulators/pim/pim-simulator` path.
|
||||
- `--threshold <float>` - maximum allowed per-element output difference.
|
||||
- `--seed <int>` - RNG seed for generated inputs.
|
||||
- `--command-timeout-seconds <float>` - timeout for compiler, runner, and
|
||||
simulator subprocesses.
|
||||
- `--verbose` - print subprocess logs and average PIM pass timings.
|
||||
- `--clean` - remove generated validation artifacts and exit.
|
||||
|
||||
That means you usually do not need to rerun standalone `--EmitSpatial` or
|
||||
`--EmitPim` commands while debugging validation failures: the per-pass dialect
|
||||
dumps are already available under `raptor/dialects/`.
|
||||
Each validation run writes artifacts in the model workspace, for example under
|
||||
`validation/operations/gemm/small/`:
|
||||
- `inputs/` - generated input CSV files.
|
||||
- `outputs/` - native ONNX-MLIR reference outputs.
|
||||
- `raptor/` - compiler artifacts, including `*.onnx.mlir`, dialect dumps under
|
||||
`dialects/`, reports under `reports/`, and final PIM artifacts under `pim/`.
|
||||
- `runner/` - generated reference runner source, build tree, and shared library.
|
||||
- `simulation/out.bin` - raw simulator output used for comparison.
|
||||
|
||||
The validator does not currently expose a simulator tracing flag, but once a
|
||||
validation has produced `raptor/pim/` you can rerun the simulator manually with
|
||||
tracing enabled:
|
||||
The compiler currently dumps dialect snapshots such as `spatial0.mlir`,
|
||||
`spatial1_dcp_merged.mlir`, `pim0.mlir`, `pim1_buff.mlir`,
|
||||
`pim2_coalesced.mlir`, `pim3_folded.mlir`, and
|
||||
`pim4_materialized.mlir` when an output directory is available.
|
||||
|
||||
To rerun the simulator manually with tracing after validation has produced a
|
||||
`raptor/pim/` directory:
|
||||
|
||||
```bash
|
||||
cd backend-simulators/pim/pim-simulator
|
||||
@@ -174,90 +184,138 @@ cargo run --no-default-features --features tracing --release \
|
||||
```
|
||||
|
||||
With `--features tracing`, the simulator writes per-core traces as
|
||||
`simulation/TraceCore0`, `simulation/TraceCore1`, ... next to `simulation/out.bin`.
|
||||
The validator normally computes the `-d` dump ranges from `raptor/pim/config.json`
|
||||
and the model output shapes. If you need a clean slate before rerunning, use:
|
||||
`TraceCore0`, `TraceCore1`, ... next to `out.bin`. The validator normally
|
||||
computes the `-d` ranges from `raptor/pim/config.json` and model output shapes.
|
||||
|
||||
Available validation networks under `validation/networks/`: `vgg16`,
|
||||
`yolo11n`, `yolo11nv2`.
|
||||
|
||||
Available operation suites under `validation/operations/`: `add`, `concat`,
|
||||
`conv`, `div`, `gather`, `gemm`, `gemv`, `matmul`, `mul`, `pool`,
|
||||
`reduce_mean`, `relu`, `reshape`, `resize`, `sigmoid`, `softmax`, `split`.
|
||||
|
||||
Generated operation tests can be regenerated with:
|
||||
|
||||
```bash
|
||||
validate.py --clean
|
||||
python3 validation/operations/gen_tests.py
|
||||
```
|
||||
|
||||
Available networks under `validation/networks/`: `vgg16`, `yolo11n`.
|
||||
Available operations under `validation/operations/`: `add`, `conv`, `div`,
|
||||
`gather`, `gemm`, `gemv`, `mul`, `pool`, `reduce_mean`, `relu`, `resize`,
|
||||
`sigmoid`, `softmax`, `split`.
|
||||
|
||||
## Rebuilding
|
||||
|
||||
Release build (fast):
|
||||
|
||||
```
|
||||
cmake --build /home/nico/raptor/raptor/cmake-build-release --target onnx-mlir -j 30
|
||||
```
|
||||
|
||||
A slower debug build is also available — configure it the same way but with
|
||||
`-DCMAKE_BUILD_TYPE=Debug` (see installation instructions below).
|
||||
|
||||
## Build
|
||||
|
||||
Initialize submodules first:
|
||||
|
||||
```bash
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
The project follows ONNX-MLIR's build requirements. The CI workflow documents
|
||||
the currently used versions and setup:
|
||||
- CMake 4.3.0 in CI,
|
||||
- LLVM/MLIR checked out under `onnx-mlir/llvm-project`,
|
||||
- Protobuf `v34.0`,
|
||||
- Rust stable for `pim-simulator`,
|
||||
- Python packages `numpy`, `onnx`, `colorama` for validation.
|
||||
|
||||
### Protobuf
|
||||
|
||||
Use the following commands to install protobuf:
|
||||
```
|
||||
Install Protobuf if your system does not already provide a compatible version:
|
||||
|
||||
```bash
|
||||
git clone --depth 1 --branch v34.0 https://github.com/protocolbuffers/protobuf
|
||||
cd protobuf
|
||||
mkdir build
|
||||
cd build
|
||||
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
|
||||
ninja
|
||||
sudo ninja install
|
||||
cmake -S protobuf -B protobuf/build -G Ninja \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-Dprotobuf_BUILD_TESTS=OFF
|
||||
cmake --build protobuf/build
|
||||
sudo cmake --install protobuf/build
|
||||
```
|
||||
|
||||
You can now remove the protobuf repo directory with:
|
||||
```
|
||||
cd ../..
|
||||
You can then remove the temporary checkout:
|
||||
|
||||
```bash
|
||||
rm -rf protobuf
|
||||
```
|
||||
|
||||
### Mlir
|
||||
### MLIR
|
||||
|
||||
Follow the first part of instructions [here](onnx-mlir/docs/BuildOnLinuxOSX.md) to build mlir.
|
||||
Follow the ONNX-MLIR instructions in
|
||||
`onnx-mlir/docs/BuildOnLinuxOSX.md` to build LLVM/MLIR. The local Raptor build
|
||||
expects `MLIR_DIR` to point at the MLIR CMake package, for example:
|
||||
|
||||
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor
|
||||
|
||||
Moreover, if compiling with build type debug, it is also suggested to use
|
||||
mold as linker (you will need to install it if you don't have it already)
|
||||
to reduce memory usage during linking. You can use it by setting the options:
|
||||
```
|
||||
-DLLVM_USE_LINKER=mold
|
||||
```bash
|
||||
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
||||
```
|
||||
|
||||
If your LLVM build directory is named `build` instead of `build_release`, adjust
|
||||
the path accordingly.
|
||||
|
||||
### Raptor
|
||||
|
||||
Use the following commands to build Raptor.
|
||||
Configure a release build:
|
||||
|
||||
Remember to set ```-DCMAKE_BUILD_TYPE=Debug``` for developing on Raptor.
|
||||
|
||||
Also in this case, it is suggested to use mold as linker to reduce link time and memory usage,
|
||||
setting the options:
|
||||
```
|
||||
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
||||
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
||||
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
||||
```
|
||||
|
||||
```
|
||||
git submodule update --init --recursive
|
||||
|
||||
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build/lib/cmake/mlir
|
||||
mkdir build && cd build
|
||||
cmake .. -G Ninja \
|
||||
```bash
|
||||
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_release/lib/cmake/mlir
|
||||
cmake -S . -B build_release -G Ninja \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||
-DMLIR_DIR=${MLIR_DIR}
|
||||
cmake --build .
|
||||
```
|
||||
|
||||
If the build fails because of protobuf missing uint definitions,
|
||||
just patch the problematic files by adding ```#include <cstdint>``` to their includes.
|
||||
Configure a debug build similarly:
|
||||
|
||||
```bash
|
||||
MLIR_DIR=$(pwd)/onnx-mlir/llvm-project/build_debug/lib/cmake/mlir
|
||||
cmake -S . -B build_debug -G Ninja \
|
||||
-DCMAKE_BUILD_TYPE=Debug \
|
||||
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||
-DMLIR_DIR=${MLIR_DIR}
|
||||
```
|
||||
|
||||
For debug development, using `mold` can reduce link time and memory use:
|
||||
|
||||
```bash
|
||||
cmake -S . -B build_debug -G Ninja \
|
||||
-DCMAKE_BUILD_TYPE=Debug \
|
||||
-DONNX_MLIR_ACCELERATORS=PIM \
|
||||
-DLLVM_ENABLE_ASSERTIONS=ON \
|
||||
-DMLIR_DIR=${MLIR_DIR} \
|
||||
-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" \
|
||||
-DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" \
|
||||
-DCMAKE_MODULE_LINKER_FLAGS="-fuse-ld=mold"
|
||||
```
|
||||
|
||||
Build the compiler with CMake:
|
||||
|
||||
```bash
|
||||
cmake --build ./build_release
|
||||
cmake --build ./build_debug
|
||||
```
|
||||
|
||||
Do not invoke `ninja` directly for this project; use `cmake --build` so CMake's
|
||||
configuration and generated shims stay consistent.
|
||||
|
||||
If a build fails because Protobuf headers are missing fixed-width integer
|
||||
definitions, patch the affected Protobuf-generated files by adding
|
||||
`#include <cstdint>`.
|
||||
|
||||
## Tests
|
||||
|
||||
The Rust simulator has its own tests:
|
||||
|
||||
```bash
|
||||
cd backend-simulators/pim/pim-simulator
|
||||
cargo test
|
||||
```
|
||||
|
||||
## Repository Layout
|
||||
|
||||
- `src/PIM/` - PIM accelerator implementation.
|
||||
- `test/PIM/` - PIM C++ unit tests.
|
||||
- `validation/` - functional validation scripts, ONNX operation tests, network
|
||||
slices, and pimsim config generation.
|
||||
- `backend-simulators/pim/pim-simulator/` - in-tree Rust functional simulator.
|
||||
- `backend-simulators/pim/pimsim-nn/` - performance simulator submodule.
|
||||
- `pimcomp_utils/` - local comparison helpers for PIMCOMP-NN.
|
||||
- `.github/actions/` and `.github/workflows/validate_operations.yml` - CI setup
|
||||
for MLIR/Protobuf caching, building Raptor, and validation.
|
||||
|
||||
Reference in New Issue
Block a user