175 lines
6.9 KiB
Markdown
175 lines
6.9 KiB
Markdown
# scalometer - parallel kernel benchmarking
|
||
|
||
This project provides a benchmarking tool for benchmarking parallelization strategies with kernels found in HPC applications.
|
||
It is designed to make adding kernels and parallelization strategies easy.
|
||
|
||
## Features
|
||
|
||
- **Kernel Registry**: A registry that allows the user to register and execute different computational kernels easily.
|
||
- **Parallelization Strategies**: Two strategies for parallelizing the execution of kernel loops:
|
||
- **OpenMP**: Uses OpenMP directives to parallelize the outermost loop.
|
||
- **Eventify**: Uses the Eventify tasking system for parallelism.
|
||
- **Kernel Execution**: Kernels such as **STREAM TRIAD** and **DAXPY** are implemented, and their execution can be timed and compared across different parallelization strategies.
|
||
|
||
## Contact
|
||
|
||
In case of troubles and feature requests, be welcome to open issues and pull requests.
|
||
You may as well contact the author Patrick Lipka (patrick.lipka@sipearl.com).
|
||
|
||
## Project Structure
|
||
```
|
||
.
|
||
├── bin/ # Compiled executable
|
||
├── include/ # Header files
|
||
│ ├── kernels.hpp # Kernel and KernelRegistry declarations
|
||
│ ├── strategy.hpp # Parallelization strategies (OpenMP, Eventify)
|
||
│ └── utils.hpp # Utility functions for initialization
|
||
├── src/ # Source files
|
||
│ ├── kernels.cpp # Kernel and KernelRegistry implementations
|
||
│ ├── strategy.cpp # Parallelization strategies (OpenMP, Eventify)
|
||
│ ├── main.cpp # Main entry point for benchmarking
|
||
├── Makefile # Makefile to build the project
|
||
└── README.md # Project documentation
|
||
```
|
||
## Requirements
|
||
|
||
- C++20 or higher
|
||
- OpenMP support (for OpenMP parallelization strategy)
|
||
- Eventify library (for Eventify parallelization strategy)
|
||
- Limitation: Providing installs for all implemented parallelization strategies is mandantory at this point. Selective compilation of strategies might be added later if needed.
|
||
|
||
### Dependencies:
|
||
|
||
- **Eventify**: Ensure that the Eventify library is properly installed and the environment variable `EVENTIFY_ROOT` points to the root directory of the Eventify installation.
|
||
|
||
## Building the Project
|
||
|
||
To build the project, run:
|
||
|
||
```
|
||
make
|
||
```
|
||
|
||
This will compile the source files and generate an executable called `benchmark` in the `bin/` directory.
|
||
Similar to the STREAM benchmark´s Makefile, the vector sizes are defined by the preprocessor variable `VECTOR_SIZE` that can be set in the Makefile.
|
||
|
||
### Clean Up
|
||
|
||
To remove all compiled files and the executable, run:
|
||
|
||
```
|
||
make clean
|
||
```
|
||
|
||
## Usage
|
||
|
||
### Running the Benchmark
|
||
|
||
To run a kernel benchmark, use the following command:
|
||
|
||
```
|
||
./bin/benchmark <kernel_name> <strategy> <num_threads_or_tasks>
|
||
```
|
||
|
||
- `<kernel_name>`: The name of the kernel to run. Example: `stream_triad`
|
||
- `<strategy>`: The parallelization strategy to use. Available options: `omp` (for OpenMP) and `eventify` (for Eventify).
|
||
- `<num_threads_or_tasks>`: The number of threads or tasks to use for parallel execution. This depends on the parallelization strategy (e.g., number of threads for OpenMP, number of tasks for Eventify).
|
||
|
||
### Example:
|
||
|
||
To run the `stream_triad` kernel with the OpenMP strategy using 4 threads:
|
||
|
||
```
|
||
./bin/benchmark stream_triad omp 4
|
||
```
|
||
|
||
To run the `daxpy` kernel with the Eventify strategy using 8 tasks:
|
||
|
||
```
|
||
./bin/benchmark daxpy eventify 8
|
||
```
|
||
|
||
### Error Handling
|
||
|
||
- If an invalid kernel name is provided, the program will print an error message and list available kernels.
|
||
|
||
Example of an invalid kernel name:
|
||
|
||
```
|
||
$ ./bin/benchmark invalid_kernel omp 4
|
||
Kernel not found: invalid_kernel
|
||
Available kernels are:
|
||
- stream_triad
|
||
- daxpy
|
||
```
|
||
|
||
## Adding New Kernels
|
||
|
||
To add a new kernel to the project, follow these steps:
|
||
|
||
1. **Define the Kernel**:
|
||
- Open the `src/kernels.cpp` file and scroll to the section where new kernels are registered (around the `initialize_registry` function).
|
||
- Use the existing kernels (`stream_triad` and `daxpy`) as templates. Create a new kernel by adding a lambda to the `register_kernel` method.
|
||
- The number, types and initialization of arguments can be choosen freely.
|
||
- Note that you only need to provide the loop body / inner loops of a loop nest. The outer loop with induction variable `int i` is defined as part of the parallelization strategy already.
|
||
|
||
For example, to add a new **vector product** kernel, you can do the following:
|
||
|
||
```
|
||
registry->register_kernel("vector_product", [&]() {
|
||
auto a = std::make_shared<std::vector<float>>();
|
||
auto b = std::make_shared<std::vector<float>>();
|
||
auto c = std::make_shared<std::vector<float>>();
|
||
|
||
auto prepare = [=]() {
|
||
a->resize(VECTOR_SIZE);
|
||
b->resize(VECTOR_SIZE);
|
||
c->resize(VECTOR_SIZE);
|
||
initialize_vector(*b);
|
||
initialize_vector(*c);
|
||
};
|
||
|
||
auto execute = [=](int kernel_start_idx, int kernel_end_idx, int n_tasks_or_threads) {
|
||
strategy::execute_strategy(strategy_name, kernel_start_idx, kernel_end_idx, n_tasks_or_threads, [&](int i) {
|
||
(*a)[i] = (*b)[i] * (*c)[i]; // Vector product operation
|
||
});
|
||
};
|
||
|
||
return Kernel("vector_product", execute, prepare);
|
||
});
|
||
```
|
||
|
||
In this example:
|
||
- `a`, `b`, and `c` are the vectors used for the operation.
|
||
- `prepare` initializes these vectors and fills them with random values using the `initialize_vector` function.
|
||
- `execute` contains the vector product logic, where each element in vector `a` is computed as the product of corresponding elements in vectors `b` and `c`.
|
||
|
||
2. **Register the Kernel**:
|
||
- The new kernel should be automatically registered when the `initialize_registry` function is called. This is done dynamically through the registry.
|
||
|
||
3. **Use the Kernel**:
|
||
- Once you have added the kernel to the registry, you can run it just like the existing kernels using the `./bin/benchmark` command. For example:
|
||
|
||
```
|
||
./bin/benchmark vector_product omp 4
|
||
```
|
||
|
||
### Notes on Adding Kernels:
|
||
|
||
- Kernels must be registered with a **name** (e.g., `"vector_product"`) and should include the corresponding **allocations and data initialization** (`prepare`) and **kernel logic** (`execute`).
|
||
- Kernels must consist out of an outer loop at least for now.
|
||
- The kernel’s execution should be parallelizable using all of the available strategies (`omp` (OpenMP) and `eventify` (Eventify) for now). You can add more strategies by extending the `strategy` namespace.
|
||
- The `VECTOR_SIZE` preprocessor variable defines the size of the input data and should be appropriate for the kernel you are implementing.
|
||
|
||
## Known Isuues and Limitations
|
||
- The instantiation of Eventify's `task_system` is inckluded in the kernel timing, leading to a constant overhead compared to OpenMP. On NVIDIA Grace, this is 2.8 ms. It's ongoning discussion whether to include it or not.
|
||
|
||
|
||
## Contributing
|
||
|
||
Feel free to submit issues or pull requests to improve the project.
|
||
|
||
## License
|
||
|
||
This project is licensed under the MIT License.
|