scalometer/README.md

175 lines
6.9 KiB
Markdown
Raw Normal View History

2024-12-13 20:32:43 +01:00
# scalometer - parallel kernel benchmarking
2024-12-13 00:30:40 +01:00
2024-12-13 20:32:43 +01:00
This project provides a benchmarking tool for benchmarking parallelization strategies with kernels found in HPC applications.
It is designed to make adding kernels and parallelization strategies easy.
2024-12-13 00:33:08 +01:00
## Features
- **Kernel Registry**: A registry that allows the user to register and execute different computational kernels easily.
- **Parallelization Strategies**: Two strategies for parallelizing the execution of kernel loops:
- **OpenMP**: Uses OpenMP directives to parallelize the outermost loop.
- **Eventify**: Uses the Eventify tasking system for parallelism.
- **Kernel Execution**: Kernels such as **STREAM TRIAD** and **DAXPY** are implemented, and their execution can be timed and compared across different parallelization strategies.
2024-12-13 12:10:05 +01:00
## Contact
In case of troubles and feature requests, be welcome to open issues and pull requests.
You may as well contact the author Patrick Lipka (patrick.lipka@sipearl.com).
2024-12-13 00:33:08 +01:00
## Project Structure
2024-12-13 00:34:53 +01:00
```
2024-12-13 00:33:08 +01:00
.
├── bin/ # Compiled executable
├── include/ # Header files
│ ├── kernels.hpp # Kernel and KernelRegistry declarations
│ ├── strategy.hpp # Parallelization strategies (OpenMP, Eventify)
│ └── utils.hpp # Utility functions for initialization
├── src/ # Source files
│ ├── kernels.cpp # Kernel and KernelRegistry implementations
│ ├── strategy.cpp # Parallelization strategies (OpenMP, Eventify)
│ ├── main.cpp # Main entry point for benchmarking
├── Makefile # Makefile to build the project
└── README.md # Project documentation
2024-12-13 00:34:53 +01:00
```
2024-12-13 00:33:08 +01:00
## Requirements
- C++20 or higher
- OpenMP support (for OpenMP parallelization strategy)
- Eventify library (for Eventify parallelization strategy)
2024-12-13 12:04:27 +01:00
- Limitation: Providing installs for all implemented parallelization strategies is mandantory at this point. Selective compilation of strategies might be added later if needed.
2024-12-13 00:33:08 +01:00
### Dependencies:
- **Eventify**: Ensure that the Eventify library is properly installed and the environment variable `EVENTIFY_ROOT` points to the root directory of the Eventify installation.
## Building the Project
To build the project, run:
```
make
```
This will compile the source files and generate an executable called `benchmark` in the `bin/` directory.
Similar to the STREAM benchmark´s Makefile, the vector sizes are defined by the preprocessor variable `VECTOR_SIZE` that can be set in the Makefile.
2024-12-13 00:33:08 +01:00
### Clean Up
To remove all compiled files and the executable, run:
```
make clean
```
## Usage
### Running the Benchmark
To run a kernel benchmark, use the following command:
```
./bin/benchmark <kernel_name> <strategy> <num_threads_or_tasks>
```
- `<kernel_name>`: The name of the kernel to run. Example: `stream_triad`
- `<strategy>`: The parallelization strategy to use. Available options: `omp` (for OpenMP) and `eventify` (for Eventify).
- `<num_threads_or_tasks>`: The number of threads or tasks to use for parallel execution. This depends on the parallelization strategy (e.g., number of threads for OpenMP, number of tasks for Eventify).
### Example:
To run the `stream_triad` kernel with the OpenMP strategy using 4 threads:
```
./bin/benchmark stream_triad omp 4
```
To run the `daxpy` kernel with the Eventify strategy using 8 tasks:
```
./bin/benchmark daxpy eventify 8
```
### Error Handling
- If an invalid kernel name is provided, the program will print an error message and list available kernels.
Example of an invalid kernel name:
```
$ ./bin/benchmark invalid_kernel omp 4
Kernel not found: invalid_kernel
Available kernels are:
- stream_triad
- daxpy
```
## Adding New Kernels
To add a new kernel to the project, follow these steps:
1. **Define the Kernel**:
- Open the `src/kernels.cpp` file and scroll to the section where new kernels are registered (around the `initialize_registry` function).
- Use the existing kernels (`stream_triad` and `daxpy`) as templates. Create a new kernel by adding a lambda to the `register_kernel` method.
2024-12-13 01:11:04 +01:00
- The number, types and initialization of arguments can be choosen freely.
- Note that you only need to provide the loop body / inner loops of a loop nest. The outer loop with induction variable `int i` is defined as part of the parallelization strategy already.
For example, to add a new **vector product** kernel, you can do the following:
```
registry->register_kernel("vector_product", [&]() {
auto a = std::make_shared<std::vector<float>>();
auto b = std::make_shared<std::vector<float>>();
auto c = std::make_shared<std::vector<float>>();
auto prepare = [=]() {
a->resize(VECTOR_SIZE);
b->resize(VECTOR_SIZE);
c->resize(VECTOR_SIZE);
initialize_vector(*b);
initialize_vector(*c);
};
auto execute = [=](int kernel_start_idx, int kernel_end_idx, int n_tasks_or_threads) {
strategy::execute_strategy(strategy_name, kernel_start_idx, kernel_end_idx, n_tasks_or_threads, [&](int i) {
(*a)[i] = (*b)[i] * (*c)[i]; // Vector product operation
});
};
return Kernel("vector_product", execute, prepare);
});
```
In this example:
- `a`, `b`, and `c` are the vectors used for the operation.
- `prepare` initializes these vectors and fills them with random values using the `initialize_vector` function.
- `execute` contains the vector product logic, where each element in vector `a` is computed as the product of corresponding elements in vectors `b` and `c`.
2. **Register the Kernel**:
- The new kernel should be automatically registered when the `initialize_registry` function is called. This is done dynamically through the registry.
3. **Use the Kernel**:
- Once you have added the kernel to the registry, you can run it just like the existing kernels using the `./bin/benchmark` command. For example:
```
./bin/benchmark vector_product omp 4
```
### Notes on Adding Kernels:
- Kernels must be registered with a **name** (e.g., `"vector_product"`) and should include the corresponding **allocations and data initialization** (`prepare`) and **kernel logic** (`execute`).
- Kernels must consist out of an outer loop at least for now.
- The kernels execution should be parallelizable using all of the available strategies (`omp` (OpenMP) and `eventify` (Eventify) for now). You can add more strategies by extending the `strategy` namespace.
- The `VECTOR_SIZE` preprocessor variable defines the size of the input data and should be appropriate for the kernel you are implementing.
2024-12-16 11:28:28 +01:00
## Known Isuues and Limitations
- The instantiation of Eventify's `task_system` is inckluded in the kernel timing, leading to a constant overhead compared to OpenMP. On NVIDIA Grace, this is 2.8 ms. It's ongoning discussion whether to include it or not.
2024-12-13 00:33:08 +01:00
## Contributing
Feel free to submit issues or pull requests to improve the project.
## License
This project is licensed under the MIT License.