- **Kernel Registry**: A registry that allows the user to register and execute different computational kernels easily.
- **Parallelization Strategies**: Two strategies for parallelizing the execution of kernel loops:
- **OpenMP**: Uses OpenMP directives to parallelize the outermost loop.
- **Eventify**: Uses the Eventify tasking system for parallelism.
- **Kernel Execution**: Kernels such as **STREAM TRIAD** and **DAXPY** are implemented, and their execution can be timed and compared across different parallelization strategies.
- Limitation: Providing installs for all implemented parallelization strategies is mandantory at this point. Selective compilation of strategies might be added later if needed.
- **Eventify**: Ensure that the Eventify library is properly installed and the environment variable `EVENTIFY_ROOT` points to the root directory of the Eventify installation.
## Building the Project
To build the project, run:
```
make
```
This will compile the source files and generate an executable called `benchmark` in the `bin/` directory.
-`<kernel_name>`: The name of the kernel to run. Example: `stream_triad`
-`<strategy>`: The parallelization strategy to use. Available options: `omp` (for OpenMP) and `eventify` (for Eventify).
-`<num_threads_or_tasks>`: The number of threads or tasks to use for parallel execution. This depends on the parallelization strategy (e.g., number of threads for OpenMP, number of tasks for Eventify).
### Example:
To run the `stream_triad` kernel with the OpenMP strategy using 4 threads:
```
./bin/benchmark stream_triad omp 4
```
To run the `daxpy` kernel with the Eventify strategy using 8 tasks:
```
./bin/benchmark daxpy eventify 8
```
### Error Handling
- If an invalid kernel name is provided, the program will print an error message and list available kernels.
- The number, types and initialization of arguments can be choosen freely.
- Note that you only need to provide the loop body / inner loops of a loop nest. The outer loop with induction variable `int i` is defined as part of the parallelization strategy already.
-`a`, `b`, and `c` are the vectors used for the operation.
-`prepare` initializes these vectors and fills them with random values using the `initialize_vector` function.
-`execute` contains the vector product logic, where each element in vector `a` is computed as the product of corresponding elements in vectors `b` and `c`.
2.**Register the Kernel**:
- The new kernel should be automatically registered when the `initialize_registry` function is called. This is done dynamically through the registry.
3.**Use the Kernel**:
- Once you have added the kernel to the registry, you can run it just like the existing kernels using the `./bin/benchmark` command. For example:
```
./bin/benchmark vector_product omp 4
```
### Notes on Adding Kernels:
- Kernels must be registered with a **name** (e.g., `"vector_product"`) and should include the corresponding **allocations and data initialization** (`prepare`) and **kernel logic** (`execute`).
- Kernels must consist out of an outer loop at least for now.
- The kernel’s execution should be parallelizable using all of the available strategies (`omp` (OpenMP) and `eventify` (Eventify) for now). You can add more strategies by extending the `strategy` namespace.
- The `VECTOR_SIZE` preprocessor variable defines the size of the input data and should be appropriate for the kernel you are implementing.
- The instantiation of Eventify's `task_system` is inckluded in the kernel timing, leading to a constant overhead compared to OpenMP. On NVIDIA Grace, this is 2.8 ms. It's ongoning discussion whether to include it or not.