scalometer/README.md

# scalometer - parallel kernel benchmarking

This project provides a benchmarking tool for benchmarking parallelization strategies with kernels found in HPC applications.
It is designed to make adding kernels and parallelization strategies easy.

## Features

- **Kernel Registry**: A registry that allows the user to register and execute different computational kernels easily.
- **Parallelization Strategies**: Two strategies for parallelizing the execution of kernel loops:
  - **OpenMP**: Uses OpenMP directives to parallelize the outermost loop.
  - **Eventify**: Uses the Eventify tasking system for parallelism.
- **Kernel Execution**: Kernels such as **STREAM TRIAD** and **DAXPY** are implemented, and their execution can be timed and compared across different parallelization strategies.

## Contact

In case of troubles and feature requests, be welcome to open issues and pull requests. 
You may as well contact the author Patrick Lipka (patrick.lipka@sipearl.com).

## Project Structure
```
.
├── bin/              # Compiled executable
├── include/          # Header files
│   ├── kernels.hpp   # Kernel and KernelRegistry declarations
│   ├── strategy.hpp  # Parallelization strategies (OpenMP, Eventify)
│   └── utils.hpp     # Utility functions for initialization
├── src/              # Source files
│   ├── kernels.cpp   # Kernel and KernelRegistry implementations
│   ├── strategy.cpp  # Parallelization strategies (OpenMP, Eventify)
│   ├── main.cpp      # Main entry point for benchmarking
├── Makefile          # Makefile to build the project
└── README.md         # Project documentation
```
## Requirements

- C++20 or higher
- OpenMP support (for OpenMP parallelization strategy)
- Eventify library (for Eventify parallelization strategy)
- Limitation: Providing installs for all implemented parallelization strategies is mandantory at this point. Selective compilation of strategies might be added later if needed.

### Dependencies:

- **Eventify**: Ensure that the Eventify library is properly installed and the environment variable `EVENTIFY_ROOT` points to the root directory of the Eventify installation.

## Building the Project

To build the project, run:

```
make
```

This will compile the source files and generate an executable called `benchmark` in the `bin/` directory.
Similar to the STREAM benchmark´s Makefile, the vector sizes are defined by the preprocessor variable `VECTOR_SIZE` that can be set in the Makefile.

### Clean Up

To remove all compiled files and the executable, run:

```
make clean
```

## Usage

### Running the Benchmark

To run a kernel benchmark, use the following command:

```
./bin/benchmark <kernel_name> <strategy> <num_threads_or_tasks>
```

- `<kernel_name>`: The name of the kernel to run. Example: `stream_triad`
- `<strategy>`: The parallelization strategy to use. Available options: `omp` (for OpenMP) and `eventify` (for Eventify).
- `<num_threads_or_tasks>`: The number of threads or tasks to use for parallel execution. This depends on the parallelization strategy (e.g., number of threads for OpenMP, number of tasks for Eventify).

### Example:

To run the `stream_triad` kernel with the OpenMP strategy using 4 threads:

```
./bin/benchmark stream_triad omp 4
```

To run the `daxpy` kernel with the Eventify strategy using 8 tasks:

```
./bin/benchmark daxpy eventify 8
```

### Error Handling

- If an invalid kernel name is provided, the program will print an error message and list available kernels.

Example of an invalid kernel name:

```
$ ./bin/benchmark invalid_kernel omp 4
Kernel not found: invalid_kernel
Available kernels are:
  - stream_triad
  - daxpy
```

## Adding New Kernels

To add a new kernel to the project, follow these steps:

1. **Define the Kernel**:
    - Open the `src/kernels.cpp` file and scroll to the section where new kernels are registered (around the `initialize_registry` function).
    - Use the existing kernels (`stream_triad` and `daxpy`) as templates. Create a new kernel by adding a lambda to the `register_kernel` method.
    - The number, types and initialization of arguments can be choosen freely.
    - Note that you only need to provide the loop body / inner loops of a loop nest. The outer loop with induction variable `int i` is defined as part of the parallelization strategy already.

    For example, to add a new **vector product** kernel, you can do the following:

    ```
    registry->register_kernel("vector_product", [&]() {
      auto a = std::make_shared<std::vector<float>>();
      auto b = std::make_shared<std::vector<float>>();
      auto c = std::make_shared<std::vector<float>>();

      auto prepare = [=]() {
        a->resize(VECTOR_SIZE);
        b->resize(VECTOR_SIZE);
        c->resize(VECTOR_SIZE);
        initialize_vector(*b);
        initialize_vector(*c);
      };

      auto execute = [=](int kernel_start_idx, int kernel_end_idx, int n_tasks_or_threads) {
        strategy::execute_strategy(strategy_name, kernel_start_idx, kernel_end_idx, n_tasks_or_threads, [&](int i) {
          (*a)[i] = (*b)[i] * (*c)[i];  // Vector product operation
        });
      };

      return Kernel("vector_product", execute, prepare);
    });
    ```

    In this example:
    - `a`, `b`, and `c` are the vectors used for the operation.
    - `prepare` initializes these vectors and fills them with random values using the `initialize_vector` function.
    - `execute` contains the vector product logic, where each element in vector `a` is computed as the product of corresponding elements in vectors `b` and `c`.

2. **Register the Kernel**:
    - The new kernel should be automatically registered when the `initialize_registry` function is called. This is done dynamically through the registry.

3. **Use the Kernel**:
    - Once you have added the kernel to the registry, you can run it just like the existing kernels using the `./bin/benchmark` command. For example:

    ```
    ./bin/benchmark vector_product omp 4
    ```

### Notes on Adding Kernels:

- Kernels must be registered with a **name** (e.g., `"vector_product"`) and should include the corresponding **allocations and data initialization** (`prepare`) and **kernel logic** (`execute`).
- Kernels must consist out of an outer loop at least for now.
- The kernel’s execution should be parallelizable using all of the available strategies (`omp` (OpenMP) and `eventify` (Eventify) for now). You can add more strategies by extending the `strategy` namespace.
- The `VECTOR_SIZE` preprocessor variable defines the size of the input data and should be appropriate for the kernel you are implementing.


## Contributing

Feel free to submit issues or pull requests to improve the project.

## License

This project is licensed under the MIT License.
-												Changed name and description

											
										
										
											2024-12-13 20:32:43 +01:00
+								# scalometer - parallel kernel benchmarking
-												Initial commit

											
										
										
											2024-12-13 00:30:40 +01:00
-												Changed name and description

											
										
										
											2024-12-13 20:32:43 +01:00
+								This project provides a benchmarking tool for benchmarking parallelization strategies with kernels found in HPC applications.
 								It is designed to make adding kernels and parallelization strategies easy.
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
 								## Features
 								- **Kernel Registry**: A registry that allows the user to register and execute different computational kernels easily.
 								- **Parallelization Strategies**: Two strategies for parallelizing the execution of kernel loops:
 								  - **OpenMP**: Uses OpenMP directives to parallelize the outermost loop.
 								  - **Eventify**: Uses the Eventify tasking system for parallelism.
 								- **Kernel Execution**: Kernels such as **STREAM TRIAD** and **DAXPY** are implemented, and their execution can be timed and compared across different parallelization strategies.
-												Contact section added

											
										
										
											2024-12-13 12:10:05 +01:00
+								## Contact
 								In case of troubles and feature requests, be welcome to open issues and pull requests.
 								You may as well contact the author Patrick Lipka (patrick.lipka@sipearl.com).
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
+								## Project Structure
-												fixed README

											
										
										
											2024-12-13 00:34:53 +01:00
+								```
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
+								.
 								├── bin/              # Compiled executable
 								├── include/          # Header files
 								│   ├── kernels.hpp   # Kernel and KernelRegistry declarations
 								│   ├── strategy.hpp  # Parallelization strategies (OpenMP, Eventify)
 								│   └── utils.hpp     # Utility functions for initialization
 								├── src/              # Source files
 								│   ├── kernels.cpp   # Kernel and KernelRegistry implementations
 								│   ├── strategy.cpp  # Parallelization strategies (OpenMP, Eventify)
 								│   ├── main.cpp      # Main entry point for benchmarking
 								├── Makefile          # Makefile to build the project
 								└── README.md         # Project documentation
-												fixed README

											
										
										
											2024-12-13 00:34:53 +01:00
+								```
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
+								## Requirements
 								- C++20 or higher
 								- OpenMP support (for OpenMP parallelization strategy)
 								- Eventify library (for Eventify parallelization strategy)
-												Remark about strategies added

											
										
										
											2024-12-13 12:04:27 +01:00
+								- Limitation: Providing installs for all implemented parallelization strategies is mandantory at this point. Selective compilation of strategies might be added later if needed.
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
 								### Dependencies:
 								- **Eventify**: Ensure that the Eventify library is properly installed and the environment variable `EVENTIFY_ROOT` points to the root directory of the Eventify installation.
 								## Building the Project
 								To build the project, run:
 								```
 								make
 								```
 								This will compile the source files and generate an executable called `benchmark` in the `bin/` directory.
-												addded remarks and adding kernel section

											
										
										
											2024-12-13 00:48:47 +01:00
+								Similar to the STREAM benchmark´s Makefile, the vector sizes are defined by the preprocessor variable `VECTOR_SIZE` that can be set in the Makefile.
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
 								### Clean Up
 								To remove all compiled files and the executable, run:
 								```
 								make clean
 								```
 								## Usage
 								### Running the Benchmark
 								To run a kernel benchmark, use the following command:
 								```
 								./bin/benchmark <kernel_name> <strategy> <num_threads_or_tasks>
 								```
 								- `<kernel_name>`: The name of the kernel to run. Example: `stream_triad`
 								- `<strategy>`: The parallelization strategy to use. Available options: `omp` (for OpenMP) and `eventify` (for Eventify).
 								- `<num_threads_or_tasks>`: The number of threads or tasks to use for parallel execution. This depends on the parallelization strategy (e.g., number of threads for OpenMP, number of tasks for Eventify).
 								### Example:
 								To run the `stream_triad` kernel with the OpenMP strategy using 4 threads:
 								```
 								./bin/benchmark stream_triad omp 4
 								```
 								To run the `daxpy` kernel with the Eventify strategy using 8 tasks:
 								```
 								./bin/benchmark daxpy eventify 8
 								```
 								### Error Handling
 								- If an invalid kernel name is provided, the program will print an error message and list available kernels.
 								Example of an invalid kernel name:
 								```
 								$ ./bin/benchmark invalid_kernel omp 4
 								Kernel not found: invalid_kernel
 								Available kernels are:
 								  - stream_triad
 								  - daxpy
 								```
-												addded remarks and adding kernel section

											
										
										
											2024-12-13 00:48:47 +01:00
+								## Adding New Kernels
 								To add a new kernel to the project, follow these steps:
 . **Define the Kernel**:
 								    - Open the `src/kernels.cpp` file and scroll to the section where new kernels are registered (around the `initialize_registry` function).
 								    - Use the existing kernels (`stream_triad` and `daxpy`) as templates. Create a new kernel by adding a lambda to the `register_kernel` method.
-												Added remarks about loop body

											
										
										
											2024-12-13 01:11:04 +01:00
+								    - The number, types and initialization of arguments can be choosen freely.
 								    - Note that you only need to provide the loop body / inner loops of a loop nest. The outer loop with induction variable `int i` is defined as part of the parallelization strategy already.
-												addded remarks and adding kernel section

											
										
										
											2024-12-13 00:48:47 +01:00
 								    For example, to add a new **vector product** kernel, you can do the following:
 								    ```
 								    registry->register_kernel("vector_product", [&]() {
 								      auto a = std::make_shared<std::vector<float>>();
 								      auto b = std::make_shared<std::vector<float>>();
 								      auto c = std::make_shared<std::vector<float>>();
 								      auto prepare = [=]() {
 								        a->resize(VECTOR_SIZE);
 								        b->resize(VECTOR_SIZE);
 								        c->resize(VECTOR_SIZE);
 								        initialize_vector(*b);
 								        initialize_vector(*c);
 								      };
 								      auto execute = [=](int kernel_start_idx, int kernel_end_idx, int n_tasks_or_threads) {
 								        strategy::execute_strategy(strategy_name, kernel_start_idx, kernel_end_idx, n_tasks_or_threads, [&](int i) {
 								          (*a)[i] = (*b)[i] * (*c)[i];  // Vector product operation
 								        });
 								      };
 								      return Kernel("vector_product", execute, prepare);
 								    });
 								    ```
 								    In this example:
 								    - `a`, `b`, and `c` are the vectors used for the operation.
 								    - `prepare` initializes these vectors and fills them with random values using the `initialize_vector` function.
 								    - `execute` contains the vector product logic, where each element in vector `a` is computed as the product of corresponding elements in vectors `b` and `c`.
 . **Register the Kernel**:
 								    - The new kernel should be automatically registered when the `initialize_registry` function is called. This is done dynamically through the registry.
 . **Use the Kernel**:
 								    - Once you have added the kernel to the registry, you can run it just like the existing kernels using the `./bin/benchmark` command. For example:
 								    ```
 								    ./bin/benchmark vector_product omp 4
 								    ```
 								### Notes on Adding Kernels:
 								- Kernels must be registered with a **name** (e.g., `"vector_product"`) and should include the corresponding **allocations and data initialization** (`prepare`) and **kernel logic** (`execute`).
 								- Kernels must consist out of an outer loop at least for now.
 								- The kernel’s execution should be parallelizable using all of the available strategies (`omp` (OpenMP) and `eventify` (Eventify) for now). You can add more strategies by extending the `strategy` namespace.
 								- The `VECTOR_SIZE` preprocessor variable defines the size of the input data and should be appropriate for the kernel you are implementing.
-												Initial commit

											
										
										
											2024-12-13 00:33:08 +01:00
+								## Contributing
 								Feel free to submit issues or pull requests to improve the project.
 								## License
 								This project is licensed under the MIT License.