Usage¶

Getting started¶

The recommended way to install pika is through spack:

spack install pika

See

spack info pika

for available options.

pika is currently available in the following repositories.

Manual installation¶

If you’d like to build pika manually you will need CMake 3.22.0 or greater and a recent C++ compiler supporting C++17:

GCC 11 or greater
clang 13 or greater

Additionally, pika depends on:

header-only Boost 1.71.0 or greater
hwloc 1.11.5 or greater
fmt 9.0.0 or greater

pika optionally depends on:

gperftools/tcmalloc, jemalloc, or mimalloc. It is highly recommended to use one of these allocators as they perform significantly better than the system allocators. You can set the allocator through the CMake variable PIKA_WITH_MALLOC. If you want to use the system allocator (e.g. for debugging) you can do so by setting PIKA_WITH_MALLOC=system.
CUDA 12.0 or greater. CUDA support can be enabled with PIKA_WITH_CUDA=ON. pika can also be built with nvc++ from the NVIDIA HPC SDK. In the latter case, set CMAKE_CXX_COMPILER to nvc++.
HIP 5.2.0 or greater. HIP support can be enabled with PIKA_WITH_HIP=ON.
whip when CUDA or HIP support is enabled.
MPI. MPI support can be enabled with PIKA_WITH_MPI=ON.
Boost.Context on macOS or exotic platforms which are not supported by the default user-level thread implementations in pika. This can be enabled with PIKA_WITH_BOOST_CONTEXT=ON.
stdexec. stdexec support can be enabled with PIKA_WITH_STDEXEC=ON (currently tested with the tag nvhpc-24.09). The integration is experimental. See Relation to std::execution and stdexec for more information about the integration.

If you are using nix you can also use the shell.nix file provided at the root of the repository to quickly enter a development environment:

nix-shell <pika-root>/shell.nix

The nixpkgs version is not pinned and may break occasionally.

Including in CMake projects¶

Once installed, pika can be used in a CMake project through the pika::pika target:

find_package(pika REQUIRED)
add_executable(app main.cpp)
target_link_libraries(app PRIVATE pika::pika)

Other ways of depending on pika are likely to work but are not officially supported.

Customizing the pika installation¶

The most important CMake options are listed in Getting started. Below is a more complete list of CMake options you can use to customize the installation.

PIKA_WITH_MALLOC: This defaults to mimalloc which requires mimalloc to be installed. Can be set to tcmalloc, jemalloc, mimalloc, or system. Setting it to system can be useful in debug builds.
PIKA_WITH_CUDA: Enable CUDA support.
PIKA_WITH_HIP: Enable HIP support.
PIKA_WITH_MPI: Enable MPI support.
PIKA_WITH_STDEXEC: Enable stdexec support.
PIKA_WITH_APEX: Enable APEX support.
PIKA_WITH_TRACY: Enable Tracy support.
PIKA_WITH_BOOST_CONTEXT: Use Boost.Context for user-level thread context switching.
PIKA_WITH_TESTS: Enable tests. Tests can be built with cmake --build . --target tests and run with ctest --output-on-failure.
PIKA_WITH_EXAMPLES: Enable examples. Binaries will be placed under bin in the build directory.

Testing¶

Tests and examples are disabled by default and can be enabled with PIKA_WITH_TESTS, PIKA_WITH_TESTS_{BENCHMARKS,REGRESSIONS,UNIT}, and PIKA_WITH_EXAMPLES. The tests must be explicitly built before running them, e.g. with cmake --build . --target tests && ctest --output-on-failure.

Controlling the number of threads and thread bindings¶

The thread pool created by the pika runtime will by default be created with a number of threads equal to the number of cores on the system. The number of threads can explicitly be controlled by a few environment variables or command line options. The most straightforward ways of changing the number of threads are with the environment variable PIKA_THREADS or the --pika:threads command line option. Both take an explicit number of threads. They also support the special values cores (the default, use one thread per core) or all (use one thread per hyperthread).

Note

Command line options always take precedence over environment variables.

Process masks¶

Many batch systems and e.g. MPI can set a process mask on the application to restrict on what cores an application can run. pika will by default take this process mask into account when determining how many threads to use for the runtime. hwloc-bind can also be used to manually set a process mask on the application. When a process mask is set, the default behaviour is to use only one thread per core in the process mask. Setting the number of threads to a number higher than the number of cores available in the mask is not allowed. Using all as the number of threads will use all the hyperthreads in the process mask.

The process mask can explicitly be ignored with the environment variable PIKA_IGNORE_PROCESS_MASK=1 or the command line option --pika:ignore-process-mask. A process mask set on the process can explicitly be overridden with the environment variable PIKA_PROCESS_MASK or the command line option --pika:process-mask. When the process mask is ignored, pika behaves as if no process mask is set and all cores or hyperthreads can be used by the runtime. PIKA_PROCESS_MASK and --pika:process-mask take an explicit hexadecimal string (beginning with 0x) representing the process mask to use. --pika:print-bind can be used to verify that the bindings used by pika are correct. Exporting the environment variable PIKA_PRINT_BIND (any value) is equivalent to using the --pika:print-bind option.

Note

If you find yourself in a situation where you need to explicitly generate a process mask, we recommend the use of hwloc-calc. hwloc-calc produces the format expected by pika with the --taskset command line option. The man page of hwloc-calc contains useful examples of generating different process masks.

In addition to hwloc-calc, hwloc-distrib (man page) can be useful if you need to generate multiple process masks that e.g. don’t overlap.

pika binds (or pins) worker threads to cores by default (except on macOS where thread binding is not supported) to avoid threads being scheduled on different cores, generally improving performance. Thread binding can be disabled by setting the environment variable PIKA_BIND or the command line option --pika:bind to the value none. Threads will in this case not be bound to any particular core and are free to migrate between cores. This is not recommended for most use cases, but can be beneficial e.g. if the system is oversubscribed and threads from different processes would otherwise be competing for time on the same core. The default value for the binding option is balanced, which will bind threads to cores in a “balanced” way, placing threads on consecutive cores, avoiding the use of hyperthreads (if available). A value of compact will fill all hyperthreads on a core with worker threads before filling the next core.

Note

Command line options always take precedence over environment variables.

Note

The PIKA_THREADS, PIKA_IGNORE_PROCESS_MASK, and PIKA_BIND environment variables were added in 0.32.0.

Interaction with OpenMP¶

When pika is used together with OpenMP extra care may be needed to ensure pika uses the correct process mask. This is because with OpenMP the main thread participates in parallel regions and if OpenMP binds threads to cores, the main thread may have a mask set to a single core before pika can read the mask. Typically, OpenMP will bind threads to cores if the OMP_PROC_BIND or OMP_PLACES environment variables are set. Some implementations of OpenMP (e.g. LLVM) set the binding of the main thread only at the first parallel region which means that if pika is initialized before the first parallel region, the mask will most likely be read correctly. Other implementations (e.g. GNU) set the binding of the main thread in global constructors which may run before pika can read the process mask. In that case you may need to either use PIKA_IGNORE_PROCESS_MASK/--pika:ignore-process-mask to use all cores on the system or explicitly set a mask with --pika:process-mask. If there is a process mask already set in the environment that is launching the application (e.g. in a SLURM job) you can read the mask before the application runs with hwloc (see pika-bind helper script for a more convenient option):

./app --pika:process-mask=$(hwloc-bind --get --taskset)

`pika-bind` helper script¶

Since version 0.20.0, the pika-bind helper script is bundled with pika. pika-bind sets the PIKA_PROCESS_MASK environment variable based on process mask information found before the pika runtime is started, and then runs the given command. pika-bind is a more convenient alternative to manually setting PIKA_PROCESS_MASK when pika is used together with a runtime that may reset the process mask of the main thread, like OpenMP.

Command line options¶

pika’s behaviour can be controlled with command line options, or environment variables. Not all command line options are exposed as environment variables. When both are present, command line options take precedence over environment variables.

If a command line option is not exposed as an environment variable, but it is necessary to set it, it is possible to use the PIKA_COMMANDLINE_OPTIONS environment variable.

For example, the following disables thread binding and explicitly sets the number of threads:

export PIKA_COMMANDLINE_OPTIONS="--pika:bind=none --pika:threads=4"

Logging¶

The pika runtime uses spdlog for logging. Warnings and more severe messages are logged by default. To change the logging level, set the PIKA_LOG_LEVEL environment variable to a value between 0 (trace) and 6 (off) (the values correspond to levels in spdlog). The log messages are sent to stderr by default. The destination can be changed by setting the PIKA_LOG_DESTINATION environment variable. Supported values are:

cerr
cout
any other value is interpreted as a path to a file

pika will by default print messages in the following format:

[2024-04-18 13:45:07.095279283] [pika] [info] [host:machine/----] [pid:2786603] [tid:2786607] [pool:0000/0003/0003] [parent:----/----] [task:0x7fa6a4077cf0/pika_main] [set_thread_state.cpp:205:set_thread_state] set_thread_state: thread(0x7fa6a802c8d0), description(<unknown>), new state(pending), old state(suspended)

The fields are as follows:

[2024-04-18 13:45:07.095279283]: The timestamp of the message.
[pika]: An identifier present in all pika’s logs.
[info]: The severity level of the message.
[host:machine/----]: The hostname and the MPI rank of the process (---- if MPI is disabled).
[pid:2786603]: The process id as reported by the operating system.
[tid:2786607]: The thread id as reported by the operating system.
[pool:0000/0003/0003]: The pika thread pool and worker thread ids: the first component is the thread pool id, the second is the global worker thread id (unique across all thread pools), and the third is the local worker thread id (unique only within the current thread pool).
[parent:----/----]: The id and description of the parent task that spawned the current task.
[task:0x7fa6a4077cf0/pika_main]: The id and description of the current task.
[set_thread_state.cpp:205/set_thread_state]: The file, line number, and function where the message was logged.
The logged message is printed last.

The pool field is [pool:----/----/----] when a message is logged from a thread that does not belong to the pika runtime. The main thread will only have the global thread id set, e.g. [pool:----/0004/----].

Task ids and descriptions are logged as ----/---- when there is no current or parent task. Task descriptions are only printed when enabled with APEX and Tracy support, or with the CMake option PIKA_WITH_THREAD_DEBUG_INFO.

The log message format can be changed by setting the environment variable PIKA_LOG_FORMAT to a format string supported by spdlog. The custom fields defined by pika can be accessed with the following:

%j: The hostname and MPI rank.
%w: The thread pool and worker thread ids.
%q: The parent task id and description.
%k: The current task id and description.

Debugging¶

Writing task based applications can be tricky, and debugging them even more so. This section describes a few options and tools that can be helpful when debugging applications using pika.

Segmentation faults and stack overflows¶

Due to the small default stack sizes of user level threads a common problem is stack overflows. When using the mmap-based stack allocation (default on platforms that support it) pika provides a configuration option to enable guard pages at the end of a stack. When enabled, a protected page will be allocated such that if one attempts to read or write within a page of the end of the stack, a segmentation fault will be triggered. The option can be enabled by exporting PIKA_USE_GUARD_PAGES=1. The option is disabled by default.

Additionally, pika can install signal handlers that print information about failures, such as backtraces. These handlers will handle the most common events, such as interrupts, segmentation faults, illegal instructions etc. and they can be enabled with the environment variable PIKA_INSTALL_SIGNAL_HANDLERS=1. The verbosity can be controlled with PIKA_EXCEPTION_VERBOSITY (this also controls how much information pika exceptions capture and print). The default value of 1 will print a backtrace. 2 or higher will print additional information about the pika build. 0 will print the minimum information.

Info

Many MPI implementations also install signal handlers or have options for enabling them. libSegFault.so (part of glibc-tools, more info in this blog post) also provides a way to install a signal handler. These can be useful alternatives to the signal handlers provided by pika. Depending on the type of issue you are debugging, different signal handlers can be more or less helpful as they print slightly different information.

pika does not install any signal handlers by default. They have to be enabled explicitly using the option described above. Keep in mind that if multiple libraries try to set signal handlers, they will likely overwrite each other such that only one is active at a time. Which signal handler is actually used may depend on when a failure happens, order of linking, order of initialization, etc. and may even be non-deterministic.

The signal handler in pika for segmentation faults is a simplified version of the regular signal handler and always prints a limited amount of information. The reason for this is that it needs to be able to handle segmentation faults caused by stack overflows. When stack overflows happen, regular signal handlers will be run on the stack of the failing thread. However, since the stack already overflowed, the signal handler will trigger another segmentation fault, and not be able to print anything before the program is terminated. The handler in pika for segmentation faults uses which uses sigaltstack which allows the signal handler to run on a new stack, guaranteeing that it can print some information on failure. The handler will print a message similar to:

Segmentation fault caught by pika's SIGSEGV handler (enabled with
PIKA_INSTALL_SIGNAL_HANDLERS=1).

This may be caused by a stack overflow, in which case you can increase the
stack sizes by modifying the configuration options PIKA_SMALL_STACK_SIZE
(default), PIKA_MEDIUM_STACK_SIZE, PIKA_LARGE_STACK_SIZE, or
PIKA_HUGE_STACK_SIZE.

Segmentation fault at address: 0x00007fdb152935f8

The signal handlers are especially useful in conjunction with PIKA_USE_GUARD_PAGES, as without the latter option a stack overflow may simply end up writing e.g. into another task’s stack, which can be very hard to detect as a stack overflow. Inspecting the core dump of a segmentation fault can be helpful in identifying whether a segmentation fault was likely caused by a stack overflow. Comparing stack pointers (see e.g. the GDB documentation) can tell you how much stack space the current task is using (if in a task).

If you’ve identified a stack overflow in your program you can do one or more of the following to avoid the stack overflow:

Use less stack space
Avoid deep recursion, e.g. by creating new tasks at some point in the computation which will get a new stack
Use a bigger stack size for the task triggering a stack overflow (this can be changed by using a different pika::execution::thread_stacksize for the task; this is currently undocumented though pika’s examples or unit tests may help you)
Set bigger stack sizes with PIKA_SMALL_STACK_SIZE, PIKA_MEDIUM_STACK_SIZE, PIKA_LARGE_STACK_SIZE, or PIKA_HUGE_STACK_SIZE. Tasks use the small stack size by default, which is 64 KiB, or 0x10000 bytes.

Sanitizers¶

Address, thread, and other sanitizers can be invaluable when debugging concurrent programs. pika can be instrumented with sanitizers as most programs. To reduce the chances of false positives make sure to build pika with the CMake option PIKA_WITH_SANITIZERS=ON. This does not enable any sanitizers directly, but disables certain functionalities internally to work better with sanitizers. To actually enable sanitizers, enable them explicitly by setting -fsanitize= flags as for any other CMake project. It’s highly recommended to use -fno-omit-frame-pointer with sanitizers.

There are known issues that may prevent you from using sanitizers with pika. Under the tools subdirectory of the pika repository you can find the most recent suppression files that are used for CI runs with sanitizers. Similarly, under .github/workflows you can find the most recent build configurations (including sanitizer options) that work with sanitizers, along with blacklists of tests that are currently known to fail with sanitizers.

Using custom allocators with pika¶

Typical use of pika can often lead to many small allocations from many different threads, potentially leading to suboptimal performance with the system allocator. By default, pika uses mimalloc as the memory allocator because it usually performs significantly better than the system allocator. In some cases, the system allocator or other custom allocators might perform better.

Setting the following environment variables usually further improves performance with mimalloc:

MIMALLOC_EAGER_COMMIT_DELAY=0
MIMALLOC_ALLOW_LARGE_OS_PAGES=1

We have observed mimalloc performing worse than the defaults with the above options on some systems, as well as worse than the system allocator. Always benchmark to find the most suitable allocator for your workload and system.

To ease testing of different allocators, you may also configure pika with the system allocator and instead use LD_PRELOAD to replace the default allocator at runtime. This allows you to choose the allocator without rebuilding pika. To do so, export the LD_PRELOAD environment variable to point to the shared library of the allocator. For example, to use jemalloc, set LD_PRELOAD to the full path of libjemalloc.so:

export LD_PRELOAD=/path/to/libjemalloc.so

Relation to std::execution and stdexec¶

When pika was first created as a fork of HPX in 2022 stdexec was in its infancy. Because of this, pika contains an implementation of a subset of the earlier revisions of P2300. The main differences to stdexec and the proposed facilities are:

The pika implementation uses C++17 and thus does not make use of concepts or coroutines. This allows compatibility with slightly older compiler versions and e.g. nvcc.
The pika implementation uses value_types, error_types, and sends_done instead of completion_signatures in sender types, as in the first 3 revisions of P2300.
pika::this_thread::experimental::sync_wait differs from std::this_thread::sync_wait in that the former expects the sender to send a single value which is returned directly by sync_wait. If no value is sent by the sender, sync_wait returns void. Errors in set_error are thrown and set_stopped is not supported.

pika has an experimental CMake option PIKA_WITH_STDEXEC which can be enabled to use stdexec for the P2300 facilities. pika brings the stdexec namespace into pika::execution::experimental, but provides otherwise no guarantees of interchangeable functionality. pika only implements a subset of the proposed sender algorithms which is why we recommend that you enable PIKA_WITH_STDEXEC whenever possible. We plan to deprecate and remove the P2300 implementation in pika in favour of stdexec and/or standard library implementations.

More resources¶

The C++ standard is the source of truth for std::execution. The P2300 proposal also contains both the wording for the majority of std::execution functionality as well as the motivation for it. The reference implementation of P2300, stdexec, maintains a list of presentations, blog posts etc. about the std::execution model. In addition to the above, other implementations of the std::execution model exist, with useful documentation and examples:

Even though the implementations differ, the concepts are transferable between implementations and useful for learning. cppreference.com also contains early documentation about std::execution.

pika has been presented at the following events and slides of the presentations are public:

CERN Computing seminar in 2022: introduction to pika and DLA-Future (slides)
The SOS-25 workshop in 2023: an overview of use of std::execution at the Swiss National Supercomputing Centre, covering uses of pika and HPX in DLA-Future, Octo-Tiger, and and Kokkos (slides)

Usage¶

Getting started¶

Manual installation¶

Including in CMake projects¶

Customizing the pika installation¶

Testing¶

Controlling the number of threads and thread bindings¶

Process masks¶

Interaction with OpenMP¶

pika-bind helper script¶

Command line options¶

Logging¶

Debugging¶

Segmentation faults and stack overflows¶

Sanitizers¶

Using custom allocators with pika¶

Relation to std::execution and stdexec¶

More resources¶

`pika-bind` helper script¶