Skip to content

Usage

Getting started

The recommended way to install pika is through spack:

spack install pika

See

spack info pika

for available options.

pika is currently available in the following repositories.

Packaging status

Manual installation

If you’d like to build pika manually you will need CMake 3.22.0 or greater and a recent C++ compiler supporting C++17:

Additionally, pika depends on:

pika optionally depends on:

  • gperftools/tcmalloc, jemalloc, or mimalloc. It is highly recommended to use one of these allocators as they perform significantly better than the system allocators. You can set the allocator through the CMake variable PIKA_WITH_MALLOC. If you want to use the system allocator (e.g. for debugging) you can do so by setting PIKA_WITH_MALLOC=system.

  • CUDA 11.0 or greater. CUDA support can be enabled with PIKA_WITH_CUDA=ON. pika can also be built with nvc++ from the NVIDIA HPC SDK. In the latter case, set CMAKE_CXX_COMPILER to nvc++.

  • HIP 5.2.0 or greater. HIP support can be enabled with PIKA_WITH_HIP=ON.

  • whip when CUDA or HIP support is enabled.

  • MPI. MPI support can be enabled with PIKA_WITH_MPI=ON.

  • Boost.Context on macOS or exotic platforms which are not supported by the default user-level thread implementations in pika. This can be enabled with PIKA_WITH_BOOST_CONTEXT=ON.

  • stdexec. stdexec support can be enabled with PIKA_WITH_STDEXEC=ON (currently tested with the tag nvhpc-24.09). The integration is experimental. See Relation to std::execution and stdexec for more information about the integration.

If you are using nix you can also use the shell.nix file provided at the root of the repository to quickly enter a development environment:

nix-shell <pika-root>/shell.nix

The nixpkgs version is not pinned and may break occasionally.

Including in CMake projects

Once installed, pika can be used in a CMake project through the pika::pika target:

find_package(pika REQUIRED)
add_executable(app main.cpp)
target_link_libraries(app PRIVATE pika::pika)

Other ways of depending on pika are likely to work but are not officially supported.

Customizing the pika installation

The most important CMake options are listed in Getting started. Below is a more complete list of CMake options you can use to customize the installation.

  • PIKA_WITH_MALLOC: This defaults to mimalloc which requires mimalloc to be installed. Can be set to tcmalloc, jemalloc, mimalloc, or system. Setting it to system can be useful in debug builds.

  • PIKA_WITH_CUDA: Enable CUDA support.

  • PIKA_WITH_HIP: Enable HIP support.

  • PIKA_WITH_MPI: Enable MPI support.

  • PIKA_WITH_STDEXEC: Enable stdexec support.

  • PIKA_WITH_APEX: Enable APEX support.

  • PIKA_WITH_TRACY: Enable Tracy support.

  • PIKA_WITH_BOOST_CONTEXT: Use Boost.Context for user-level thread context switching.

  • PIKA_WITH_TESTS: Enable tests. Tests can be built with cmake --build . --target tests and run with ctest --output-on-failure.

  • PIKA_WITH_EXAMPLES: Enable examples. Binaries will be placed under bin in the build directory.

Testing

Tests and examples are disabled by default and can be enabled with PIKA_WITH_TESTS, PIKA_WITH_TESTS_{BENCHMARKS,REGRESSIONS,UNIT}, and PIKA_WITH_EXAMPLES. The tests must be explicitly built before running them, e.g. with cmake --build . --target tests && ctest --output-on-failure.

Controlling the number of threads and thread bindings

The thread pool created by the pika runtime will by default be created with a number of threads equal to the number of cores on the system. The number of threads can explicitly be controlled by a few environment variables or command line options. The most straightforward ways of changing the number of threads are with the environment variable PIKA_THREADS or the --pika:threads command line option. Both take an explicit number of threads. They also support the special values cores (the default, use one thread per core) or all (use one thread per hyperthread).

Note

Command line options always take precedence over environment variables.

Process masks

Many batch systems and e.g. MPI can set a process mask on the application to restrict on what cores an application can run. pika will by default take this process mask into account when determining how many threads to use for the runtime. hwloc-bind can also be used to manually set a process mask on the application. When a process mask is set, the default behaviour is to use only one thread per core in the process mask. Setting the number of threads to a number higher than the number of cores available in the mask is not allowed. Using all as the number of threads will use all the hyperthreads in the process mask.

The process mask can explicitly be ignored with the environment variable PIKA_IGNORE_PROCESS_MASK=1 or the command line option --pika:ignore-process-mask. A process mask set on the process can explicitly be overridden with the environment variable PIKA_PROCESS_MASK or the command line option --pika:process-mask. When the process mask is ignored, pika behaves as if no process mask is set and all cores or hyperthreads can be used by the runtime. PIKA_PROCESS_MASK and --pika:process-mask take an explicit hexadecimal string (beginning with 0x) representing the process mask to use. --pika:print-bind can be used to verify that the bindings used by pika are correct. Exporting the environment variable PIKA_PRINT_BIND (any value) is equivalent to using the --pika:print-bind option.

Note

If you find yourself in a situation where you need to explicitly generate a process mask, we recommend the use of hwloc-calc. hwloc-calc produces the format expected by pika with the --taskset command line option. The man page of hwloc-calc contains useful examples of generating different process masks.

In addition to hwloc-calc, hwloc-distrib (man page) can be useful if you need to generate multiple process masks that e.g. don’t overlap.

pika binds (or pins) worker threads to cores by default (except on macOS where thread binding is not supported) to avoid threads being scheduled on different cores, generally improving performance. Thread binding can be disabled by setting the environment variable PIKA_BIND or the command line option --pika:bind to the value none. Threads will in this case not be bound to any particular core and are free to migrate between cores. This is not recommended for most use cases, but can be beneficial e.g. if the system is oversubscribed and threads from different processes would otherwise be competing for time on the same core. The default value for the binding option is balanced, which will bind threads to cores in a “balanced” way, placing threads on consecutive cores, avoiding the use of hyperthreads (if available). A value of compact will fill all hyperthreads on a core with worker threads before filling the next core.

Note

Command line options always take precedence over environment variables.

Note

The PIKA_THREADS, PIKA_IGNORE_PROCESS_MASK, and PIKA_BIND environment variables were added in 0.32.0.

Interaction with OpenMP

When pika is used together with OpenMP extra care may be needed to ensure pika uses the correct process mask. This is because with OpenMP the main thread participates in parallel regions and if OpenMP binds threads to cores, the main thread may have a mask set to a single core before pika can read the mask. Typically, OpenMP will bind threads to cores if the OMP_PROC_BIND or OMP_PLACES environment variables are set. Some implementations of OpenMP (e.g. LLVM) set the binding of the main thread only at the first parallel region which means that if pika is initialized before the first parallel region, the mask will most likely be read correctly. Other implementations (e.g. GNU) set the binding of the main thread in global constructors which may run before pika can read the process mask. In that case you may need to either use PIKA_IGNORE_PROCESS_MASK/--pika:ignore-process-mask to use all cores on the system or explicitly set a mask with --pika:process-mask. If there is a process mask already set in the environment that is launching the application (e.g. in a SLURM job) you can read the mask before the application runs with hwloc (see pika-bind helper script for a more convenient option):

./app --pika:process-mask=$(hwloc-bind --get --taskset)

pika-bind helper script

Since version 0.20.0, the pika-bind helper script is bundled with pika. pika-bind sets the PIKA_PROCESS_MASK environment variable based on process mask information found before the pika runtime is started, and then runs the given command. pika-bind is a more convenient alternative to manually setting PIKA_PROCESS_MASK when pika is used together with a runtime that may reset the process mask of the main thread, like OpenMP.

Command line options

pika’s behaviour can be controlled with command line options, or environment variables. Not all command line options are exposed as environment variables. When both are present, command line options take precedence over environment variables.

If a command line option is not exposed as an environment variable, but it is necessary to set it, it is possible to use the PIKA_COMMANDLINE_OPTIONS environment variable.

For example, the following disables thread binding and explicitly sets the number of threads:

export PIKA_COMMANDLINE_OPTIONS="--pika:bind=none --pika:threads=4"

Logging

The pika runtime uses spdlog for logging. Warnings and more severe messages are logged by default. To change the logging level, set the PIKA_LOG_LEVEL environment variable to a value between 0 (trace) and 6 (off) (the values correspond to levels in spdlog). The log messages are sent to stderr by default. The destination can be changed by setting the PIKA_LOG_DESTINATION environment variable. Supported values are:

  • cerr

  • cout

  • any other value is interpreted as a path to a file

pika will by default print messages in the following format:

[2024-04-18 13:45:07.095279283] [pika] [info] [host:machine/----] [pid:2786603] [tid:2786607] [pool:0000/0003/0003] [parent:----/----] [task:0x7fa6a4077cf0/pika_main] [set_thread_state.cpp:205:set_thread_state] set_thread_state: thread(0x7fa6a802c8d0), description(<unknown>), new state(pending), old state(suspended)

The fields are as follows:

  • [2024-04-18 13:45:07.095279283]: The timestamp of the message.

  • [pika]: An identifier present in all pika’s logs.

  • [info]: The severity level of the message.

  • [host:machine/----]: The hostname and the MPI rank of the process (---- if MPI is disabled).

  • [pid:2786603]: The process id as reported by the operating system.

  • [tid:2786607]: The thread id as reported by the operating system.

  • [pool:0000/0003/0003]: The pika thread pool and worker thread ids: the first component is the thread pool id, the second is the global worker thread id (unique across all thread pools), and the third is the local worker thread id (unique only within the current thread pool).

  • [parent:----/----]: The id and description of the parent task that spawned the current task.

  • [task:0x7fa6a4077cf0/pika_main]: The id and description of the current task.

  • [set_thread_state.cpp:205/set_thread_state]: The file, line number, and function where the message was logged.

  • The logged message is printed last.

The pool field is [pool:----/----/----] when a message is logged from a thread that does not belong to the pika runtime. The main thread will only have the global thread id set, e.g. [pool:----/0004/----].

Task ids and descriptions are logged as ----/---- when there is no current or parent task. Task descriptions are only printed when enabled with APEX and Tracy support, or with the CMake option PIKA_WITH_THREAD_DEBUG_INFO.

The log message format can be changed by setting the environment variable PIKA_LOG_FORMAT to a format string supported by spdlog. The custom fields defined by pika can be accessed with the following:

  • %j: The hostname and MPI rank.

  • %w: The thread pool and worker thread ids.

  • %q: The parent task id and description.

  • %k: The current task id and description.

Using custom allocators with pika

Typical use of pika can often lead to many small allocations from many different threads, potentially leading to suboptimal performance with the system allocator. By default, pika uses mimalloc as the memory allocator because it usually performs significantly better than the system allocator. In some cases, the system allocator or other custom allocators might perform better.

Setting the following environment variables usually further improves performance with mimalloc:

  • MIMALLOC_EAGER_COMMIT_DELAY=0

  • MIMALLOC_ALLOW_LARGE_OS_PAGES=1

We have observed mimalloc performing worse than the defaults with the above options on some systems, as well as worse than the system allocator. Always benchmark to find the most suitable allocator for your workload and system.

To ease testing of different allocators, you may also configure pika with the system allocator and instead use LD_PRELOAD to replace the default allocator at runtime. This allows you to choose the allocator without rebuilding pika. To do so, export the LD_PRELOAD environment variable to point to the shared library of the allocator. For example, to use jemalloc, set LD_PRELOAD to the full path of libjemalloc.so:

export LD_PRELOAD=/path/to/libjemalloc.so

Relation to std::execution and stdexec

When pika was first created as a fork of HPX in 2022 stdexec was in its infancy. Because of this, pika contains an implementation of a subset of the earlier revisions of P2300. The main differences to stdexec and the proposed facilities are:

  • The pika implementation uses C++17 and thus does not make use of concepts or coroutines. This allows compatibility with slightly older compiler versions and e.g. nvcc.

  • The pika implementation uses value_types, error_types, and sends_done instead of completion_signatures in sender types, as in the first 3 revisions of P2300.

  • pika::this_thread::experimental::sync_wait differs from std::this_thread::sync_wait in that the former expects the sender to send a single value which is returned directly by sync_wait. If no value is sent by the sender, sync_wait returns void. Errors in set_error are thrown and set_stopped is not supported.

pika has an experimental CMake option PIKA_WITH_STDEXEC which can be enabled to use stdexec for the P2300 facilities. pika brings the stdexec namespace into pika::execution::experimental, but provides otherwise no guarantees of interchangeable functionality. pika only implements a subset of the proposed sender algorithms which is why we recommend that you enable PIKA_WITH_STDEXEC whenever possible. We plan to deprecate and remove the P2300 implementation in pika in favour of stdexec and/or standard library implementations.

More resources

The C++ standard is the source of truth for std::execution. The P2300 proposal also contains both the wording for the majority of std::execution functionality as well as the motivation for it. The reference implementation of P2300, stdexec, maintains a list of presentations, blog posts etc. about the std::execution model. In addition to the above, other implementations of the std::execution model exist, with useful documentation and examples:

Even though the implementations differ, the concepts are transferable between implementations and useful for learning. cppreference.com also contains early documentation about std::execution.

pika has been presented at the following events and slides of the presentations are public: