Changelog

0.29.0 (2024-10-04)

New features

  • The CUDA/HIP pool uses fewer, and a user-controllable number of, cuBLAS/SOLVER and rocBLAS/SOLVER handles. This can significantly reduce GPU memory usage, especially on high core count systems. (#1234)

  • The sender adaptor transfer has been renamed to continues_on, with a deprecation warning, to match the latest P2300 proposal. transfer_just and transfer_when_all have been deprecated. (#1235)

Breaking changes

Bugfixes

  • An initialization issue when setting an explicit process mask on a process that has a limited number of cores available due to cgroups restrictions has been fixed. (#1251)

0.28.0 (2024-09-04)

New features

Breaking changes

  • When using stdexec, pika now requires and is tested with stdexec commit 8bc7c7f06fe39831dea6852407ebe7f6be8fa9fd (#1205)

  • The Boost dependency is now found using CMake's config mode instead of CMake's deprecated Boost module. This requires Boost to be installed with a BoostConfig.cmake file. (#1229)

Bugfixes

  • NUMA nodes and sockets are now correctly handled on Grace-Hopper systems. This allows use of hwloc 2.11 and newer on such systems. (#1224)

0.27.0 (2024-08-15)

New features

  • sync_wait now internally uses a binary_semaphore instead of a condition_variable for synchronization and may be slightly faster due to removing one lock. (#1206)

Breaking changes

  • All channel implementations have been removed. (#1209)

Bugfixes

  • Compilation of copyable senders together with when_all_vector has been fixed. (#1200)

  • The pika/execution.hpp header has been fixed to include headers that should have been included by it. (#1206)

0.26.1 (2024-07-31)

New features

Breaking changes

Bugfixes

  • The PIKA_MPI_COMPLETION_MODE environment variable not being taken into account has been fixed. (#1215)

0.26.0 (2024-07-08)

New features

  • Handling of receivers has been slightly optimized internally to avoid allocations. (#1126, #1139, #1192)

  • The MPI integration now has experimental support for the MPI continuations proposal. This requires an experimental build of OpenMPI. (#1128)

  • pika can now be compiled as a static library by disabling the CMake option BUILD_SHARED_LIBS. (#1179)

Breaking changes

  • The shared_mutex utility has been removed. (#1155)

  • The prefix module has been cleaned up with many internal functionalities being moved to a detail namespace. (#1177)

  • The depleted thread state has been removed as it is no longer used. (#1184)

Bugfixes

  • The hostname output in logs has been fixed. (#1127)

  • The internal append_t type pack helper has been fixed to append rather than prepend. This may affect the order of completion signatures for senders. However, there are still no guarantees on the order of completion signatures. (#1137)

  • Fix potential use-after-free in MPI integration. (#1151)

  • Undefined behaviour on FreeBSD in the get_executable_prefix helper has been fixed. (#1171)

  • Compilation with PIKA_WITH_STACKTRACES=OFF has been fixed. (#1178, #1196)

0.25.0 (2024-05-10)

New features

  • The MPI polling functionality has been significantly refactored and optimized. (#1102)

  • pika's senders and receivers now use the newer sender and receiver concepts when defined by stdexec. (#1105)

Breaking changes

  • The pika logging facilities have been refactored to use spdlog. The new behaviour is described in the pull request and documentation (#1093)

  • The runtime module has been cleaned up with many functionalities being removed or moved to the detail namespace. (#826, #1091)

Bugfixes

  • Builds with the CMake option PIKA_WITH_THREAD_DEBUG_INFO enabled have been fixed. (#1101)

0.24.0 (2024-04-12)

New features

  • Avoid unnecessary copies of CUDA streams and handles to improve profiling appearance. (#1056)

  • Avoid use of std::invoke_result in tag_invoke_result variants to improve compilation times. (#1058, #1060)

Breaking changes

Bugfixes

  • Fix use of --pika:print-bind with --pika:bind=none. (#1082, #1087)

  • Work around compilation issue with CUDA 12.4. (#1084)

  • Make sure main is never defined in libpika.so. (#1088)

0.23.0 (2024-03-07)

New features

  • Further improved performance, particularly on ARM64 systems. (#1023, #1033, #1035, #1041)

  • Allow compilation on ARM systems with address sanitizer enabled. (#1045)

Breaking changes

Bugfixes

  • Allow the use of the require_started sender adaptor with unique_any_sender and any_sender. (#1044)

  • Fix a data race in CUDA/HIP event polling. (#1051)

0.22.2 (2024-02-09)

Bugfixes

  • Fix incorrect worker thread numbers when using more than one thread pool. (#1016)

  • Unrevert #872 with an indexing error fixed. (#1017)

0.22.1 (2024-01-29)

Bugfixes

  • Revert #872 as it was found to cause issues in applications. (#1009)

0.22.0 (2024-01-24)

New features

  • A new function pika::is_runtime_initialized has been added. (#808)

  • CUDA and HIP handles are now guaranteed to be released together with cuda_pool instead of on program exit. (#872)

  • Spinloop performance has been significantly improved on ARM64 systems. (#923, #927)

  • The pika::barrier now scales significantly better with the number of cores. (#940)

  • Exceptions thrown in the main entry point, e.g. pika_main, are now reported with the message of the exception, if available. (#959)

Breaking changes

Bugfixes

  • The CMake configuration now sets the policy CMP0144 to silence warnings about CMake package root directory variables. (#885)

  • The permissions on the installed pika-bind helper script have been fixed. (#915)

  • A missing include causing compilation failures with PIKA_WITH_UNITY_BUILD=OFF has been added. (#955)

  • A use-after-free has been fixed in when_all_vector. (#966)

  • A use-after-free has been fixed in sync_wait. (#976)

  • A use-after-free has been fixed in default_agent. (#979)

  • An initialization order issue has been fixed in debug printing facilities. (#983)

  • A potential cause for dangling references has been fixed in thread_pool_init_parameters. (#984)

  • A few data races have been fixed in the schedulers. (#985, #986)

  • Forwarding of callable values in execution::then has been fixed. (#994)

  • A data race in condition_variable::notify_all has been fixed. (#998)

0.21.0 (2023-12-06)

New features

  • A new sender adaptor require_started allows to detect unstarted senders. (#869)

  • The conversion from any_sender to unique_any_sender has been optimized, reusing the same storage. (#844)

  • The number of streams created by the cuda_pool is now proportional to the number of threads used by the runtime instead of hardware_concurrency. (#864)

Breaking changes

  • pika::start and pika::finalize now return void. Most runtime management functions no longer take an error_code and always throw an exception on failure. (#825)

Bugfixes

  • One lifetime bug in split has been fixed. (#839)

  • yield_while is now able to warn about potential deadlocks when suspension is disabled. (#856)

0.20.0 (2023-11-01)

New features

  • The MPI rank is now printed with --pika:print-bind and when handling exceptions, if MPI support is enabled. (#805, #822)

  • A warning message is now printed on macOS when using --pika:process-mask since thread bindings are unsupported. (#806)

  • Thread bindings can now be printed using the environment variable PIKA_PRINT_BIND in addition to the command line option --pika:print-bind. (#828)

  • The pika-bind helper script has been added to more conveniently set PIKA_PROCESS_MASK based on the environment. (#834)

Breaking changes

  • All remaining locality-related functions and files have been removed. (#823)

Bugfixes

  • Handling of explicitly specified process masks with --pika:process-mask or PIKA_PROCESS_MASK has been fixed. (#807)

0.19.1 (2023-10-09)

Bugfixes

  • Fix a bug in drop_operation_state when the predecessor sender is sending a tuple. (#801)

0.19.0 (2023-10-04)

New features

  • A transfer_when_all sender adaptor has been introduced. (#792)

  • A drop_operation_state sender adaptor has been introduced. (#772)

Breaking changes

  • The PIKA_WITH_DISABLED_SIGNAL_EXCEPTION_HANDLERS CMake option has been removed. This option can be controlled at runtime as before. (#773)

  • The PIKA_WITH_THREAD_GUARD_PAGE CMake option has been removed. This option can be controlled at runtime as before. (#782)

  • thread_priority::critical has been removed as it is an alias to high_recursive and is unused.(#783)

Bugfixes

  • Fix a few instances of the wrong type being forwarded in split_tuple and when_all sender adaptors. (#781)

  • Fix a hang introduced by the global activity count when using MPI polling. (#778)

  • Fix a use-after-free in ensure_started. (#795)

  • Fix lifetime bug in ensure_started when the sender is dropped without being connected. (#797)

0.18.0 (2023-09-06)

New features

  • A documentation site has been created on pikacpp.org. (#723)

  • A new command line option --pika:process-mask has been added to allow overriding the mask detected at startup. The process mask is also now read before the pika runtime is started to avoid problems with e.g. OpenMP resetting the mask before pika can read it. (#738, #739)

  • An overload of pika::start which takes no callable and is equivalent to passing nullptr_t or an empty callable as the entry point has been added. (#761)

Breaking changes

  • The any_receiver set_value_t overload now accepts types which may throw in their move and copy constructors. (#702)

  • The PIKA_WITH_GENERIC_CONTEXT_COROUTINES CMake option has been renamed to PIKA_WITH_BOOST_CONTEXT. (#729)

  • The then and unpack sender adaptors now correctly have noexcept get_env_t customizations. (#732)

  • mimalloc is now the default allocator. (#730)

  • The pika::this_thread::experimental::sync_wait receiver now correctly advertises itself as a receiver when using stdexec. (#735)

  • Various outdated and unused utilities and configuration options have been removed. (#744, #753, #758, #759)

Bugfixes

  • The small buffer optimization in any_sender and its companions has been disabled due to issues with the implementation. (#760)

0.17.0 (2023-08-02)

New features

  • Improve MPI polling: continuations are not triggered under lock anymore and can be explicitly transferred to a new task/pool, throttling is possible on a per stream basis, the number of completions to handle per poll iteration may be controlled. (#593)

  • Add pika::wait to wait for the runtime to be idle. (#704)

  • Failure information is now printed before attaching a debugger. (#712)

  • --pika:print-bind now also prints the thread pool of a thread when thread binding is disabled to be consistent with the output when thread binding is enabled. (#710)

  • Allow to explicitly reset {unique_,}any_sender. (#719)

  • Add execution::unpack sender adaptor unpack tuple-like types sent by predecessor senders. (#721)

Bugfixes

  • Fix warnings when CMake unity build is disabled. (#697)

  • Fix bogus error when --pika:print-bind and --pika:bind=none are used together. (#710)

  • Fix memory leak with stack overflow detection enabled. (#714)

  • Fix freeing stack when guard pages are disabled. (#716)

0.16.0 (2023-05-31)

New features

  • pika can now be compiled with CUDA 12 and C++20 when PIKA_WITH_STDEXEC is disabled. (#684)

  • pika::barrier can now optionally do a timed blocking wait. The default behaviour is unchanged. (#685)

Breaking changes

  • pika::spinlock has been removed and replaced with a private implementation. (#672)

Bugfixes

  • Compilation with the CMake option PIKA_WITH_VERIFY_LOCKS_BACKTRACE has been fixed. (#680)

  • Compilation with fmt 10 and CUDA/HIP has been fixed. (#691)

0.15.1 (2023-05-12)

Bugfixes

  • Eagerly reset shared state in async_rw_mutex. This prevents deadlocks in certain use cases of async_rw_mutex. (#677)

  • Use pika::spinlock instead of pika::mutex in async_rw_mutex. This allows use of async_rw_mutex from non-pika threads. (#679)

0.15.0 (2023-05-03)

New features

  • async_rw_mutex has been moved to a public header: pika/async_rw_mutex.hpp. The functionality is still experimental in the pika::execution::experimental namespace. async_rw_mutex_access_type and async_rw_mutex_access_wrapper have also been moved out of the detail namespace. (#655)

Breaking changes

  • The any_sender and unique_any_sender operator bool(), which can be used to check whether the sender contains a valid sender, is now explicit to avoid accidental conversions. (#653)

  • Scheduler idling was disabled by default. This typically improves performance. If performance is less important than resource usage idling may be beneficial to enable explicitly. (#661)

  • The CMake option PIKA_WITH_THREAD_CUMULATIVE_COUNTS was disabled by default. This often improves performance. (#662)

  • Thread guard pages were disabled by default. This often improves performance. They can still be enabled at runtime with the configuration option pika.stacks.use_guard_pages=1 to debug e.g. stack overflows. (#663)

  • The fast_idle and delay_exit scheduler modes were completely removed as they added overhead and were not used in any meaningful way in the scheduler. (#664)

  • The ability to run background threads in the scheduler was completely removed. (#665, #668)

Bugfixes

  • Fixed an inconsistent preprocessor guard that affected Apple M1 and M2 systems. (#657)

  • Fixed preprocessor guards to enable deadlock detection in debug builds. The deadlock detection was never enabled previously. (#658)

  • Thread deadlock detection will now correctly print potentially deadlocked threads. (#659)

0.14.0 (2023-04-05)

New features

  • pika can now be compiled with NVHPC. The support is experimental. (#606)

  • CUDA polling was improved. Among other changes polling continuations are no longer called under a lock. (#609)

  • Improved the error message when pika is configured with multiple thread pools but there are not enough resources for all thread pools. (#619)

Breaking changes

  • Cleaned up modules and moved internal functionality into detail namespaces. (#625, #631, #632, #633, #634)

  • Renamed the CMake option PIKA_WITH_P2300_REFERENCE_IMPLEMENTATION to PIKA_WITH_STDEXEC to better reflect what it does. (#641)

Bugfixes

0.13.0 (2023-03-08)

New features

  • Add better compile-time error messages to diagnose one-shot senders being used as multi-shot senders. (#586)

Breaking changes

  • Remove the PIKA_WITH_BACKGROUND_THREAD_COUNTERS CMake option. These counters are no longer available. (#588)

  • Update required stdexec commit. pika is now tested with 6510f5bd69cc03b24668f26eda3dd3cca7e81bb2 (#597)

  • Cleaned up modules and moved minor functionality into detail namespaces. (#594, #595, #596, #599, #607)

Bugfixes

  • Initialize HIP early to avoid concurrent initialization. (#591)

0.12.0 (2023-02-01)

New features

  • Make read-only access senders of async_rw_mutex connectable with l-value references. (#548)

  • Add split_tuple sender adaptor which allows transforming a sender of tuples into a tuple of senders. (#549)

  • Add bool conversion operator and empty member function to unique_any_sender and any_sender. (#551)

Breaking changes

  • Remove the conversion operators from wrapper types in async_rw_mutex. Wrappers must explicitly be unwrapped using get. (#548)

  • Require whip 0.1.0. (#565)

Bugfixes

  • Make the ensure_started sender noncopyable. (#539)

  • Fix compilation failure on macOS with C++20 enabled. (#541)

  • Fix deadlocks in certain use cases of async_rw_mutex. (#548)

  • Fix certain use cases of any_sender and when_all. (#555)

0.11.0 (2022-12-07)

New features

Breaking changes

  • All parallel algorithms have been moved to a new repository that depends on pika: https://github.com/pika-org/pika-algorithms. (#505)

  • fmt is now a required dependency. (#487)

  • The default allocator has been changed from tcmalloc to mimalloc. (#501)

  • Cleaned up various modules and moved minor functionality into detail namespaces. (#483, #508, #509)

Bugfixes

0.10.0 (2022-11-02)

New features

Breaking changes

  • More functionality in the algorithms module has been moved into detail namespaces. (#475)

Bugfixes

  • Many sender adaptors have been updated to correctly handle reference types. (#472, #484, #492, )

  • then_with_stream now correctly stores the values sent by the predecessor sender for the duration of the CUDA operation launched by it. (#485)

0.9.0 (2022-10-05)

New features

  • Signal handlers are now optional, they can be set with --pika:install_signal_handlers=1. They are enabled by default when --pika:attach-debugger=exception is set. (#458)

Breaking changes

  • The P2300 reference implementation is now found through a find_package instead of a fetch_content in CMake and is required when PIKA_WITH_P2300_REFERENCE_IMPLEMENTATION in ON. (#436)

  • whip is now a dependency to replace the GPU abstraction layer we previously used. (#423)

  • Use rocBLAS directly instead of hipBLAS. (#391)

  • Move more internal functionality into the detail namespace. (#445, #446, #449, #461, #462)

Bugfixes

  • Add set_stopped_t() to (unique_)any_sender completion signatures. (#464)

  • Fix compilation on Arm64 with PIKA_WITH_GENERIC_CONTEXT_COROUTINES=OFF. (#439)

  • Add a missing default entry for pika.diagnostics_on_terminate. (#458)

0.8.0 (2022-09-07)

New features

  • The PIKA_WITH_P2300_REFERENCE_IMPLEMENTATION option can now be enabled together with PIKA_WITH_CUDA (with clang as the device compiler) and PIKA_WITH_HIP. (#330)

  • CMake options related to tests and examples now use cmake_dependent_option where appropriate. This means that options like PIKA_WITH_TESTS_UNIT will correctly be enabled when reconfiguring with PIKA_WITH_TESTS=ON even if pika was initially configured with PIKA_WITH_TESTS=OFF. (#356)

  • pika::finalize no longer has to be called on a pika thread. (#366)

Breaking changes

  • Removed operator| overloads for sync_wait and start_detached to align the implementation with P2300. (#346)

  • Removed parallel_executor_aggregated. (#372)

  • Moved more internal functionality into the detail namespace. (#374, #377, #379, #386, #400, #411, #420, #428, #429)

  • Allow compiling only device code with hipcc when PIKA_WITH_HIP is enabled instead of requiring hipcc to be used for host code as well. The PIKA_WITH_HIP option now has to be enabled explicitly like CUDA support instead of being automatically detected with hipcc set as the C++ compiler. (#402)

Bugfixes

  • Fixed handling of reference types in ensure_started and let_error. (#338)

  • Fixed compilation for powerpc. (#341)

  • Correctly set the stream in cusolver_handle::set_stream. (#344)

  • Fix the --pika:ignore-process-mask command line option. It was previously being ignored. (#355)

  • Fix a visibility issue in the program_options implementation. (#359)

  • Change detection of builtins to be more robust against mixing compilers. (#390)

  • Fixed compilation for arm64. (#393)

  • Only check for CMAKE_CUDA_STANDARD and PIKA_WITH_CXX_STANDARD when building pika itself. This could previously lead to false positive configuration errors. (#396)

  • Fix compilation on macOS with PIKA_WITH_MPI enabled. (#405)

0.7.0 (2022-08-03)

New features

  • The CUDA polling now uses both normal and high priority queues based on the flags passed to the cuda_stream. (#286)

  • Eagerly check completion of the MPI requests and add throttling of MPI traffic to help prevent excessive message queues. (#277)

  • Eagerly check completion of CUDA kernels. (#306)

Breaking changes

  • Remove static and thread local storage emulation. (#321)

  • Moved internal functionality into the detail namespace. (#209, #276, #324)

  • Remove specialization for Intel KNL. (#309)

Bugfixes

  • Fix a compilation error with posix coroutines implementation. (#314)

  • Fix handling of reference values and errors types sent by predecessors to when_all, split and sync_wait. (#285, #307, #320)

0.6.0 (2022-07-06)

New features

  • Added basic support for Tracy. The Tracy integration can currently only be used with threads that do not yield. (#252)

  • Added make_any_sender and make_unique_any_sender helpers for deducing the template parameters of any_sender and unique_any_sender. (#259)

  • Added a drop_value sender adaptor which ignores values sent from the predecessor. (#262)

  • Allow passing flags to the cuda_stream and cuda_pool constructor. (#270)

  • Allow using any version of mimalloc. The version was previously unnecessarily constrained to 1. (#273)

  • Further relax the requirements for constness on argc and argv passed to pika::init and pika::start. (#275)

Breaking changes

  • If a process mask is set the pika runtime now uses the mask by default to restrict the number of threads. The command-line option --pika:use-process-mask which was previously used to enable this behaviour has been removed along with the corresponding configuration option. The process mask can be explicitly ignored with the command-line option --pika:ignore-process-mask or the configuration option pika.ignore_process_mask. (#242)

  • Moved internal functionality into the detail namespace. (#246, #248, #257)

Bugfixes

  • Fix handling of reference types sent by predecessors to ensure_started and schedule_from. (#282)

0.5.0 (2022-06-02)

New features

  • The then_with_cublas and then_with_cusolver sender adaptors can now also be used with hipBLAS and hipSOLVER. (#220)

  • There is now experimental support for using the P2300 reference implementation in place of pika's own implementation. This can be enabled with the PIKA_WITH_P2300_REFERENCE_IMPLEMENTATION CMake option. (#215)

Breaking changes

  • The --pika:help command-line no longer takes any arguments. (#219)

  • Vc support has been removed. (#223)

  • Cleaned up the command_line_handling module and moved minor functionality into the detail namespace. (#216)

  • Removed the then_with_any_cuda sender adaptor. (#243)

Bugfixes

  • Scheduler properties can now be used with prefer. (#214)

0.4.0 (2022-05-04)

New features

  • Annotations are now inherited from the scheduling task instead of the spawning task for transfer. (#188)

  • Annotations for bulk regions are now lifted up around the work loop on each worker thread using a scoped annotation. (#197)

  • It is now allowed to pass a lambda with auto parameters to then. (#182)

  • A scheduler that spawns std::threads is now available. (#200)

Breaking changes

  • Cleaned up various modules and moved minor functionality into detail namespaces. (#179, #196)

Bugfixes

  • The sender returned by ensure_started can now be discarded without being connected to its corresponding receiver. (#180)

  • The lifetime issue in the split sender adaptor is now fixed. (#203)

  • The scheduler forwarding in schedule_from is now properly handled. (#186)

  • Missing includes to transform_mpi.hpp have now been added. (#176)

  • Remove unnecessary Boost dependencies when PIKA_WITH_GENERIC_CONTEXT_COROUTINES=ON. (#185)

  • The segmentation fault in the shared priority queue scheduler has now been fixed. (#210)

0.3.0 (2022-04-06)

New features

  • Using pika::mpi::experimental::transform_mpi in debug mode now checks that polling has been enabled. (#142)

  • pika_main no longer requires non-const argc and argv. (#146)

Breaking changes

  • pika::execution::experimental::ensure_started no longer sends const references to receivers, like pika::execution::experimental::split. It now sends values by rvalue reference. (#143)

  • All serialization support has been removed. (#145, #150)

  • pika::bind_front no longer unwraps std::reference_wrappers to match the behaviour of std::bind_front. (#140)

  • Cleaned up various modules and moved minor functionality into detail namespaces. (#152, #153, #155, #158, #160)

  • Move pika::execution::experimental::sync_wait to pika::this_thread::experimental::sync_wait to match the namespace of sync_wait in P2300. (#159)

Bugfixes

  • pika::execution::experimental::make_future releases its operation state as soon as the wrapped sender has signaled completion. (#139)

  • pika::cuda::experimental::then_with_stream now correctly checks for invocability of the given callable with lvalue references. (#144)

  • pika::mpi::experimental::transform_mpi now stores the values sent from the predecessor in the operation state to ensure stable references. (#156)

  • pika::cuda::experimental::then_with_stream now stores the values sent from the predecessor in the operation state to ensure stable references. (#162)

  • Tasks scheduled with pika::execution::experimental::thread_pool_scheduler now use the annotation of the spawning task as a fallback if no explicit annotation has been given. (#173)

0.2.0 (2022-03-08)

New features

  • Added a P2300 cuda_scheduler along with various helper functionalities. (#37, #128)

  • Re-enabled support for APEX. (#104)

  • Added P2300 scheduler queries. (#102)

  • Added top-level API headers for CUDA, MPI, and added thread manager, resource partitioner functionality to pika/runtime.hpp header. (#117, #123)

  • Added when_all_vector, a variant of when_all that works with vectors of senders. (#109, #132)

Breaking changes

  • Bumped the minimum required compiler versions to GCC 9 and clang 9. (#70)

  • Removed the filesystem compatibility layer based on Boost.Filesystem. std::filesystem support is now required from the standard library. (#70)

  • Changed the default value of the configuration option pika.exception_verbosity to 1 (previously 2). Exceptions will now by default not print the pika configuration and environment variables. (#99)

  • Yielding of pika threads is now disallowed with uncaught exceptions (with an assertion) to prevent hard-to-debug errors when thread stealing is enabled. (#112)

Bugfixes

  • pika threads are now again rescheduled on the worker thread where they are suspended. (#110)

  • Fixed a bug in the reference counting of the shared state in ensure_started and split that prevented it from being freed. (#111)

  • Fixed deadlocks in stop_token. (#113)

0.1.0 (2022-01-31)

This is the first release of pika.