Parallel API#
oneDPL provides the set of algorithms with execution policies as defined by the C++ Standard. All those algorithms work with C++ Standard aligned execution policies and with DPC++ execution policies.
Additionally, oneDPL provides wrapper functions for SYCL buffers, special iterators, and a set of non-standard parallel algortithms.
C++ Standard aligned execution policies#
oneDPL has the set of execution policies and related utilities that are semantically aligned with the C++ Standard:
// Defined in <oneapi/dpl/execution>
namespace oneapi {
namespace dpl {
namespace execution {
class sequenced_policy { /*unspecified*/ };
class parallel_policy { /*unspecified*/ };
class parallel_unsequenced_policy { /*unspecified*/ };
class unsequenced_policy { /*unspecified*/ };
inline constexpr sequenced_policy seq { /*unspecified*/ };
inline constexpr parallel_policy par { /*unspecified*/ };
inline constexpr parallel_unsequenced_policy par_unseq { /*unspecified*/ };
inline constexpr unsequenced_policy unseq { /*unspecified*/ };
template <class T>
struct is_execution_policy;
template <class T>
inline constexpr bool is_execution_policy_v = oneapi::dpl::execution::is_execution_policy<T>::value;
}
}
}
See “Execution policies” in the C++ Standard for more information.
DPC++ Execution Policy#
A DPC++ execution policy class oneapi::dpl::execution::device_policy
specifies
where and how an algorithm runs.
// Defined in <oneapi/dpl/execution>
namespace oneapi {
namespace dpl {
namespace execution {
template <typename KernelName = /*unspecified*/>
class device_policy;
device_policy<> dpcpp_default;
template <typename KernelName = /*unspecified*/>
device_policy<KernelName>
make_device_policy( sycl::queue );
template <typename KernelName = /*unspecified*/>
device_policy<KernelName>
make_device_policy( sycl::device );
template <typename NewKernelName, typename OldKernelName>
device_policy<NewKernelName>
make_device_policy( const device_policy<OldKernelName>& = dpcpp_default );
}
}
}
dpcpp_default
is a predefined execution policy object to run algorithms on the default DPC++ device.
device_policy class#
template <typename KernelName = /*unspecified*/>
class device_policy
{
public:
using kernel_name = KernelName;
device_policy();
template <typename OtherName>
device_policy( const device_policy<OtherName>& );
explicit device_policy( sycl::queue );
explicit device_policy( sycl::device );
sycl::queue queue() const;
operator sycl::queue() const;
};
An object of the device_policy
type is associated with a sycl::queue
that is used
to run algorithms on a DPC++ compliant device. When an algorithm runs with device_policy
it is capable of processing SYCL buffers (passed via oneapi::dpl::begin/end
),
data in the host memory and data in Unified Shared Memory (USM), including USM device memory.
Data placed in the host memory and USM can only be passed to oneDPL algorithms
as pointers and random access iterators. The way to transfer data from the host memory
to a device and back is unspecified; per-element data movement to/from a temporary storage
is a possible valid implementation.
The KernelName
template parameter, also aliased as kernel_name
within the class template,
is to explicitly provide a name for DPC++ kernels executed by an algorithm the policy is passed to.
device_policy()
Construct a policy object associated with a queue created with the default device selector.
template <typename OtherName>
device_policy( const device_policy<OtherName>& policy )
Construct a policy object associated with the same queue as policy
, by changing
the kernel name of the given policy to kernel_name
defined for the new policy.
explicit device_policy( sycl::queue queue )
Construct a policy object associated with the given queue.
explicit device_policy( sycl::device device )
Construct a policy object associated with a queue created for the given device.
sycl::queue queue() const
Return the queue the policy is associated with.
operator sycl::queue() const
Allow implicit conversion of the policy to a sycl::queue
object.
make_device_policy function#
The make_device_policy
function templates simplify device_policy
creation.
template <typename KernelName = /*unspecified*/>
device_policy<KernelName>
make_device_policy( sycl::queue queue )
Return a policy object associated with queue
, with a kernel name possibly provided
as the template argument, otherwise unspecified.
template <typename KernelName = /*unspecified*/>
device_policy<KernelName>
make_device_policy( sycl::device device )
Return a policy object to run algorithms on device
, with a kernel name possibly provided
as the template argument, otherwise unspecified.
template <typename NewKernelName, typename OldKernelName>
device_policy<NewKernelName>
make_device_policy( const device_policy<OldKernelName>& policy = dpcpp_default )
Return a policy object constructed from policy
, with a new kernel name provided as the template
argument. If no policy object is provided, the new policy is constructed from dpcpp_default
.
Buffer wrappers#
// Defined in <oneapi/dpl/iterator>
namespace oneapi {
namespace dpl {
template < typename T, typename AllocatorT, sycl::access::mode Mode >
/*unspecified*/ begin( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::mode_tag_t<Mode> tag = sycl::read_write );
template < typename T, typename AllocatorT, sycl::access::mode Mode >
/*unspecified*/ begin( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::mode_tag_t<Mode> tag, sycl::property::noinit );
template < typename T, typename AllocatorT >
/*unspecified*/ begin( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::property::noinit );
template < typename T, typename AllocatorT, sycl::access::mode Mode >
/*unspecified*/ end( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::mode_tag_t<Mode> tag = sycl::read_write );
template < typename T, typename AllocatorT, sycl::access::mode Mode >
/*unspecified*/ end( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::mode_tag_t<Mode> tag, sycl::property::noinit );
template < typename T, typename AllocatorT >
/*unspecified*/ end( sycl::buffer<T, /*dim=*/1, AllocatorT> buf,
sycl::property::noinit );
}
}
oneapi::dpl::begin
and oneapi::dpl::end
are helper functions
for passing DPC++ buffers to oneDPL algorithms.
These functions accept a buffer and return an object
of an unspecified type that satisfies the following requirements:
it is
CopyConstructible
,CopyAssignable
, and comparable with operators==
and!=
;the following expressions are valid:
a + n
,a - n
,a - b
, wherea
andb
are objects of the type, andn
is an integer value;it provides the
get_buffer()
method that returns the buffer passed to thebegin
orend
function.
When invoking an algorithm, the buffer passed to begin
should be the same
as the buffer passed to end
. Otherwise, the behavior is undefined.
sycl::mode_tag_t
and sycl::property::noinit
parameters allow to specify
an access mode to be used for accessing the buffer by algorithms.
The mode serves as a hint, and can be overridden depending on semantics of the algorithm.
When invoking an algorithm, the same access mode arguments should be used
for begin
and end
. Otherwise, the behavior is undefined.
using namespace oneapi;
auto buf_begin = dpl::begin(buf, sycl::write_only);
auto buf_end_1 = dpl::end(buf, sycl::write_only);
auto buf_end_2 = dpl::end(buf, sycl::write_only, sycl::noinit);
dpl::fill(dpl::dpcpp_default, buf_begin, buf_end_1, 42); // allowed
dpl::fill(dpl::dpcpp_default, buf_begin, buf_end_2, 42); // not allowed
Iterators#
The oneDPL iterators are defined in the <oneapi/dpl/iterator>
header,
in namespace oneapi::dpl
.
template <typename Integral>
class counting_iterator
{
public:
using difference_type = /* a signed integer type of the same size as Integral */;
using value_type = Integral;
using reference = Integral;
counting_iterator();
explicit counting_iterator(Integral init);
reference operator*() const;
reference operator[](difference_type i) const;
difference_type operator-(const counting_iterator& it) const;
counting_iterator operator+(difference_type forward) const;
counting_iterator operator-(difference_type backward) const;
counting_iterator& operator+=(difference_type forward);
counting_iterator& operator-=(difference_type backward);
counting_iterator& operator++();
counting_iterator& operator--();
counting_iterator& operator++(int);
counting_iterator& operator--(int);
bool operator==(const counting_iterator& it) const;
bool operator!=(const counting_iterator& it) const;
bool operator<(const counting_iterator& it) const;
bool operator>(const counting_iterator& it) const;
bool operator<=(const counting_iterator& it) const;
bool operator>=(const counting_iterator& it) const;
};
counting_iterator
is a random access iterator-like type that represents an integer counter.
When dereferenced, counting_iterator
provides an Integral rvalue equal to the value of the
counter; dereference operations cannot be used to modify the counter. The arithmetic and comparison
operators of counting_iterator
behave as if applied to the values of Integral type
representing the counters of the iterator instances passed to the operators.
class discard_iterator
{
public:
using difference_type = std::ptrdiff_t;
using value_type = /* unspecified */;
using reference = /* unspecified */;
discard_iterator();
explicit discard_iterator(difference_type init);
reference operator*() const;
reference operator[](difference_type) const;
difference_type operator-(const discard_iterator& it) const;
discard_iterator operator+(difference_type forward) const;
discard_iterator operator-(difference_type backward) const;
discard_iterator& operator+=(difference_type forward);
discard_iterator& operator-=(difference_type backward);
discard_iterator& operator++();
discard_iterator& operator--();
discard_iterator operator++(int);
discard_iterator operator--(int);
bool operator==(const discard_iterator& it) const;
bool operator!=(const discard_iterator& it) const;
bool operator<(const discard_iterator& it) const;
bool operator>(const discard_iterator& it) const;
};
discard_iterator
is a random access iterator-like type that, when dereferenced, provides an
lvalue that may be assigned an arbitrary value. The assignment has no effect on the
discard_iterator
instance; the write is discarded. The arithmetic and comparison operators
of discard_iterator
behave as if applied to integer counter values maintained by the
iterator instances to determine their position relative to each other.
template <typename SourceIterator, typename IndexMap>
class permutation_iterator
{
public:
using difference_type =
typename std::iterator_traits<SourceIterator>::difference_type;
using value_type = typename std::iterator_traits<SourceIterator>::value_type;
using pointer = typename std::iterator_traits<SourceIterator>::pointer;
using reference = typename std::iterator_traits<SourceIterator>::reference;
permutation_iterator(const SourceIterator& input1, const IndexMap& input2,
std::size_t index = 0);
SourceIterator base() const;
reference operator*() const;
reference operator[](difference_type i) const;
difference_type operator-(const permutation_iterator& it) const;
permutation_iterator operator+(difference_type forward) const;
permutation_iterator operator-(difference_type backward) const;
permutation_iterator& operator+=(difference_type forward);
permutation_iterator& operator-=(difference_type forward);
permutation_iterator& operator++();
permutation_iterator& operator--();
permutation_iterator operator++(int);
permutation_iterator operator--(int);
bool operator==(const permutation_iterator& it) const;
bool operator!=(const permutation_iterator& it) const;
bool operator<(const permutation_iterator& it) const;
bool operator>(const permutation_iterator& it) const;
bool operator<=(const permutation_iterator& it) const;
bool operator>=(const permutation_iterator& it) const;
};
permutation_iterator
is a random access iterator-like type whose dereferenced value set is
defined by the source iterator provided, and whose iteration order over the dereferenced value set
is defined by either another iterator or a functor that maps the permutation_iterator
index
to the index of the source iterator. The arithmetic and comparison operators of
permutation_iterator
behave as if applied to integer counter values maintained by the
iterator instances to determine their position in the index map.
permutation_iterator::operator*
uses the counter value of the instance on which
it is invoked to index into the index map. The corresponding value in the map is then used
to index into the value set defined by the source iterator. The resulting lvalue is returned
as the result of the operator.
permutation_iterator::operator[]
uses the parameter i
to index into the index map. The corresponding value in the map is then used
to index into the value set defined by the source iterator. The resulting lvalue is returned
as the result of the operator.
template <typename SourceIterator, typename IndexMap>
permutation_iterator<SourceIterator, IndexMap>
make_permutation_iterator(SourceIterator source, IndexMap map);
make_permutation_iterator
constructs and returns an instance of permutation_iterator
using the source iterator and index map provided.
template <typename Iterator, typename UnaryFunc>
class transform_iterator
{
public:
using difference_type = typename std::iterator_traits<Iterator>::difference_type;
using reference = typename std::invoke_result<UnaryFunc,
typename std::iterator_traits<Iterator>::reference>::type;
using value_type = typename std::remove_reference<reference>::type;
using pointer = typename std::iterator_traits<Iterator>::pointer;
Iterator base() const;
transform_iterator(Iterator it, UnaryFunc unary_func);
transform_iterator(const transform_iterator& input);
transform_iterator& operator=(const transform_iterator& input);
reference operator*() const;
reference operator[](difference_type i) const;
difference_type operator-(const transform_iterator& it) const
transform_iterator operator+(difference_type forward) const;
transform_iterator operator-(difference_type backward) const;
transform_iterator& operator+=(difference_type forward);
transform_iterator& operator-=(difference_type backward);
transform_iterator& operator++();
transform_iterator& operator--();
transform_iterator operator++(int);
transform_iterator operator--(int);
bool operator==(const transform_iterator& it) const;
bool operator!=(const transform_iterator& it) const;
bool operator<(const transform_iterator& it) const;
bool operator>(const transform_iterator& it) const;
bool operator<=(const transform_iterator& it) const;
bool operator>=(const transform_iterator& it) const;
};
transform_iterator
is a random access iterator-like type whose dereferenced value set is
defined by the unary function and source iterator provided. When dereferenced,
transform_iterator
provides the result of the unary function applied to the corresponding
element of the source iterator; dereference operations cannot be used to modify the elements of
the source iterator unless the unary function result includes a reference to the element. The
arithmetic and comparison operators of transform_iterator
behave as if applied to the
source iterator itself.
template <typename UnaryFunc, typename Iterator>
transform_iterator<UnaryFunc, Iterator>
make_transform_iterator(Iterator, UnaryFunc);
make_transform_iterator
constructs and returns an instance of transform_iterator
using the source iterator and unary function object provided.
template <typename... Iterators>
class zip_iterator
{
public:
using difference_type = typename std::make_signed<std::size_t>::type;
using value_type =
std::tuple<typename std::iterator_traits<Iterators>::value_type...>;
using reference = /* unspecified tuple of reference types */;
using pointer =
std::tuple<typename std::iterator_traits<Iterators>::pointer...>;
std::tuple<Iterators...> base() const;
zip_iterator();
explicit zip_iterator(Iterators... args);
zip_iterator(const zip_iterator& input);
zip_iterator& operator=(const zip_iterator& input);
reference operator*() const;
reference operator[](difference_type i) const;
difference_type operator-(const zip_iterator& it) const;
zip_iterator operator-(difference_type backward) const;
zip_iterator operator+(difference_type forward) const;
zip_iterator& operator+=(difference_type forward);
zip_iterator& operator-=(difference_type backward);
zip_iterator& operator++();
zip_iterator& operator--();
zip_iterator operator++(int);
zip_iterator operator--(int);
bool operator==(const zip_iterator& it) const;
bool operator!=(const zip_iterator& it) const;
bool operator<(const zip_iterator& it) const;
bool operator>(const zip_iterator& it) const;
bool operator<=(const zip_iterator& it) const;
bool operator>=(const zip_iterator& it) const;
};
zip_iterator
is an iterator-like type defined over one or more iterators. When dereferenced,
the value returned from zip_iterator
is a tuple of the values returned by dereferencing the
source iterators over which the zip_iterator
is defined. The arithmetic operators of
zip_iterator
update the source iterators of a zip_iterator
instance as though the
operation were applied to each of these iterators.
template <typename... Iterators>
zip_iterator<Iterators...>
make_zip_iterator(Iterators...);
make_zip_iterator
constructs and returns an instance of zip_iterator
using the set of source iterators provided.
Parallel Algorithms#
The parallel algorithms are defined in the <oneapi/dpl/algorithm>
header,
in namespace oneapi::dpl
.
template<typename Policy, typename InputKeyIt, typename InputValueIt,
typename OutputValueIt,
typename T = typename std::iterator_traits<InputValueIt>::value_type,
typename BinaryPred =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type>,
typename BinaryOp =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>>
OutputValueIt
exclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first,
InputKeyIt keys_last, InputValueIt values_first, OutputValueIt values_result,
T initial_value = 0,
BinaryPred binary_pred =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type>(),
BinaryOp binary_op =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>());
oneapi::dpl::exclusive_scan_by_segment
performs partial prefix scans by applying the
binary_op
operation to a sequence of values. Each partial scan applies to a contiguous
subsequence determined by the keys associated with the values being equal according to the
binary_pred
predicate, and the first element of each scan is the initial value provided.
The return value is an iterator targeting the end of the result sequence.
The initial value used if one is not provided is an instance of the value_type
of the
InputValueIt
iterator type initialized to 0. If no binary predicate is provided for the
comparison of keys an instance of std::equal_to
with the value_type
of the
InputKeyIt
iterator type is used. Finally, an instance of std::plus
with the
value_type
of the InputValueIt
iterator type is used if no binary operator is
provided to combine the elements of the value subsequences.
template<typename Policy, typename InputKeyIt, typename InputValueIt,
typename OutputValueIt,
typename BinaryPredcate =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type,
typename BinaryOp =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>>
OutputValueIt
inclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first,
InputKeyIt keys_last, InputValueIt values_first, OutputValueIt values_result
BinaryPred binary_pred =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type>(),
BinaryOp binary_op =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>());
oneapi::dpl::inclusive_scan_by_segment
performs partial prefix scans by applying the
binary_op
operation to a sequence of values. Each partial scan applies to a contiguous
subsequence determined by the keys associated with the values being equal according to the
binary_pred
predicate. The return value is an iterator targeting the end of the result
sequence.
If no binary predicate is provided for the comparison of keys an instance of std::equal_to
with the value_type
of the InputKeyIt
iterator type is used. An instance of
std::plus
with the value_type
of the InputValueIt
iterator type is used if
no binary operator is provided to combine the elements of the value subsequences.
template<typename Policy, typename InputKeyIt, typename InputValueIt,
typename OutputKeyIt, typename OutputValueIt,
typename BinaryPredcate =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type,
typename BinaryOp =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>>
std::pair<OutputKeyIt,OutputValueIt>
reduce_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
InputValueIt values_first, OutputKeyIt keys_result,
OutputValueIt values_result,
BinaryPred binary_pred =
std::equal_to<typename std::iterator_traits<InputKeyIt>::value_type>(),
BinaryOp binary_op =
std::plus<typename std::iterator_traits<InputValueIt>::value_type>());
oneapi::dpl::reduce_by_segment
performs partial reductions on a sequence of values. Each
reduction is computed with the binary_op
operation for a contiguous subsequence of values
determined by the associated keys being equal according to the binary_pred
predicate.
For each subsequence the first of the equal keys is stored into keys_result
and the computed
reduction is stored into values_result
. The return value is a pair of
iterators holding the end of the resulting sequences.
If no binary predicate is provided for the comparison of keys an instance of std::equal_to
with the value_type
of the InputKeyIt
iterator type is used. An instance of
std::plus
with the value_type
of the InputValueIt
iterator type is used to
combine the values in each subsequence identified if a binary operator is not provided.
template<typename Policy, typename InputIt1, typename InputIt2, typename OutputIt,
typename Comparator =
std::less<typename std::iterator_traits<InputIt>::value_type>>
OutputIt
binary_search(Policy&& policy, InputIt1 start, InputIt1 end,
InputIt2 value_first, InputIt2 value_last, OutputIterator result,
Comparator comp =
std::less<typename std::iterator_traits<InputIt1>::value_type>());
oneapi::dpl::binary_search
performs a binary search over the data in [start, end)
for each value in [value_first, value_last)
. If the value exists in the data searched then
the corresponding element in [result, result + distance(value_first, value_last))
is set to
true, otherwise it is set to false.
If no comparator is provided, operator<
is used to determine when the search value is less
than an element in the range being searched.
The elements e of [start, end) must be partitioned with respect to the comparator used. For all elements e in [start, end) and a given search value v in [value_first, value_last) comp(e, v) implies !comp(v, e).
template<typename Policy, typename InputIt1, typename InputIt2, typename OutputIt,
typename Comparator =
std::less<typename std::iterator_traits<InputIt>::value_type>>
OutputIt
lower_bound(Policy&& policy, InputIt1 start, InputIt1 end,
InputIt2 value_first, InputIt2 value_last, OutputIterator result,
Comparator comp =
std::less<typename std::iterator_traits<InputIt1>::value_type>());
oneapi::dpl::lower_bound
performs a binary search over the data in [start, end)
for
each value in [value_first, value_last)
to find the lowest index at which the search value
could be inserted in [start, end)
without violating the ordering defined by the comparator
provided. That lowest index is then assigned to the corresponding element in
[result, result + distance(value_first, value_last))
.
If no comparator is provided, operator<
is used to determine when the search value is less
than an element in the range being searched.
The elements e of [start, end) must be partitioned with respect to the comparator used.
template<typename Policy, typename InputIt1, typename InputIt2, typename OutputIt,
typename Comparator =
std::less<typename std::iterator_traits<InputIt>::value_type>>
OutputIt
upper_bound(Policy&& policy, InputIt1 start, InputIt1 end,
InputIt2 value_first, InputIt2 value_last, OutputIterator result,
Comparator comp =
std::less<typename std::iterator_traits<InputIt1>::value_type>());
oneapi::dpl::upper_bound
performs a binary search over the data in [start, end)
for each value in [value_first, value_last)
to find the highest index at which the search
value could be inserted in [start, end)
without violating the ordering defined by the
comparator provided. That highest index is then assigned to the corresponding element in
[result, result + distance(value_first, value_last))
.
If no comparator is provided, operator<
is used to determine when the search value is less
than an element in the range being searched.
The elements e of [start, end) must be partitioned with respect to the comparator used.