oneCCL Concepts#
oneCCL specification defines the following list of concepts:
Device#
Note
Here and below, a native device/context/stream/event are defined in the scope of SYCL device runtime
using native_device_type = sycl::device;
using native_context_type = sycl::context;
using native_stream_type = sycl::queue;
using native_event_type = sycl::event;
oneCCL specification defines device
as an abstraction of a computational device: a CPU, a specific GPU card in the system, or any other device participating in a communication operation. device
corresponds to the communicator’s rank (addressable entity in a communication operation).
oneCCL specification defines the way to create an instance of the device
class with a native object (native_device_type
) and without a native object (corresponds to the host).
Creating a new device object:
device ccl::create_device(native_device_type& native_device);
device ccl::create_device();
- native_device
the existing native device object
- return
device
a device object
device
class shall provide ability to retrieve a native object.
Retrieving a native device object:
native_device_type device::get_native();
- return
native_device_type
- a native device objectshall throw exception if a
device
object does not wrap the native object
Context#
oneCCL specification defines context
as an abstraction of a computational devices context that is responsible for managing resources and for executing of communication operations on one or more devices specified in the context.
oneCCL specification defines the way to create an instance of the context
class with a native object (native_context_type
) and without a native object.
Creating a new context object:
context ccl::create_context(native_context_type& native_context);
context ccl::create_context();
- native_context
the existing native context object
- return
context
a context object
context
class shall provide ability to retrieve a native object.
Retrieving a native context object:
native_context_type context::get_native();
- return
native_context_type
- a native context objectshall throw exception if a
context
object does not wrap the native object
Key-Value Store#
kvs_interface
defines the key-value store (KVS) interface to be used to establish connection between ranks during the creation of oneCCL communicator. The interface shall include blocking get
and set
methods.
Getting a record from the key-value store:
virtual vector_class<char> kvs_interface::get(
const string_class& key) = 0;
- key
the key of value to be retrieved
- return
vector_class<char>
the value associated with the given key
Note
get
operation with a non-existing key shall return empty result
Saving a record in the key-value store:
void kvs_interface::set(
const string_class& key,
const vector_class<char>& data) = 0;
- key
the key at which the value should be stored
- data
the value that should be associated with the given key
Note
set
operation with empty data
shall remove a record from the key-value store
oneCCL specification defines kvs
class as a built-in KVS provided by oneCCL.
class kvs : public kvs_interface {
public:
static constexpr size_t address_max_size = 256;
using address_type = array_class<char, address_max_size>;
~kvs() override;
address_type get_address() const;
vector_class<char> get(
const string_class& key) override;
void set(
const string_class& key,
const vector_class<char>& data) override;
}
Retrieving an address of built-in key-value store:
kvs::address_type kvs::get_address() const;
- return
kvs::address_type
- the address of the key-value storeshould be retrieved from the main built-in KVS and distributed to other processes for the built-in KVS creation
Creating a main built-in key-value store. Its address should be distributed using an out-of-band communication mechanism and be used to create key-value stores on other ranks:
shared_ptr_class<kvs> ccl::create_main_kvs();
- return
shared_ptr_class<kvs>
the main key-value store object
Creating a new key-value store from main kvs address:
shared_ptr_class<kvs> ccl::create_kvs(const kvs::address_type& addr);
- addr
the address of the main kvs
- return
shared_ptr_class<kvs>
key-value store object
Communicator#
oneCCL specification defines communicator
class that describes a group of communicating ranks, where a rank is an addressable entity in a communication operation and corresponds to single oneCCL device.
communicator
defines communication operations on memory buffers between homogeneous oneCCL devices, that is, all oneCCL devices either wrap native device objects of the same type (for example CPUs only or GPUs only) or do not wrap native objects.
Each process may correspond to multiple ranks.
Note
Support for multiple ranks per process is optional
Creating a new communicator(s) with user-supplied communicator size, rank-to-device mapping/rank, context and kvs:
Note
If device
and context
objects are omitted, then they are created with ccl::create_device()
and ccl::create_context()
functions without native objects
vector_class<communicator> ccl::create_communicators(
int size,
const map_class<int, device>& rank_device_map,
const context& context,
shared_ptr_class<kvs_interface> kvs);
communicator ccl::create_communicator(
int size,
int rank,
shared_ptr_class<kvs_interface> kvs);
- size
user-supplied total number of ranks
- rank_device_map
user-supplied mapping of local ranks on devices
- rank
user-supplied local rank
- context
device context
- kvs
key-value store for ranks wire-up
- return
vector_class<communicator>
/communicator
a vector of communicator objects / a communicator object
communicator
shall provide methods to retrieve the rank, the device, and the context that correspond to the communicator object as well as the total number of ranks in the communicator.
Retrieving the rank in a communicator:
int communicator::rank() const;
- return
int
the rank that corresponds to the communicator object
Retrieving the total number of ranks in a communicator:
int communicator::size() const;
- return
int
the total number of the ranks
Retrieving an underlying device, which was used as communicator construction argument:
device communicator::get_device() const;
- return
device
the device that corresponds to the communicator object
Retrieving an underlying context, which was used as communicator construction argument:
context communicator::get_context() const;
- return
context
the context that corresponds to the communicator object
Note
See also: Collective Operations
Stream#
oneCCL specification defines stream
as an abstraction that encapsulates execution context for communicator
communication operations.
Stream shall be passed to communicator
communication operation.
oneCCL specification defines the way to create an instance of the stream
class with a native object (native_stream_type
) and without a native object.
Creating a new stream object:
stream ccl::create_stream(native_stream_type& native_stream);
stream ccl::create_stream();
- native_stream
the existing native stream object
- return
stream
a stream object
stream
class shall provide ability to retrieve a native object.
Retrieving a native stream object:
native_stream_type stream::get_native();
- return
native_stream_type
- a native stream objectshall throw exception if a
stream
object does not wrap the native object
Event#
oneCCL specification defines event
as an abstraction that encapsulates synchronization context for communicator
communication operations.
Each communication operation of oneCCL shall return an event object for tracking the operation’s progress. A vector of events may be passed to the communicator
communication operation to designate input dependencies for the operation.
Note
Support for handling of input events is optional
oneCCL specification defines the way to create an instance of the event
class with a native object (native_event_type
).
Creating a new event object:
event ccl::create_event(native_event_type& native_event);
- native_event
the existing native event object
- return
event
an event object
event
class shall provide ability to retrieve a native object.
Retrieving a native event object:
native_event_type event::get_native();
- return
native_event_type
- a native event objectshall throw exception if an
event
object does not wrap the native object
Note
See also: Operation Progress Tracking
Operation Attributes#
Communication operation behavior may be controlled through operation attributes.