imatcopy_batch#
Computes a group of in-place scaled matrix transpose or copy operations using general dense matrices.
Description
The imatcopy_batch
routines perform a series of in-place scaled matrix
copies or transpositions. They are batched versions of imatcopy,
but the imatcopy_batch
routines perform their operations with
groups of matrices. Each group contains matrices with the same parameters.
There is a strided API, in which the matrices in a batch are a set
distance away from each other in memory and in which all matrices
share the same parameters (for example matrix size), and a more
flexible group API where each group of matrices has the same
parameters but the user may provide multiple groups that have
different parameters. The group API argument structure is better
suited to USM pointers than to sycl::buffer
arguments, so we
only specify it for USM inputs. The strided API works with both USM
and buffer memory.
strided API
group API
Buffer memory
supported
not supported
USM pointers
supported
supported
imatcopy_batch
supports the following precisions:
T
float
double
std::complex<float>
std::complex<double>
imatcopy_batch (Buffer Version)#
Description
The buffer version of imatcopy_batch
supports only the strided API.
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
C is a matrix at offset i * stride in matrix_array_in_out
C := alpha * op(C)
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha
is a scalar,
C
is a matrix to be transformed in place,
and C
is m
x n
.
The matrix_array_in_out
buffer contains all the input matrices. The stride
between matrices is given by the stride
parameter. The total
number of matrices in matrix_array_in_out
is given by the batch_size
parameter.
Strided API
Syntax
namespace oneapi::math::blas::column_major {
void imatcopy_batch(sycl::queue &queue,
oneapi::math::transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &matrix_array_in_out,
std::int64_t ld_in,
std::int64_t ld_out,
std::int64_t stride,
std::int64_t batch_size);
}
namespace oneapi::math::blas::row_major {
void imatcopy_batch(sycl::queue &queue,
oneapi::math::transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &matrix_array_in_out,
std::int64_t ld_in,
std::int64_t ld_out,
std::int64_t stride,
std::int64_t batch_size);
}
Input Parameters
- queue
The queue where the routine should be executed.
- trans
Specifies op(
C
), the transposition operation applied to the matricesC
. See oneMath defined datatypes for more details.- m
Number of rows of each matrix
C
on input. Must be at least zero.- n
Number of columns of each matrix
C
on input. Must be at least zero.- alpha
Scaling factor for the matrix transpositions or copies.
- matrix_array_in_out
Buffer holding the input matrices
C
with sizestride
*batch_size
.- ld_in
The leading dimension of the matrices
C
on input. It must be positive, and must be at leastm
if column major layout is used, and at leastn
if row-major layout is used.- ld_out
The leading dimension of the matrices
C
on output. It must be positive.C
not transposedC
transposedColumn major
ld_out
must be at leastm
.ld_out
must be at leastn
.Row major
ld_out
must be at leastn
.ld_out
must be at leastm
.- stride
Stride between different
C
matrices.C
not transposedC
transposedColumn major
stride
must be at leastmax(ld_in*m, ld_out*m)
.stride
must be at leastmax(ld_in*m, ld_out*n)
.Row major
stride
must be at leastmax(ld_in*n, ld_out*n)
.stride
must be at leastmax(ld_in*n, ld_out*m)
.- batch_size
Specifies the number of matrix transposition or copy operations to perform.
Output Parameters
- matrix_array_in_out
Output buffer, overwritten by
batch_size
matrix copy or transposition operations of the formalpha
* op(C
).
Throws
This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.
oneapi::math::invalid_argument
oneapi::math::unsupported_device
imatcopy_batch (USM Version)#
Description
The USM version of imatcopy_batch
supports the group API and the strided API.
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
m,n, alpha, ld_in, ld_out and group_size at position i in their respective arrays
for j = 0 … group_size – 1
C is a matrix at position idx in matrix_array_in_out
C := alpha * op(C)
idx := idx + 1
end for
end for
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
C is a matrix at offset i * stride in matrix_array_in_out
C := alpha * op(C)
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha
is a scalar,
C
is a matrix to be transformed in place,
and C
is m
x n
.
For the group API, the matrices are given by arrays of pointers. C
represents a matrix stored at the address pointed to by matrix_array_in_out
.
The number of entries in matrix_array_in_out
is given by:
For the strided API, the single array C contains all the matrices
to be transformed in place. The locations of the individual matrices within
the buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
Group API
Syntax
namespace oneapi::math::blas::column_major {
event imatcopy_batch(sycl::queue &queue,
const oneapi::math::transpose *trans_array,
const std::int64_t *m_array,
const std::int64_t *n_array,
const T *alpha_array,
T **matrix_array_in_out,
const std::int64_t *ld_in_array,
const std::int64_t *ld_out_array,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::math::blas::row_major {
event imatcopy_batch(sycl::queue &queue,
const oneapi::math::transpose *trans_array,
const std::int64_t *m_array,
const std::int64_t *n_array,
const T *alpha_array,
T **matrix_array_in_out,
const std::int64_t *ld_in_array,
const std::int64_t *ld_out_array,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters
- queue
The queue where the routine should be executed.
- trans_array
Array of size
group_count
. Each elementi
in the array specifiesop(C)
the transposition operation applied to the matrices C.- m_array
Array of size
group_count
of number of rows of C on input. Each must be at least 0.- n_array
Array of size
group_count
of number of columns of C on input. Each must be at least 0.- alpha_array
Array of size
group_count
containing scaling factors for the matrix transpositions or copies.- matrix_array_in_out
Array of size
total_batch_count
, holding pointers to arrays used to store C matrices.- ld_in_array
Array of size
group_count
. The leading dimension of the matrix inputC
. If matrices are stored using column major layout,ld_in_array[i]
must be at leastm_array[i]
. If matrices are stored using row major layout,ld_in_array[i]
must be at leastn_array[i]
. Must be positive.- ld_out_array
Array of size
group_count
. The leading dimension of the output matrixC
. Each entryld_out_array[i]
must be positive and at least:m_array[i]
if column major layout is used andC
is not transposedm_array[i]
if row major layout is used andC
is transposedn_array[i]
otherwise
- group_count
Number of groups. Must be at least 0.
- group_size
Array of size
group_count
. The elementgroup_size[i]
is the number of matrices in the groupi
. Each element ingroup_size
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- matrix_array_in_out
Output array of pointers to
C
matrices, overwritten bytotal_batch_count
matrix transpose or copy operations of the formalpha*op(C)
.
Return Values
Output event to wait on to ensure computation is complete.
Strided API
Syntax
namespace oneapi::math::blas::column_major {
sycl::event imatcopy_batch(sycl::queue &queue,
oneapi::math::transpose trans,
std::int64_t m,
std::int64_t n,
value_or_pointer<T> alpha,
const T *matrix_array_in_out,
std::int64_t ld_in,
std::int64_t ld_out,
std::int64_t stride,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
namespace oneapi::math::blas::row_major {
sycl::event imatcopy_batch(sycl::queue &queue,
oneapi::math::transpose trans,
std::int64_t m,
std::int64_t n,
value_or_pointer<T> alpha,
const T *matrix_array_in_out,
std::int64_t ld_in,
std::int64_t ld_out,
std::int64_t stride,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
Input Parameters
- queue
The queue where the routine should be executed.
- trans
Specifies
op(C)
, the transposition operation applied to the matrices C.- m
Number of rows for each matrix
C
on input. Must be at least 0.- n
Number of columns for each matrix
C
on input. Must be at least 0.- alpha
Scaling factor for the matrix transpose or copy operation. See Scalar Arguments in BLAS for more details.
- matrix_array_in_out
Array holding the matrices
C
. Must have size at leaststride*batch_size
.- ld_in
Leading dimension of the
C
matrices on input. If matrices are stored using column major layout,ld_in
must be at leastm
. If matrices are stored using row major layout,ld_in
must be at leastn
. Must be positive.- ld_out
Leading dimension of the
C
matrices on output. If matrices are stored using column major layout,ld_out
must be at leastm
ifC
is not transposed orn
ifC
is transposed. If matrices are stored using row major layout,ld_out
must be at leastn
ifC
is not transposed or at leastm
ifC
is transposed. Must be positive.- stride
Stride between different
C
matrices withinmatrix_array_in_out
.C
not transposedC
transposedColumn major
stride
must be at leastmax(ld_in*m, ld_out*m)
.stride
must be at leastmax(ld_in*m, ld_out*n)
.Row major
stride
must be at leastmax(ld_in*n, ld_out*n)
.stride
must be at leastmax(ld_in*n, ld_out*m)
.- batch_size
Specifies the number of matrices to transpose or copy.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- matrix_array_in_out
Output array, overwritten by
batch_size
matrix transposition or copy operations of the formalpha*op(C)
.
Return Values
Output event to wait on to ensure computation is complete.
Throws
This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.
oneapi::math::invalid_argument
oneapi::math::unsupported_device
oneapi::math::device_bad_alloc
Parent topic: BLAS-like Extensions