gemv_batch#

Computes a group of gemv operations.

Description

The gemv_batch routines are batched versions of gemv, performing multiple gemv operations in a single call. Each gemv operations perform a scalar-matrix-vector product and add the result to a scalar-vector product.

gemv_batch supports the following precisions.

T

float

double

std::complex<float>

std::complex<double>

gemv_batch (Buffer Version)#

Description

The buffer version of gemv_batch supports only the strided API.

The strided API operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stridea in a.
    X and Y are matrices at offset i * stridex, i * stridey, in x and y.
    Y := alpha * op(A) * X + beta * Y
end for

where:

op(A) is one of op(A) = A, or op(A) = AT, or op(A) = AH,

alpha and beta are scalars,

A is a matrix and X and Y are vectors,

The x and y buffers contain all the input matrices. The stride between vectors is given by the stride parameter. The total number of vectors in x and y buffers is given by the batch_size parameter.

Strided API

Syntax

namespace oneapi::mkl::blas::column_major {
    void gemv_batch(sycl::queue &queue,
                    onemkl::transpose trans,
                    std::int64_t m,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    T beta,
                    sycl::buffer<T,1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}
namespace oneapi::mkl::blas::row_major {
    void gemv_batch(sycl::queue &queue,
                    onemkl::transpose trans,
                    std::int64_t m,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    T beta,
                    sycl::buffer<T,1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Specifies op(A) the transposition operation applied to the matrices A. See oneMKL defined datatypes for more details.

m

Number of rows of op(A). Must be at least zero.

n

Number of columns of op(A). Must be at least zero.

alpha

Scaling factor for the matrix-vector products.

a

Buffer holding the input matrices A with size stridea * batch_size.

lda

The leading dimension of the matrices A. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between different A matrices. Must be at least zero.

x

Buffer holding the input vectors X with size stridex * batch_size.

incx

The stride of the vector X. Must not be zero.

stridex

Stride between different consecutive X vectors, must be at least 0.

beta

Scaling factor for the vector Y.

y

Buffer holding input/output vectors Y with size stridey * batch_size.

incy

Stride between two consecutive elements of the Y vectors. Must not be zero.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (m - 1)*abs(incy)) if layout is column major or (1 + (n - 1)*abs(incy)) if row major layout is used.

batch_size

Specifies the number of matrix-vector operations to perform.

Output Parameters

y

Output overwritten by batch_size matrix-vector product operations of the form alpha * op(A) * X + beta * Y.

Throws

This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.

oneapi::mkl::invalid_argument

oneapi::mkl::unsupported_device

oneapi::mkl::host_bad_alloc

oneapi::mkl::device_bad_alloc

oneapi::mkl::unimplemented

gemv_batch (USM Version)#

Description

The USM version of gemv_batch supports the group API and strided API.

The group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
    for j = 0 … group_size – 1
        A is an m x n matrix in a[idx]
        X and Y are vectors in x[idx] and y[idx]
        Y := alpha[i] * op(A) * X + beta[i] * Y
        idx = idx + 1
    end for
end for

The strided API operation is defined as

for i = 0 … batch_size – 1
    A is a matrix at offset i * stridea in a.
    X and Y are vectors at offset i * stridex, i * stridey in x and y.
    Y := alpha * op(A) * X + beta * Y
end for

where:

op(A) is one of op(A) = A, or op(A) = AT, or op(A) = AH,

alpha and beta are scalars,

A is a matrix and X and Y are vectors,

For group API, x and y arrays contain the pointers for all the input vectors. A array contains the pointers to all input matrices. The total number of vectors in x and y and matrices in A are given by:

\[total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]\]

For strided API, x and y arrays contain all the input vectors. A array contains the pointers to all input matrices. The total number of vectors in x and y and matrices in A are given by the batch_size parameter.

Group API

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event gemv_batch(sycl::queue &queue,
                           const onemkl::transpose *trans,
                           const std::int64_t *m,
                           const std::int64_t *n,
                           const T *alpha,
                           const T **a,
                           const std::int64_t *lda,
                           const T **x,
                           const std::int64_t *incx,
                           const T *beta,
                           T **y,
                           const std::int64_t *incy,
                           std::int64_t group_count,
                           const std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event gemv_batch(sycl::queue &queue,
                           const onemkl::transpose *trans,
                           const std::int64_t *m,
                           const std::int64_t *n,
                           const T *alpha,
                           const T **a,
                           const std::int64_t *lda,
                           const T **x,
                           const std::int64_t *incx,
                           const T *beta,
                           T **y,
                           const std::int64_t *incy,
                           std::int64_t group_count,
                           const std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Array of group_count onemkl::transpose values. trans[i] specifies the form of op(A) used in the matrix-vector product in group i. See oneMKL defined datatypes for more details.

m

Array of group_count integers. m[i] specifies the number of rows of op(A) for every matrix in group i. All entries must be at least zero.

n

Array of group_count integers. n[i] specifies the number of columns of op(A) for every matrix in group i. All entries must be at least zero.

alpha

Array of group_count scalar elements. alpha[i] specifies the scaling factor for every matrix-vector product in group i.

a

Array of pointers to input matrices A with size total_batch_count.

See Matrix Storage for more details.

lda

Array of group_count integers. lda[i] specifies the leading dimension of A for every matrix in group i. All entries must be positive and at least m if column major layout is used or at least n if row major layout is used.

x

Array of pointers to input vectors X with size total_batch_count.

See Matrix Storage for more details.

incx

Array of group_count integers. incx[i] specifies the stride of X for every vector in group i. Must not be zero.

beta

Array of group_count scalar elements. beta[i] specifies the scaling factor for vector Y for every vector in group i.

y

Array of pointers to input/output vectors Y with size total_batch_count.

See Matrix Storage for more details.

incy

Array of group_count integers. incy[i] specifies the leading dimension of Y for every vector in group i. Must not be zero.

group_count

Specifies the number of groups. Must be at least 0.

group_size

Array of group_count integers. group_size[i] specifies the number of matrix-vector products in group i. All entries must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

y

Overwritten by vector calculated by (alpha[i] * op(A) * X + beta[i] * Y) for group i.

Return Values

Output event to wait on to ensure computation is complete.

Strided API

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event gemv_batch(sycl::queue &queue,
                           onemkl::transpose trans,
                           std::int64_t m,
                           std::int64_t n,
                           T alpha,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T beta,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event gemv_batch(sycl::queue &queue,
                           onemkl::transpose trans,
                           std::int64_t m,
                           std::int64_t n,
                           T alpha,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T beta,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Specifies op(A) the transposition operation applied to the matrices A. See oneMKL defined datatypes for more details.

m

Number of rows of op(A). Must be at least zero.

n

Number of columns of op(A). Must be at least zero.

alpha

Scaling factor for the matrix-vector products.

a

Pointer to the input matrices A with size stridea * batch_size.

lda

The leading dimension of the matrices A. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between different A matrices. Must be at least zero.

x

Pointer to the input vectors X with size stridex * batch_size.

incx

Stride of the vector X. Must not be zero.

stridex

Stride between different consecutive X vectors, must be at least 0.

beta

Scaling factor for the vector Y.

y

Pointer to the input/output vectors Y with size stridey * batch_size.

incy

Stride between two consecutive elements of the y vectors. Must not be zero.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (m - 1)*abs(incy)) if layout is column major or (1 + (n - 1)*abs(incy)) if row major layout is used.

batch_size

Specifies the number of matrix-vector operations to perform.

Output Parameters

y

Output overwritten by batch_size matrix-vector product operations of the form alpha * op(A) * X + beta * Y.

Return Values

Output event to wait on to ensure computation is complete.

Throws

This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.

oneapi::mkl::invalid_argument

oneapi::mkl::unsupported_device

oneapi::mkl::host_bad_alloc

oneapi::mkl::device_bad_alloc

oneapi::mkl::unimplemented

Parent topic: BLAS-like Extensions