Conventions#

oneDNN specification relies on a set of standard naming conventions for variables. This section describes these conventions.

Variable (Tensor) Names#

Neural network models consist of operations of the following form:

dst=f(src,weights),

where dst and src are activation tensors, and weights are learnable tensors.

The backward propagation therefore consists in computing the gradients with respect to the srcand:math:weights` respectively:

diff_src=dfsrc(diff_dst,src,weights,dst),

and

diff_weights=dfweights(diff_dst,src,weights,dst).

While oneDNN uses src, dst, and weights as generic names for the activations and learnable tensors, for a specific operation there might be commonly used and widely known specific names for these tensors. For instance, the convolution operation has a learnable tensor called bias. For usability reasons, oneDNN primitives use such names in initialization and other functions.

oneDNN uses the following commonly used notations for tensors:

Name

Meaning

src

Source tensor

dst

Destination tensor

weights

Weights tensor

bias

Bias tensor (used in convolution, inner product and other primitives)

scale_shift

Scale and shift tensors (used in Batch Normalization and Layer normalization primitives)

workspace

Workspace tensor that carries additional information from the forward propagation to the backward propagation

scratchpad

Temporary tensor that is required to store the intermediate results

diff_src

Gradient tensor with respect to the source

diff_dst

Gradient tensor with respect to the destination

diff_weights

Gradient tensor with respect to the weights

diff_bias

Gradient tensor with respect to the bias

diff_scale

Gradient tensor with respect to the scale

diff_shift

Gradient tensor with respect to the shift

*_layer

RNN layer data or weights tensors

*_iter

RNN recurrent data or weights tensors

RNN-Specific Notation#

The following notations are used when describing RNN primitives.

Name

Semantics

matrix multiply operator

elementwise multiplication operator

W

input weights

U

recurrent weights

T

transposition

B

bias

h

hidden state

a

intermediate value

x

input

t

timestamp index

l

layer index

activation

tanh, relu, logistic

c

cell state

c~

candidate state

i

input gate

f

forget gate

o

output gate

u

update gate

r

reset gate