Conventions¶

oneDNN specification relies on a set of standard naming conventions for variables. This section describes these conventions.

Variable (Tensor) Names¶

Neural network models consist of operations of the following form:

\[\dst = f(\src, \weights),\]

where \(\dst\) and \(\src\) are activation tensors, and \(\weights\) are learnable tensors.

The backward propagation therefore consists in computing the gradients with respect to the \(\src\)weights` respectively:

\[\diffsrc = \mathrm{d} f_{\src}(\diffdst, \src, \weights, \dst),\]

and

\[\diffweights = \mathrm{d} f_{\weights}(\diffdst, \src, \weights, \dst).\]

While oneDNN uses src, dst, and weights as generic names for the activations and learnable tensors, for a specific operation there might be commonly used and widely known specific names for these tensors. For instance, the convolution operation has a learnable tensor called bias. For usability reasons, oneDNN primitives use such names in initialization and other functions.

oneDNN uses the following commonly used notations for tensors:

Name	Meaning
`src`	Source tensor
`dst`	Destination tensor
`weights`	Weights tensor
`bias`	Bias tensor (used in convolution, inner product and other primitives)
`scale_shift`	Scale and shift tensors (used in Batch Normalization and Layer normalization primitives)
`workspace`	Workspace tensor that carries additional information from the forward propagation to the backward propagation
`scratchpad`	Temporary tensor that is required to store the intermediate results
`diff_src`	Gradient tensor with respect to the source
`diff_dst`	Gradient tensor with respect to the destination
`diff_weights`	Gradient tensor with respect to the weights
`diff_bias`	Gradient tensor with respect to the bias
`diff_scale_shift`	Gradient tensor with respect to the scale and shift
`*_layer`	RNN layer data or weights tensors
`*_iter`	RNN recurrent data or weights tensors

RNN-Specific Notation¶

The following notations are used when describing RNN primitives.

Name	Semantics
\(\cdot\)	matrix multiply operator
\(*\)	elementwise multiplication operator
W	input weights
U	recurrent weights
\(\Box^T\)	transposition
B	bias
h	hidden state
a	intermediate value
x	input
\(\Box_t\)	timestamp index
\(\Box_l\)	layer index
activation	tanh, relu, logistic
c	cell state
\(\tilde{c}\)	candidate state
i	input gate
f	forget gate
o	output gate
u	update gate
r	reset gate