Memory Formats¶
In oneDNN memory format is how a multidimensional tensor is stored in 1-dimensional linear memory address space. oneDNN specifies two kinds of memory formats: plain which correspond to traditional multidimensional arrays, and optimized which are completely opaque.
Plain Memory Formats¶
Plain memory formats describe how multidimensional tensors are laid out in memory using an array of \(\operatorname{dimensions}\) and an array of \(\operatorname{strides}\) both of which have length equal to the rank of the tensor. In oneDNN the order of dimensions is fixed and different dimensions can have certain canonical interpretation depending on the primitive. For example, for CNN primitives the order for activation tensors is \(\{N, C, ..., D, H, W\}\), where \(N\) stands for minibatch (or batch size), \(C\) stands for channels, and \(D\), \(H\), and \(W\) stand for image spatial dimensions: depth, height and width respectively. Spatial dimensions may be omitted in the order from outermost to innermost; for example, it is not possible to omit \(H\) when \(D\) is present and it is never possible to omit \(W\). Canonical interpretation is documented for each primitive. However, this means that the \(\operatorname{strides}\) array plays an important role defining the order in which different dimension are laid out in memory. Moreover, the \(\operatorname{strides}\) need to agree with \(\operatorname{dimensions}\).
More precisely, let \(T\) be a tensor of rank \(n\) and let \(\sigma\) be the permutation of the \(\operatorname{strides}\) array that sorts it, i.e. \(\operatorname{strides}[i] \geq \operatorname{strides}[j]\) if \(\sigma(i) < \sigma(j)\) for all \(0 \leq i, j < n\). Then the following must hold:
For an element with coordinates \((i_0, \ldots, i_{n-1})\) such that \(0 \leq i_j < \operatorname{dimensions}[j]\) for \(0 \leq j < n\), its offset in memory is computed as:
Here \(\operatorname{offset_0}\) is the offset from the parent memory and is non-zero only for
submemory memory descriptors created using dnnl::memory::desc::submemory_desc()
.
Submemory memory descriptors inherit strides from the parent memory descriptor.
Their main purpose is to express in-place concat operations.
As an example, consider an \(M \times N\) matrix \(A\) (\(M\) rows times \(N\) columns). Regardless of whether \(A\) is stored transposed or not, \(\operatorname{dimensions}_A = \{M, N\}\). However, \(\operatorname{strides}_A = \{LDA, 1\}\) if it is not transposed and \(\operatorname{strides}_A = \{1, LDA\}\) if it is, where \(LDA\) is such that \(LDA \geq N\) if \(A\) is not transposed, and \(LDA \geq M\) if it is. This also shows that \(A\) does not have to be stored densly in memory.
Note
The example above shows that oneDNN assumes data to be stored in row-major order.
Code example:
int M, N;
dnnl::memory::dims dims {M, N}; // Dimensions always stay the same
// Non-transposed matrix
dnnl::memory::dims strides_non_transposed {N, 1};
dnnl::memory::desc A_non_transposed {dims, dnnl::memory::data_type::f32,
strides_non_transposed};
// Transposed matrix
dnnl::memory::dims strides_transposed {1, M};
dnnl::memory::desc A_transposed {dims, dnnl::memory::data_type::f32,
strides_transposed};
Optimized Format ‘any’¶
Another kind of format that oneDNN supports is an opaque optimized memory
format that cannot be created directly from \(\operatorname{strides}\) and \(\operatorname{dimensions}\) arrays. A
memory descriptor for an optimized memory format can only be created by passing
any
when creating certain operation descriptors, using them to create
corresponding primitive descriptors and then querying them for memory
descriptors. Data in plain memory format should then be reordered into the data
in optimized data format before computations. Since reorders are expensive, the
optimized memory format needs to be propagated through computations graph.
Optimized formats can employ padding, blocking and other data transformations to
keep data in layout optimal for a certain architecture. This means that it in
general operations like dnnl::memory::desc::permute_axes()
or
dnnl::memory::desc::submemory_desc()
may fail. It is in general incorrect to use
product of dimension sizes to calculate amount of memory required to store data:
dnnl::memory::desc::get_size()
must be used instead.
Memory Format Propagation¶
Memory format propagation is one of the central notions that needs to be well-understood to use oneDNN correctly.
Convolution and inner product primitives choose the memory format when you
create them with the placeholder memory format any
for input or output. The
memory format chosen is based on different circumstances such as hardware and
convolution parameters. Using the placeholder any
memory format is the
recommended practice for convolutions, since they are the most compute-intensive
operations in most topologies where they are present.
Other primitives, such as Elementwise, LRN, batch normalization and other, on
forward propagation should use the same memory format as the preceding layer
thus propagating the memory format through multiple oneDNN primitives. This
avoids unnecessary reorders which may be expensive and should be avoided unless
a compute-intensive primitive requires a different format. For performance
reasons, backward computations of such primitives requires consistent memory
format with the corresponding forward computations. Hence, when initializing
there primitives for backward computations you should use dnnl::memory::format_tag::any
memory format
tag as well.
Below is the short summary when to use and not to use memory format any
during
operation description initialization:
Primitive Kinds |
Forward Propagation |
Backward Propagation |
No Propagation |
---|---|---|---|
Compute intensive: (De-)convolution, Inner product, RNN |
Use |
Use |
N/A |
Memory-bandwidth limited: Pooling, Layer and Batch Normalization, Local Response Normalization, Elementwise, Shuffle, Softmax |
Use memory format from preceding layer for source tensors, and |
Use |
N/A |
Memory-bandwidth limited: Reorder, Concat, Sum, Binary |
N/A |
N/A |
Use memory format from preceding layer for source tensors, and |
Additional format synchronization is required between forward and backward
propagation when running training workloads. This is achieved via the
hint_pd
arguments of primitive descriptor constructors for primitives that
implement backward propagation.
API¶
-
enum
dnnl::memory
::
format_tag
¶ Memory format tag specification.
Memory format tags can be further divided into two categories:
Domain-agnostic names, i.e. names that do not depend on the tensor usage in the specific primitive. These names use letters from
a
tof
to denote logical dimensions and form the order in which the dimensions are laid in memory. For example, dnnl::memory::format_tag::ab is used to denote a 2D tensor where the second logical dimension (denoted asb
) is the innermost, i.e. has stride = 1, and the first logical dimension (a
) is laid out in memory with stride equal to the size of the second dimension. On the other hand, dnnl::memory::format_tag::ba is the transposed version of the same tensor: the outermost dimension (a
) becomes the innermost one.Domain-specific names, i.e. names that make sense only in the context of a certain domain, such as CNN. These names are aliases to the corresponding domain-agnostic tags and used mostly for convenience. For example, dnnl::memory::format_tag::nc is used to denote 2D CNN activations tensor memory format, where the channels dimension is the innermost one and the batch dimension is the outermost one. Moreover, dnnl::memory::format_tag::nc is an alias for dnnl::memory::format_tag::ab, because for CNN primitives the logical dimensions of activations tensors come in order: batch, channels, spatial. In other words, batch corresponds to the first logical dimension (
a
), and channels correspond to the second one (b
).
The following domain-specific notation applies to memory format tags:
'n'
denotes the mini-batch dimension'c'
denotes a channels dimensionWhen there are multiple channel dimensions (for example, in convolution weights tensor),
'i'
and'o'
denote dimensions of input and output channels'g'
denotes a groups dimension for convolution weights'd'
,'h'
, and'w'
denote spatial depth, height, and width respectively
Values:
-
enumerator
undef
¶ Undefined memory format tag.
-
enumerator
any
¶ Placeholder memory format tag. Used to instruct the primitive to select a format automatically.
-
enumerator
a
¶ plain 1D tensor
-
enumerator
ab
¶ plain 2D tensor
-
enumerator
ba
¶ permuted 2D tensor
-
enumerator
abc
¶ plain 3D tensor
-
enumerator
acb
¶ permuted 3D tensor
-
enumerator
bac
¶ permuted 3D tensor
-
enumerator
bca
¶ permuted 3D tensor
-
enumerator
cba
¶ permuted 3D tensor
-
enumerator
abcd
¶ plain 4D tensor
-
enumerator
abdc
¶ permuted 4D tensor
-
enumerator
acdb
¶ permuted 4D tensor
-
enumerator
bacd
¶ permuted 4D tensor
-
enumerator
bcda
¶ permuted 4D tensor
-
enumerator
cdba
¶ permuted 4D tensor
-
enumerator
dcab
¶ permuted 4D tensor
-
enumerator
abcde
¶ plain 5D tensor
-
enumerator
abdec
¶ permuted 5D tensor
-
enumerator
acbde
¶ permuted 5D tensor
-
enumerator
acdeb
¶ permuted 5D tensor
-
enumerator
bacde
¶ permuted 5D tensor
-
enumerator
bcdea
¶ permuted 5D tensor
-
enumerator
cdeba
¶ permuted 5D tensor
-
enumerator
decab
¶ permuted 5D tensor
-
enumerator
abcdef
¶ plain 6D tensor
-
enumerator
acbdef
¶ plain 6D tensor
-
enumerator
defcab
¶ plain 6D tensor
-
enumerator
x
¶ 1D tensor; an alias for dnnl::memory::format_tag::a
-
enumerator
nc
¶ 2D CNN activations tensor; an alias for dnnl::memory::format_tag::ab
-
enumerator
cn
¶ 2D CNN activations tensor; an alias for dnnl::memory::format_tag::ba
-
enumerator
tn
¶ 2D RNN statistics tensor; an alias for dnnl::memory::format_tag::ab
-
enumerator
nt
¶ 2D RNN statistics tensor; an alias for dnnl::memory::format_tag::ba
-
enumerator
ncw
¶ 3D CNN activations tensor; an alias for dnnl::memory::format_tag::abc
-
enumerator
nwc
¶ 3D CNN activations tensor; an alias for dnnl::memory::format_tag::acb
-
enumerator
nchw
¶ 4D CNN activations tensor; an alias for dnnl::memory::format_tag::abcd
-
enumerator
nhwc
¶ 4D CNN activations tensor; an alias for dnnl::memory::format_tag::acdb
-
enumerator
chwn
¶ 4D CNN activations tensor; an alias for dnnl::memory::format_tag::bcda
-
enumerator
ncdhw
¶ 5D CNN activations tensor; an alias for dnnl::memory::format_tag::abcde
-
enumerator
ndhwc
¶ 5D CNN activations tensor; an alias for dnnl::memory::format_tag::acdeb
-
enumerator
oi
¶ 2D CNN weights tensor; an alias for dnnl::memory::format_tag::ab
-
enumerator
io
¶ 2D CNN weights tensor; an alias for dnnl::memory::format_tag::ba
-
enumerator
oiw
¶ 3D CNN weights tensor; an alias for dnnl::memory::format_tag::abc
-
enumerator
owi
¶ 3D CNN weights tensor; an alias for dnnl::memory::format_tag::acb
-
enumerator
wio
¶ 3D CNN weights tensor; an alias for dnnl::memory::format_tag::cba
-
enumerator
iwo
¶ 3D CNN weights tensor; an alias for dnnl::memory::format_tag::bca
-
enumerator
oihw
¶ 4D CNN weights tensor; an alias for dnnl::memory::format_tag::abcd
-
enumerator
hwio
¶ 4D CNN weights tensor; an alias for dnnl::memory::format_tag::cdba
-
enumerator
ohwi
¶ 4D CNN weights tensor; an alias for dnnl::memory::format_tag::acdb
-
enumerator
ihwo
¶ 4D CNN weights tensor; an alias for dnnl::memory::format_tag::bcda
-
enumerator
iohw
¶ 4D CNN weights tensor; an alias for dnnl::memory::format_tag::bacd
-
enumerator
oidhw
¶ 5D CNN weights tensor; an alias for dnnl::memory::format_tag::abcde
-
enumerator
dhwio
¶ 5D CNN weights tensor; an alias for dnnl::memory::format_tag::cdeba
-
enumerator
odhwi
¶ 5D CNN weights tensor; an alias for dnnl::memory::format_tag::acdeb
-
enumerator
iodhw
¶ 5D CNN weights tensor; an alias for dnnl::memory::format_tag::bacde
-
enumerator
idhwo
¶ 5D CNN weights tensor; an alias for dnnl::memory::format_tag::bcdea
-
enumerator
goiw
¶ 4D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcd
-
enumerator
wigo
¶ 4D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::dcab
-
enumerator
goihw
¶ 5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcde
-
enumerator
hwigo
¶ 5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::decab
-
enumerator
giohw
¶ 5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::acbde
-
enumerator
goidhw
¶ 6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcdef
-
enumerator
giodhw
¶ 6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcdef
-
enumerator
dhwigo
¶ 6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::defcab
-
enumerator
tnc
¶ 3D RNN data tensor in the format (seq_length, batch, input channels).
-
enumerator
ntc
¶ 3D RNN data tensor in the format (batch, seq_length, input channels).
-
enumerator
ldnc
¶ 4D RNN states tensor in the format (num_layers, num_directions, batch, state channels).
-
enumerator
ldigo
¶ 5D RNN weights tensor in the format (num_layers, num_directions, input_channels, num_gates, output_channels).
For LSTM cells, the gates order is input, forget, candidate and output gate.
For GRU cells, the gates order is update, reset and output gate.
-
enumerator
ldgoi
¶ 5D RNN weights tensor in the format (num_layers, num_directions, num_gates, output_channels, input_channels).
For LSTM cells, the gates order is input, forget, candidate and output gate.
For GRU cells, the gates order is update, reset and output gate.
-
enumerator
ldio
¶ 4D LSTM projection tensor in the format (num_layers, num_directions, num_channels_in_hidden_state, num_channels_in_recurrent_projection).
-
enumerator
ldoi
¶ 4D LSTM projection tensor in the format (num_layers, num_directions, num_channels_in_recurrent_projection, num_channels_in_hidden_state).
-
enumerator
ldgo
¶ 4D RNN bias tensor in the format (num_layers, num_directions, num_gates, output_channels).
For LSTM cells, the gates order is input, forget, candidate and output gate.
For GRU cells, the gates order is update, reset and output gate.