LayerNorm#
LayerNorm performs a layer normalization operation on src tensor.
The layerNorm operation performs normalization from begin_norm_axis
to last dimension of the data tensor. It is defined by the following
formulas which is the same as Layer normalization.
where
\(\gamma(c), \beta(c)\) are optional scale and shift for a channel
\(\mu(t, n), \sigma^2(t, n)\) are mean and variance (see
\(\epsilon\) is a constant to improve numerical stability.
Mean and variance are computed at runtime or provided by a user. When mean and variance are computed at runtime, the following formulas are used:
\(\mu(t, n) = \frac{1}{C} \sum\limits_{c} \src(t, n, c)_{}\),
\(\sigma^2(t, n) = \frac{1}{C} \sum\limits_{c} {}_{} (\src(t, n, c) - \mu(t, n))^2\).
Operation Attributes#
|
Description |
Value Type |
|
|
---|---|---|---|---|
Indicate whether to output mean and variance which can be later passed to backward op |
bool |
|
Optional |
|
|
s64 |
[-r,r-1],where r=rank(src). -1 is default |
Optional |
|
When set to True, this module has learnable per-element affine parameters. |
bool |
|
Optional |
|
The constant to improve numerical stability |
f32 |
Arbitrary
positive
f32 value,
|
Optional |
Execution Arguments#
The inputs and outputs must be provided according to the below index order when constructing an operation.
Inputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Optional |
2 |
|
Optional |
@note gamma
is scaling for normalized value. beta
is the bias
added to the scaled normalized value. They are both 1D tensor with the
same span as src’s channel axis and required if attribute use_affine
is set to True.
Outputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Optional |
2 |
|
Optional |
@note Both mean
and variance
are required if attribute
keep_stats
is set to True.
Supported Data Types#
LayerNorm operation supports the following data type combinations.
Src / Dst |
Gamma / Beta / Mean / Variance |
---|---|
f32 |
f32 |
bf16 |
f32, bf16 |
f16 |
f32 |