LayerNormBackward#
LayerNormBackward performs the backward of LayerNorm operation.
The backward propagation computes \(\diffsrc(t, n, c)\), \(\diffgamma(c)^*\), and \(\diffbeta(c)^*\) based on \(\diffdst(t, n, c)\), \(src(t, n, c)\), \(\mu(t, n)\), \(\sigma^2(t, n)\), \(\gamma(c) ^*\), and \(\beta(c) ^*\).
The tensors marked with an asterisk are used only when the operation is configured to use \(\gamma(c)\), and \(\beta(c)\)
Operation Attributes#
|
Description |
Value Type |
|
|
---|---|---|---|---|
|
s64 |
[-r,r-1],where r=rank(src). -1 is default |
Optional |
|
When set to True, this module has learnable per-element affine parameters. |
bool |
|
Optional |
|
The constant to improve numerical stability |
f32 |
Arbitrary
positive
f32 value,
|
Optional |
Execution Arguments#
The inputs and outputs must be provided according to the below index order when constructing an operation.
Inputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Required |
2 |
|
Required |
3 |
|
Required |
4 |
|
Optional |
5 |
|
Optional |
@note gamma
is scaling for normalized value. beta
is the bias
added to the scaled normalized value. They are both 1D tensor with the
same span as src’s channel axis and required if attribute use_affine
is set to True.
Outputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Optional |
2 |
|
Optional |
Supported Data Types#
LayerNormBackward operation supports the following data type combinations.
Src / Diff_dst / Diff_src |
Gamma / Beta / Mean / Variance / Diff_gamma / Diff_beta |
---|---|
f32 |
f32 |
bf16 |
f32, bf16 |
f16 |
f32 |