DynamicQuantize#
DynamicQuantize operation converts a f32 tensor to a quantized (s8 or u8) tensor. It supports both per-tensor and per-channel asymmetric linear quantization. The target quantized data type is specified via the data type of dst logical tensor. Rounding mode is library-implementation defined.
For per-tensor quantization:
For per-channel quantization, taking channel axis = 1 as an example:
where \(ic\) is the number of channels.
Operation Attributes#
|
Description |
Value Type |
|
|
---|---|---|---|---|
Specifies which de-quantization type is used |
string |
|
Optional |
|
Specifies dimension on which per-channel de-quantization is applied |
s64 |
A s64 value
in the
range of
[-r, r-1]
where r =
rank(src),
|
Optional |
Execution Arguments#
The inputs and outputs must be provided according to the below index order when constructing an operation.
Inputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Required |
2 |
|
Optional |
@note scales
is a f32 1D tensor to be applied to the quantization
formula. For qtype
= per-tensor
, there should be only one
element in the scales tensor. For qtype
= per-channel
, the
element number should be equal to the element number of src tensor along
the dimension axis.
@note zps
is a 1D tensor with offset values that map to zero. For
qtype
= per-tensor
, there should be only one element in the zps
tensor. For qtype
= per-channel
, the element number should be
equal to the element number of input tensor along the dimension axis. If
not specified, the library can assume the operator is symmetric
quantization and perform kernel optimization accordingly.
Outputs#
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
Supported Data Types#
DynamicQuantize operation supports the following data type combinations.
Src |
Scales |
Zps |
Dst |
---|---|---|---|
f32 |
f32 |
s8, u8, s32 |
s8 |
f32 |
f32 |
s8, u8, s32 |
u8 |