6.2. Activation quantization¶
Blueoil can quantize an activation function by passing the callable activation quantizer
and keyword arguments activation_quantizer_kwargs
to the network class.
self.block_last = conv2d("block_last", self.block_1, filters=self.num_classes, kernel_size=1,
activation=None, use_bias=True, is_debug=self.is_debug,
kernel_initializer=tf.compat.v1.random_normal_initializer(mean=0.0, stddev=0.01),
data_format=channel_data_format)
h = self.block_last.get_shape()[1].value
w = self.block_last.get_shape()[2].value
self.pool = tf.compat.v1.layers.average_pooling2d(name='global_average_pool', inputs=self.block_last,
pool_size=[h, w], padding='VALID', strides=1,
data_format=channel_data_format)
self.base_output = tf.reshape(self.pool, [-1, self.num_classes], name="pool_reshape")
return self.base_output
weight_quantizer=None,
weight_quantizer_kwargs={},
*args,
**kwargs
):
"""
Args:
quantize_first_convolution(bool): use quantization in first conv.
quantize_last_convolution(bool): use quantization in last conv.
weight_quantizer (callable): weight quantizer.
6.2.1. Activation quantizer¶
Currenly, Blueoil has only one activation function quantizer.
6.2.1.1. Linear mid tread half quantizer (LinearMidTreadHalfQuantizer
)¶
This quantization creates a linear mid tread half quantizer.
If backward
is provided, this backward
will be used in backpropagation.
This quantization method is DoReFa-Net 1 activation quantization variant, the difference from DoReFa-Net is to be able to change max_value
.
Forward is:
\[\begin{split}\mathbf{X} & = \text{clip}\big(\mathbf{X}, 0, max\_value\big)\\
\mathbf{Y} & =
\begin{cases}
\mathbf{X}, & \text{if $bit$ is 32} \\
\frac{\text{round}\big(\frac{\mathbf{X}}{max\_value}
\cdot (2^{bit}-1)\big)}{2^{bit}-1} \cdot max\_value, & otherwise
\end{cases}\end{split}\]
Default backward is:
\[\begin{split}\frac{\partial Loss}{\partial \mathbf{X}} =
\begin{cases}
\frac{\partial Loss}{\partial y}, & \text{if $0 < x < max\_value$}\\
0, & otherwise
\end{cases}\end{split}\]
Reference