Semantic Segmentation - DeepLab V3+
Semantic Segmentation
Semantic segmentation involves partitioning/marking regions in the image
belonging to different objects/classes. Deep learning methods have made a remarkable
improvement in this field within the past few years. This short article summarises
DeepLab V3+, an elegant extension of DeepLab v3 proposed by the same authors (Chen et al.).
data:image/s3,"s3://crabby-images/729b6/729b6c9c17b8dd5dd532fc5ccf25b172f7229a23" alt=""
Intuition
data:image/s3,"s3://crabby-images/cd9c7/cd9c7fb59abbb9a7e94ed0cb77808f6e5cc5d307" alt=""
Previously, ASPP (Atrous Spatial Pyramid Pooling) has been used to extract rich multi-scale features from images. The authors of Deeplab v3+ try to combine the ASPP module with the good old encoder-decoder architecture with skip connections, thereby providing better details in predictions.
Architecture
data:image/s3,"s3://crabby-images/57b9b/57b9b4c3ef2cb89975e21381596fcb306dd2d91d" alt=""
Here are the key features of this architecture:
- Atrous Depthwise Convolution: The depthwise convolution has an added dilation to make it atrous.
- ASPP style encoder from DeepLab V3 + UNet style decoder with skip connections.
- Modified Xception network as the backbone: This can be replaced by any backbone; HRNet seems to be widely used these days.
Results
data:image/s3,"s3://crabby-images/f56c0/f56c0b7c641bbacec7ccdd47f50ba73565cb9b9f" alt=""
References
- (https://arxiv.org/pdf/1802.02611.pdf) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
- (https://arxiv.org/abs/1610.02357) Xception: Deep Learning with Depthwise Separable Convolutions
- (https://arxiv.org/pdf/1606.00915v2.pdf) DeepLab: Semantic Image Segmentation with
Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs - (https://arxiv.org/abs/1801.04381) MobileNetV2: Inverted Residuals and Linear Bottlenecks