基于随机块移位和可变形注意力的视频烟雾识别

Video smoke recognition based on random patch shift and deformable attention

  • 摘要: 识别出工业环境中的烟雾排放行为对于规范和实时监督企业,以及环境保护都具有至关重要的意义。然而,识别工业排放烟雾具有很高的挑战性,一方面工业排放烟雾具有高透明度、高动态性等特点;另一方面烟雾的形状和尺寸可能会因环境、光照等因素而发生变化。目前主流的烟雾识别方法都是基于图像或视频的深度学习模型,但是基于图像的模型无法对视频中烟雾的动态特性进行有效的时序建模,同时基于视频的模型没有考虑烟雾形状多变的特性。将随机块移位(random patch shift,RPS)和可变形注意力(deformable attention,DA)引入Swin Transformer。RPS将传统的2D空间注意力转变为时空注意力,从而使用2D的自注意力计算对动态烟雾进行建模;DA通过自适应形变的方式使网络能够适应不同的烟雾形态和外观变化,提高网络的鲁棒性和泛化能力。在RISE数据集上的实验结果表明,本文方法能够在3个子集上分别达到0.85、0.86和0.84的F_1分数,相比其他方法有0.01~0.06的提升。

     

    Abstract: Recognition of smoke emission behavior in industrial environments is of vital importance for regulating and monitoring companies in real time, as well as for environmental protection. However, it is highly challenging. On the one hand, industrial emission smoke is characterized by high transparency and high dynamics, and on the other hand, the shape and size of smoke may change due to the environment, lighting, and other factors. Currently, the mainstream smoke recognition methods are deep learning models based on images and videos, but the image-based models cannot effectively model the dynamic characteristics of the smoke in the video in a time-series manner, while the video-based models do not take into account the characteristics of the variable shape of the smoke. The random patch shift (RPS) and deformable attention (DA) was introduced into the Swin Transformer. The traditional 2D spatial attention was transformed into spatio-temporal attention by RPS, thereby modeling the dynamic smoke using 2D self-attention computations. By means of adaptive deformation, DA enabled the network to adapt to different smoke shapes and appearance changes, thereby improving the robustness and generalization ability of the network. Experimental results on the RISE dataset show that the proposed method can achieve F1 scores of 0.85, 0.86, and 0.84 in the three subsets, respectively, with an improvement of 0.01~0.06 compared to other methods.

     

/

返回文章
返回