Abstract:
Recognizing smoke emission behavior in industrial environments is of vital importance for regulating and monitoring companies in real time, as well as for environmental protection. However, recognizing industrial emission smoke is highly challenging, on the one hand, industrial emission smoke is characterized by high transparency and high dynamics, and on the other hand, the shape and size of smoke may change due to the environment, lighting, and other factors. Currently, the mainstream smoke recognition methods are image- or video-based deep learning models, but the image-based models are unable to effectively model the dynamic characteristics of the smoke in the video in a time-series manner, while the video-based models do not take into account the characteristics of the variable shape of the smoke. In this paper, we propose to introduce Random Patch Shift (RPS) and Deformable Attention (DA) into Swin Transformer. RPS transforms the traditional 2D spatial attention into spatio-temporal attention, to model dynamic smoke using 2D self-attention computation; DA enables the network to adapt to different changes in smoke morphology and appearance using adaptive deformation, improving the robustness and generalization ability of the network. Experimental results on the RISE dataset show that the method in this paper can achieve F_1scores of 0.85, 0.86, and 0.84 in the three subsets, respectively, with an improvement of 0.01-0.06 compared to other methods.