Abstract:
Recognition of smoke emission behavior in industrial environments is of vital importance for regulating and monitoring companies in real time, as well as for environmental protection. However, it is highly challenging. On the one hand, industrial emission smoke is characterized by high transparency and high dynamics, and on the other hand, the shape and size of smoke may change due to the environment, lighting, and other factors. Currently, the mainstream smoke recognition methods are deep learning models based on images and videos, but the image-based models cannot effectively model the dynamic characteristics of the smoke in the video in a time-series manner, while the video-based models do not take into account the characteristics of the variable shape of the smoke. The random patch shift (RPS) and deformable attention (DA) was introduced into the Swin Transformer. The traditional 2D spatial attention was transformed into spatio-temporal attention by RPS, thereby modeling the dynamic smoke using 2D self-attention computations. By means of adaptive deformation, DA enabled the network to adapt to different smoke shapes and appearance changes, thereby improving the robustness and generalization ability of the network. Experimental results on the RISE dataset show that the proposed method can achieve
F1 scores of 0.85, 0.86, and 0.84 in the three subsets, respectively, with an improvement of 0.01~0.06 compared to other methods.