Recently, a research team led by Professor Shan Caifeng and Associate Professor Zhao Fang from the School of Intelligence Science and Technology at Nanjing University, in collaboration with the China Mobile Cloud Capability Center, Nanjing University of Science and Technology, and other institutions, has proposed a highly aligned abnormal image generation method that can be widely applied to Industrial Anomaly Detection tasks. This method aims to address the performance limitations of models caused by the scarcity of anomalous samples in industrial settings, overcoming key bottlenecks in existing generation methods related to image realism, diversity, and mask alignment.


Fig. 1. Flowchart of the proposed method and a comparison of generated images
The research team designed a novel controllable diffusion model, AnomalyControl, which integrates four innovative modules. First, a CLIP-guided anomaly prompt generator is introduced to identify text descriptions that best align semantically with real anomalous images. Second, an anomaly appearance and shape disentanglement mechanism ensures the semantic consistency of anomalous appearance is not overly influenced by variations in its shape. Through this refined control and disentanglement strategy, AnomalyControl can effectively generate complex and realistic industrial anomaly images. Additionally, the team incorporated a training-free local control enhancement strategy to improve the alignment between generated anomalous regions and their masks. Finally, a hard sample generation module enables a training process that enhances downstream model performance without requiring additional real samples. The proposed method not only reduces data collection costs but also significantly improves the system's generalization capability in industrial environments. It further supports downstream detection models in implementing a coarse-to-fine learning strategy, markedly enhancing their ability to identify low-saliency defects.

Fig. 2. Segmentation comparison between the proposed method and state-of-the-art (SOTA) methods
Experimental results demonstrate that the proposed method achieves promising performance and generalization on public datasets such as MVTec-AD, providing effective technical support for addressing the "data scarcity" challenge in industrial quality inspection. The research team also notes that the current method still has room for improvement in real-time generation and generalization to non-industrial domains (e.g., medical imaging). Future work will focus on enhancing the model's generation efficiency and cross-domain adaptability to promote the reliable deployment and application of controllable generation technology in more complex intelligent perception systems. This work has been accepted to ACM MM 2025.
