Achievement 1: Eye Movement-Based Lesion Detection Method
Recently, Professor Shan Caifeng and Assistant Professor Fang Yuqi from the School of Artificial Intelligence Science and Technology at Nanjing University, in collaboration with ShanghaiTech University and other institutions, proposed an end-to-end lesion detection framework named GAA-DETR based on eye movement information, significantly improving the accuracy and interpretability of lesion recognition in medical images. By incorporating clinicians' gaze data, this study addresses the limitations of traditional detection models that overly rely on bounding box annotations while ignoring semantic features within lesions, offering a visual perception paradigm more aligned with clinical cognition for computer-aided diagnostic systems.
The research team designed a Query-Level Attention Alignment mechanism comprising three core modules:
(1) Adaptive Gaze Kernel: Integrating clinicians’ browsing behaviors at different magnification levels to dynamically adjust the generation of gaze heatmaps, more accurately reflecting clinically relevant regions of interest;
(2) Gaze-Guided Matching Module: Establishing a one-to-one correspondence between model attention and clinician gaze at the bounding box level, achieving deep alignment between model features and clinically focused areas;
(3) Query Consistency Loss: A novel loss function that effectively encourages spatial consistency between the model’s attention distribution and clinician gaze, thereby enhancing the clinical rationality of model interpretations.
This method is the first to enable fully self-supervised training without requiring gaze input during the inference stage, demonstrating strong compatibility and scalability for plug-and-play integration into mainstream detection architectures (e.g., DETR, DINO, RT-DETR).

Fig. 1. Schematic diagram of the eye movement-based lesion detection method
Meanwhile, the research team has constructed and open-sourced the first medical gaze dataset for lesion detection, comprising 1,669 high-quality gaze trajectories covering mammography and cervical TCT images. The dataset provides comprehensive annotations including bounding boxes and gaze heatmaps, offering experimental data to support further research on gaze-based detection methods.

Fig. 2. Visualization of results from the proposed method and comparative methods
Experimental results demonstrate that the proposed method significantly outperforms existing mainstream approaches across multiple detection tasks, including breast tumor and cervical Candida detection, validating its clinical utility. This study has been accepted in advance by MICCAI 2025, a top-tier conference in the field of medical image analysis, and has been selected as a Spotlight paper. The co-first authors include Peng Zhixiang, an undergraduate student of the Class of 2022 from our school.
Achievement 2: Uncertainty-Driven Multimodal MRI Fusion Method
Recently, Professor Shan Caifeng and Assistant Professor Fang Yuqi from the School of Artificial Intelligence Science and Technology at Nanjing University, in collaboration with Capital Medical University, proposed an Uncertainty-aware Multimodal MRI Fusion (UMMF) method, which has been successfully applied to the prediction of HIV-associated asymptomatic neurocognitive impairment. Addressing the modality laziness issue prevalent in existing multimodal fusion approaches, this study introduces an innovative solution that significantly enhances the model's efficiency in utilizing multimodal information and improves prediction accuracy.

Fig. 3. Schematic diagram of the uncertainty-driven multimodal MRI fusion method
The UMMF framework integrates structural MRI (sMRI), functional MRI (fMRI), and diffusion tensor imaging (DTI). By employing an uncertainty-driven alternating unimodal training strategy, it mitigates the dominance of individual modalities and enhances the extraction and fusion of multimodal features. Simultaneously, the research team introduced a stochastic network prediction method to quantitatively weight the uncertainties of different modalities, achieving more robust and reliable predictive performance.
Experimental results demonstrate that UMMF surpasses existing state-of-the-art methods across multiple evaluation metrics, achieving significant breakthroughs in predicting HIV-associated asymptomatic neurocognitive impairment. Notably, the method not only improves predictive accuracy but also identifies key brain regions associated with the disease, providing potential biomarkers for early clinical intervention.

Fig. 4. The proposed method demonstrates clinical interpretability
This work has been accepted by MICCAI 2025, a top-tier conference in the field of medical image analysis, providing new technical support for intelligent analysis and clinical applications of multimodal medical imaging. In the future, the research team will further optimize the method, explore disease-specific encoder designs, and extend its application to other brain disease prediction tasks.
