Research Update(XVII)HYPNOS System Architecture

发布者:汤靖玲发布时间:2025-10-20浏览次数:14

  

    Recently, Associate Professor Ke Xu from the School of Artificial Intelligence and Technology at Nanjing University, in collaboration with Zhejiang University, Huawei Cloud, and other institutions, addressed the issue of data validation in data analysis workflows by proposing HYPNOS—a visualization system that supports interactive data lineage tracking. HYPNOS parses and adapts code through its lineage module, extracting both schema-level and instance-level data lineage information from data transformation scripts. It provides a lineage view for an overall understanding of the data transformation process, along with a detailed view for instance-level tracking and granular inspection. HYPNOS can reveal multi-level data relationships, helping users understand and track data lineage more efficiently.



Fig. 1. HYPNOS Architecture


  

    The HYPNOS system architecture (see Fig. 1) consists of two main components: the lineage module and the visualization interface. The lineage module takes scripts and one or more tables as input, uses a program adaptor to parse each line of code, and extracts data transformation (DT) semantics. The module then employs a lineage tracker to capture both schema-level and instance-level data lineage (DL) and provides data lineage tracking (DLT) services.Based on the extracted DT semantics, the system's lineage view constructs a lineage graph that represents the data transformation process of the script. Users can double-click a table in the lineage graph to expand and view it in the detail view. The lineage view and detail view support interactive data tracking operations at the column-level and row-level, respectively.In this study, data lineage (DL) is derived by analyzing the input-output relationships across sequential data transformation operations.

 


  

Fig. 2a. (a) Rapidly Locating Anomalous Country: Sorting the result table reveals Honduras' growth rate as a significant outlier (B1→C1). (b) Step-by-step Traceback: Row-level tracking drills down to the aggregated data of the adjacent weeks, confirming new_deaths_x=79 and new_deaths_y=204 (B2→C2). (c) Historical Context Comparison: Expanding Honduras' daily death data identifies spikes on 2020/08/26–08/27. These fall within normal fluctuations over the long-term trend (C5→C7).


Fig. 2b. Column-level Formula Validation: The growth_rate is derived from new_deaths_x / new_deaths_y - 1. Tracing confirms the correctness of the filtering and aggregation logic (A→C).




    We demonstrate the usability and effectiveness of HYPNOS through a case study (Fig. 2), expert interviews, and user studies. The paper has been formally accepted and published by IEEE TVCG.