Preserving Logical and Functional Dependencies in Synthetic Clinical Datasets
Preserving Logical and Functional Dependencies in Synthetic Clinical Datasets
DFG Project Number: 576429337
This project develops dependency-aware methods for generating synthetic clinical tabular data, focusing on preserving both logical and functional relationships among attributes while maintaining data utility and fidelity.
Dependencies among attributes are a fundamental characteristic of clinical tabular data, yet their preservation in synthetic data generation remains largely underexplored. While functional dependencies have received some attention in prior work, logical dependencies lack formal definitions and practical extraction methods.
This project aims to formalize logical dependencies, develop efficient techniques for their extraction, and propose quantitative metrics for their assessment. Furthermore, we evaluate existing synthetic data generation models for their ability to preserve both logical and functional dependencies. By incorporating dependency-aware constraints into data generation frameworks, the project seeks to enable the creation of reliable synthetic data that maintains data utility and privacy while faithfully preserving inter-attribute relationships, with particular relevance to healthcare and data-driven research.