SBI – Department of Systems Biology and Bioinformatics
Faculty of Computer Science and Electrical Engineering
University of Rostock
Ulmenstrasse 69 | 18057 Rostock
Germany
+49 381 498-7571
olaf.wolkenhauer@uni-rostock.de
Convex space learning for synthetic data generation in clinical research
This project develops NextConvGeN, a deep learning framework for generating realistic and privacy-preserving synthetic clinical data. By extending the principle of convex space learning beyond imbalanced datasets, NextConvGeN enables the creation of representative tabular data that can improve machine learning applications in clinical decision support and patient stratification.
DFG Project Number: FK 515800538
Machine learning in clinical research often suffers from small, imbalanced, and privacy-restricted datasets. These limitations make it difficult to train robust and generalizable models, especially when sensitive patient data cannot be shared across institutions. Synthetic data generation offers a promising solution by producing artificial yet statistically representative datasets that preserve the structure and relationships of the original data while maintaining patient privacy.
Building on our previous work on convex space learning, we have developed several state-of-the-art algorithms, including LoRAS, ProWRAS, and ConvGeN, for generating synthetic data in imbalanced tabular datasets. The newly funded DFG project extends this research through the development of the Convex-space Generative Network for Tabular Data (NextConvGen). NextConvGeN will generalize convex space learning beyond imbalanced classification, enabling the generation of synthetic data with both continuous and categorical features, even in the presence of missing values.
The project investigates how convex space learning can be tailored for specific machine learning tasks, including regression, classification, and unsupervised patient stratification. Through collaborations with clinical partners at Universitätsmedizin Rostock, NextConvGeN will be validated on real-world clinical datasets involving osteoporosis and kidney transplantation.
All methods and results will be made open-source to ensure transparency and reproducibility. By enabling realistic, privacy-preserving synthetic data generation, NextConvGeN aims to advance the use of artificial intelligence in medicine and contribute to more inclusive, data-driven clinical research.
Related publications
Convex space learning for tabular synthetic data generation
Manjunath Mahendra, Chaithra Umesh, Kristian Schultz, Olaf Wolkenhauer, Saptarshi Bej
Neurocomputing, Volume 659, 2026, 131722, ISSN 0925-2312
Preserving logical and functional dependencies in synthetic tabular data
C. Umesh, K. Schultz, M. Mahendra, S. Bej, O. Wolkenhauer
Challenges and applications in generative AI for clinical tabular data in physiology
Chaithra Umesh, Manjunath Mahendra, Saptarshi Bej, Olaf Wolkenhauer, Markus Wolfien