Abstract: I introduce a new, NoGAN alternative to standard tabular data synthetization. It is designed to run faster by several orders of magnitude, compared to training generative adversarial networks (GAN). In addition, the quality of the generated data is superior to almost all other products available on the market. The hyperparameters are intuitive, leading to explainable AI.
Many evaluation metrics to measure faithfulness have critical flaws, sometimes rating generated data as excellent, when it is actually a failure, due to relying on low-dimensional indicators. I fix this problem with the full multivariate empirical distribution (ECDF). As an additional benefit, both for synthetization and evaluation, all types of features — categorical, ordinal, or continuous — are processed with a single formula, regardless of type, even in the presence of missing values.
In real-life case studies, the synthetization was generated in less than 5 seconds, versus 10 minutes with GAN. It produced higher quality results, verified via cross-validation. Thanks to the very fast implementation, it is possible to automatically and efficiently fine-tune the hyperparameters. I also discuss next steps to further improve the speed, the faithfulness of the generated data, auto-tuning, Gaussian NoGAN, and applications other than synthetization.
Bio: Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com, former VC-funded executive, author and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET.
Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS). He published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is the author of multiple books, including “Synthetic Data and Generative AI” (Elsevier, 2024). Vincent lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory. He recently launched a GenAI certification program, offering state-of-the-art, enterprise grade projects to participants.