Blog

Elen Vardanyan,  Sona Hunanyan, Tigran Galstyan, Arnak Dalalyan, Arshak Minasyan 

Guaranteed Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

Bios:

Elen Vardanyan is currently teaching Machine Learning and Deep Learning courses as well as supervising bachelor’s and master’s theses in those fields at the American University of Armenia and Yerevan State University. She received her bachelor’s degree in Computer Science in 2018 from the American University of Armenia. Parallel to her studies, she worked as a data scientist at PicsArt doing projects in the User Behavior and the Computer Vision groups. After gaining experience in the industry, she continued her education and got her master’s degree in Mathematics in Data Science in 2022 from the Technical University of Munich where she did several projects as well as her master’s thesis in cooperation with BMW. Ms. Vardanyan’s research interests are probabilistic machine learning and deep learning, generative modeling, robustness, and optimization.

 

Sona Hunanyan acquired her Bachelor’s Degree in Informatics and Applied Mathematics from Yerevan State University in 2011. Later, in 2014, she completed her Master’s Degree in Applied Mathematics at EPFL (École Polytechnique Fédérale de Lausanne), Lausanne. Besides her studies, she gained practical experience while working as a quantitative analyst at the Nuclear Power Plant Goesgen, Daniken. In 2022, Sona Hunanyan earned a Ph.D. from the University Of Zurich, Switzerland. The topic of her Ph.D. thesis is quantification of sensitivity and empirical determinacy in Bayesian hierarchical models. Her research focuses on Bayesian statistics and model diagnostics.

 

Tigran Galstyan is currently doing his Ph.D. in the field of statistics and machine learning theory at Russian-Armenian University. He studied applied mathematics (BSc and Masters) at Yerevan State University. Since 2016, he has been working at YerevaNN Machine Learning Research Lab as a Researcher in the field of Machine Learning, Neural Networks, and Natural Language Processing. Within that period, Mr. Galstyan also worked as a Machine Learning intern at Google, X – the moonshot factory, and Teamable. His research interests are Applied Statistics, Machine Learning Theory, and the Robustness of Neural Networks.

 

Arnak Dalalyan is a Professor of Statistics at ENSAE, Institut Polytechnique de Paris, and the Director of the Center of Research in Economics and Statistics (CREST). In 1999, Prof. Dalalyan received an M.Sc. from the Paris 6 University and in 2001, earned a PhD from the University of Le Mans. He defended his habilitation in 2007 at the Paris 6 University. Prof. Dalalyan’s research focuses on high-dimensional statistics, statistical machine learning, nonparametric function estimation, and computer vision. He serves currently as Associate Editor for several statistical journals including the Annals of Statistics, Bernoulli, Electronic Journal of Statistics and Journal of Machine Learning Research.

 

Arshak Minasyan is a postdoctoral fellow at ENSAE (École nationale de la statistique et de l’administration économique) in Paris, France. He received his Ph.D. degree in mathematics from Yerevan State University in 2021. Prior to that, he received his Master’s and Bachelor’s degrees in mathematics from Skolkovo Institute of Technology and Higher School of Economics in 2018 and 2016, respectively. During 2018-2021, Dr. Minasyan worked as a Machine Learning Researcher at YerevaNN Research Lab. His primary research is mostly theoretical and focuses on designing estimators that can be applied in the corrupted by outliers datasets and studying the theoretical properties of the risk of the proposed estimator in terms of contamination level, sample size, and the dimension of the data points.

 

Description of the Talk:

Generative modeling is a powerful machine learning technique widely used across scientific and industrial domains. Its primary goal is to generate new examples that mimic an unknown distribution based on available training data, ensuring diversity and avoiding duplication of examples from the training set. In this talk, we present our paper that offers theoretical insights into training generative models with two essential properties: (i) the error of replacing the true data-generating distribution with the trained data-generating distribution should optimally converge to zero as the sample size approaches infinity, and (ii) the trained data-generating distribution should be far enough from any distribution replicating examples in the training data. They provide non-asymptotic results in the form of finite sample risk bounds that quantify these properties and depend on relevant parameters such as sample size, the dimension of the ambient space, and the dimension of the latent space. Their results are applicable to general integral probability metrics used to quantify errors in probability distribution spaces, with the Wasserstein-$p$ distance being the central example. They also include numerical examples to illustrate our theoretical findings.