Publications | Ricardo Baptista

2025

arXiv
A Mathematical Perspective On Contrastive Learning

Ricardo Baptista, Andrew M Stuart, and Son Tran

arXiv:2505.24134, 2025

Abs Bib HTML PDF

Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent space. In this work, we focus on the bimodal setting and interpret contrastive learning as the optimization of (parameterized) encoders that define conditional probability distributions, for each modality conditioned on the other, consistent with the available data. This provides a framework for multimodal algorithms such as crossmodal retrieval, which identifies the mode of one of these conditional distributions, and crossmodal classification, which is similar to retrieval but includes a fine-tuning step to make it task specific. The framework we adopt also gives rise to crossmodal generative models. This probabilistic perspective suggests two natural generalizations of contrastive learning: the introduction of novel probabilistic loss functions, and the use of alternative metrics for measuring alignment in the common latent space. We study these generalizations of the classical approach in the multivariate Gaussian setting. In this context we view the latent space identification as a low-rank matrix approximation problem. This allows us to characterize the capabilities of loss functions and alignment metrics to approximate natural statistics, such as conditional means and covariances; doing so yields novel variants on contrastive learning algorithms for specific mode-seeking and for generative tasks. The framework we introduce is also studied through numerical experiments on multivariate Gaussians, the labeled MNIST dataset, and on a data assimilation application arising in oceanography.
@article{baptista2025mathematical, title = {A Mathematical Perspective On Contrastive Learning}, author = {Baptista, Ricardo and Stuart, Andrew M and Tran, Son}, journal = {arXiv:2505.24134}, year = {2025}, }
arXiv
Learning Enhanced Ensemble Filters

Eviatar Bach, Ricardo Baptista, Edoardo Calvello, Bohan Chen, and Andrew Stuart

arXiv:2504.17836, 2025

Abs Bib HTML PDF

The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. This shortcoming is addressed by approximating the mean-field evolution using a novel form of neural operator taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. The derivation of methods from a mean-field formulation allows a single parameterization of the algorithm to be deployed at different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root mean-square-error performance relative to leading methods in filtering the Lorenz 96 and Kuramoto-Sivashinsky models.
@article{bach2025learning, title = {Learning Enhanced Ensemble Filters}, author = {Bach, Eviatar and Baptista, Ricardo and Calvello, Edoardo and Chen, Bohan and Stuart, Andrew}, journal = {arXiv:2504.17836}, year = {2025}, }
arXiv
Memorization and Regularization in Generative Diffusion Models

Ricardo Baptista, Agnimitra Dasgupta, Nikola B Kovachki, Assad Oberai, and Andrew M Stuart

arXiv:2501.15785, 2025

Abs Bib PDF Code

Diffusion models have emerged as a powerful framework for generative modeling. At the heart of the methodology is score matching: learning gradients of families of log-densities for noisy versions of the data distribution at different scales. When the loss function adopted in score matching is evaluated using empirical data, rather than the population loss, the minimizer corresponds to the score of a time-dependent Gaussian mixture. However, use of this analytically tractable minimizer leads to data memorization: in both unconditioned and conditioned settings, the generative model returns the training samples. This paper contains an analysis of the dynamical mechanism underlying memorization. The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer; and, in so doing, lays the foundations for a principled understanding of how to regularize. Numerical experiments investigate the properties of: (i) Tikhonov regularization; (ii) regularization designed to promote asymptotic consistency; and (iii) regularizations induced by under-parameterization of a neural network or by early stopping when training a neural network. These experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.
@article{baptista2025memorization, title = {Memorization and Regularization in Generative Diffusion Models}, author = {Baptista, Ricardo and Dasgupta, Agnimitra and Kovachki, Nikola B and Oberai, Assad and Stuart, Andrew M}, journal = {arXiv:2501.15785}, year = {2025}, }
AISTATS
Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps

Ricardo Baptista, Aram-Alexandre Pooladian, Michael Brennan, Youssef Marzouk, and Jonathan Niles-Weed

In The 28th International Conference on Artificial Intelligence and Statistics, 2025

Abs Bib HTML PDF Code

Conditional simulation is a fundamental task in statistical modeling: Generate samples from the conditionals given finitely many data points from a joint distribution. One promising approach is to construct conditional Brenier maps, where the components of the map pushforward a reference distribution to conditionals of the target. While many estimators exist, few, if any, come with statistical or algorithmic guarantees. To this end, we propose a non-parametric estimator for conditional Brenier maps based on the computational scalability of \emphentropic optimal transport. Our estimator leverages a result of Carlier et al., (2010), which shows that optimal transport maps under a rescaled quadratic cost asymptotically converge to conditional Brenier maps; our estimator is precisely the entropic analogues of these converging maps. We provide heuristic justifications for how to choose the scaling parameter in the cost as a function of the number of samples by fully characterizing the Gaussian setting. We conclude by comparing the performance of the estimator to other machine learning and non-parametric approaches on benchmark datasets and Bayesian inference problems.
@inproceedings{baptista2025conditional, title = {Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps}, author = {Baptista, Ricardo and Pooladian, Aram-Alexandre and Brennan, Michael and Marzouk, Youssef and Niles-Weed, Jonathan}, booktitle = {The 28th International Conference on Artificial Intelligence and Statistics}, year = {2025} }

2024

arXiv
Expected information gain estimation via density approximations: Sample allocation and dimension reduction

Fengyi Li, Ricardo Baptista, and Youssef Marzouk

arXiv:2411.08390, 2024

Abs Bib PDF

Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the MSE of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.
@article{li2024expected, title = {Expected information gain estimation via density approximations: Sample allocation and dimension reduction}, author = {Li, Fengyi and Baptista, Ricardo and Marzouk, Youssef}, journal = {arXiv:2411.08390}, year = {2024} }
arXiv
Inverse Problems and Data Assimilation: A Machine Learning Approach

Eviatar Bach, Ricardo Baptista, Daniel Sanz-Alonso, and Andrew Stuart

arXiv:2410.10523, 2024

Abs Bib PDF

The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct mathematical treatment of various topics in machine learning.
@article{bach2024inverse, title = {Inverse Problems and Data Assimilation: A Machine Learning Approach}, author = {Bach, Eviatar and Baptista, Ricardo and Sanz-Alonso, Daniel and Stuart, Andrew}, journal = {arXiv:2410.10523}, year = {2024} }
ICML ESM
Learning Optimal Filters Using Variational Inference

Enoch Luk, Eviatar Bach, Ricardo Baptista, and Andrew Stuart

In ICML Machine Learning for Earth System Modeling Workshop, 2024

Abs Bib HTML PDF

Filtering-the task of estimating the conditional distribution of states of a dynamical system given partial, noisy, observations-is important in many areas of science and engineering, including weather and climate prediction. However, the filtering distribution is generally intractable to obtain for high-dimensional, nonlinear systems. Filters used in practice, such as the ensemble Kalman filter (EnKF), are biased for nonlinear systems and have numerous tuning parameters. Here, we present a framework for learning a parameterized analysis map-the map that takes a forecast distribution and observations to the filtering distribution-using variational inference. We show that this methodology can be used to learn gain matrices for filtering linear and nonlinear dynamical systems, as well as inflation and localization parameters for an EnKF. Future work will apply this framework to learn new filtering algorithms.
@inproceedings{luk2024learning, title = {Learning Optimal Filters Using Variational Inference}, author = {Luk, Enoch and Bach, Eviatar and Baptista, Ricardo and Stuart, Andrew}, booktitle = {ICML Machine Learning for Earth System Modeling Workshop}, year = {2024}, }
PNAS
Codiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dots

Théo Bourdais, Pau Batlle, Xianjin Yang, Ricardo Baptista, Nicolas Rouquette, and Houman Owhadi

Proceedings of the National Academy of Sciences, 2024

Abs Bib HTML PDF Code

Many complex data analysis problems within and beyond the scientific domain involve discovering graphical structures and functional relationships within data. Nonlinear variance decomposition with Gaussian Processes simplifies and automates this process. Other methods, such as artificial neural networks, lack this variance decomposition feature. Information-theoretic and causal inference methods suffer from super-exponential complexity with respect to the number of variables. The proposed technique performs this task in polynomial complexity. This unlocks the potential for applications involving the identification of a network of hidden relationships between variables without a parameterized model at a remarkable scale, scope, and complexity. Most problems within and beyond the scientific domain can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. These hypergraphs offer a natural platform for organizing, communicating, and processing computational knowledge. While most scientific problems can be framed as the data-driven discovery of unknown functions in a computational hypergraph whose structure is known (Type 2), many require the data-driven discovery of the structure (connectivity) of the hypergraph itself (Type 3). We introduce an interpretable Gaussian Process (GP) framework for such (Type 3) problems that does not require randomization of the data, access to or control over its sampling, or sparsity of the unknown functions in a known or learned basis. Its polynomial complexity, which contrasts sharply with the super-exponential complexity of causal inference methods, is enabled by the nonlinear ANOVA capabilities of GPs used as a sensing mechanism.
@article{bourdais2023computational, author = {Bourdais, Théo and Batlle, Pau and Yang, Xianjin and Baptista, Ricardo and Rouquette, Nicolas and Owhadi, Houman}, title = {Codiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dots}, journal = {Proceedings of the National Academy of Sciences}, volume = {121}, number = {32}, pages = {e2403449121}, year = {2024}, doi = {10.1073/pnas.2403449121}, url = {https://www.pnas.org/doi/abs/10.1073/pnas.2403449121}, eprint = {https://www.pnas.org/doi/pdf/10.1073/pnas.2403449121}, }
AMS
An approximation theory framework for measure-transport sampling algorithms

Ricardo Baptista, Bamdad Hosseini, Nikola B Kovachki, Youssef M Marzouk, and Amir Sagiv

Mathematics of Computation, 2024

Abs Bib HTML PDF Code

This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling – a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance (or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback–Leibler divergence. Specialized rates for approximations of the popular triangular Knöthe-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
@article{baptista2023approximation, title = {An approximation theory framework for measure-transport sampling algorithms}, author = {Baptista, Ricardo and Hosseini, Bamdad and Kovachki, Nikola B and Marzouk, Youssef M and Sagiv, Amir}, journal = {Mathematics of Computation}, year = {2024}, }
JCP
Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

Ricardo Baptista, Lianghao Cao, Joshua Chen, Omar Ghattas, Fengyi Li, Youssef M Marzouk, and J Tinsley Oden

Journal of Computational Physics, 2024

Abs Bib HTML PDF

We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for high-dimensional image data that is generally intractable to evaluate. We tackle this challenging Bayesian inference problem using a likelihood-free approach based on measure transport together with the construction of summary statistics for the image data. We also show that expected information gains (EIGs) from the observed data about the model parameters can be computed with no significant additional cost. Lastly, we present a numerical case study based on the Ohta–Kawasaki model for diblock copolymer thin film self-assembly and top-down microscopy characterization. For calibration, we introduce several domain-specific energy- and Fourier-based summary statistics, and quantify their informativeness using EIG. We demonstrate the power of the proposed approach to study the effect of data corruptions and experimental designs on the calibration results.
@article{baptista2022bayesian, title = {Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport}, author = {Baptista, Ricardo and Cao, Lianghao and Chen, Joshua and Ghattas, Omar and Li, Fengyi and Marzouk, Youssef M and Oden, J Tinsley}, journal = {Journal of Computational Physics}, volume = {503}, pages = {112844}, year = {2024}, publisher = {Elsevier}, }
arXiv
Neural Approximate Mirror Maps for Constrained Diffusion Models

Berthy T Feng, Ricardo Baptista, and Katherine L Bouman

arXiv:2406.12816, 2024

Abs Bib HTML PDF

Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a diffusion model not only improves its distribution-matching accuracy but also makes it more reliable for generating valid synthetic data and solving constrained inverse problems. However, existing methods for constrained diffusion models are inflexible with different types of constraints. Recent work proposed to learn mirror diffusion models (MDMs) in an unconstrained space defined by a mirror map and to impose the constraint with an inverse mirror map, but analytical mirror maps are challenging to derive for complex constraints. We propose neural approximate mirror maps (NAMMs) for general constraints. Our approach only requires a differentiable distance function from the constraint set. We learn an approximate mirror map that pushes data into an unconstrained space and a corresponding approximate inverse that maps data back to the constraint set. A generative model, such as an MDM, can then be trained in the learned mirror space and its samples restored to the constraint set by the inverse map. We validate our approach on a variety of constraints, showing that compared to an unconstrained diffusion model, a NAMM-based MDM substantially improves constraint satisfaction. We also demonstrate how existing diffusion-based inverse-problem solvers can be easily applied in the learned mirror space to solve constrained inverse problems.
@article{feng2024neural, title = {Neural Approximate Mirror Maps for Constrained Diffusion Models}, author = {Feng, Berthy T and Baptista, Ricardo and Bouman, Katherine L}, journal = {arXiv:2406.12816}, year = {2024}, }
arXiv
TrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional Regression

Ricardo Baptista, Eliza O’Reilly, and Yangxinyu Xie

arXiv:2407.09964, 2024

Abs Bib HTML PDF

We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for the estimation of the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across a variety of settings with both simulated and real data.
@article{baptista2024trim, title = {TrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional Regression}, author = {Baptista, Ricardo and O'Reilly, Eliza and Xie, Yangxinyu}, journal = {arXiv:2407.09964}, year = {2024}, }
JMLR
Learning non-Gaussian graphical models via Hessian scores and triangular transport

Ricardo Baptista, Youssef Marzouk, Rebecca E Morrison, and Olivier Zahm

Journal of Machine Learning Research, 2024

Abs Bib HTML PDF Code

Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions, most existing methods fail to consistently recover the graph structure for non-Gaussian data. Here we propose an algorithm for learning the Markov structure of continuous and non-Gaussian distributions. To characterize conditional independence, we introduce a score based on integrated Hessian information from the joint log-density, and we prove that this score upper bounds the conditional mutual information for a general class of distributions. To compute the score, our algorithm SING estimates the density using a deterministic coupling, induced by a triangular transport map, and iteratively exploits sparse structure in the map to reveal sparsity in the graph. For certain non-Gaussian datasets, we show that our algorithm recovers the graph structure even with a biased approximation to the density. Among other examples, we apply SING to learn the dependencies between the states of a chaotic dynamical system with local interactions.
@article{baptista2021learning, title = {Learning non-{G}aussian graphical models via {H}essian scores and triangular transport}, author = {Baptista, Ricardo and Marzouk, Youssef and Morrison, Rebecca E and Zahm, Olivier}, journal = {Journal of Machine Learning Research}, volume = {25}, number = {85}, pages = {1--46}, year = {2024}, }
arXiv
Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis

Qiao Chen, Elise Arnaud, Ricardo Baptista, and Olivier Zahm

arXiv:2406.13425, 2024

Abs Bib HTML PDF

We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-oriented dimension reduction, where either an input or output quantity of interest is prescribed. We consider, in particular, goal-oriented sensor placement and goal-oriented sensitivity analysis, which can be viewed as dimension reduction where the most important output or, respectively, input components are chosen. Both applications present difficult combinatorial optimization problems with expensive objectives such as the expected information gain and Sobol indices. By optimizing gradient-based bounds, we can determine the most informative sensors and most sensitive parameters as the largest diagonal entries of some diagnostic matrices, thus bypassing the combinatorial optimization and objective evaluation.
@article{chen2024coupled, title = {Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis}, author = {Chen, Qiao and Arnaud, Elise and Baptista, Ricardo and Zahm, Olivier}, journal = {arXiv:2406.13425}, year = {2024}, }
JUQ
Conditional Sampling with Monotone GANs: from Generative Models to Likelihood-Free Inference

Ricardo Baptista, Bamdad Hosseini, Nikola B Kovachki, and Youssef Marzouk

SIAM/ASA Journal on Uncertainty Quantification, 2024

Abs Bib HTML PDF Code

We present a novel framework for conditional sampling of probability measures, using block triangular transport maps. We develop the theoretical foundations of block triangular transport in a Banach space setting, establishing general conditions under which conditional sampling can be achieved and drawing connections between monotone block triangular maps and optimal transport. Based on this theory, we then introduce a computational approach, called monotone generative adversarial networks (M-GANs), to learn suitable block triangular maps. Our algorithm uses only samples from the underlying joint probability measure and is hence likelihood-free. Numerical experiments with M-GAN demonstrate accurate sampling of conditional measures in synthetic examples, Bayesian inverse problems involving ordinary and partial differential equations, and probabilistic image in-painting.
@article{kovachki2020conditional, title = {Conditional Sampling with Monotone {GAN}s: from Generative Models to Likelihood-Free Inference}, author = {Baptista, Ricardo and Hosseini, Bamdad and Kovachki, Nikola B and Marzouk, Youssef}, journal = {SIAM/ASA Journal on Uncertainty Quantification}, volume = {12}, number = {3}, pages = {868-900}, year = {2024}, doi = {10.1137/23M1581546}, }

2023

arXiv
Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

Zheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk, Lars Ruthotto, and Deepanshu Verma

arXiv:2310.16975, 2023

Abs Bib HTML PDF

We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference–particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
@article{wang2023efficient, title = {Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference}, author = {Wang, Zheyu Oliver and Baptista, Ricardo and Marzouk, Youssef and Ruthotto, Lars and Verma, Deepanshu}, journal = {arXiv:2310.16975}, year = {2023}, }
FoCM
On the representation and learning of monotone triangular transport maps

Ricardo Baptista, Youssef Marzouk, and Olivier Zahm

Foundations of Computational Mathematics, 2023

Abs Bib HTML PDF Code

Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps—approximations of the Knothe–Rosenblatt (KR) rearrangement—–are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.
@article{baptista2020adaptive, title = {On the representation and learning of monotone triangular transport maps}, author = {Baptista, Ricardo and Marzouk, Youssef and Zahm, Olivier}, journal = {Foundations of Computational Mathematics}, pages = {1--46}, year = {2023}, publisher = {Springer}, }
NeurIPS
Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, and Leonardo Zepeda-Núñez

In Advances in Neural Information Processing Systems, 2023

Abs Bib HTML PDF Code

We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low- resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a con- ditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representa- tive of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8× and 16×. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives.
@inproceedings{wan2023debias, title = {Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models}, author = {Wan, Zhong Yi and Baptista, Ricardo and Chen, Yi-fan and Anderson, John and Boral, Anudhyan and Sha, Fei and Zepeda-N{\'u}{\~n}ez, Leonardo}, booktitle = {Advances in Neural Information Processing Systems}, year = {2023}, }
NeurIPS OTML
A generative flow model for conditional sampling via optimal transport

Jason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, and Ryan Tsai

In NeurIPS Optimal Transport and Machine Learning Workshop, 2023

Abs Bib HTML PDF

Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models characterize conditionals by learning a transport map that pushes forward a reference (e.g., a standard Gaussian) to the target distribution. While these approaches can successfully describe many non- Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn the map. This work proposes a non-parametric generative model that adaptively maps reference samples to the target. The model uses block-triangular transport maps, whose components characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted L^2 cost function, thereby extending the data-driven approach in Trigila and Tabak for conditional sampling. The proposed approach is demonstrated on a low-dimensional example and a parameter inference problem involving nonlinear ODEs.
@inproceedings{alfonso2023generative, title = {A generative flow model for conditional sampling via optimal transport}, author = {Alfonso, Jason and Baptista, Ricardo and Bhakta, Anupam and Gal, Noam and Hou, Alfin and Lyubimova, Isa and Pocklington, Daniel and Sajonz, Josef and Trigila, Giulio and Tsai, Ryan}, booktitle = {NeurIPS Optimal Transport and Machine Learning Workshop}, year = {2023}, }
JCP
Ensemble transport smoothing.–Part 1: Unified framework

Maximilian Ramgraber, Ricardo Baptista, Dennis McLaughlin, and Youssef Marzouk

Journal of Computational Physics: X, 2023

Abs Bib HTML PDF

Smoothers are algorithms for Bayesian time series re-analysis. Most operational smoothers rely either on affine Kalman-type transformations or on sequential importance sampling. These strategies occupy opposite ends of a spectrum that trades computational efficiency and scalability for statistical generality and consistency: non-Gaussianity renders affine Kalman updates inconsistent with the true Bayesian solution, while the ensemble size required for successful importance sampling can be prohibitive. This paper revisits the smoothing problem from the perspective of measure transport, which offers the prospect of consistent prior-to-posterior transformations for Bayesian inference. We leverage this capacity by proposing a general ensemble framework for transport-based smoothing. Within this framework, we derive a comprehensive set of smoothing recursions based on nonlinear transport maps and detail how they exploit the structure of state-space models in fully non-Gaussian settings. We also describe how many standard Kalman-type smoothing algorithms emerge as special cases of our framework. A companion paper explores the implementation of nonlinear ensemble transport smoothers in greater depth.
@article{ramgraber2022ensemble, title = {Ensemble transport smoothing.--{P}art 1: {U}nified framework}, author = {Ramgraber, Maximilian and Baptista, Ricardo and McLaughlin, Dennis and Marzouk, Youssef}, journal = {Journal of Computational Physics: X}, volume = {17}, pages = {100134}, year = {2023}, issn = {2590-0552}, doi = {https://doi.org/10.1016/j.jcpx.2023.100134}, }
JCP
Ensemble transport smoothing.–Part 2: Nonlinear updates

Maximilian Ramgraber, Ricardo Baptista, Dennis McLaughlin, and Youssef Marzouk

Journal of Computational Physics: X, 2023

Abs Bib HTML PDF

Smoothing is a specialized form of Bayesian inference for state-space models that characterizes the posterior distribution of a collection of states given an associated sequence of observations. Ramgraber et al. [38] proposes a general framework for transport-based ensemble smoothing, which includes linear Kalman-type smoothers as special cases. Here, we build on this foundation to realize and demonstrate nonlinear backward ensemble transport smoothers. We discuss parameterization and regularization of the associated transport maps, and then examine the performance of these smoothers for nonlinear and chaotic dynamical systems that exhibit non-Gaussian behavior. In these settings, our nonlinear transport smoothers yield lower estimation error than conventional linear smoothers and state-of-the-art iterative ensemble Kalman smoothers, for comparable numbers of model evaluations.
@article{ramgraber2022ensemblePart2, title = {Ensemble transport smoothing.--{P}art 2: {N}onlinear updates}, author = {Ramgraber, Maximilian and Baptista, Ricardo and McLaughlin, Dennis and Marzouk, Youssef}, journal = {Journal of Computational Physics: X}, volume = {17}, pages = {100133}, year = {2023}, issn = {2590-0552}, doi = {https://doi.org/10.1016/j.jcpx.2023.100133}, }
arXiv
An adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localization

Mathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff D Eldredge

arXiv:2310.08741, 2023

Abs Bib HTML PDF

Heavy tails is a common feature of filtering distributions that results from the nonlinear dynamical and observation processes as well as the uncertainty from physical sensors. In these settings, the Kalman filter and its ensemble version — the ensemble Kalman filter (EnKF) — that have been designed under Gaussian assumptions result in degraded performance. t–distributions are a parametric family of distributions whose tail-heaviness is modulated by a degree of freedom ν. Interestingly, Cauchy and Gaussian distributions correspond to the extreme cases of a t–distribution for ν= 1 and ν= ∞, respectively. Leveraging tools from measure transport (Spantini et al., SIAM Review, 2022), we present a generalization of the EnKF whose prior-to-posterior update leads to exact inference for t–distributions. We demonstrate that this filter is less sensitive to outlying synthetic observations generated by the observation model for small ν. Moreover, it recovers the Kalman filter for ν= ∞. For nonlinear state-space models with heavy-tailed noise, we propose an algorithm to estimate the prior-to-posterior update from samples of joint forecast distribution of the states and observations. We rely on a regularized expectation-maximization (EM) algorithm to estimate the mean, scale matrix, and degree of freedom of heavy-tailed t–distributions from limited samples (Finegold and Drton, arXiv preprint, 2014). Leveraging the conditional independence of the joint forecast distribution, we regularize the scale matrix with an l1 sparsity-promoting penalization of the log-likelihood at each iteration of the EM algorithm. This l1 regularization draws upon the graphical lasso algorithm (Friedman et al., Biostatistics, 2008) to estimate sparse covariance matrix in the Gaussian setting. By sequentially estimating the degree of freedom at each analysis step, our filter has the appealing feature of adapting the prior-to-posterior update to the tail-heaviness of the data. This new filter intrinsically embeds an adaptive and data-dependent multiplicative inflation mechanism complemented with an adaptive localization through the l1-penalization of the estimated scale matrix. We demonstrate the benefits of this new ensemble filter on challenging filtering problems with heavy-tailed noise.
@article{leprovost2023, title = {An adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localization}, author = {Le Provost, Mathieu and Baptista, Ricardo and Marzouk, Youssef and Eldredge, Jeff D}, journal = {arXiv:2310.08741}, year = {2023}, }
IEEE CSS
Computational Optimal Transport and Filtering on Riemannian manifolds

Daniel Grange, Mohammad Al-Jarrah, Ricardo Baptista, Amirhossein Taghvaei, Tryphon T Georgiou, and Allen Tannenbaum

IEEE Control Systems Letters, 2023

Abs Bib HTML PDF

In this paper we extend recent developments in computational optimal transport to the setting of Riemannian manifolds. In particular, we show how to learn optimal transport maps from samples that relate probability distributions defined on manifolds. Specializing these maps for sampling conditional probability distributions provides an ensemble approach for solving nonlinear filtering problems defined on such geometries. The proposed computational methodology is illustrated with examples of transport and nonlinear filtering on Lie groups, including the circle S^1, the special Euclidean group SE(2), and the special orthogonal group SO(3).
@article{grange2023computational, title = {Computational Optimal Transport and Filtering on {R}iemannian manifolds}, author = {Grange, Daniel and Al-Jarrah, Mohammad and Baptista, Ricardo and Taghvaei, Amirhossein and Georgiou, Tryphon T and Tannenbaum, Allen}, journal = {IEEE Control Systems Letters}, year = {2023}, }
NeurIPS
Structured Neural Networks for Density Estimation and Causal Inference

Asic Q Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, and Rahul G Krishnan

In Advances in Neural Information Processing Systems, 2023

Abs Bib HTML PDF

Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
@inproceedings{chen2023structured, title = {Structured Neural Networks for Density Estimation and Causal Inference}, author = {Chen, Asic Q and Shi, Ruian and Gao, Xiang and Baptista, Ricardo and Krishnan, Rahul G}, booktitle = {Advances in Neural Information Processing Systems}, year = {2023}, }
arXiv
Score-based diffusion models in function space

Jae Hyun Lim, Nikola B Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, and 1 more author

arXiv:2302.07400, 2023

Abs Bib HTML PDF

Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many domains where the data has a functional form such as in scientific computing and 3D geometric data analysis. In this work, we introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by integrating a function-valued Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF).
@article{lim2023score, title = {Score-based diffusion models in function space}, author = {Lim, Jae Hyun and Kovachki, Nikola B and Baptista, Ricardo and Beckham, Christopher and Azizzadenesheli, Kamyar and Kossaifi, Jean and Voleti, Vikram and Song, Jiaming and Kreis, Karsten and Kautz, Jan and others}, journal = {arXiv:2302.07400}, year = {2023}, }

2022

NeurIPS SBM
Dimension reduction via score ratio matching

Michael Brennan, Ricardo Baptista, and Youssef Marzouk

In NeurIPS 2022 Workshop on Score-Based Methods, 2022

Abs Bib HTML PDF

We propose a method to detect a low-dimensional subspace where a non-Gaussian target distribution departs from a known reference distribution (e.g., a standard Gaussian). We identify this subspace from gradients of the log-ratio between the target and reference densities, which we call the score ratio. Given only samples from the target distribution, we estimate these gradients via score ratio matching, with a tailored parameterization and a regularization method that expose the low- dimensional structure we seek. We show that our approach outperforms standard score matching for dimension reduction of in-class distributions, and that several benchmark UCI datasets in fact exhibit this type of low dimensionality.
@inproceedings{brennan2022dimension, title = {Dimension reduction via score ratio matching}, author = {Brennan, Michael and Baptista, Ricardo and Marzouk, Youssef}, booktitle = {NeurIPS 2022 Workshop on Score-Based Methods}, year = {2022}, }

JOSS

MParT: Monotone Parameterization Toolkit

Matthew Parno, Paul-Baptiste Rubio, Daniel Sharp, Michael Brennan, Ricardo Baptista, Henning Bonart, and Youssef Marzouk

Journal of Open Source Software, 2022

Bib HTML PDF Code

@article{parno2022mpart,
  title = {MParT: Monotone Parameterization Toolkit},
  author = {Parno, Matthew and Rubio, Paul-Baptiste and Sharp, Daniel and Brennan, Michael and Baptista, Ricardo and Bonart, Henning and Marzouk, Youssef},
  journal = {Journal of Open Source Software},
  volume = {7},
  number = {80},
  pages = {4843},
  year = {2022},
}

JMA
Diagonal nonlinear transformations preserve structure in covariance and precision matrices

Rebecca Morrison, Ricardo Baptista, and Estelle Basor

Journal of Multivariate Analysis, 2022

Abs Bib HTML PDF

For a multivariate normal distribution, the sparsity of the covariance and precision matrices encodes complete information about independence and conditional independence properties. For general distributions, the covariance and precision matrices reveal correlations and so-called partial correlations between variables, but these do not, in general, have any correspondence with respect to independence properties. In this paper, we prove that, for a certain class of non-Gaussian distributions, these correspondences still hold, exactly for the covariance and approximately for the precision. The distributions – sometimes referred to as “nonparanormal” – are given by diagonal transformations of multivariate normal random variables. We provide several analytic and numerical examples illustrating these results.
@article{morrison2022diagonal, title = {Diagonal nonlinear transformations preserve structure in covariance and precision matrices}, author = {Morrison, Rebecca and Baptista, Ricardo and Basor, Estelle}, journal = {Journal of Multivariate Analysis}, volume = {190}, pages = {104983}, year = {2022}, publisher = {Elsevier}, }
arXiv
Gradient-based data and parameter dimension reduction for Bayesian models: an information theoretic perspective

Ricardo Baptista, Youssef Marzouk, and Olivier Zahm

arXiv:2207.08670, 2022

Abs Bib HTML PDF Code

We consider the problem of reducing the dimensions of parameters and data in non-Gaussian Bayesian inference problems. Our goal is to identify an "informed" subspace of the parameters and an “informative” subspace of the data so that a high-dimensional inference problem can be approximately reformulated in low-to-moderate dimensions, thereby improving the computational efficiency of many inference techniques. To do so, we exploit gradient evaluations of the log-likelihood function. Furthermore, we use an information-theoretic analysis to derive a bound on the posterior error due to parameter and data dimension reduction. This bound relies on logarithmic Sobolev inequalities, and it reveals the appropriate dimensions of the reduced variables. We compare our method with classical dimension reduction techniques, such as principal component analysis and canonical correlation analysis, on applications ranging from mechanics to image processing.
@article{baptista2022gradient, title = {Gradient-based data and parameter dimension reduction for {B}ayesian models: an information theoretic perspective}, author = {Baptista, Ricardo and Marzouk, Youssef and Zahm, Olivier}, journal = {arXiv:2207.08670}, year = {2022}, }
PRSA
A low-rank ensemble Kalman filter for elliptic observations

Mathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff D Eldredge

Proceedings of the Royal Society A, 2022

Abs Bib HTML PDF Code

We propose a regularization method for ensemble Kalman filtering (EnKF) with elliptic observation operators. Commonly used EnKF regularization methods suppress state correlations at long distances. For observations described by elliptic partial differential equations, such as the pressure Poisson equation (PPE) in incompressible fluid flows, distance localization cannot be applied, as we cannot disentangle slowly decaying physical interactions from spurious long-range correlations. This is particularly true for the PPE, in which distant vortex elements couple nonlinearly to induce pressure. Instead, these inverse problems have a low effective dimension: low-dimensional projections of the observations strongly inform a low-dimensional subspace of the state space. We derive a low-rank factorization of the Kalman gain based on the spectrum of the Jacobian of the observation operator. The identified eigenvectors generalize the source and target modes of the multipole expansion, independently of the underlying spatial distribution of the problem. Given rapid spectral decay, inference can be performed in the low-dimensional subspace spanned by the dominant eigenvectors. This low-rank EnKF is assessed on dynamical systems with Poisson observation operators, where we seek to estimate the positions and strengths of point singularities over time from potential or pressure observations. We also comment on the broader applicability of this approach to elliptic inverse problems outside the context of filtering.
@article{le2022low, title = {A low-rank ensemble Kalman filter for elliptic observations}, author = {Le Provost, Mathieu and Baptista, Ricardo and Marzouk, Youssef and Eldredge, Jeff D}, journal = {Proceedings of the Royal Society A}, volume = {478}, number = {2266}, pages = {20220182}, year = {2022}, publisher = {The Royal Society}, }
SIAM Review
Coupling techniques for nonlinear ensemble filtering

Alessio Spantini, Ricardo Baptista, and Youssef Marzouk

SIAM Review, 2022

Abs Bib HTML PDF Code

We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.
@article{spantini2019coupling, title = {Coupling techniques for nonlinear ensemble filtering}, author = {Spantini, Alessio and Baptista, Ricardo and Marzouk, Youssef}, journal = {SIAM Review}, volume = {64}, number = {4}, pages = {921--953}, year = {2022}, publisher = {SIAM}, }

2021

AIAA
A low-rank nonlinear ensemble filter for vortex models of aerodynamic flows

Mathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff Eldredge

In AIAA Scitech 2021 Forum, 2021

Abs Bib HTML

Robustly estimating the separated flow about an airfoil is critical in the design of any closed-loop controller. Darakananda et al. (Phys. Rev. Fluids, 2018) successfully used an ensemble Kalman filter (EnKF) to sequentially estimate the flow using an inviscid vortex model and distributed surface pressure readings. To tackle challenging inference problems with limited observations, classical localization schemes suppress correlations at long distances. However, these techniques would be harmful in our case due to the existence of physical long-range interactions between vortices and pressure readings. Instead, these interactions are best described as interactions between clusters of variables. This work proposes a systematic procedure to identify these clusters of variables from a nonlinear observation model. By projecting the states and observations onto these new sets of variables, the inference is performed in a low-dimensional subspace of the state and the observations. To perform consistent inference with the nonlinear model, we use the stochastic map filter (SMF): a natural generalization of the EnKF that relies on interpretable nonlinear prior-to-posterior transformations (Spantini et al., arXiv, 2019). We combine the identification of these clusters of variables with the SMF to derive a low-rank nonlinear ensemble filter. This filter is assessed on the response of a translating plate at 20 degrees that undergoes strong and overlapping pulses applied near the leading-edge. Our framework outperforms the EnKF at estimating the surface pressure distribution along the entire plate, with only two pressure sensors (placed at the edges of the plate) for collecting measurements. Keywords: inviscid vortex model, disturbed separated flow, data assimilation, nonlinear ensemble filter, measure transport, low-rank projections
@inproceedings{le2021low, title = {A low-rank nonlinear ensemble filter for vortex models of aerodynamic flows}, author = {Le Provost, Mathieu and Baptista, Ricardo and Marzouk, Youssef and Eldredge, Jeff}, booktitle = {AIAA Scitech 2021 Forum}, pages = {1937}, year = {2021}, }

2020

SEG
Bayesian seismic inversion: Measuring Langevin MCMC sample quality with kernels

Muhammad Izzatullah, Ricardo Baptista, Lester Mackey, Youssef Marzouk, and Daniel Peter

In SEG International Exposition and Annual Meeting, 2020

Abs Bib HTML PDF

The Bayesian framework is commonly used to quantify uncertainty in seismic inversion. To perform Bayesian inference, Markov chain Monte Carlo (MCMC) algorithms are regarded as the gold standard technique for sampling from the posterior probability distribution. Consistent MCMC methods have trouble for complex, high-dimensional models, and most methods scale poorly to large datasets, such as those arising in seismic inversion. As an alternative, approximate MCMC methods based on unadjusted Langevin dynamics offer scalability and more rapid sampling at the cost of biased inference. However, when assessing the quality of approximate MCMC samples for characterizing the posterior distribution, most diagnostics fail to account for these biases. In this work, we introduce the kernel Stein discrepancy (KSD) as a diagnostic tool to determine the convergence of MCMC samples for Bayesian seismic inversion. We demonstrate the use of the KSD for measuring sample quality and selecting the optimal Langevin MCMC algorithm for two Gaussian Bayesian inference problems.
@inproceedings{izzatullah2020bayesian, title = {Bayesian seismic inversion: Measuring Langevin MCMC sample quality with kernels}, author = {Izzatullah, Muhammad and Baptista, Ricardo and Mackey, Lester and Marzouk, Youssef and Peter, Daniel}, booktitle = {SEG International Exposition and Annual Meeting}, pages = {D031S034R003}, year = {2020}, organization = {SEG}, }

2019

JCP
Some greedy algorithms for sparse polynomial chaos expansions

Ricardo Baptista, Valentin Stolbunov, and Prasanth B Nair

Journal of Computational Physics, 2019

Abs Bib HTML Code

Compressed sensing algorithms approximate functions using limited evaluations by searching for a sparse representation among a dictionary of basis functions. Orthogonal matching pursuit (OMP) is a greedy algorithm for selecting basis functions whose computational cost scales with the size of the dictionary. For polynomial chaos (PC) approximations, the size of the dictionary grows quickly with the number of model inputs and the maximum polynomial degree, making them often prohibitive to use with greedy methods. We propose two new algorithms for efficiently constructing sparse PC expansions for problems with high-dimensional inputs. The first algorithm is a parallel OMP method coupled with an incremental QR factorization scheme, wherein the model construction step is interleaved with a ν-fold cross-validation procedure. The second approach is a randomized greedy algorithm that leverages a probabilistic argument to only evaluate a subset of basis functions from the dictionary at each iteration of the incremental algorithm. The randomized algorithm is demonstrated to recover model outputs with a similar level of sparsity and accuracy as OMP, but with a cost that is independent of the dictionary size. Both algorithms are validated with a numerical comparison of their performance on a series of algebraic test problems and PDEs with high-dimensional inputs.
@article{baptista2019some, title = {Some greedy algorithms for sparse polynomial chaos expansions}, author = {Baptista, Ricardo and Stolbunov, Valentin and Nair, Prasanth B}, journal = {Journal of Computational Physics}, volume = {387}, pages = {303--325}, year = {2019}, publisher = {Elsevier}, }

2018

ICML
Bayesian optimization of combinatorial structures

Ricardo Baptista, and Matthias Poloczek

In International Conference on Machine Learning, 2018

Abs Bib HTML PDF Code

The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas. This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.
@inproceedings{baptista2018bayesian, title = {Bayesian optimization of combinatorial structures}, author = {Baptista, Ricardo and Poloczek, Matthias}, booktitle = {International Conference on Machine Learning}, pages = {462--471}, year = {2018}, organization = {PMLR}, }
AIAA
Optimal approximations of coupling in multidisciplinary models

Ricardo Baptista, Youssef Marzouk, Karen Willcox, and Benjamin Peherstorfer

AIAA Journal, 2018

Abs Bib HTML Code

This paper presents a methodology for identifying important discipline couplings in multicomponent engineering systems. Coupling among disciplines contributes significantly to the computational cost of analyzing a system and can become particularly burdensome when coupled analyses are embedded within a design or optimization loop. In many cases, disciplines may be weakly coupled, so that some of the coupling or interaction terms can be neglected without significantly impacting the accuracy of the system output. Typical practice derives such approximations in an ad hoc manner using expert opinion and domain experience. This work proposes a new approach that formulates an optimization problem to find a model that optimally balances accuracy of the model outputs with the sparsity of the discipline couplings. An adaptive sequential Monte Carlo sampling-based technique is used to efficiently search the combinatorial model space of different discipline couplings. An algorithm for selecting an optimal model is presented and illustrated in a fire-detection satellite model and a turbine engine cycle analysis model.
@article{baptista2018optimal, title = {Optimal approximations of coupling in multidisciplinary models}, author = {Baptista, Ricardo and Marzouk, Youssef and Willcox, Karen and Peherstorfer, Benjamin}, journal = {AIAA Journal}, volume = {56}, number = {6}, pages = {2412--2428}, year = {2018}, publisher = {American Institute of Aeronautics and Astronautics}, }

2017

NeurIPS
Beyond normality: learning sparse probabilistic graphical models in the non-Gaussian setting

Rebecca E Morrison, Ricardo Baptista, and Youssef Marzouk

In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017

Abs Bib HTML PDF

We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.
@inproceedings{morrison2017beyond, title = {Beyond normality: learning sparse probabilistic graphical models in the non-{G}aussian setting}, author = {Morrison, Rebecca E and Baptista, Ricardo and Marzouk, Youssef}, booktitle = {Proceedings of the 31st International Conference on Neural Information Processing Systems}, pages = {2356--2366}, year = {2017}, }