Research in Algorithms

Regression with small data sets: Surrogate modeling is often used to create a fast, but approximate, alternative to simulations. By running the simulation at a few carefully-chosen sample points in the input parameter space, we can use the corresponding inputs-outputs as a training data set to build a machine learning model that acts as a surrogate for the simulation. However, for expensive simulations, when we can generate only a small training set, it is unclear if some machine learning models perform better than others. We compared several popular models, evaluating them not just on prediction quality, but also on their applicability to practical problems, such as, identifying the viable region of a process, solving inverse problems, and identifying parameter values for use in experiments.

Interpreting the solution to inverse problems: In earlier work, we have combined sampling and code surrogates to solve inverse problems, where we want to find input parameters that map to target output values, often specified with associated uncertainties. However, interpreting the solution is a challenge, especially when the input space is high-dimensional. The solution is often difficult to visualize as it can span a large range of values in each input dimension, even though it occupies a small fraction of the total hyper-volume spanned by this range of values. We have explored the use of self-organizing maps to map the solution to a lower dimensional space so we can understand where the solution lies in the input space of the problem, enabling us to use the solution in practice.

Independent-block Gaussian process: Gaussian process is a popular model for regression, as it provides not just a prediction, but the uncertainty as well. However, it can be expensive, requiring the solution of a linear system of equations. We investigated the use of tapering, where small elements in the covariance matrix are dropped, and combined it with reordering schemes to create a banded covariance matrix for a faster solution. However, we found the idea does not work in general. Instead, motivated by the concept of block tapering, we proposed the independent-block GP method, which is a simple way to reduce the cost of solution without sacrificing the accuracy of the predictions. The method is also embarrassingly parallel, leading to furthur reduction in computational cost.