Regression with small data sets: Surrogate modeling is often used to create a fast, but approximate, alternative to simulations. By running the simulation at a few carefully-chosen sample points in the input parameter space, we can use the corresponding inputs-outputs as a training data set to build a machine learning model that acts as a surrogate for the simulation. However, for expensive simulations, when we can generate only a small training set, it is unclear if some machine learning models perform better than others. We compared several popular models, evaluating them not just on prediction quality, but also on their applicability to practical problems, such as, identifying the viable region of a process, solving inverse problems, and identifying parameter values for use in experiments.
Select publications (available from Google Scholar):
- C. Kamath and Y.J. Fan, “Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing,” Knowledge and Information Systems Journal, Volume 57, Number 2, November 2018, pp. 475-493.
Interpreting the solution to inverse problems: In earlier work, we have combined sampling and code surrogates to solve inverse problems, where we want to find input parameters that map to target output values, often specified with associated uncertainties. However, interpreting the solution is a challenge, especially when the input space is high-dimensional. The solution is often difficult to visualize as it can span a large range of values in each input dimension, even though it occupies a small fraction of the total hyper-volume spanned by this range of values. We have explored the use of self-organizing maps to map the solution to a lower dimensional space so we can understand where the solution lies in the input space of the problem, enabling us to use the solution in practice.
Select publications (available from Google Scholar):
- Chandrika Kamath, Juliette Franzman, and Ravi Ponmalai, “Data mining for faster, interpretable solutions to inverse problems: A case study using additive manufacturing,” Machine Learning with Applications, Volume 6, 15 December 2021, https://doi.org/10.1016/j.mlwa.2021.100122.
- Ravi Ponmalai and Chandrika Kamath, “Self-Organizing Maps and Their Applications to Data Analysis,” LLNL Technical report LLNL-TR-791165, 20 September 2019. Available at: https://www.osti.gov/biblio/1566795-self-organizing-maps-applications-data-analysis
Independent-block Gaussian process: Gaussian process is a popular model for regression, as it provides not just a prediction, but the uncertainty as well. However, it can be expensive, requiring the solution of a linear system of equations. We investigated the use of tapering, where small elements in the covariance matrix are dropped, and combined it with reordering schemes to create a banded covariance matrix for a faster solution. However, we found the idea does not work in general. Instead, motivated by the concept of block tapering, we proposed the independent-block GP method, which is a simple way to reduce the cost of solution without sacrificing the accuracy of the predictions. The method is also embarrassingly parallel, leading to further reduction in computational cost.
Select publications (available from Google Scholar):
- Chandrika Kamath, Juliette Franzman, and Ravi Ponmalai, “Data mining for faster, interpretable solutions to inverse problems: A case study using additive manufacturing,” Machine Learning with Applications, Volume 6, 15 December 2021, https://doi.org/10.1016/j.mlwa.2021.100122.
- Juliette Franzman and Chandrika Kamath, “Understanding the Effects of Tapering on Gaussian Process Regression,” LLNL-TR-787826. 19 August 2019. Available at https://www.osti.gov/biblio/1558874-understanding-effects-tapering-gaussian-process-regression