Surrogates and sampling for simulations: Analysis of data from computer simulations presents unique opportunities for data analysis. Simulations of complex physical phenomena can be very expensive to run, but at the same time, we have greater control over the data we choose to generate. For the last fifteen years, I have been interested in how we can combine the generation and the analysis of the data to reduce the cost of gaining insight into the phenomenon being simulated. Two ideas play a role – sampling, where we carefully select the values of the input parameters at which to run the simulation, and surrogate models, which are fast, but approximate, alternatives to the simulation. By combining these ideas suitably, we can identify viable regions in the input space; solve inverse problems, where we seek the input parameters that result in specific output values with associated uncertainties; design experiments; and progressively refine our understanding of the phenomenon being simulated.
Intelligent sampling: The quality of a surrogate model depends not only on the model used, but also the sample points at which the training data are generated. Often, the focus is on the initial set of sample points used to create the model. But, as the practical use of these models increases, there is a need for algorithms that not only generate space-filling samples, but also support progressive and incremental sampling. We explore sampling algorithms used in various disciplines and evaluate them in the context of how well they support the many needs of modern surrogate modeling, the closely related task of hyperparameter optimization, and data analysis in general.