Compressing unstructured simulation data: Computer simulations can generate vast quantities of floating point data that are written out for later analysis. This is a challenge for exascale systems with their limited I/O bandwidth. In-situ analysis, an often proposed solution, assumes we know the analysis algorithms and their parameters, and thus does not support scientific discovery. Compressing the data is an alternative, but for unstructured data, it is a challenge to identify neighboring data points so we can exploit the similarity among them to aid in the compression. For both lossy and lossless compression of unstructured simulation data, we investigated compressive sensing, sampling combined with regression, and clustering techniques to address these problems. This project was a collaboration with Prof. George Karypis from the University of Minnesota.
Intelligent exploration of large-scale data. A challenge in data analysis is the selection of algorithms, and their parameters, for use in each step of the analysis. These choices are often made by examining the data and through trial and error. When the data set is too large to permit easy visualization and exploration, we typically select a sample of data points and examine them to understand the characteristics of the data. We consider alternatives to a simple random selection of samples to understand how we can learn more about the data set, especially when we are restricted to a small number of passes through the data.
Analysis of time series data from sensors: Time series data from sensors can be analyzed to understand and gain insight into the quantities being measured. Using data mainly from wind-energy applications, we show how we can identify diurnal motifs or recurring patterns, predict imminent changes in the wind energy, and identify important sensor streams. These ideas could provide energy operators additional information they could exploit in scheduling wind energy on the power grid.