Compressing simulation data: Computer simulations can generate vast quantities of floating point data, making compression a key aspect of the I/O and storage of the data. However, when the data are unstructured, it becomes a challenge to identify neighboring data points so we can exploit the similarity among them to aid in the compression. For both lossy and lossless compression of unstructured simulation data, we explored the use of compressive sensing, sampling combined with regression, and clustering techniques (collaboration with Prof. George Karypis, from the University of Minnesota).
Select publications (available from Google Scholar):
- C. Kamath, “Compressing unstructured mesh data from simulations using machine learning,” International Journal of Data Science and Analytics, Volume 9, pp 113-130, (2020) https://doi.org/10.1007/s41060-019-00180-6
- C. Kamath, Y.-J. Fan, “Compressing Unstructured Mesh Data Using Spline Fits, Compressed Sensing, and Regression Methods,” IEEE GlobalSIP, November 2018, Anaheim, CA, pp. 316-320.
- C. Kamath, “Learning to compress unstructured mesh data from simulations,” IEEE/ACM/ASA International Conference on Data Science and Advanced Analytics (DSAA 2017), Tokyo, Japan, October 19-21, 2017.
- Y. J. Fan and C. Kamath, “A comparison of compressed sensing and sparse recovery algorithms applied to simulation data,” Statistics, Optimization, and Information Computing, Vol. 4, Issue 3, September 2016, pp 194-213. DOI: http://dx.doi.org/10.19139/soic.v4i3.207
- J. Iverson, C. Kamath, and G. Karypis, Evaluation of connected-component labeling algorithms for distributed-memory systems, Parallel Computing, Vol. 44, May 2015, Pages 53-68. doi:10.1016/j.parco.2015.02.005
- J. Iverson, C. Kamath, G. Karypis, “Fast and effective lossy compression algorithms for scientific datasets,” Euro-Par Conference, Rhodes Island, Greece, August 27-31, 2012.
Intelligent exploration of large-scale data. A challenge in data analysis is the selection of algorithms, and their parameters, for use in each step of the analysis. These choices are often made by examining the data and through trial and error. When the data set is too large to permit easy visualization and exploration, we typically select a sample of data points and examine them to understand the characteristics of the data. We consider alternatives to a simple random selection of samples to understand how we can learn more about the data set, especially when we are restricted to a small number of passes through the data.
Select publications (available from Google Scholar):
- Chandrika Kamath,”Intelligent Exploration of Large-Scale Data: What Can We Learn in Two Passes?,” IEEE International Conference on Big Data, Los Angeles, CA, December 2019.
Analysis of time series data from sensors: Time series data from sensors can be analyzed to understand and gain insight into the quantities being measured. Using data mainly from wind-energy applications, we show how we can identify diurnal motifs or recurring patterns, predict imminent changes in the wind energy, and identify important sensor streams. These ideas could provide energy operators additional information they could exploit in scheduling wind energy on the power grid.
Select publications (available from Google Scholar):
- Ravi Ponmalai and Chandrika Kamath, “Self-Organizing Maps and Their Applications to Data Analysis,” LLNL Technical report LLNL-TR-791165, 20 September 2019. Available at: https://www.osti.gov/biblio/1566795-self-organizing-maps-applications-data-analysis
- Ya Ju Fan and Chandrika Kamath, “Detecting ramp events in wind energy generation using affinity evaluation on weather data,”, Statistical Analysis and Data Mining, Volume 9, issue 3, June 2016, pages 155–173. DOI: http://dx.doi.org/10.1002/sam.11308
- Y. J. Fan and C. Kamath, “Identifying and Exploiting Diurnal Motifs in Wind Generation Time Series Data,” International Journal of Pattern Recognition and Artificial Intelligence , Vol 29, Number 2, 1550012-1 – 1550012-25, March 2015. Available at http://dx.doi.org/10.1142/S0218001415500123
- C. Kamath and Y. J. Fan, “Incremental SVD for Insight into Wind Generation,” 13-th International Conference on Machine Learning and Applications (ICMLA), Detroit, Dec 3-6, 2014.
- C. Kamath, “Dimension reduction for streaming data,” book chapter in Data Intensive Computing: Architectures, Algorithms, and Applications, Ian Gorton and Deb Gracio, editors, Cambridge University Press, 2012, pp 124-156.
- C. Kamath and Y. J. Fan, “Finding motifs in wind generation time series data,” International Conference on Machine Learning and Applications, Boca Raton, December 12-15, 2012.