{"id":197,"date":"2022-03-05T21:55:28","date_gmt":"2022-03-05T21:55:28","guid":{"rendered":"http:\/\/ckamath.org\/?page_id=197"},"modified":"2023-03-22T19:41:53","modified_gmt":"2023-03-22T19:41:53","slug":"research-in-algorithms","status":"publish","type":"page","link":"https:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/","title":{"rendered":"Research in Algorithms"},"content":{"rendered":"\n<p>This page describes my research in data mining algorithms, that is, solution techniques for specific tasks in data analysis. Unlike the  area of applications, where multiple challenges often have to be  addressed simultaneously, a focus on the algorithms enables me to  consider, in isolation, each of the many challenges encountered in real  applications. I can create computationally efficient solutions that are  appropriate to the size of the data, can process the variation in the  data, and are robust to the settings of parameters of the algorithms. It  is also an opportunity to advance the state of the art in analysis  algorithms. <\/p>\n\n\n\n<p>More details on my research in algorithms is available on the following pages:<\/p>\n\n\n\n<ul>\n<li><strong>&nbsp;<a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/#algo-surrogates\" target=\"_blank\" rel=\"noreferrer noopener\">Surrogates and sampling for simulations<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/#algo-sampling\" target=\"_blank\" rel=\"noreferrer noopener\">&nbsp;Intelligent sampling<\/a><\/strong><\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" aria-label=\" Regression with small data sets (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/\" target=\"_blank\"> <\/a><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/2\/#algo-small\" target=\"_blank\" rel=\"noreferrer noopener\">Regression with small data sets<\/a><\/strong><\/li>\n\n\n\n<li><a rel=\"noreferrer noopener\" aria-label=\" Regression with small data sets (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/\" target=\"_blank\"> <\/a><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/2\/#algo-inverse\" target=\"_blank\" rel=\"noreferrer noopener\">Interpreting the solution to inverse problems<\/a><\/strong><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/2\/#algo-gp\" target=\"_blank\" rel=\"noreferrer noopener\">Independent-block Gaussian process<\/a><\/strong><a rel=\"noreferrer noopener\" aria-label=\" Regression with small data sets (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/\" target=\"_blank\"> <\/a><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/#algo-compress\" target=\"_blank\" rel=\"noreferrer noopener\">Compressing unstructured simulation data<\/a><\/strong><a rel=\"noreferrer noopener\" aria-label=\"Compressing unstructured simulation data  (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/4\/\" target=\"_blank\"> <\/a><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/#algo-explore\" target=\"_blank\" rel=\"noreferrer noopener\">Intelligent exploration of large-scale data<\/a><\/strong><a rel=\"noreferrer noopener\" aria-label=\"Compressing unstructured simulation data  (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/4\/\" target=\"_blank\"> <\/a><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/3\/#algo-timeseries\" target=\"_blank\" rel=\"noreferrer noopener\">Analysis of time series data from sensors<\/a><\/strong><a rel=\"noreferrer noopener\" aria-label=\"Compressing unstructured simulation data  (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/4\/\" target=\"_blank\"> <\/a><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/4\/#algo-dimred\" target=\"_blank\" rel=\"noreferrer noopener\">Dimension reduction for scientific applications<\/a><\/strong><a rel=\"noreferrer noopener\" aria-label=\"Dimension reduction for scientific applications  (opens in a new tab)\" href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/5\/\" target=\"_blank\"> <\/a><\/li>\n\n\n\n<li> <strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/4\/#algo-aspen\" target=\"_blank\" rel=\"noreferrer noopener\">ASPEN- Approximate SPlitting for ENsembles<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/5\/#algo-tracking\" target=\"_blank\" rel=\"noreferrer noopener\">Tracking moving objects in simulations and video<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/5\/#algo-pdeip\" target=\"_blank\" rel=\"noreferrer noopener\">PDEs in image processing<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"http:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/5\/#algo-sapphire\" target=\"_blank\" rel=\"noreferrer noopener\">Sapphire scientific data mining software<\/a><\/strong><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:36% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"838\" height=\"896\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/cvt_new.png\" alt=\"\" class=\"wp-image-325 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/cvt_new.png 838w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/cvt_new-281x300.png 281w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/cvt_new-768x821.png 768w\" sizes=\"(max-width: 838px) 100vw, 838px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-surrogates\"><strong>Sampling and surrogates for simulations: <\/strong>Analysis of data from computer simulations presents unique opportunities for data analysis. Simulations of complex physical phenomena can be very expensive to run, but at the same time, we have greater control over the data we choose to generate. For the last fifteen years, I have been interested in how we can combine the generation and the analysis of the data to reduce the cost of gaining insight into the phenomenon being simulated. Two ideas play a role &#8211; sampling, where we carefully select the values of the input parameters at which to run the simulation, and surrogate models, which are fast, but approximate, alternatives to the simulation. By combining these ideas suitably, we can identify viable regions in the input space; solve inverse problems, where we seek the input parameters that result in specific output values with associated uncertainties; design experiments; and progressively refine our understanding of the phenomenon being simulated.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Chandrika Kamath, Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis, Machine Learning with Applications, Volume 9, 15 September 2022, 100373, https:\/\/doi.org\/10.1016\/j.mlwa.2022.100373<\/li>\n\n\n\n<li>Chandrika Kamath, &#8220;Intelligent Sampling for Surrogate Modeling, Hyperparameter Optimization, and Data Analysis, &#8221; LLNL Technical Report LLNL-TR-829837, December 2021.<\/li>\n\n\n\n<li>Chandrika Kamath, Juliette Franzman, and Ravi Ponmalai, &#8220;Data mining for faster, interpretable solutions to inverse problems: A case study using additive manufacturing,&#8221; Machine Learning with Applications, Volume 6, 15 December 2021, https:\/\/doi.org\/10.1016\/j.mlwa.2021.100122.<\/li>\n\n\n\n<li>C. Kamath and Y.J. Fan, &#8220;Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing,&#8221; Knowledge and Information Systems Journal, Volume 57, Number 2, November 2018, pp. 475-493.<\/li>\n\n\n\n<li>C. Kamath, \u201cOn the use of data mining to build high-density, additively-manufactured parts,\u201d invited book chapter, Information Science for Materials Discovery and Design, T. Lookman, F. Alexander, and K. Rajan (eds.) in Springer Series in Materials Science, 225, pp 141-155, 2016.<\/li>\n\n\n\n<li>C.Kamath, &#8220;Data Mining and Statistical Inference in Selective Laser Melting&#8221;, International Journal of Advanced Manufacturing Technology, Volume 86, Issue 5, pp 1659\u20131677, September 2016. Appeared online 11 January, 2016. http:\/\/dx.doi.org\/10.1007\/s00170-015-8289-2 <\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:44% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"451\" height=\"658\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/sampling_new.png\" alt=\"\" class=\"wp-image-327 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/sampling_new.png 451w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/sampling_new-206x300.png 206w\" sizes=\"(max-width: 451px) 100vw, 451px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-sampling\"><strong>Sampling for surrogate models, hyperparameter optimization, and data analysis: <\/strong>The quality of a surrogate model depends not only on the model used, but also the sample points at which the training data are generated. Often, the focus is on the initial set of sample points used to create the model. But, as the practical use of these models increases, there is a need for algorithms that not only generate space-filling samples, but also support progressive and incremental sampling. We explore algorithms used in various disciplines and evaluate them in the context of how well they support the many needs of modern surrogate modeling, the closely related task of hyperparameter optimization, and data analysis in general.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Chandrika Kamath, Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis, Machine Learning with Applications, Volume 9, 15 September 2022, 100373, https:\/\/doi.org\/10.1016\/j.mlwa.2022.100373<\/li>\n\n\n\n<li>Chandrika Kamath, &#8220;Intelligent Sampling for Surrogate Modeling, Hyperparameter Optimization, and Data Analysis, &#8221; LLNL Technical Report LLNL-TR-829837, December 2021.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<!--nextpage-->\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:45% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"922\" height=\"896\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/process_map_new.png\" alt=\"\" class=\"wp-image-270 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/process_map_new.png 922w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/process_map_new-300x292.png 300w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/process_map_new-768x746.png 768w\" sizes=\"(max-width: 922px) 100vw, 922px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-small\"><strong>Regression with small data sets: <\/strong>Surrogate modeling is often used to create a fast, but approximate, alternative to simulations. By running the simulation at a few carefully-chosen sample points in the input parameter space, we can use the corresponding inputs-outputs as a training data set to build a machine learning model that acts as a surrogate for the simulation. However, for expensive simulations, when we can generate only a small training set, it is unclear if some machine learning models perform better than others. We compared several popular models, evaluating them not just on prediction quality, but also on their applicability to practical problems, such as, identifying the viable region of a process, solving inverse problems, and identifying parameter values for use in experiments.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>C. Kamath and Y.J. Fan, &#8220;Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing,&#8221; Knowledge and Information Systems Journal, Volume 57, Number 2, November 2018, pp. 475-493.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:37% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"1015\" height=\"889\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/som_inverse_new.png\" alt=\"\" class=\"wp-image-271 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/som_inverse_new.png 1015w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/som_inverse_new-300x263.png 300w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/som_inverse_new-768x673.png 768w\" sizes=\"(max-width: 1015px) 100vw, 1015px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-inverse\"><strong>Interpreting the solution to inverse problems: <\/strong>In earlier work, we have combined sampling and code surrogates to solve inverse problems, where we want to find input parameters that map to target output values, often specified with associated uncertainties. However, interpreting the solution is a challenge, especially when the input space is high-dimensional. The solution is often difficult to visualize as it can span a large range of values in each input dimension, even though it occupies a small fraction of the total hyper-volume spanned by this range of values. We have explored the use of self-organizing maps to map the solution to a lower dimensional space so we can understand where the solution lies in the input space of the problem, enabling us to use the solution in practice.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Chandrika Kamath, Juliette Franzman, and Ravi Ponmalai, &#8220;Data mining for faster, interpretable solutions to inverse problems: A case study using additive manufacturing,&#8221; Machine Learning with Applications, Volume 6, 15 December 2021, https:\/\/doi.org\/10.1016\/j.mlwa.2021.100122.<\/li>\n\n\n\n<li>Ravi Ponmalai and Chandrika Kamath, &#8220;Self-Organizing Maps and Their Applications to Data Analysis,&#8221; LLNL Technical report LLNL-TR-791165, 20 September 2019. Available at: https:\/\/www.osti.gov\/biblio\/1566795-self-organizing-maps-applications-data-analysis<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:44% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"445\" height=\"709\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/BlockGP_a_new.png\" alt=\"\" class=\"wp-image-289 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/BlockGP_a_new.png 445w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/BlockGP_a_new-188x300.png 188w\" sizes=\"(max-width: 445px) 100vw, 445px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-gp\"><strong>Independent-block Gaussian process:<\/strong> Gaussian process is a popular model for regression, as it provides not just a prediction, but the uncertainty as well. However, it can be expensive, requiring the solution of a linear system of equations. We investigated the use of tapering, where small elements in the covariance matrix are dropped, and combined it with reordering schemes to create a banded covariance matrix for a faster solution. However, we found the idea does not work in general. Instead, motivated by the concept of block tapering, we proposed the independent-block GP method, which is a simple way to reduce the cost of solution without sacrificing the accuracy of the predictions. The method is also embarrassingly parallel, leading to further reduction in computational cost.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Chandrika Kamath, Juliette Franzman, and Ravi Ponmalai, &#8220;Data mining for faster, interpretable solutions to inverse problems: A case study using additive manufacturing,&#8221; Machine Learning with Applications, Volume 6, 15 December 2021, https:\/\/doi.org\/10.1016\/j.mlwa.2021.100122.<\/li>\n\n\n\n<li>Juliette Franzman and Chandrika Kamath, &#8220;Understanding the Effects of Tapering on Gaussian Process Regression,&#8221; LLNL-TR-787826. 19 August 2019. Available at https:\/\/www.osti.gov\/biblio\/1558874-understanding-effects-tapering-gaussian-process-regression<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<!--nextpage-->\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:42% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"596\" height=\"713\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Compression_new.jpg\" alt=\"\" class=\"wp-image-293 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Compression_new.jpg 596w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Compression_new-251x300.jpg 251w\" sizes=\"(max-width: 596px) 100vw, 596px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-compress\"><strong>Compressing simulation data: <\/strong>Computer simulations can generate vast quantities of floating point data, making compression a key aspect of the I\/O and storage of the data. However, when the data are unstructured, it becomes a challenge to identify neighboring data points so we can exploit the similarity among them to aid in the compression. For both lossy and lossless compression of unstructured simulation data, we explored the use of compressive sensing, sampling combined with regression, and clustering techniques (collaboration with Prof. George Karypis, from the University of Minnesota).<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>C. Kamath, &#8220;Compressing unstructured mesh data from simulations using machine learning,&#8221; International Journal of Data Science and Analytics, Volume 9, pp 113-130, (2020) https:\/\/doi.org\/10.1007\/s41060-019-00180-6<\/li>\n\n\n\n<li>C. Kamath, Y.-J. Fan, &#8220;Compressing Unstructured Mesh Data Using Spline Fits, Compressed Sensing, and Regression Methods,&#8221; IEEE GlobalSIP, November 2018, Anaheim, CA, pp. 316-320.<\/li>\n\n\n\n<li>C. Kamath, &#8220;Learning to compress unstructured mesh data from simulations,&#8221; IEEE\/ACM\/ASA International Conference on Data Science and Advanced Analytics (DSAA 2017), Tokyo, Japan, October 19-21, 2017.<\/li>\n\n\n\n<li>Y. J. Fan and C. Kamath, &#8220;A comparison of compressed sensing and sparse recovery algorithms applied to simulation data,&#8221; Statistics, Optimization, and Information Computing, Vol. 4, Issue 3, September 2016, pp 194-213. DOI: http:\/\/dx.doi.org\/10.19139\/soic.v4i3.207<\/li>\n\n\n\n<li>J. Iverson, C. Kamath, and G. Karypis, Evaluation of connected-component labeling algorithms for distributed-memory systems, Parallel Computing, Vol. 44, May 2015, Pages 53-68. doi:10.1016\/j.parco.2015.02.005<\/li>\n\n\n\n<li>J. Iverson, C. Kamath, G. Karypis, &#8220;Fast and effective lossy compression algorithms for scientific datasets,&#8221; Euro-Par Conference, Rhodes Island, Greece, August 27-31, 2012.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:46% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"692\" height=\"710\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/IntExpl_new.png\" alt=\"\" class=\"wp-image-296 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/IntExpl_new.png 692w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/IntExpl_new-292x300.png 292w\" sizes=\"(max-width: 692px) 100vw, 692px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-explore\"><strong>Intelligent exploration of large-scale data. <\/strong>A challenge in data analysis is the selection of algorithms, and their parameters, for use in each step of the analysis. These choices are often made by examining the data and through trial and error. When the data set is too large to permit easy visualization and exploration, we typically select a sample of data points and examine them to understand the characteristics of the data. We consider alternatives to a simple random selection of samples to understand how we can learn more about the data set, especially when we are restricted to a small number of passes through the data.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Chandrika Kamath,&#8221;Intelligent Exploration of Large-Scale Data: What Can We Learn in Two Passes?,&#8221; IEEE International Conference on Big Data, Los Angeles, CA, December 2019.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:40% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"762\" height=\"373\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/WindSOMS.png\" alt=\"\" class=\"wp-image-299 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/WindSOMS.png 762w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/WindSOMS-300x147.png 300w\" sizes=\"(max-width: 762px) 100vw, 762px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-timeseries\"><strong>Analysis of time series data from sensors: <\/strong>Time series data from sensors can be analyzed to understand and gain insight into the quantities being measured. Using data mainly from wind-energy applications, we show how we can identify diurnal motifs or recurring patterns, predict imminent changes in the wind energy, and identify important sensor streams. These ideas could provide energy operators additional information they could exploit in scheduling wind energy on the power grid.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Ravi Ponmalai and Chandrika Kamath, &#8220;Self-Organizing Maps and Their Applications to Data Analysis,&#8221; LLNL Technical report LLNL-TR-791165, 20 September 2019. Available at: https:\/\/www.osti.gov\/biblio\/1566795-self-organizing-maps-applications-data-analysis<\/li>\n\n\n\n<li>Ya Ju Fan and Chandrika Kamath, &#8220;Detecting ramp events in wind energy generation using affinity evaluation on weather data,&#8221;, Statistical Analysis and Data Mining, Volume 9, issue 3, June 2016, pages 155\u2013173. DOI: http:\/\/dx.doi.org\/10.1002\/sam.11308<\/li>\n\n\n\n<li>Y. J. Fan and C. Kamath, &#8220;Identifying and Exploiting Diurnal Motifs in Wind Generation Time Series Data,&#8221; International Journal of Pattern Recognition and Artificial Intelligence , Vol 29, Number 2, 1550012-1 &#8211; 1550012-25, March 2015. Available at http:\/\/dx.doi.org\/10.1142\/S0218001415500123<\/li>\n\n\n\n<li>C. Kamath and Y. J. Fan, &#8220;Incremental SVD for Insight into Wind Generation,&#8221; 13-th International Conference on Machine Learning and Applications (ICMLA), Detroit, Dec 3-6, 2014.<\/li>\n\n\n\n<li>C. Kamath, &#8220;Dimension reduction for streaming data,&#8221; book chapter in Data Intensive Computing: Architectures, Algorithms, and Applications, Ian Gorton and Deb Gracio, editors, Cambridge University Press, 2012, pp 124-156.<\/li>\n\n\n\n<li>C. Kamath and Y. J. Fan, &#8220;Finding motifs in wind generation time series data,&#8221; International Conference on Machine Learning and Applications, Boca Raton, December 12-15, 2012.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<!--nextpage-->\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:43% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"382\" height=\"418\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/featsel_image_new.jpg\" alt=\"\" class=\"wp-image-303 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/featsel_image_new.jpg 382w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/featsel_image_new-274x300.jpg 274w\" sizes=\"(max-width: 382px) 100vw, 382px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-dimred\"><strong>Dimension reduction for scientific applications:<\/strong> Reducing the number of dimensions, that is, the number of features representing a data point, is important in scientific applications to minimize the effect of irrelevant or redundant features in any subsequent analysis. Often many different types of features are extracted for each data point using a range of techniques, and domain information alone may not be sufficient to prune the features to keep only the relevant ones. We investigated filters, wrappers, and several non-linear dimension reduction techniques for their effectiveness in scientific applications ranging from remote sensing to astronomy and plasma physics.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Y. J. Fan and C. Kamath. &#8220;On the Selection of Dimension Reduction Techniques for Scientific Applications,&#8221; in Real World Data Mining Application, Springer Annals of Information Systems,Volume 17, pp 91-122, 2015.<\/li>\n\n\n\n<li>Cantu-Paz, E., Newsam, S., Kamath, C., \u201cFeature Selection in Scientific Applications,\u201d Proceedings, ACM International Conference on Knowledge Discovery and Data Mining, pp 788-793, August 22-25, 2004, Seattle, WA. UCRL-CONF-202657.<\/li>\n\n\n\n<li>Fodor, I. K., and C. Kamath, \u201cDimension reduction techniques and the Classification of Bent Double Galaxies,\u201d Computational Statistics and Data Analysis journal, Volume 41, pp. 91-122, 2002.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:44% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"557\" height=\"712\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/aspen_new.jpg\" alt=\"\" class=\"wp-image-305 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/aspen_new.jpg 557w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/aspen_new-235x300.jpg 235w\" sizes=\"(max-width: 557px) 100vw, 557px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-aspen\"><strong>ASPEN &#8211; Approximate splitting for ensembles: <\/strong>Ensembles of classifiers, where different classifiers are created from the same data set through randomization, can improve the classification accuracy. To reduce the cost of creating multiple classifiers, we considered two ways to randomize the split decision at each node of the tree &#8211; use a sub-sample of instances at the node to identify the best split, or create a histogram, evaluate splits at the mid-point of each bin, and select the split randomly in the bin that contains the best split. A combination of both ideas can furthur reduce the cost of building the ensemble.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Kamath, C., E., Cant\u00fa-Paz, and D. Littau, \u201cApproximate Splitting for Ensembles of Trees using Histograms,\u201d Proceedings, Second SIAM International Conference on Data Mining, pp. 370-383, April 2002.<\/li>\n\n\n\n<li>Kamath, C., and E. Cantu-Paz, Creating ensembles of decision trees through sampling, Proceedings, 33rd Symposium on the Interface of Computing Science and Statistics, Costa Mesa, CA, June 2001. Also available as Lawrence Livermore National Laboratory technical report, UCRL-JC-14226.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<!--nextpage-->\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:46% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"823\" height=\"627\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Tracking_a_new.png\" alt=\"\" class=\"wp-image-320 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Tracking_a_new.png 823w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Tracking_a_new-300x229.png 300w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Tracking_a_new-768x585.png 768w\" sizes=\"(max-width: 823px) 100vw, 823px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-tracking\"><strong>Tracking moving objects in simulations and video:<\/strong> Detection and tracking of moving objects are important tasks in problems such as activity detection and identification in video sequences. We explored a range of techniques, focusing on how to make them more robust and computationally efficient so we could detect and track a moderate number of vehicles in video from traffic sequences, as well as a large number of non-rigid, coherent structures in spatio-temporal data from simulations.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul>\n<li>A. Gezahegne and C. Kamath, \u201cTracking non-rigid structures in computer simulations,\u201d IEEE International Conference on Image Processing, San Diego, October 2008, pp. 1548-1551.<\/li>\n\n\n\n<li>Samson S.-C. Cheung and C. Kamath, \u201cRobust background subtraction with foreground validation for urban traffic video,\u201d Eurasip Journal on applied signal processing. Volume 14, pp 2330-2340, 2005.<\/li>\n\n\n\n<li>C. Kamath, A. Gezahegne, S. Newsam, G.M. Roberts, \u201cSalient Points for Tracking Moving Objects in Video,\u201d Proceedings, Image and Video Communications and Processing, pp 442-453, SPIE Volume 5685, Electronic Imaging, San Jose, January 2005.<\/li>\n\n\n\n<li>Cheung, S.-C., and C. Kamath, \u201cRobust techniques for background subtraction in urban traffic video,\u201d Video Communications and Image Processing, Volume 5308, pp 881-892, SPIE Electronic Imaging, San Jose, January 2004<\/li>\n\n\n\n<li>Gyaourova, A., C. Kamath, and S.-C. Cheung, \u201cBlock matching for object tracking,\u201d LLNL Technical report, October 2003. UCRL-TR-200271.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:31% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"290\" height=\"552\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/pde_new.jpg\" alt=\"\" class=\"wp-image-329 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/pde_new.jpg 290w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/pde_new-158x300.jpg 158w\" sizes=\"(max-width: 290px) 100vw, 290px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-pdeip\"><strong>PDEs for image processing<\/strong>: Partial differential equations have been used for image processing tasks such as denoising and segmentation. In some of our early work, we evaluated the performance of these methods on real images so we could understand better their pros and cons and compare them with more traditional methods of image processing. We were particularly interested in the computational cost of the PDE-based methods and the choice of various parameters and options in their implementation.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>Weeratunga S. and C. Kamath, \u201cAn investigation of implicit active contours for scientific image segmentation,\u201d Video Communications and Image Processing, SPIE Volume 5308, pp. 210-221, SPIE Electronic Imaging, San Jose, January 2004.<\/li>\n\n\n\n<li>Weeratunga S.K., and C. Kamath, \u201cA comparison of PDE-based non-linear anisotropic diffusion techniques for image denoising,\u201d Proceedings, Image Processing: Algorithms and Systems II, SPIE Electronic Imaging, San Jose, January 2003.<\/li>\n\n\n\n<li>Weeratunga S.K. and C. Kamath, \u201cPDE-based non-linear diffusion techniques for denoising scientific\/industrial images: An empirical study,\u201d Proceedings, Image Processing: Algorithms and Systems, SPIE Electronic Imaging, pp. 279-290, San Jose, January 2002.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:42% auto\"><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" loading=\"lazy\" width=\"940\" height=\"570\" src=\"http:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Sapphire_new.png\" alt=\"\" class=\"wp-image-331 size-full\" srcset=\"https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Sapphire_new.png 940w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Sapphire_new-300x182.png 300w, https:\/\/ckamath.org\/wp-content\/uploads\/2022\/03\/Sapphire_new-768x466.png 768w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p id=\"algo-sapphire\"><strong>Sapphire scientific data mining software<\/strong> (R&amp;D100 award, 2006): When I started the Sapphire scientific data mining project in late 1990s, I put together an object-oriented, modular design for the software. A common interface for each class of algorithms allowed us to easily try different algorithms and create new ones, a common data store supported many different data formats used in different domains, and the implementation of the compute intensive parts in C++, that were glued together using Python, enabled us to quickly create efficient solutions to specific problems. Nearly twenty-five years later, with lots more practical experience in data mining, I find the approach I took provided both the flexibility and efficiency required to meet the diverse needs of data analysis in scientific simulations, observations and experiments.<\/p>\n<\/div><\/div>\n\n\n\n<p style=\"font-size:15px\">Select publications (available from <a rel=\"noreferrer noopener\" href=\"https:\/\/scholar.google.com\/citations?user=PB82ll0AAAAJ&amp;hl=en\" target=\"_blank\">Google Scholar<\/a>):<\/p>\n\n\n\n<ul style=\"font-size:15px\">\n<li>C. Kamath, Scientific Data Mining: A Practical Perspective, SIAM, Philadelphia, May 2009.<\/li>\n\n\n\n<li>Kamath, C., \u201cSapphire System Architecture,\u201d IPAM short program on Mathematical Challenges in Scientific Data Mining, UCLA, January 14-18, 2002. <a rel=\"noreferrer noopener\" href=\"http:\/\/www.ipam.ucla.edu\/abstract\/?tid=4011&amp;pcode=SDM2002\" target=\"_blank\">UCRL-PRES-146654.<\/a><\/li>\n\n\n\n<li>Kamath, Chandrika, and Erick Cant\u00fa-Paz, \u201cOn the design of a parallel object-oriented data mining toolkit,&#8221; Workshop on Distributed and Parallel Knowledge Discovery, Knowledge Discovery and Data Mining Conference, Boston, August 20-23, 2000.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>This page describes my research in data mining algorithms, that is, solution techniques for specific tasks in data analysis. Unlike the area of applications, where multiple challenges often have to be addressed simultaneously, a focus on the algorithms enables me to consider, in isolation, each of the many challenges encountered in real applications. I can<a class=\"more-link\" href=\"https:\/\/ckamath.org\/index.php\/projects\/research-in-algorithms\/\">Continue reading <span class=\"screen-reader-text\">&#8220;Research in Algorithms&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":192,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/pages\/197"}],"collection":[{"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/comments?post=197"}],"version-history":[{"count":43,"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/pages\/197\/revisions"}],"predecessor-version":[{"id":525,"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/pages\/197\/revisions\/525"}],"up":[{"embeddable":true,"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/pages\/192"}],"wp:attachment":[{"href":"https:\/\/ckamath.org\/index.php\/wp-json\/wp\/v2\/media?parent=197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}