There are two aspects of my work in data science:

  • Applications – gaining insight into scientific data and addressing specific questions posed by domain scientists. Many factors, occurring individually or in combination, make this endeavor challenging, including the size of the data set, its quality, its complexity, and how clearly the questions being addressed have been articulated by the domain scientists. Solving a real problem is far from trivial and often involves modifying and combining ideas and algorithms from different fields.
  • Research in algorithms – solution techniques for specific tasks in data analysis. Unlike the area of applications, where multiple challenges often have to be addressed simultaneously, a focus on the algorithms enables me to consider, in isolation, each of the many challenges encountered in real applications. I can create computationally efficient solutions that are appropriate to the size of the data, can process the variation in the data, and are robust to the settings of parameters of the algorithms. It is also an opportunity to advance the state of the art in analysis algorithms.

My work has been possible thanks to support from:

  • My colleagues and fellow researchers who participated in my projects
  • Our collaborators in various scientific domains who generously shared their data, their time, and their domain expertise
  • Those who funded my work in applications and algorithms over the years. In particular, I would like to acknowledge the support for my research in algorithms through the following projects:
    • Principal Investigator, “IDEALS: Improving Data Exploration and Analysis at Large Scale,” DOE ASCR program on Scientific Data Management, Analysis, and Visualization, 2016-2021
    • Lead, Data Mining and Uncertainty Quantification, “ACAMM: Accelerated Certification of Additively Manufactured Metals”, (PI-Wayne King), LLNL Laboratory Directed Research and Development project, 2013-2015.
    • Co-Principal Investigator (joint with Prof. George Karypis, U. Minnesota), “ExaDM: Intelligent Reduction of Data from Exascale Simulations,” DOE ASCR Exascale program, 2010-2013
    • Principal Investigator, “SensorStreams: Analysis of Streaming Data from Sensors,” DOE ASCR Applied Math program, 2009-2012
    • Principal Investigator, “MINDES: Data Mining for Inverse Design in Materials,” DOE ASCR ARRA program, 2010-2012
    • Principal Investigator, “WINDSENSE: Integrating Wind Energy on the Power Grid,” DOE EERE Wind Energy program, FY 2009-2012
    • Co-Principal Investigator, “Scientific Data Management Center,” (PI-Arie Shoshani), DOE Office of Science SciDAC-2 program, FY 2007-2011.
    • Principal Investigator, “Robust Real-Time Techniques for Detection and Tracking in Video,” LLNL Laboratory Directed Research and Development project, FY 2003-2005.
    • Co-Investigator, “Scientific Data Management Center,” (PI-Arie Shoshani), DOE Office of Science SciDAC program, FY 2002-2006.
    • Principal Investigator, “Scientific Data Mining,” DOE NNSA ASC Initiative, 1998-2007
    • Principal Investigator, “Sapphire: Scalable Pattern Recognition for Large-Scale Scientific Data Mining,” LLNL Laboratory Directed Research and Development project, FY 1999-2001.