Identification of bent-double galaxies: Our first data set was from the FIRST (Faint Images of the Radio Sky at Twenty Centimeters) astronomy survey, where we considered the task of identifying galaxies with a bent-double morphology. Working with the FIRST catalog, which had been created by fitting elliptic Gaussians to the brighter image ‘blobs’, we extracted representative features for each galaxy for use in machine learning algorithms. This data set also formed a key test bed for our research in classification algorithms.

Analysis of coherent structures: Coherent structures are a collection of neighboring points (grid points in a simulation or pixels in an image) that behave as a coherent whole. Analysis of the behavior of these structures over time can shed light on the phenomenon being simulated or observed. Our work in this area has focused on the definition and evolution of these structures in both experimental data from the NSTX and simulation data of the Rayleigh-Taylor instability. The latter comprised two of the largest data sets we analyzed at 30TB and 80TB.

Classification of orbits in a Poincare plot: One of the analysis tasks in magnetic fusion is the classification of orbits in a Poincare plot into one of four types – quasiperiodic, island chain, separatrix, and stochastic – based on the shape of the orbit. Each orbit is represented by the (x,y) coordinates of the points, with an orbit consisting of a few thousand points, making this the smallest data set we analyzed. It was also the most challenging, making us realize that while our eyes can easily discern a pattern created by a few points, automating the identification of this pattern in code is far from trivial.