ITC Colloquium - Aleksandra Ćiprijanović (FNAL)


Thursday, November 4, 2021, 11:10am to 12:00pm


"Bridging the gap between simulations and survey data - domain adaptation for deep learning in astronomy'

Astronomical surveys are already producing very large datasets, and machine learning will play a crucial role in enabling us to fully utilize all of the available data. Machine lerning models are often initially trained on simulated data and then applyed to observations, which can potentially lead to a substantial decrease in model accuracy on the new target dataset. Simulated and telescope data represent different data domains, and for a machine learning model to work in both domains, domain-invariant learning is necessary. The talk will cover this problem through a task of distinguishing between merging and non-merging galaxies in simulated (Illustris-1 cosmological simulation) and observational data (Sloan Digital Sky Survey). Galaxy mergers are very important for our understanding of the evolution of matter in the universe. These are very long processes, so our ability to utilize and combine knowledge from different data domains is very important for these efforts. In order to unable deep learning algorithms to work in multiple domains we test two domain adaptation techniques: Maximum Mean Discrepancy (MMD) and Domain Adversarial Neural Networks (DANNs). These techniques are particularly important when one of the domains is comprised of new and unlabeled data, which is often the case with new survey data. We show that the addition of domain adaptation substantially improves performance of the model in the new target domain. With further development, these techniques will allow different domain scientists to construct machine learning models that can successfully combine the knowledge from simulated and instrument data or data originating from multiple instruments.

See also: Colloquium, 2021-22