High-contrast imaging: data processing
While extreme-adaptive-optics-equipped coronagraphic instruments have revealed and characterized a handful of self-luminous planets, the more populous – a hundred of –circumstellar disks revealed by them are still sitting on the bench. This is because of the fact that existing methods are only optimized for planet imaging (Pueyo 2016), but not for extended structures such as circumstellar disks.
In comaprison, the explosion of data reduction methods in applied mathematics, statistics, and computer science, have shed light on the ways to detect and characterize disks. In order to cultivate these methods for their application on high-contrast imaging datasets, it is necessary to understand their advantages and limitations – not just empirically, but also mathematically with equation derivation.
1. Foundation: non-negative matrix factorization
The Non-negative Matrix Factorization (NMF) method is introduced to astronomical data processing after Blanton & Roweis (2007), and its non-negativity requirement is an ideal match to astronomical data that are recorded on detectors. For high-contrast imaging datasets, Pueyo et al. (2012) has shown that such a requirement could lead to significant data reduction improvement with another method. A combination of both suggest that NMF could be an optimal method for the data post-processing step in high-contrast imaging.
From mathematical derivation, to simulation, to on-sky data reduction, we are able to show that NMF excels in extracting faint signals while preserving signals for characterization when there are reference star images to remove stellar light (ApJ, arXiv, ADS).
NMF Python code: GitHub link.
Technical highlight: we mathematically proved that the iteration of NMF will result into a stable construction of the components.
Scientific highlight: our simulation demonstrates that NMF is able to better detect and preserve the light scattered by circumstellar disks.
Project timescale: 2016 May to 2017 December
2. Moving forward: data imputation using non-negative matrix factorization
Ground-based high-contrast imaging instruments opened the gate to disk imaging (e.g., Esposito et al. 2020). However, the most difficult challenge in ground-based observations, is that the non-repetability of weather and instrument performance makes it hard to obtain a perfect template for a star. By treating the disk-hosting regions as missing data, and noticing the fact that the sky rotates during an observation sequence, we can impute the "missing data" regions using NMF.
From mathematical derivation, to simulation, to on-sky data reduction, we are able to show that NMF excels in ground-based data imputation strategies when we use a star itself to reveal its surroundings (ApJ, arXiv, ADS).
Technical highlight: we used sequential Non-negative Matrix Factorization for the data imputation procedure (DI-sNMF). Mathematically, this reaches a least-biased capture of star light, and thus enables robust circumstellar structure recovery for both planets and disks.
Scientific highlight: for the HR 4796A debris disk, the first marginal evidence that dust scatters light along forward direction more when wavelength increases.
DI-sNMF Python code: GitHub link.
Project timescale: 2018 August to 2020 January
3. Application: data imputation on VLT/SPHERE star-hopping data
Quasi-static residual signals behind adaptive optics systems have been a limiting factor for high-contrast imaging (Pueyo 2016). On the instrumentation and telescope operation side, Wahhaj et al. (2021) introduced the star-hopping approach to quasi-seamlessly observe a disk-hosting star and its point spread function reference star. This offers a gate to reaching unprecedented post-processing data quality.
Technical highlight: Data imputation, when applied to star-hopping observations of SPHERE, finds its perfect application in observational datasets.
Scientific highlight: (1) Planet-forming disks in total intensity. (2) Polarization fraction. (3) Trends, etc.
Highlighted as a Cover Photo for A&A's Volume 680: volume link, or A&A's tweet, or the photo below.
Fun facts: (1) This is the first time that data processing methods precede high-quality data in time. In fact, I originally introduced DI-sNMF to the field just for mathematical curiosity. (2) When I was showing researcers in the field the results above before publication, their comments included "this is polarized light?", "hmm the resolution looks bad", "what this is in total intensity?!"