Learning the galaxy-halo connection in cosmological simulations
The application of machine learning (ML) models to zoom simulations presents a promising avenue for generating large mock galaxy catalogues at low computational expense. However, deterministic ML models have proven unable to fully replicate the scatter in galaxy-halo relationships. In my work, I’ve shown that this is because a substantial portion of the scatter is driven by stochasticity built into the subgrid models. Subgrid models, which control physics at unresolved scales, use random numbers to determine the timing of events such as star formation. Changes to the random number seed create a chaos-like effect, resulting in substantial differences in individual galaxies in otherwise identical simulations. In this talk, I will discuss attempts to overcome this issue - using the FLARES suite of zoom simulations - by 1) modelling the differences in like-for-like galaxies that arise when the random number seed is changed; and 2) using probabilistic ML methods.
Extra-galactic Byte: Exploring Galaxies with Machine Learning
Machine learning's prowess in pattern recognition and data analysis is revolutionizing how we sift through astronomical data, including radio continuum observations. These radio observations are widely used as an extinction-free tracer for star formation rates (SFR) in galaxies, and recent studies have found compelling evidence that the radio luminosity-SFR relation depends on stellar mass. We revisit this topic using the RandomForest decision-tree regression algorithm, allowing us to investigate the importance of galaxy parameters in a non-linear and non-parametric manner. We analyze two models: the first uses the LOFAR Two-metre Sky Survey (LoTSS) second data release data with photometric observations alone (analyzed using SED fits covering UV, optical as well as mid-IR flux), while the second combines LoTSS data with both photometric and spectroscopic observations (from the MPA-JHU SDSS catalogue DR8) for the same sample of emission-line-classified star-forming galaxies. Results obtained using non-parametric ML methods and parametric models show strong agreement, confirming the reliability of decision-tree algorithms in handling biased radio-selected galaxy samples, even without non-detections. This approach proves valuable for analyzing large radio-selected multi-wavelength datasets, including those from the upcoming WEAVE-LOFAR survey.
Tracing Galaxy Morphology and Evolution with Autoencoders
Galaxy morphology is strongly correlated with the properties and evolutionary history of galaxies. This poster presents a method that allows us to take a galaxy (or group of galaxies) at low redshift and find pseudo-progenitors at higher redshifts. These pseudo-progenitors are galaxies that might evolve into galaxies with similar properties and morphologies to the selected galaxies at low-redshift. This method utilises an autoencoder (a type of unsupervised machine learning model). Autoencoders take high dimension data (e.g. a galaxy image) and compress the data into a lower dimension latent space. This latent space learns the physical appearance of galaxies, with galaxies of a similar appearance occupying a similar region in the latent space. As the appearance of galaxies is strongly affected by morphology, we can use the latent space to find galaxies with a similar appearance, and by extension, morphology. If we take a low-redshift galaxy and incrementally find similar galaxies at increasing redshift, we are able to find the pseudo-progenitors of low-redshift galaxies at high-redshift. When applying this method to simulated data from the IllustrisTNG simulation, we are able to accurately find the progenitors of low-redshift galaxies, or find galaxies with similar properties to the progenitors of low-redshift galaxies.
Morphologically Informed Neural Networks
The new age of astronomy is to be defined by large galaxy surveys, such as DESI, Euclid and LSST, and analytical methods will no longer be sufficient when it comes to processing these large volumes of data. Deep learning offers a key set of solutions which excel when working with unstructured data. With imaging being the most dominant data type, it’s important to leverage such information when trying to conduct inference of galactic properties which often require high quality spectra to obtain redshift estimates, star formation rates, etc. However, it is not enough to simply insert the data into a network and accept the values presented by the model. Our proposed method seeks to use deep learning to obtain clear morphological features and trace how these features inform photometric estimates for galaxy properties. Further, by utilising normalising flows, it is also possible to generate probability densities of these measurements so that they may also be used in a statistical cosmological setting.
Finding T-Dwarf Companions to Gaia Primary Stars
Since their discovery 30 years ago, T-dwarfs have been an area of interest due to their low masses and cool atmospheres; but the properties of lone brown dwarfs are notoriously hard to determine. By finding T-dwarfs that are part of wide binary systems with Gaia-detected primary stars, we can infer characteristics (such as age) from the primary star, helping us determine other properties and building a better understanding of the physics of T-dwarfs. In this work, we have employed the VISTA Hemisphere Survey and DES DR2 to perform a deep search for T-dwarfs, reaching down to J=20. In this search, we have found ten T-dwarfs with spectral types of T4–T9 and proper motions ranging from 50 mas/yr to 1 arcsecond/yr, including four T-dwarfs in binary systems with white dwarf primaries. We have obtained FourStar observations to confirm the binarity of our T-dwarfs, and we have also obtained FIRE spectra of some of our confirmed companions to further constrain the T-dwarf properties.
Understanding Predictions made by Machine Learning for Spectroscopic Atmospheric Characterisation
Recent spectral observatories stand to revolutionise our ability to study exoplanets on a larger population scale than ever before. Analysing this data requires extracting information about the planetary atmosphere from the spectra. For anything outside a small number of targets this is very computationally resource intensive, which is a large barrier to entry as we move into larger scale planetary surveys. The use of ML has been proven as a powerful tool in tackling this, reducing computational resources required. However, the scale of these models means this advantage comes at the cost of understanding. We go beyond existing approaches, presenting a novel method of interpretability based on physically motivated forward modelling, bridging the gap between ML and traditional approaches. We trained a convolutional network architecture to predict the atmospheric abundances of 5 molecules across 40,000 simulated Ariel spectra, then compare a selection of existing techniques for interpreting predictions. Based on this analysis we propose a novel application of the perturbation sensitivity technique for interpreting ML predictions. This method has potential for use outside of Ariel data, and we believe the opportunity to share it here would help unlock barriers to entry in the use of ML for planetary spectral analysis.
Investigating the Influence of Asymmetric Error Bars on Retrievals of Exoplanet Transmission Spectra
In a Bayesian retrieval framework, the goodness-of-fit of a model to the observed data is quantified by computing the value of a likelihood function. However, current pipelines assume a Gaussian form for the likelihood which imposes limitations on our data analysis. Many datasets have been published which exhibit different upper and lower error bars. These emerge from the asymmetric form of the posterior distributions when fitting the transit depths across wavelength channels in lightcurve data. The Gaussian likelihood can only accept one value for the width of the distribution and, as such, we are forced to average between the two errors. The extent to which this approximation influences parameter predictions has been given little attention. Given that many JWST datasets are now available to the community, we test this assumption and seek to reaffirm our confidence in our predictions. We incorporate an asymmetric likelihood function in a retrieval and test its ability to accurately retrieve the parameters of a simulation scattered by an asymmetric distribution. Results are compared to the currently accepted, Gaussian case in several scattering regimes and, in doing so, we are able to probe the retrieval’s sensitivity to this effect at different scales.
Understanding Acoustic Scale Observations: the One-sided Fight against Λ
The cosmic microwave backround (CMB) and baryon acoustic oscillations (BAO) provide precise benchmarks for measuring the expansion history of the universe. In particular, the CMB angular scale measurement θ⁎ which determines the ratio of the sound horizon to the angular diameter distance to the last scattering surface, offers a robust constraint on cosmological models independent of late-time physics. We show that fundamental energy conditions in general relativity impose strict limits on the BAO observables used by DESI. We also identify which regions of parameter space in the time varying parameterization ω(a) = ω₀ + ωₐ (1 - a) remain viable while satisfying these conditions.
Simulation Based Inference for Gravitational Wave Parameter Estimation
The recent detections of gravitational waves have expanded our knowledge on compact objects and the evolution of the Universe. This was only possible thanks to the extremely sophisticated combination of exceptional experimental set-up and state-of-the-art computational techniques. The latter span from the production of accurate waveform models to the inference of gravitational waves source parameters. We are developing a new method to simultaneously determine the properties of the noise and signal embedded in the detector data using simulation based inference. We will present, the idea behind this method, first results obtained with it and comparisons with traditional data analysis techniques.
Progress on Superfluid Helium Gravitational Wave Detectors
Current successful gravitational wave detectors are incredibly sensitive, but this sensitivity currently comes at a high cost, and also currently limited in frequency area. Our research continues the investigation into superfluid helium resonant detectors and increasing their readout sensitivity with new geometries. New takes on resonant-bar-esque detectors have potential for increasing sensitivity to gravitational wave at non-LIGO frequencies. We have increased the direct readout sensitivity of pressure variations in helium, characterised by g₀, by 4 magnitudes over previous work - and shown this experimentally; this was achieved with a re-entrant cavity design. A greater readout sensitivity allows for optomechanical techniques to be used to reduce the noise floor of the detector; such as dynamic back action. Progress in this field will allow for more gravitational wave detectors to be built, resulting in more data and allowing for better sky-localisation.
Machine Learning to Obtain Cosmological Constraints from Large-Scale Surveys
In my talk, I will talk about the development of neural emulators, designed to streamline the exploration and inference of different cosmological models. I will discuss the methodology behind constructing the emulators (what neural emulators are and how they enable efficient likelihood evaluations and parameter inference for large-scale surveys), as well as touch upon the choice of cosmological models, sampling strategies, and validation techniques. I think this talk could be interesting for a wide array of young scientists, as it incorporates machine learning (a very popular topic currently) with its use in modern cosmological surveys. For those who have never used emulators - it will help shed a light on the contemporary methods used in analysis of large data, while for those who are proficient in machine learning it will show yet another use for it and open possible avenues for future collaboration.