Abstract. Evo, a 7-billion-parameter genomic foundation model, learns biological complexity from individual nucleotides to whole genomes. Trained on 2.7 million raw prokaryotic and phage genome sequences, Evo is naturally multimodal, enabling the codesign of DNA, RNA, and protein molecules that form higher-order functional systems. Evo is also inherently multiscale, enabling prediction and generation tasks at the level of molecules, systems, and genomes. A new version, Evo-2, is currently in development, where I work on the safety and ethical evaluations of the model.
Evo-1 Paper: https://www.science.org/doi/10.1126/science.ado9336
Presentation: https://drive.google.com/file/d/1RV2usIwcm-zEleSmOBqrMLt5F1rEOFkV/view?usp=sharing
Abstract. Recent serendipitous discoveries in X-ray astronomy such as extragalactic fast X-ray transients as counterparts to gravitational wave signals from binary neutron star mergers, extroplanetary transit candidates and other rare short-duration phenomena in the X-ray sky highlight the importance of a systematic search for such events in X-ray archives. We present the first representation learning based anomaly detection approach for the discovery of high-energy transients. We introduce novel equal-length event file representations capturing both time and energy information. Applying our unsupervised X-ray transient detection pipeline to these event files representations enables the efficient identification of new transients. This involves extracting features from the representations using principal component analysis or an autoencoder followed by dimensionality reduction and clustering. By associating these clusters with previously identified transients and performing nearest-neighbor searches, we create a catalog of X-ray transient candidates. The catalog includes 3539 transient candidates in the Chandra archive. As part of our search, we present the discovery of a new extragalactic fast X-ray transient, XRT 200515, most likely associated with a rare giant magnetar flare in the Large Magellanic Cloud or alternatively a distant gamma-ray burst in the background. This transient detection method for time-domain high-energy astrophysics is applicable to data from other high-energy observatories like XMM-Newton, Swift-XRT, eROSITA, the Einstein Probe and the upcoming AXIS observatory.
Paper: https://doi.org/10.1093/mnras/stae2808
Code: https://github.com/StevenDillmann/ml-xraytransients-mnras
Abstract. The recent launch of low Earth orbit satellite constellations is creating a growing threat for astronomical observations with ground-based telescopes that has alarmed the astronomical community. Observations affected by artificial satellites can become unusable for scientific research, wasting a growing fraction of the research budget on costly infrastructures and mitigation efforts. Here we report the first measurements of artificial satellite contamination on observations from a low Earth orbit made with the Hubble Space Telescope. With the help of volunteers on a citizen science project (www.asteroidhunter.org) and a deep learning algorithm, we scanned the archive of Hubble Space Telescope images taken between 2002 and 2021. We find that a fraction of 2.7% of the individual exposures with a typical exposure time of 11 minutes are crossed by satellites and that the fraction of satellite trails in the images increases with time. This fraction depends on the size of the field of view, exposure time, filter used and pointing. With the growing number of artificial satellites currently planned, the fraction of Hubble Space Telescope images crossed by satellites will increase in the next decade and will need further close study and monitoring.
Abstract. As tracers of the major volatile cycles of Mars CO2, H2O, and dust clouds are important for understanding the circulation of the martian atmosphere and hence martian climate. We present the spatial and seasonal distribution of laterally-confined clouds in the middle atmosphere of Mars during one Mars Year as identified in limb radiance measurements by the Mars Climate Sounder. Cloud identifications were made by citizen scientists through the “Cloudspotting on Mars” citizen science project (www.cloudspotting.org), hosted on the citizen science platform Zooniverse. A method to aggregate the crowdsourced data using a novel clustering algorithm is developed. The derived cloud catalog is presented and the seasonal and spatial distribution of clouds is discussed in terms of key populations.
Paper: https://www.sciencedirect.com/science/article/pii/S0019103523003548
Abstract. The groundbreaking discovery of gravitational waves from the merger of two black holes, GW150914, confirmed Einstein’s Theory of General Relativity and earned the 2017 Nobel Prize in Physics. Soon thereafter, the gravitational wave (GW) detectors LIGO and Virgo detected a signal from the binary neutron star (BNS) merger event GW170817, which marked the beginning of the multimessenger era in astronomy. The GW170817 event was simultaneously observed across the entire electromagnetic (EM) spectrum, including gamma rays, X-rays, optical/infrared, and radio wavelengths. The simultaneous detection of GWs and EM signals provides an unprecedented opportunity to study the physics behind these intriguing phenomena. Upon a positive GW detection, an automated alert notifies telescopes worldwide to search for EM counterparts. The success of EM follow-up studies relies heavily on the latency of the response. Current GW search pipelines use a method called matched filtering, which provides state-of-the-art sensitivity but comes with high computational costs and low latency. We present a low-latency deep learning search algorithm and test it on a simulated dataset. Read More
Executive Summary: Executive Summary
Code: https://github.com/StevenDillmann/ml-xraytransients-mnras
Abstract. Galactic archaeology aims to reconstruct the structure and formation of our galaxy by discovering and studying the properties of stars within often disintegrated nearby sub-galactic structures. It involves analysing the ages, chemical abundances, and motions of ancient, low-mass, very metal-poor stars and the remnants of stellar populations. Here, we present a R/G/B composite density map that exhibits prominent halo substructures, including both satellites such as globular clusters and dwarf galaxies, and stellar streams like the Sagittarius stream. This Gaia Map of Streams and Satellites extends the original field of streams by offering an all-sky map.
Code: https://github.com/StevenDillmann/galactic-archaeology-gaia
Abstract. A recent study on Cold Diffusion suggests that the performance of diffusion models shows limited sensitivity to the degradation used. The current understanding of these models relies heavily on Langevin dynamics, variational inference and Gaussian noise for training and sampling. Instead, we present experimental results using impulse noise or salt-and-pepper noise as a degradation strategy applied to the MNIST dataset, which introduces sparse and discrete disturbances to the images by adding random white and black pixels to an image.
Abstract. We present the development and functionality of a Python-based Sudoku solver. Our Sudoku solver accepts a 9×9 Sudoku puzzle in a text file format, where zeros represent unknown values and cells are separated by ‘|’, ‘+’, ‘-’. It outputs the solved Sudoku including the solving time to the terminal with the option to save the sudoku in a file. It is run-able from the command line with:
$ python src/solve_sudoku.py input.txt [solver] [save_file]
The sudoku solver offers a range of solving algorithms, including backtracking, constraint satisfaction (backtracking + elimination constraint), and linear programming, empowering users to choose the most suitable strategy for different sudoku difficulty levels.
TBD
TBD