Extended human presence beyond low-Earth orbit (BLEO) during missions to the Moon and Mars will pose significant challenges in the near future. A primary health risk associated with these missions is radiation exposure, primarily from galatic cosmic rays (GCRs) and solar proton events (SPEs). While GCRs present a more consistent, albeit modulated threat, SPEs are harder to predict and can deliver acute doses over short periods. Currently NASA utilizes analytical tools for monitoring the space radiation environment in order to make decisions of immediate action to shelter astronauts. However this reactive approach could be significantly enhanced by predictive models that can forecast radiation exposure in advance, ideally hours ahead of major events, while providing estimates of prediction uncertainty to improve decision-making. In this work we present a machine learning approach for forecasting radiation exposure in BLEO using multimodal time-series data including direct solar imagery from Solar Dynamics Observatory, X-ray flux measurements from GOES missions, and radiation dose measurements from the BioSentinel satellite that was launched as part of Artemis 1 mission. To our knowledge, this is the first time full-disk solar imagery has been used to forecast radiation exposure. We demonstrate that our model can predict the onset of increased radiation due to an SPE event, as well as the radiation decay profile after an event has occurred.
@article{gurav-2024-radiation, title = {Probabilistic Forecasting of Radiation Exposure for Spaceflight}, author = {Gurav, Rutuja and Massara, Elena and Song, Xiaomei and Sinclair, Kimberly and Brown, Edward and Kusner, Matt and Poduval, Bala and Baydin, {Atılım Güneş}}, journal = {arXiv preprint arXiv:2411.17703}, year = {2024} }
SDO-FM is a foundation model using data from NASA’s Solar Dynamics Observatory (SDO) spacecraft; integrating three separate instruments to encapsulate the Sun’s complex physical interactions into a multi-modal embedding space. This model can be used to streamline scientific investigations involving SDO by making the enormous datasets more computationally accessible for heliophysics research and enable investigations that require instrument fusion. We discuss four key components: an ingestion pipeline to create machine learning ready datasets, the model architecture and training approach, resultant embeddings & fine-tunable models, and finally downstream fine-tuned applications. A key component of this effort has been to include subject matter specialists at each stage of development; reviewingthe scientific value and providing guidance for model architecture, dataset, and training paradigm decisions. This paper marks release of our pretrained models and embedding datasets, available to the community on Hugging Face and sdofm.org.
@article{walsh-2024-foundation, title = {A Foundation Model for the {Solar Dynamics Observatory}}, author = {Walsh, James and Gass, Daniel G. and Pollan, Raul Ramos and Wright, Paul J. and Galvez, Richard and Kasmanoff, Noah and Naradowsky, Jason and Spalding, Anne and Parr, James and Baydin, {Atılım Güneş}}, journal = {arXiv preprint arXiv:2410.02530}, year = {2024} }
This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a k-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian–Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their secondorder quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information (FoMoH). We then use FoMoH to develop a novel generalization of line search by extending it to a hyperplane search. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation. Our code is open-sourced at https://github.com/SRI-CSL/fomoh
@article{cobb-2024-forwardmode, title = {Second-Order Forward-Mode Automatic Differentiation for Optimization}, author = {Cobb, Adam D. and Baydin, {Atılım Güneş} and Pearlmutter, Barak A. and Jha, Susmit}, journal = {arXiv preprint arXiv:2408.10419}, year = {2024} }
Cloud computing offers an opportunity to run compute-resource intensive climate models at scale by parallelising model runs such that datasets useful to the exoplanet community can be produced efficiently. To better understand the statistical distributions and properties of potentially habitable planetary atmospheres we implemented a parallelised climate modelling tool to scan a range of hypothetical atmospheres.Starting with a modern day Earth atmosphere, we iteratively and incrementally simulated a range of atmospheres to infer the landscape of the multi-parameter space, such as the abundances of biological mediated gases (O2, CO2, H2O, CH4, H2, and N2) that would yield "steady state" planetary atmospheres on Earth-like planets around solar-type stars. Our current datasets comprises of 124,314 simulated models of exoplanet atmospheres and is available publicly on the NASA Exoplanet Archive. Our scalable approach of analysing atmospheres could also help interpret future observations of planetary atmospheres by providing estimates of atmospheric gas fluxes and temperatures as a function of altitude. Such data could enable high-throughput first-order assessment of the potential habitability of exoplanetary surfaces and sepcan be a learning dataset for machine learning applications in the atmospheric and exoplanet science domain.
@article{chopra-2023-planetary, title = {{PyATMOS}: A Scalable Grid of Hypothetical Planetary Atmospheres}, author = {Chopra, Aditya and Bell, Aaron C. and Fawcett, William and Talebi, Rodd and Angerhausen, Daniel and Baydin, {Atılım Güneş} and Berea, Anamaria and Cabrol, Nathalie A. and Kempes, Christopher and Mascaro, Massimo}, journal = {arXiv preprint arXiv:2202.08587}, year = {2023} }
Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.
@article{baydin-2022-gradients, title = {Gradients without Backpropagation}, author = {Baydin, {Atılım Güneş} and Pearlmutter, Barak A. and Syme, Don and Wood, Frank and Torr, Philip}, journal = {arXiv preprint arXiv:2202.08587}, year = {2022} }
The original “Seven Motifs“ set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the “Nine Motifs of Simulation Intelligence“, a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
@article{lavin-2021-simulation, title = {Simulation Intelligence: Towards a New Generation of Scientific Methods}, author = {Lavin, Alexander and Zenil, Hector and Paige, Brooks and Krakauer, David and Gottschlich, Justin and Mattson, Tim and Anandkumar, Anima and Choudry, Sanjay and Rocki, Kamil and Baydin, {Atılım Güneş} and Prunkl, Carina and Isayev, Olexandr and Peterson, Erik and McMahon, Peter L. and Macke, Jakob and Cranmer, Kyle and Zhang, Jiaxin and Wainwright, Haruko and Hanuka, Adi and Veloso, Manuela and Assefa, Samuel and Zheng, Stephan and Pfeffer, Avi}, journal = {arXiv preprint arXiv:2112.03235}, year = {2021} }
We introduce a recent symplectic integration scheme derived for solving physically motivated systems with non-separable Hamiltonians. We show its relevance to Riemannian manifold Hamiltonian Monte Carlo (RMHMC) and provide an alternative to the currently used generalised leapfrog symplectic integrator, which relies on solving multiple fixed point iterations to convergence. Via this approach, we are able to reduce the number of higher-order derivative calculations per leapfrog step. We explore the implications of this integrator and demonstrate its efficacy in reducing the computational burden of RMHMC. Our code is provided in a new open-source Python package, hamiltorch.
@article{cobb-2019-symplectic, title = {Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo}, author = {Cobb, Adam D. and Baydin, Atılım Güneş and Markham, Andrew and Roberts, Stephen J.}, journal = {arXiv preprint arXiv:1910.06243}, year = {2019} }
It is well known that deep generative models have a rich latent space, and that it is possible to smoothly manipulate their outputs by traversing this latent space. Recently, architectures have emerged that allow for more complex manipulations, such as making an image look as though it were from a different class, or painted in a certain style. These methods typically require large amounts of training in order to learn a single class of manipulations. We present Transflow Learning, a method for transforming a pre-trained generative model so that its outputs more closely resemble data that we provide afterwards. In contrast to previous methods, Transflow Learning does not require any training at all, and instead warps the probability distribution from which we sample latent vectors using Bayesian inference. Transflow Learning can be used to solve a wide variety of tasks, such as neural style transfer and few-shot classification.
@article{gambardella-2019-transflow, title = {Transflow Learning: Repurposing Flow Models Without Retraining}, author = {Gambardella, Andrew and Baydin, Atılım Güneş and Torr, Philip H. S.}, journal = {arXiv preprint arXiv:1911.13270}, year = {2019} }
Atmospheric retrieval is a modelling technique used to determine a planet atmosphere’s temperature and composition from spectral data. The retrieved atmospheric composition can provide understanding into the surface fluxes of gaseous species necessary to maintain the stability of that atmosphere, leading to insights into the geological as well as biological processes active on the planet. Among exoplanets, rocky, terrestrial ones, are of particular interest because of their theoretical habitability. Atmospheric retrieval is both time-consuming and compute-intensive. Traditional retrieval methods involve the use of complex algorithms that generate numerous atmospheric models. These models are then compared to observational data, and a posterior distribution is constructed to determine the most likely value and associated uncertainty for each model parameter. Runtimes scale with the number of model parameters, and when many molecular species are considered, become prohibitively long. The issue will become especially prohibitive as the number of detected exoplanets will grow tremendously in the near future. Machine learning (ML) offers a way to reduce the time to perform a retrieval by orders of magnitude, given a sufficient dataset to train with. Here we present a large dataset of 3,112,620 synthetic planetary systems generated with our Intelligent exoplaNet Atmospheric RetrievAl (INARA) framework based on the NASA Planetary Spectrum Generator. The dataset contains the parameters defining each planetary system and the simulated spectra of stellar, planetary and noise components. The dataset was designed to enable the first ML retrieval model for rocky, terrestrial exoplanets, and it is publicly available through the NASA Exoplanet Archive.
@article{zorzan-2025-dataset, title = {A Machine-Learning-Ready Dataset for Exoplanet Atmospheric Retrieval2}, author = {Zorzan, Simone and Soboczenski, Frank and O'Beirne, Molly D. and Himes, Michael D. and Lund, Michael B. and {van Eyken}, Julian C. and Arney, Giada N. and Villanueva, Geronimo L. and Mascaro, Massimo and {Domagal-Goldman}, Shawn D. and Baydin, {Atılım Güneş}}, journal = {The Astrophysical Journal Supplement Series}, year = {2025 (accepted)}, publisher = {American Astronomical Society} }
The simplified general perturbations 4 (SGP4) orbital propagation model is one of the most widely used methods for rapidly and reliably predicting the positions and velocities of objects orbiting Earth. Over time, SGP models have undergone refinement to enhance their efficiency and accuracy. Nevertheless, they still do not match the precision offered by high-precision numerical propagators, which can predict the positions and velocities of space objects in low-Earth orbit with significantly smaller errors. In this study, we introduce a novel differentiable version of SGP4, named SGP4. By porting the source code of SGP4 into a differentiable program based on PyTorch, we unlock a whole new class of techniques enabled by differentiable orbit propagation, including spacecraft orbit determination, state conversion, covariance similarity transformation, state transition matrix computation, and covariance propagation. Besides differentiability, our SGP4 supports parallel propagation of a batch of two-line elements (TLEs) in a single execution and it can harness modern hardware accelerators like GPUs or XLA devices (e.g. TPUs) thanks to running on the PyTorch backend. Furthermore, the design of SGP4 makes it possible to use it as a differentiable component in larger machine learning (ML) pipelines, where the propagator can be an element of a larger neural network that is trained or fine-tuned with data. Consequently, we propose a novel orbital propagation paradigm, ML-SGP4. In this paradigm, the orbital propagator is enhanced with neural networks attached to its input and output. Through gradient-based optimization, the parameters of this combined model can be iteratively refined to achieve precision surpassing that of SGP4. Fundamentally, the neural networks function as identity operators when the propagator adheres to its default behavior as defined by SGP4. However, owing to the differentiability ingrained within SGP4, the model can be fine-tuned with ephemeris data to learn corrections to both inputs and outputs of SGP4. This augmentation enhances precision while maintaining the same computational speed of SGP4 at inference time. This paradigm empowers satellite operators and researchers, equipping them with the ability to train the model using their specific ephemeris or high-precision numerical propagation data.
@article{acciarini-2024-closing, author = {Acciarini, Giacomo and Baydin, {Atılım Güneş} and Izzo, Dario}, title = {Closing the gap between {SGP4} and high-precision propagation via differentiable programming}, journal = {Acta Astronautica}, year = {2024}, issn = {0094-5765}, doi = {https://doi.org/10.1016/j.actaastro.2024.10.063}, url = {https://www.sciencedirect.com/science/article/pii/S0094576524006374} }
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
@article{bengio-2024-risks, author = {Bengio, Yoshua and Hinton, Geoffrey and Yao, Andrew and Song, Dawn and Abbeel, Pieter and Darrell, Trevor and Harari, Yuval Noah and Zhang, Ya-Qin and Xue, Lan and Shalev-Shwartz, Shai and Hadfield, Gillian and Clune, Jeff and Maharaj, Tegan and Hutter, Frank and Baydin, {Atılım Güneş} and McIlraith, Sheila and Gao, Qiqi and Acharya, Ashwin and Krueger, David and Dragan, Anca and Torr, Philip and Russell, Stuart and Kahneman, Daniel and Brauner, Jan and Mindermann, Sören}, title = {Managing extreme {AI} risks amid rapid progress}, journal = {Science}, volume = {384}, number = {6698}, pages = {842-845}, year = {2024}, doi = {10.1126/science.adn0117}, url = {https://www.science.org/doi/abs/10.1126/science.adn0117}, eprint = {https://www.science.org/doi/pdf/10.1126/science.adn0117} }
Thermospheric density is one of the main sources of uncertainty in the estimation of satellites’ position and velocity in low-Earth orbit. This has negative consequences in several space domains, including space traffic management, collision avoidance, re-entry predictions, orbital lifetime analysis, and space object cataloging. In this paper, we investigate the prediction accuracy of empirical density models (e.g., NRLMSISE-00 and JB-08) against black-box machine learning (ML) models trained on precise orbit determination-derived thermospheric density data (from CHAMP, GOCE, GRACE, SWARM-A/B satellites). We show that by using the same inputs, the ML models we designed are capable of consistently improving the predictions with respect to state-of-the-art empirical models by reducing the mean absolute percentage error (MAPE) in the thermospheric density estimation from the range of 40%–60% to approximately 20%. As a result of this work, we introduce Karman: an open-source Python software package developed during this study. Karman provides functionalities to ingest and preprocess thermospheric density, solar irradiance, and geomagnetic input data for ML readiness. Additionally, it facilitates developing and training ML models on the aforementioned data and benchmarking their performance at different altitudes, geographic locations, times, and solar activity conditions. Through this contribution, we offer the scientific community a comprehensive tool for comparing and enhancing thermospheric density models using ML techniques.
@article{acciarini-2024-thermospheric, title = {Improving Thermospheric Density Predictions in Low-Earth Orbit with Machine Learning}, author = {Acciarini, Giacomo and Brown, Edward and Berger, Thomas and Guhathakurta, Madhulika and Parr, James and Bridges, Christopher and Baydin, {Atılım Güneş}}, journal = {Space Weather}, year = {2024}, volume = {22}, number = {2}, pages = {e2023SW003652}, url = { https://doi.org/10.1029/2023SW003652}, doi = {10.1029/2023SW003652}, publisher = {American Geophysical Union} }
Superresolution (SR) aims to increase the resolution of images by recovering detail. Compared to standard interpolation, deep learning-based approaches learn features and their relationships to leverage prior knowledge of what low-resolution patterns look like in higher resolution. Deep neural networks can also perform image cross-calibration by learning the systematic properties of the target images. While SR for natural images aims to create perceptually convincing results, SR of scientific data requires careful quantitative evaluation. In this work, we demonstrate that deep learning can increase the resolution and calibrate solar imagers belonging to different instrumental generations. We convert solar magnetic field images taken by the Michelson Doppler Imager and the Global Oscillation Network Group to the characteristics of the Helioseismic and Magnetic Imager. We also establish a set of performance measurements to benchmark deep-learning-based SR and calibration for scientific applications.
@article{munozjaramillo-2024-super, title = {Physically Motivated Deep Learning to Superresolve and Cross Calibrate Solar Magnetograms}, author = {{Muñoz-Jaramillo}, Andrés and Jungbluth, Anna and Gitiaux, Xavier and Wright, Paul J. and Shneider, Carl and Maloney, Shane A. and Baydin, {Atılım Güneş} and Gal, Yarin and Deudon, Michel and Kalaitzis, Freddie}, journal = {The Astrophysical Journal Supplement Series}, year = {2024}, volume = {271}, number = {2}, url = {http://dx.doi.org/10.3847/1538-4365/ad12c2}, doi = {10.3847/1538-4365/ad12c2}, publisher = {American Astronomical Society} }
The high energy particles originating from the Sun, known as solar energetic particles (SEPs), contribute significantly to the space radiation environment posing serious threats to the astronauts and scientific instruments on board spacecraft and form a major topic of space wether studies. The mechanism that accelerates the SEPs to the observed energy ranges, their transport in the inner heliosphere, and the influence of suprathermal seed particle spectrum are open questions in heliophysics. Accurate predictions of the occurrences of SEP events well in advance are necessary to mitigate their adverse effects but predictions based on first principle models still remains a challenge. In this scenario, adopting a machine learning approach to SEP modeling and prediction is desirable. However, the lack of a balanced database of SEP events restrains this approach. We addressed this limitation by generating large datasets of synthetic SEP events sampled from the physics-based model, Energetic Particle Radiation Environment Module. Using this data, we developed neural networks-based surrogate models to study the seed population parameter space. Our models run thousands to millions of times faster (depending on computer hardware), making simulation-based inference workflows practicable in SEP studies while providing predictive uncertainty estimates using a deep ensemble approach.
@article{baydin-2023-solar, title = {A Surrogate Model For Studying Solar Energetic Particle Transport and the Seed Population}, author = {Baydin, {Atılım Güneş} and Poduval, Bala and Schwadron, Nathan A.}, journal = {Space Weather}, year = {2023}, volume = {21}, number = {12}, pages = {e2023SW003593}, url = {http://dx.doi.org/10.1029/2023SW003593}, doi = {10.1029/2023SW003593}, publisher = {American Geophysical Union} }
The full optimization of the design and operation of instruments whose functioning relies on the interaction of radiation with matter is a super-human task, due to the large dimensionality of the space of possible choices for geometry, detection technology, materials, data-acquisition, and information-extraction techniques, and the interdependence of the related parameters. On the other hand, massive potential gains in performance over standard, “experience-driven” layouts are in principle within our reach if an objective function fully aligned with the final goals of the instrument is maximized through a systematic search of the configuration space. The stochastic nature of the involved quantum processes make the modeling of these systems an intractable problem from a classical statistics point of view, yet the construction of a fully differentiable pipeline and the use of deep learning techniques may allow the simultaneous optimization of all design parameters. In this white paper, we lay down our plans for the design of a modular and versatile modeling tool for the end-to-end optimization of complex instruments for particle physics experiments as well as industrial and medical applications that share the detection of radiation as their basic ingredient. We consider a selected set of use cases to highlight the specific needs of different applications.
@article{dorigo-2023-differentiable, title = {Toward the End-to-End Optimization of Particle Physics Instruments with Differentiable Programming}, author = {Dorigo, Tommaso and Giammanco, Andrea and Vischia, Pietro and Aehle, Max and Bawaj, Mateusz and Boldyrev, Alexey and {de Castro Manzano}, Pablo and Derkach, Denis and Donini, Julien and Edelen, Auralee and Fanzago, Federica and Gauger, Nicolas R. and Glaser, Christian and Baydin, {Atılım Güneş} and Heinrich, Lukas and Keidel, Ralf and Kieseler, Jan and Krause, Claudius and Lagrange, Maxime and Lamparth, Max and Layer, Lukas and Maier, Gernot and Nardi, Federico and Pettersen, Helge E.S. and Ramos, Alberto and Ratnikov, Fedor and Röhrich, Dieter and {de Austri}, Roberto Ruiz and {del Árbol}, Pablo Martínez Ruiz and Savchenko, Oleg and Simpson, Nathan and Strong, Giles C. and Taliercio, Angela and Tosi, Mia and Ustyuzhanin, Andrey and Zaraket, Haitham}, journal = {Reviews in Physics}, url = {https://doi.org/10.1016/j.revip.2023.100085}, doi = {10.1016/j.revip.2023.100085}, year = {2023}, pages = {100085}, issn = {2405-4283} }
Characterizing exoplanetary atmospheres via Bayesian retrievals requires assuming some chemistry model, such as thermochemical equilibrium or parameterized abundances. The higher-resolution data offered by upcoming telescopes enable more complex chemistry models within retrieval frameworks. Yet many chemistry codes that model more complex processes like photochemistry and vertical transport are computationally expensive, and directly incorporating them into a 1D retrieval model can result in prohibitively long execution times. Additionally, phase-curve observations with upcoming telescopes motivate 2D and 3D retrieval models, further exacerbating the lengthy runtime for retrieval frameworks with complex chemistry models. Here we compare thermochemical equilibrium approximation methods based on their speed and accuracy with respect to a Gibbs energy-minimization code. We find that, while all methods offer orders-of-magnitude reductions in computational cost, neural network surrogate models perform more accurately than the other approaches considered, achieving a median absolute dex error of <0.03 for the phase space considered. While our results are based on a 1D chemistry model, our study suggests that higher-dimensional chemistry models could be incorporated into retrieval models via this surrogate modeling approach.
@article{himes-2023-3dretrieval, title = {Toward 3D Retrieval of Exoplanet Atmospheres: Assessing Thermochemical Equilibrium Estimation Methods}, author = {Himes, Michael D. and Harrington, Josepth and Baydin, Atılım Güneş}, journal = {The Planetary Science Journal}, publisher = {American Astronomical Society}, url = {https://doi.org/10.3847/PSJ/acc939}, doi = {10.3847/PSJ/acc939}, volume = {4}, number = {74}, year = {2023} }
The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, with mission critical measures and robustness throughout the process. Drawing on experience in both spacecraft engineering and machine learning (research through product across domain areas), we’ve developed a proven systems engineering approach for machine learning and artificial intelligence: the Machine Learning Technology Readiness Levels framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for machine learning workflows, including key distinctions from traditional software engineering, and a lingua franca for people across teams and organizations to work collaboratively on machine learning and artificial intelligence technologies. Here we describe the framework and elucidate with use-cases from physics research to computer vision apps to medical diagnostics.
@article{lavin-2022-technology, title = {Technology Readiness Levels for Machine Learning Systems}, author = {Lavin, Alexander and Gilligan-Lee, Ciaran M. and Visnjic, Alessya and Ganju, Siddha and Newman, Dava and Ganguly, Sujoy and Lange, Danny and Baydin, Atılım Güneş and Sharma, Amit and Gibson, Adam and Zheng, Stephan and Gal, Yarin and Xing, Eric P. and Mattmann, Chris and Parr, James}, journal = {Nature Communications}, publisher = {Nature Publishing Group}, year = {2022}, volume = {13}, number = {6039}, url = {https://doi.org/10.1038/s41467-022-33128-9}, doi = {10.1038/s41467-022-33128-9} }
The Solar Dynamics Observatory (SDO), a NASA multi-spectral decade-long mission that has been daily producing terabytes of observational data from the Sun, has been recently used as a use-case to demonstrate the potential of machine learning methodologies and to pave the way for future deep-space mission planning. In particular, the idea of using image-to-image translation to virtually produce extreme ultra-violet channels has been proposed in several recent studies, as a way to both enhance missions with less available channels and to alleviate the challenges due to the low downlink rate in deep space. This paper investigates the potential and the limitations of such a deep learning approach by focusing on the permutation of four channels and an encoder–decoder based architecture, with particular attention to how morphological traits and brightness of the solar surface affect the neural network predictions. In this work we want to answer the question: can synthetic images of the solar corona produced via image-to-image translation be used for scientific studies of the Sun? The analysis highlights that the neural network produces high-quality images over three orders of magnitude in count rate (pixel intensity) and can generally reproduce the covariance across channels within a 1% error. However the model performance drastically diminishes in correspondence of extremely high energetic events like flares, and we argue that the reason is related to the rareness of such events posing a challenge to model training.
@article{salvatelli-2022-synthetic, title = {Exploring the Limits of Synthetic Creation of Solar {EUV} Images via Image-to-Image Translation}, author = {Salvatelli, Valentina and {Guedes dos Santos}, Luiz Fernando and Bose, Souvik and Neuberg, Brad and Cheung, Mark and Janvier, Miho and Jin, Meng and Gal, Yarin and Baydin, Atılım Güneş}, journal = {The Astrophysical Journal}, year = {2022}, volume = {937}, number = {2}, url = {https://doi.org/10.3847/1538-4357/ac867b}, doi = {10.3847/1538-4357/ac867b} }
Atmospheric retrieval determines the properties of an atmosphere based on its measured spectrum. The low signal-to-noise ratios of exoplanet observations require a Bayesian approach to determine posterior probability distributions of each model parameter, given observed spectra. This inference is computationally expensive, as it requires many executions of a costly radiative transfer (RT) simulation for each set of sampled model parameters. Machine learning (ML) has recently been shown to provide a significant reduction in runtime for retrievals, mainly by training inverse ML models that predict parameter distributions, given observed spectra, albeit with reduced posterior accuracy. Here we present a novel approach to retrieval by training a forward ML surrogate model that predicts spectra given model parameters, providing a fast approximate RT simulation that can be used in a conventional Bayesian retrieval framework without significant loss of accuracy. We demonstrate our method on the emission spectrum of HD 189733 b and find good agreement with a traditional retrieval from the Bayesian Atmospheric Radiative Transfer (BART) code (Bhattacharyya coefficients of 0.9843 – 0.9972, with a mean of 0.9925, between 1D marginalized posteriors). This accuracy comes while still offering significant speed enhancements over traditional RT, albeit not as much as ML methods with lower posterior accuracy. Our method is 9× faster per parallel chain than BART when run on an AMD EPYC 7402P central processing unit (CPU). Neural-network computation using an NVIDIA Titan Xp graphics processing unit is 90× – 180× faster per chain than BART on that CPU.
@article{himes-2022-margehomer, title = {Accurate Machine-learning Atmospheric Retrieval via a Neural-network Surrogate Model for Radiative Transfer}, author = {Himes, Michael D. and Harrington, Joseph and Cobb, Adam D. and Baydin, {Atılım Güneş} and Soboczenski, Frank and {O'Beirne}, Molly D. and Zorzan, Simone and Wright, David C. and Scheffer, Zacchaeus and {Domagal-Goldman}, Shawn D. and Arney, Giada N.}, journal = {The Planetary Science Journal}, publisher = {American Astronomical Society}, year = {2022}, volume = {3}, number = {4}, pages = {236--250}, url = {https://doi.org/10.3847/PSJ/abe3fd}, doi = {10.3847/PSJ/abe3fd} }
We develop a machine learning approach to detect and discriminate elephants from other species, and to recognise important behaviours such as running and rumbling, based only on seismic data generated by the animals. We demonstrate our approach using data acquired in the Kenyan savanna, consisting of 8000 hours seismic recordings and 250k camera trap pictures. Our classifiers, different convolutional neural networks trained on seismograms and spectrograms, achieved 80–90% balanced accuracy in detecting elephants up to 100 meters away, and over 90% balanced accuracy in recognising running and rumbling behaviours from the seismic data. We release the dataset used in this study: SeisSavanna represents a unique collection of seismic signals with the associated wildlife species and behaviour. Our results suggest that seismic data offer substantial benefits for monitoring wildlife, and we propose to further develop our methods using dense arrays that could result in a seismic shift for wildlife monitoring.
@article{szenicer-2021-seismic, title = {Seismic savanna: Machine learning for classifying wildlife and behaviours using ground-based vibration field recordings}, author = {Szenicer, Alexandre and Reinwald, Michael and Moseley, Ben and {Nissen-Meyer}, Tarje and Muteti, Zacharia Mutinda and Oduor, Sandy and {McDermott-Roberts}, Alex and Baydin, Atılım Güneş and Mortimer, Beth}, journal = {Remote Sensing in Ecology and Conservation}, publisher = {John Wiley & Sons and Zoological Society of London}, year = {2021}, volume = {8}, number = {2}, pages = {236--250}, url = {https://doi.org/10.1002/rse2.242}, doi = {10.1002/rse2.242} }
Spaceborne Earth observation is a key technology for flood response, offering valuable information to decision makers on the ground. Very large constellations of small, nano satellites - ’CubeSats’ are a promising solution to reduce revisit time in disaster areas from days to hours. However, data transmission to ground receivers is limited by constraints on power and bandwidth of CubeSats. Onboard processing offers a solution to decrease the amount of data to transmit by reducing large sensor images to smaller data products. The ESA’s recent PhiSat-1 mission aims to facilitate the demonstration of this concept, providing the hardware capability to perform onboard processing by including a power-constrained machine learning accelerator and the software to run custom applications. This work demonstrates a flood segmentation algorithm that produces flood masks to be transmitted instead of the raw images, while running efficiently on the accelerator aboard the PhiSat-1. Our models are trained on \worldfloods: a newly compiled dataset of 119 globally verified flooding events from disaster response organizations, which we make available in a common format. We test the system on independent locations, demonstrating that it produces fast and accurate segmentation masks on the hardware accelerator, acting as a proof of concept for this approach.
@article{mateogarcia-2021-global, title = {Towards Global Flood Mapping Onboard Low Cost Satellites with Machine Learning}, author = {Mateo-Garcia, Gonzalo and Veitch-Michaelis, Joshua and Smith, Lewis and Oprea, Silviu and Schumann, Guy and Gal, Yarin and Baydin, Atılım Güneş and Backes, Dietmar}, journal = {Scientific Reports}, publisher = {Nature Publishing Group}, year = {2021}, volume = {11}, number = {7249}, doi = {10.1038/s41598-021-86650-z}, url = {https://doi.org/10.1038/s41598-021-86650-z} }
The design of instruments that rely on the interaction of radiation with matter for their operation is a quite complex task if our goal is to achieve near optimality on some well-defined utility function U, such as the expected precision of a set of planned measurements achievable with a given amount of collected data. This complexity stems from the interplay between physical processes that are intrinsically stochastic in nature—the quantum phenomena that take place at the subnuclear level—and the vast space of possible choices for the physical characteristics of the instrument and its detection elements, as defined in its design phase. The precision of pattern recognition of detected signals and the power of information-extraction procedures that directly affect the value of U both depend on these characteristics. In the majority of realistic cases, U may be represented as a combination of performance and cost considerations that should be balanced within reasonable limitations. Neural networks are naturally suitable for the task mentioned above. They can also be effectively used as surrogates for simulators to enable gradient-based optimization in cases where a simulator is nondifferentiable. In addition, automatic differentiation (AD) techniques developed in the 1980s [2] and now commonly available in the most popular machine learning (ML) frameworks make it possible to rely on efficient implementations of the back-propagation algorithm. The MODE Collaboration (an acronym for Machine-learning Optimized Design of Experiments) aims at developing tools based on deep neural networks and modern AD techniques to implement a full modeling of all the elements of experimental design, achieving end-to-end optimization of the design of instruments via a fully differentiable pipeline capable of exploring the Pareto-optimal frontier of U. Exploratory studies have shown that very large gains in performance are potentially achievable, even for very simple apparatus. MODE has the goal to show how those techniques may be adapted to the complexity of modern and future particle detectors and experiments, while remaining applicable to a number of applications outside of that domain. Below we succinctly describe the research program of the MODE Collaboration.
@article{baydin-2021-experimental, title = {Toward Machine Learning Optimization of Experimental Design}, author = {Baydin, Atılım Güneş and Cranmer, Kyle and {de Castro Manzano}, Pablo and Delaere, Christophe and Derkach, Denis and Donini, Julien and Dorigo, Tommaso and Giammanco, Andrea and Kieseler, Jan and Layer, Lukas and Louppe, Gilles and Ratnikov, Fedor and Strong, Giles C. and Tosi, Mia and Ustyuzhanin, Andrey and Vischia, Pietro and Yarar, Hevjin}, journal = {Nuclear Physics News}, year = {2021}, volume = {31}, number = {1}, pages = {25--28}, publisher = {Taylor \& Francis}, doi = {10.1080/10619127.2021.1881364}, url = {https://doi.org/10.1080/10619127.2021.1881364}, eprint = {https://doi.org/10.1080/10619127.2021.1881364} }
Context. Solar activity plays a quintessential role in influencing the interplanetary medium and space-weather around Earth. Remote sensing instruments on-board heliophysics space missions provide a pool of information about the Sun’s activity, via the measurement of its magnetic field and the emission of light from the multi-layered, multi-thermal, and dynamic solar atmosphere. Extreme UV (EUV) wavelength observations from space help in understanding the subtleties of the outer layers of the Sun, namely the chromosphere and the corona. Unfortunately, such instruments, like the Atmospheric Imaging Assembly (AIA) on-board NASA’s Solar Dynamics Observatory (SDO), suffer from time-dependent degradation that reduces their sensitivity. Current state-of-the-art calibration techniques rely on sounding rocket flights to maintain absolute calibration, which are infrequent, complex, and limited to a single vantage point. Aims. We aim to develop a novel method based on machine learning (ML) that exploits spatial patterns on the solar surface across multi-wavelength observations to auto-calibrate the instrument degradation. Methods. We establish two convolutional neural network (CNN) architectures that take either single-channel or multi-channel input and train the models using the SDOML dataset. The dataset is further augmented by randomly degrading images at each epoch with the training dataset spanning non-overlapping months with the test dataset. We also develop a non-ML baseline model to assess the gain of the CNN models. With the best trained models, we reconstruct the AIA multi-channel degradation curves of 2010–2020 and compare them with the sounding-rocket based degradation curves. Results. Our results indicate that the CNN-based models significantly outperform the non-ML baseline model in calibrating instrument degradation. Moreover, multi-channel CNN outperforms the single-channel CNN, which suggests the importance of crosschannel relations between different EUV channels for recovering the degradation profiles. The CNN-based models reproduce the degradation corrections derived from the sounding rocket cross-calibration measurements within the experimental measurement uncertainty, indicating that it performs equally well when compared with the current techniques. Conclusions. Our approach establishes the framework for a novel technique based on CNNs to calibrate EUV instruments. We envision that this technique can be adapted to other imaging or spectral instruments operating at other wavelengths.
@article{dossantos-2021-multi, title = {Multi-Channel Auto-Calibration for the Atmospheric Imaging Assembly using Machine Learning}, author = {{Guedes dos Santos}, Luiz Fernando and Bose, Souvik and Salvatelli, Valentina and Neuberg, Brad and Cheung, Mark and Janvier, Miho and Jin, Meng and Gal, Yarin and Boerner, Paul and Baydin, Atılım Güneş}, journal = {Astronomy \& Astrophysics}, year = {2021}, volume = {648}, pages = {A53}, doi = {10.1051/0004-6361/202040051}, url = {https://doi.org/10.1051/0004-6361/202040051} }
Machine learning is now used in many areas of astrophysics, from detecting exoplanets in Kepler transit signals to removing telescope systematics. Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, plan-net, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra. We demonstrate that an ensemble provides greater accuracy and more robust uncertainties than a single model. In addition to being the first to use Bayesian neural networks for atmospheric retrieval, we also introduce a new loss function for Bayesian neural networks that learns correlations between the model outputs. Importantly, we show that designing machine learning models to explicitly incorporate domain-specific knowledge both improves performance and provides additional insight by inferring the covariance of the retrieved atmospheric parameters. We apply plan-net to the Hubble Space Telescope Wide Field Camera 3 transmission spectrum for WASP-12b and retrieve an isothermal temperature and water abundance consistent with the literature. We highlight that our method is flexible and can be expanded to higher-resolution spectra and a larger number of atmospheric parameters.
@article{cobb-2019-ensemble, title = {An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval}, author = {Cobb, Adam D. and Himes, Michael D. and Soboczenski, Frank and Zorzan, Simone and O’Beirne, Molly D. and Baydin, Atılım Güneş and Gal, Yarin and Domagal-Goldman, Shawn D. and Arney, Giada N. and Angerhausen, Daniel}, journal = {The Astronomical Journal}, volume = {158}, number = {1}, year = {2019}, doi = {10.3847/1538-3881/ab2390}, url = {https://doi.org/10.3847/1538-3881/ab2390} }
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply “autodiff”, is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other’s results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names “dynamic computational graphs” and “differentiable programming”. We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms “autodiff”, “automatic differentiation”, and “symbolic differentiation” as these are encountered more and more in machine learning settings.
@article{baydin-2018-ad-machinelearning, title = {Automatic differentiation in machine learning: a survey}, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A. and Radul, Alexey Andreyevich and Siskind, Jeffrey Mark}, journal = {Journal of Machine Learning Research (JMLR)}, year = {2018}, volume = {18}, number = {153}, pages = {1-43}, url = {http://jmlr.org/papers/v18/17-468.html} }
We introduce a novel evolutionary algorithm (EA) with a semantic network-based representation. For enabling this, we establish new formulations of EA variation operators, crossover and mutation, that we adapt to work on semantic networks. The algorithm employs commonsense reasoning to ensure all operations preserve the meaningfulness of the networks, using ConceptNet and WordNet knowledge bases. The algorithm can be interpreted as a novel memetic algorithm (MA), given that (1) individuals represent pieces of information that undergo evolution, as in the original sense of memetics as it was introduced by Dawkins; and (2) this is different from existing MA, where the word “memetic” has been used as a synonym for local refinement after global optimization. For evaluating the approach, we introduce an analogical similarity-based fitness measure that is computed through structure mapping. This setup enables the open-ended generation of networks analogous to a given base network.
@article{baydin-2015-semanticnetwork-evolutionary, title = {A semantic network-based evolutionary algorithm for computational creativity}, author = {Baydin, Atılım Güneş and López de Mántaras, Ramon and Ontañón, Santiago}, journal = {Evolutionary Intelligence}, volume = {8}, number = {1}, pages = {3--21}, doi = {10.1007/s12065-014-0119-1}, publisher = {Springer}, year = {2015} }
Central pattern generators (CPGs), with a basis is neurophysiological studies, are a type of neural network for the generation of rhythmic motion. While CPGs are being increasingly used in robot control, most applications are hand-tuned for a specific task and it is acknowledged in the field that generic methods and design principles for creating individual networks for a given task are lacking. This study presents an approach where the connectivity and oscillatory parameters of a CPG network are determined by an evolutionary algorithm with fitness evaluations in a realistic simulation with accurate physics. We apply this technique to a five-link planar walking mechanism to demonstrate its feasibility and performance. In addition, to see whether results from simulation can be acceptably transferred to real robot hardware, the best evolved CPG network is also tested on a real mechanism. Our results also confirm that the biologically inspired CPG model is well suited for legged locomotion, since a diverse manifestation of networks have been observed to succeed in fitness simulations during evolution.
@article{baydin-2012-centralpatterngenerator, title = {Evolution of central pattern generators for the control of a five-link bipedal walking mechanism}, author = {Baydin, Atılım Güneş}, journal = {Paladyn, Journal of Behavioral Robotics}, volume = {3}, number = {1}, pages = {45--53}, doi = {10.2478/s13230-012-0019-y}, year = {2012} }
Recent events, such as the loss of 38 satellites by SpaceX due to a geomagnetic storm have highlighted the importance of having more accurate estimation and prediction of thermospheric density. Solar and geomagnetic activities wield significant influence over the behavior of the thermospheric density, exerting an important impact on spacecraft motion in low-Earth orbit (LEO). The impending Solar Cycle 25’s peak arrives at a time when the number of operational satellites in LEO is surging, driven by the proliferation of mega-constellations. This escalating satellite presence, spanning sectors from defense to commercial applications, increases the intricacy of the operational environment. The accuracy of thermospheric neutral density models, which underpin crucial safety-oriented tasks like satellite collision avoidance and space traffic management, is therefore pivotal. While the importance of solar events on thermospheric density is apparent, currently, the influence of the Sun in thermospheric density models is only included in the form of solar proxies (such as F10.7). This can be underwhelming, leading to mispredictions of thermospheric density values. A shared framework that supports the ingestion of inputs from various sources to devise thermospheric density models, and where thermospheric density models can be compared, is currently lacking. Furthermore, the recent advancements in machine learning (ML) offer a unique opportunity to construct thermospheric density models that use these models to describe the relationship between the Sun and the Earth’s thermosphere. For this reason, this study introduces an open-source software package, called Karman, to help solve this problem. Essential for this, are three steps: first, the preparation and ingestion of input data from several sources in an ML-readiness fashion. Then, the construction of ML models that can be trained on these datasets. Finally, the creation of a benchmarking platform to compare ML models against state-of-the-art empirical models, evaluating their performances under varying conditions, such as geomagnetic storm strength, altitude, and solar irradiance levels. The utility of this framework is demonstrated through various experiments, showcasing its effectiveness in both benchmarking density models and discerning factors driving thermospheric density variations. The study compares the performance of traditional empirical models (NRLMSISE-00 and JB-08) with machine learning models trained on identical inputs. The results reveal a consistent 20-40% improvement in accuracy, highlighting the potential of machine learning techniques. One particularly significant area addressed by this research involves the incorporation of additional inputs to refine density estimations. Current approaches rely on solar proxies for estimating the Sun’s impact on the thermosphere. However, it is suggested that direct Extreme Ultraviolet (EUV) irradiance data could enhance accuracy. The framework outlined in this paper enables the integration of such inputs, facilitating the validation of hypotheses and supporting the evolution of thermospheric density models. In conclusion, this study presents a comprehensive framework for advancing thermospheric density modeling in the context of LEO satellites. Through the development of neural network models, an extensive dataset, and a benchmarking platform, the paper contributes significantly to the improvement of satellite trajectory predictions. As the space environment becomes increasingly intricate, tools such as the presented framework are crucial for maintaining the safety and effectiveness of satellite operations in LEO.
@inproceedings{acciarini-2023-karman, title = {Karman -- a Machine Learning Software Package for Benchmarking Thermospheric Density Models}, author = {Acciarini, Giacomo and Brown, Edward and Bridges, Christopher and Baydin, Atılım Güneş and Berger, Thomas E. and Guhathakurta, Madhulika}, booktitle = {Advanced Maui Optical and Space Surveillance Technologies (AMOS) Conference, 19--22 September 2023}, year = {2023} }
The risk of collisions in Earth’s orbit is growing markedly. In January 2021, SpaceX and OneWeb released an operator-to-operator fact sheet that highlights the critical reliance on conjunction data messages (CDMs) and observations, demonstrating the need for a diverse sensing environment for orbital objects. Recently, the University of Oxford and the University of Surrey developed, in collaboration with Trillium Technologies and the European Space Operations Center, an open-source Python package for modeling the spacecraft collision avoidance process, called Kessler. Such tools can be used for importing/exporting CDMs in their standard format, modeling the current low-Earth orbit (LEO) population and its short-term propagation from a given catalog file, as well as modeling the evolution of conjunction events based on the current population and observation scenarios, hence emulating the CDMs generation process of the Combined Space Operations Center (CSpOC). The model also provides probabilistic programming and ML tools to predict future collision events and to perform Bayesian inference (i.e., optimal use of all available observations). In the framework of a United Kingdom Space Agency-funded project and with Cranfield University, we analyze and study the impact of megaconstellations and observation models in the collision avoidance process. First, we developed realistic radar and optical observation models to add to the Kessler tools. We then monitor and report how the number of CDMs varies, according to different observation models and different observation schedules (i.e., more and less frequent). The observation models will emulate radar observation strategies. Then, we analyze the impact of future megaconstellations (7 future systems, with 24,787 more satellites in varying orbits) on the number of warnings generated from the increase in the probability of collision leading to an increased burden on space operators. FCC licenses were used to identify credible megaconstellation sources to understand how a potential five-fold increase in active satellites will impact LEO situational safety. We finally present how our simulations help understand the impact of these future megaconstellations on the current population, and how we can devise better ground observation strategies to quantify future observation needs and reduce the burden on operators.
@inproceedings{acciarini-2022-observation, title = {Observation Strategies and Megaconstellations Impact on Current LEO Population}, author = {Acciarini, Giacomo and Baresi, Nicola and Bridges, Christopher and Felicetti, Leonard and Hobbs, Stephen and Baydin, Atılım Güneş}, booktitle = {2nd {ESA} Near-Earth Object and Debris Detection Conference, 24--26 January 2023}, year = {2022} }
We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates’ ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing.
@inproceedings{munk-2022-probabilistic, title = {Probabilistic Surrogate Networks for Simulators with Unbounded Randomness}, author = {Munk, Andreas and Zwartsenberg, Berend and Ścibior, Adam and Baydin, Atılım Güneş and Stewart, Andrew and Fernlund, Goran and Poursartip, Anoush and Wood, Frank}, booktitle = {38th Conference on Uncertainty in Artificial Intelligence (UAI)}, year = {2022} }
We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.
@inproceedings{harvey-2022-attention, title = {Attention for Inference Compilation}, author = {Harvey, William and Munk, Andreas and Bergholm, Alexander and Baydin, Atılım Güneş and Wood, Frank}, booktitle = {12th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH)}, year = {2022} }
Domain adaptation is an important problem and often needed for real-world applications. In this problem, instead of i.i.d. training and testing datapoints, we assume that the source (training) data and the target (testing) data have different distributions. With that setting, the empirical risk minimization training procedure often does not perform well, since it does not account for the change in the distribution. A common approach in the domain adaptation literature is to learn a representation of the input that has the same (marginal) distribution over the source and the target domain. However, these approaches often require additional networks and/or optimizing an adversarial (minimax) objective, which can be very expensive or unstable in practice. To improve upon these marginal alignment techniques, in this paper, we first derive a generalization bound for the target loss based on the training loss and the reverse Kullback-Leibler (KL) divergence between the source and the target representation distributions. Based on this bound, we derive an algorithm that minimizes the KL term to obtain a better generalization to the target domain. We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples without any additional network or a minimax objective. This leads to a theoretically sound alignment method which is also very efficient and stable in practice. Experimental results also suggest that our method outperforms other representation-alignment approaches.
@inproceedings{nguyen-2022-kl, title = {{KL} Guided Domain Adaptation}, author = {Nguyen, Tuan and Tran, Toan and Gal, Yarin and Torr, Philip H.S. and Baydin, Atılım Güneş}, booktitle = {Tenth International Conference on Learning Representations (ICLR)}, year = {2022} }
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method’s correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework.
@inproceedings{naderiparizi-2022-amortized, title = {Amortized Rejection Sampling in Universal Probabilistic Programming}, author = {Naderiparizi, Saeid and {\'S}cibior, Adam and Munk, Andreas and Ghadiri, Mehrdad and Baydin, At{\i}l{\i}m G{\"u}ne{\c{s}} and Gram-Hansen, Bradley and de Witt, Christian Schroeder and Zinkov, Robert and Torr, Philip H.S. and Rainforth, Tom and Teh, Yee Whye and Wood, Frank}, booktitle = {Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2022} }
Domain generalization is the problem where we aim to train a model with data from a set of source domains so that the model can generalize to unseen target domains. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform poorly, since the information learned by that model might be domain-specific and cannot generalize well to target domains. To tackle this problem, a predominant approach is to find and learn some domain-invariant information and use that for the prediction problem. In this paper, we propose a theoretically grounded method to learn a domain-invariant representation by enforcing the representation network to be invariant under all transformation functions among domains. We also show how to use Generative Adversarial Networks to learn such domain transformations to implement our method in practice. We illustrate the effectiveness of our method on several widely used datasets for domain generalization problem, on all of which we achieve competitive results with state-of-the-art models.
@inproceedings{nguyen-2021-domain, title = {Domain Invariant Representation Learning with Domain Density Transformations}, author = {Nguyen, Tuan and Tran, Toan and Gal, Yarin and Baydin, Atılım Güneş}, booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS)}, year = {2021} }
@inproceedings{acciarini-2020-automated, title = {Kessler: a Machine Learning Library for Space Collision Avoidance}, author = {Acciarini, Giacomo and Pinto, Francesco and Metz, Sascha and Boufelja, Sarah and Kaczmarek, Sylvester and Merz, Klaus and Martinez-Heras, José A. and Letizia, Francesca and Bridges, Christopher and Baydin, Atılım Güneş}, booktitle = {8th European Conference on Space Debris}, year = {2021} }
We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than baseline methods, including Bayesian optimization, numerical optimization, and approaches using score function gradient estimators.
@inproceedings{shirobokov-2020-blackbox, title = {Black-Box Optimization with Local Generative Surrogates}, author = {Shirobokov, Sergey and Belavin, Vladislav and Kagan, Michael and Ustyuzhanin, Andrey and Baydin, Atılım Güneş}, booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS)}, year = {2020} }
Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to 50x), with significantly reduced training data generation and better accuracy on real-world test datasets than previous methods.
@inproceedings{behl-2020-autosimulate, title = {AutoSimulate: (Quickly) Learning Synthetic Data Generation}, author = {Behl, Harkirat Singh and Baydin, Atılım Güneş and Gal, Ran and Torr, Philip H. S. and Vineet, Vibhav}, booktitle = {16th European Conference on Computer Vision (ECCV)}, year = {2020} }
We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.
@inproceedings{baydin-2019-quest-for-physics, title = {Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model}, author = {Baydin, Atılım Güneş and Heinrich, Lukas and Bhimji, Wahid and Shao, Lei and Naderiparizi, Saeid and Munk, Andreas and Liu, Jialin and Gram-Hansen, Bradley and Louppe, Gilles and Meadows, Lawrence and Torr, Philip and Lee, Victor and Prabhat and Cranmer, Kyle and Wood, Frank}, booktitle = {Advances in Neural Information Processing Systems 33 (NeurIPS)}, year = {2019} }
Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN–LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL.
@inproceedings{baydin-2019-etalumis, author = {Baydin, Atılım Güneş and Shao, Lei and Bhimji, Wahid and Heinrich, Lukas and Meadows, Lawrence F. and Liu, Jialin and Munk, Andreas and Naderiparizi, Saeid and Gram-Hansen, Bradley and Louppe, Gilles and Ma, Mingfei and Zhao, Xiaohui and Torr, Philip and Lee, Victor and Cranmer, Kyle and Prabhat and Wood, Frank}, title = {Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale}, year = {2019}, isbn = {9781450362290}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3295500.3356180}, doi = {10.1145/3295500.3356180}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, articleno = {Article 29}, numpages = {24}, keywords = {inference, probabilistic programming, deep learning, simulation}, location = {Denver, Colorado}, series = {SC ’19} }
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.
@inproceedings{baydin-2018-hypergradient, title = {Online Learning Rate Adaptation with Hypergradient Descent}, author = {Baydin, Atılım Güneş and Cornish, Robert and Rubio, David Martínez and Schmidt, Mark and Wood, Frank}, booktitle = {Sixth International Conference on Learning Representations (ICLR), Vancouver, Canada, April 30 -- May 3, 2018}, year = {2018} }
We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.
@inproceedings{le-2016-inference-compilation, author = {Le, Tuan Anh and Baydin, Atılım Güneş and Wood, Frank}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)}, title = {Inference Compilation and Universal Probabilistic Programming}, year = {2017}, volume = {54}, pages = {1338--1348}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, publisher = {PMLR} }
We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.
@inproceedings{le-2016-synthetic-data, author = {Le, Tuan Anh and Baydin, Atılım Güneş and Zinkov, Robert and Wood, Frank}, booktitle = {30th International Joint Conference on Neural Networks, Anchorage, AK, USA, May 14--19, 2017}, title = {Using Synthetic Data to Train Neural Networks is Model-Based Reasoning}, year = {2017} }
The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectively optimized. The result now dominates much of practical machine learning, with applications in machine translation, computer vision, and speech recognition. Many of these methods, viewed through the lens of algorithmic differentiation (AD), can be seen as either addressing issues with the gradient itself, or finding ways of achieving increased efficiency using tricks that are AD-related, but not provided by current AD systems. The goal of this paper is to explain not just those methods of most relevance to AD, but also the technical constraints and mindset which led to their discovery. After explaining this context, we present a “laundry list” of methods developed by the deep learning community. Two of these are discussed in further mathematical detail: a way to dramatically reduce the size of the tape when performing reverse-mode AD on a (theoretically) time-reversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton’s method.
@inproceedings{baydin-2016-tricks-from-deep-learning, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A. and Siskind, Jeffrey Mark}, booktitle = {7th International Conference on Algorithmic Differentiation, Christ Church Oxford, UK, September 12--15, 2016}, title = {Tricks from Deep Learning}, year = {2016} }
DiffSharp is an algorithmic differentiation (AD) library for the .NET ecosystem, which is targeted by the C# and F# languages, among others. The library has been designed with machine learning applications in mind \citepBaydin2015b, allowing very succinct implementations of models and optimization routines. DiffSharp is implemented in F# and exposes forward and reverse AD operators as general nestable higher-order functions, usable by any .NET language. It provides high-performance linear algebra primitives—scalars, vectors, and matrices, with a generalization to tensors underway—that are fully supported by all the AD operators, and which use a BLAS/LAPACK backend via the highly optimized OpenBLAS library. DiffSharp currently uses operator overloading, but we are developing a transformation-based version of the library using F#’s “code quotation” metaprogramming facility \citepSyme2006. Work on a CUDA-based GPU backend is also underway.
@inproceedings{baydin-2016-diffsharp-an-ad-library, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A. and Siskind, Jeffrey Mark}, booktitle = {7th International Conference on Algorithmic Differentiation, Christ Church Oxford, UK, September 12--15, 2016}, title = {DiffSharp: An AD Library for .NET Languages}, year = {2016} }
This paper presents a new type of evolutionary algorithm (EA) based on the concept of “meme”, where the individuals forming the population are represented by semantic networks and the fitness measure is defined as a function of the represented knowledge. Our work can be classified as a novel memetic algorithm (MA), given that (1) it is the units of culture, or information, that are undergoing variation, transmission, and selection, very close to the original sense of memetics as it was introduced by Dawkins; and (2) this is different from existing MA, where the idea of memetics has been utilized as a means of local refinement by individual learning after classical global sampling of EA. The individual pieces of information are represented as simple semantic networks that are directed graphs of concepts and binary relations, going through variation by memetic versions of operators such as crossover and mutation, which utilize knowledge from commonsense knowledge bases. In evaluating this introductory work, as an interesting fitness measure, we focus on using the structure mapping theory of analogical reasoning from psychology to evolve pieces of information that are analogous to a given base information. Considering other possible fitness measures, the proposed representation and algorithm can serve as a computational tool for modeling memetic theories of knowledge, such as evolutionary epistemology and cultural selection theory.
@inproceedings{baydin-2012-evolution-of-ideas, author = {Baydin, Atılım Güneş and López de Mántaras, Ramon}, booktitle = {Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2012, IEEE World Congress On Computational Intelligence, WCCI 2012, Brisbane, Australia, June 10--15, 2012}, title = {Evolution of ideas: A novel memetic algorithm based on semantic networks}, year = {2012}, doi = {10.1109/CEC.2012.6252886}, pages = {1--8} }
Analogy plays an important role in creativity, and is extensively used in science as well as art. In this paper we introduce a technique for the automated generation of cross-domain analogies based on a novel evolutionary algorithm (EA). Unlike existing work in computational analogy-making restricted to creating analogies between two given cases, our approach, for a given case, is capable of creating an analogy along with the novel analogous case itself. Our algorithm is based on the concept of "memes", which are units of culture, or knowledge, undergoing variation and selection under a fitness measure, and represents evolving pieces of knowledge as semantic networks. Using a fitness function based on Gentner’s structure mapping theory of analogies, we demonstrate the feasibility of spontaneously generating semantic networks that are analogous to a given base network.
@inproceedings{baydin-2012-crossdomain-analogies, author = {Baydin, Atılım Güneş and López de Mántaras, Ramon and Ontañón, Santiago}, booktitle = {Proceedings of the International Conference on Computational Creativity (ICCC 2012), Dublin, Ireland, May 30--June 1, 2012}, title = {Automated generation of cross-domain analogies via evolutionary computation}, year = {2012}, pages = {25--32} }
Mediation is an important method in dispute resolution. We implement a case based reasoning approach to mediation integrating analogical and commonsense reasoning components that allow an artificial mediation agent to satisfy requirements expected from a human mediator, in particular: utilizing experience with cases in different domains; and structurally transforming the set of issues for a better solution. We utilize a case structure based on ontologies reflecting the perceptions of the parties in dispute. The analogical reasoning component, employing the Structure Mapping Theory from psychology, provides a flexibility to respond innovatively in unusual circumstances, in contrast with conventional approaches confined into specialized problem domains. We aim to build a mediation case base incorporating real world instances ranging from interpersonal or intergroup disputes to international conflicts.
@incollection{baydin-2011-cbr-commonsense-structuremapping, year = {2011}, isbn = {978-3-642-23290-9}, booktitle = {Case-Based Reasoning Research and Development}, volume = {6880}, series = {Lecture Notes in Computer Science}, editor = {Ram, Ashwin and Wiratunga, Nirmalie}, doi = {10.1007/978-3-642-23291-6_28}, title = {CBR with Commonsense Reasoning and Structure Mapping: An Application to Mediation}, url = {http://dx.doi.org/10.1007/978-3-319-23461-8_36}, publisher = {Springer Berlin Heidelberg}, author = {Baydin, Atılım Güneş and López de Mántaras, Ramon and Simoff, Simeon and Sierra, Carles}, pages = {378--392} }
This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a K-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian-Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their second-order quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information for K-dimensional hyperplane search (FoMoH-KD). We derive the convergence properties of FoMoH-KD and show how it generalizes to Newton’s method for K = D. We also compare its convergence rate to forward gradient descent (FGD) and show FoMoH-KD has an exponential convergence rate compared to FGD’s linear convergence for positive definite quadratic functions. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation.
@inproceedings{cobb-2024-optimization, title = {Second-Order Forward-Mode Automatic Differentiation for Optimization}, author = {Cobb, Adam D. and Baydin, {Atılım Güneş} and Pearlmutter, Barak A. and Jha, Susmit}, booktitle = {16th Annual Workshop on Optimization for Machine Learning, NeurIPS 2024}, year = {2024} }
Estimates of seismic wave speeds in the Earth (seismic velocity models) are key input parameters to earthquake simulations for ground motion prediction. Owing to the non-uniqueness of the seismic inverse problem, typically many velocity models exist for any given region. The arbitrary choice of which velocity model to use in earthquake simulations impacts ground motion predictions. However, current hazard analysis methods do not account for these sources of uncertainty. We present a proof-of-concept ground motion prediction workflow for incorporating uncertainties arising from inconsistencies between existing seismic velocity models. Our analysis is based on the probabilistic fusion of overlapping seismic velocity models using scalable Gaussian process (GP) regression. Specifically, we fit a GP to two synthetic 1-D velocity profiles simultaneously, and show that the predictive uncertainty accounts for the differences between the models. We subsequently draw velocity model samples from the predictive distribution and estimate peak ground displacement using acoustic wave propagation through the velocity models. The resulting distribution of possible ground motion amplitudes is much wider than would be predicted by simulating shaking using only the two input velocity models. This proof-of-concept illustrates the importance of probabilistic methods for physics-based seismic hazard analysis.
@inproceedings{scivier-2024-gaussian, title = {Gaussian Processes for Probabilistic Estimates of Earthquake Ground Shaking: A 1-D Proof-of-Concept}, author = {Scivier, Sam A. and {Nissen-Meyer}, Tarje and Koelemeijer, Paula and Baydin, {Atılım Güneş}}, booktitle = {Machine Learning and the Physical Sciences workshop, NeurIPS 2024}, year = {2024} }
Accurate estimation of thermospheric density is critical for precise modeling of satellite drag forces in low Earth orbit (LEO). Improving this estimation is crucial to tasks such as state estimation, collision avoidance, and re-entry calculations. The largest source of uncertainty in determining thermospheric density is modeling the effects of space weather driven by solar and geomagnetic activity. Current operational models rely on ground-based proxy indices which imperfectly correlate with the complexity of solar outputs and geomagnetic responses. In this work, we directly incorporate NASA’s Solar Dynamics Observatory (SDO) extreme ultraviolet (EUV) spectral images into a neural thermospheric density model to determine whether the predictive performance of the model is increased by using space-based EUV imagery data instead of, or in addition to, the ground-based proxy indices. We demonstrate that EUV imagery can enable predictions with much higher temporal resolution and replace ground-based proxies while significantly increasing performance relative to current operational models. Our method paves the way for assimilating EUV image data into operational thermospheric density forecasting models for use in LEO satellite navigation processes.
@inproceedings{malik-2023-thermospheric, title = {High-Cadence Thermospheric Density Estimation enabled by Machine Learning on Solar Imagery}, author = {Malik, Shreshth A. and Walsh, James and Acciarini, Giacomo and Berger, Thomas E. and Baydin, {Atılım Güneş}}, booktitle = {Machine Learning and the Physical Sciences workshop, NeurIPS 2023}, year = {2023} }
Molecular complexity has been proposed as a potential agnostic biosignature – in other words: a way to search for signs of life beyond Earth without relying on “life as we know it.” More than one way to compute molecular complexity has been proposed, so comparing their performance in evaluating experimental data collected in situ, such as on board a probe or rover exploring another planet, is imperative. Here, we report the results of an attempt to deploy multiple machine learning (ML) techniques to predict molecular complexity scores directly from mass spectrometry data. Our initial results are encouraging and may provide fruitful guidance toward determining which complexity measures are best suited for use with experimental data. Beyond the search for signs of life, this approach is likewise valuable for studying the chemical composition of samples to assist decisions made by the rover or probe, and may thus contribute toward supporting the need for greater autonomy.
@inproceedings{gebhard-2022-molecular, title = {Inferring molecular complexity from mass spectrometry data using machine learning}, author = {Gebhard, Timothy D. and Bell, Aaron and Gong, Jian and Hastings, Jaden J.A. and Fricke, George M. and Cabrol, Nathalie and Sandford, Scott and Phillips, Michael and Warren-Rhodes, Kimberley and Baydin, {Atılım Güneş}}, booktitle = {Machine Learning and the Physical Sciences workshop, NeurIPS 2022}, year = {2022} }
Online disinformation is a dynamic and pervasive problem on social networks as evidenced by a spate of public disasters in light of active efforts to combat it. Since the massive amounts of content generated each day on these platforms is impossible to manually curate, ranking and recommendation algorithms are a key apparatus that drive user interactions. However, the vulnerability of ranking and recommendation algorithms to attack from coordinated campaigns spreading misleading information has been established both theoretically and anecdotally. Unfortunately it is unclear how effective countermeasures to disinformation are in practice due to the limited view we have into the operation of such platforms. In such settings, simulations have emerged as a popular technique to study the long-term effects of content ranking and recommendation systems. We develop a multiagent simulation of a popular social network, Reddit, that aligns with the state–action space available to real users based on the platform’s affordances. We collect millions of real-world interactions from Reddit to estimate the network for each user in our dataset and utilise Reddit’s self-described content ranking strategies to compare the impact of coordinated activity on content spread by each strategy. We expect that this will inform the design of robust content distribution systems that are resilient against targeted attacks by groups of malicious actors.
@inproceedings{mehta-2022-inauthentic, title = {Estimating the Impact of Coordinated Inauthentic Behavior on Content Recommendations in Social Networks}, author = {Mehta, Swapneel and State, Bogdan and Bonneau, Richard and Nagler, Jonathan and Torr, Philip and Baydin, {Atılım Güneş}}, booktitle = {AI for Agent-Based Modelling Workshop (AI4ABM) at the International Conference on Machine Learning (ICML) 2022}, year = {2022} }
Accurately estimating spacecraft location is of crucial importance for a variety of safety-critical tasks in low-Earth orbit (LEO), including satellite collision avoidance and re-entry. The solar activity largely impacts the physical characteristics of the thermosphere, consequently affecting the trajectories of spacecraft in LEO. Stateof-the-art models for estimating thermospheric density are either computationally expensive or under-perform during extreme solar activity. Moreover, these models provide single-point solutions, neglecting critical information on the associated uncertainty. In this work we use and compare two methods, Monte Carlo dropout and deep ensembles, to estimate thermospheric total mass density and associated uncertainty. The networks are trained using ground-truth density data from five wellcalibrated satellites, using orbital data information, solar and geomagnetic indices as input. The trained models improve for a subset of satellites upon operational solutions, also providing a measure of uncertainty in the density estimation.
@inproceedings{bonasera-2021-ensemble, title = {Dropout and Ensemble Networks for Thermospheric Density Uncertainty Estimation}, author = {Bonasera, Stefano and Acciarini, Giacomo and {Pérez-Hernández}, Jorge A. and Benson, Bernard and Brown, Edward and Sutton, Eric and Jah, Moriba K. and Bridges, Christopher and Baydin, {Atılım Güneş}}, booktitle = {Bayesian Deep Learning workshop, {NeurIPS} 2021}, year = {2021} }
This study uses a sigma-variational autoencoder to learn a latent space of solar images using the 12 channels taken by Atmospheric Imaging Assembly (AIA) and the Helioseismic and Magnetic Imager (HMI) instruments on-board the NASA Solar Dynamics Observatory. The model is able to significantly compress the large image dataset to 0.19% of its original size while still proficiently reconstructing the original images. As a downstream task making use of the learned representation, this study demonstrates the of use the learned latent space as an input to improve the forecasts of the F30 solar radio flux index, compared to an off-the-shelf pretrained ResNet feature extractor. Finally, the developed models can be used to generate realistic synthetic solar images by sampling from the learned latent space.
@inproceedings{brown-2021-learning, title = {Learning the solar latent space: sigma-variational autoencoders for multiple channel solar imaging}, author = {Brown, Edward and Bonasera, Stefano and Benson, Bernard and {Pérez-Hernández}, Jorge A. and Acciarini, Giacomo and Baydin, {Atılım Güneş} and Bridges, Christopher and Jin, Meng and Sutton, Eric and Jah, Moriba K.}, booktitle = {Fourth Workshop on Machine Learning and the Physical Sciences ({NeurIPS} 2021)}, year = {2021} }
Solar radio flux along with geomagnetic indices are important indicators of solar activity and its effects. Extreme solar events such as flares and geomagnetic storms can negatively affect the space environment including satellites in low-Earth orbit. Therefore, forecasting these space weather indices is of great importance in space operations and science. In this study, we propose a model based on long shortterm memory neural networks to learn the distribution of time series data with the capability to provide a simultaneous multivariate 27-day forecast of the space weather indices using time series as well as solar image data. We show a 30–40% improvement of the root mean-square error while including solar image data with time series data compared to using time series data alone. Simple baselines such as a persistence and running average forecasts are also compared with the trained deep neural network models. We also quantify the uncertainty in our prediction using a model ensemble.
@inproceedings{benson-2021-simultaneous, title = {Simultaneous Multivariate Forecast of Space Weather Indices using Deep Neural Network Ensembles}, author = {Benson, Bernard and Brown, Edward and Bonasera, Stefano and Acciarini, Giacomo and {Pérez-Hernández}, Jorge A. and Sutton, Eric and Jah, Moriba K. and Bridges, Christopher and Jin, Meng and Baydin, {Atılım Güneş}}, booktitle = {Fourth Workshop on Machine Learning and the Physical Sciences ({NeurIPS} 2021)}, year = {2021} }
We propose the use of probabilistic programming techniques to tackle the malicious user identification problem in a recommendation algorithm. Probabilistic programming provides numerous advantages over other techniques, including but not limited to providing a disentangled representation of how malicious users acted under a structured model, as well as allowing for the quantification of damage caused by malicious users. We show experiments in malicious user identification using a model of regular and malicious users interacting with a simple recommendation algorithm, and provide a novel simulation-based measure for quantifying the effects of a user or group of users on its dynamics.
@inproceedings{gambardella-2021-detecting, title = {Detecting and Quantifying Malicious Activity with Simulation-based Inference}, author = {Gambardella, Andrew and State, Bogdan and Khan, Naeemullah and Tsourides, Kleovoulos and Torr, Philip and Baydin, Atılım Güneş}, booktitle = {{ICML} Workshop on Socially Responsible Machine Learning}, year = {2021} }
@inproceedings{poduval-2021-studying, title = {Studying Solar Energetic Particles and Their Seed Population Using Surrogate Models}, author = {Poduval, Bala and Baydin, Atılım Güneş and Schwadron, Nathan}, booktitle = {Machine Learning for Space Sciences workshop, 43rd Committee on Space Research (COSPAR) Scientific Assembly, Sydney, Australia}, year = {2021} }
After decades of space travel, low Earth orbit is a junkyard of discarded rocket bod-ies, dead satellites, and millions of pieces of debris from collisions and explosions.Objects in high enough altitudes do not re-enter and burn up in the atmosphere, butstay in orbit around Earth for a long time. With a speed of 28,000 km/h, collisionsin these orbits can generate fragments and potentially trigger a cascade of morecollisions known as the Kessler syndrome. This could pose a planetary challenge,because the phenomenon could escalate to the point of hindering future spaceoperations and damaging satellite infrastructure critical for space and Earth scienceapplications. As commercial entities place mega-constellations of satellites in orbit,the burden on operators conducting collision avoidance manoeuvres will increase.For this reason, development of automated tools that predict potential collisionevents (conjunctions) is critical. We introduce a Bayesian deep learning approachto this problem, and develop recurrent neural network architectures (LSTMs) thatwork with time series of conjunction data messages (CDMs), a standard data formatused by the space community. We show that our method can be used to modelall CDM features simultaneously, including the time of arrival of future CDMs,providing predictions of conjunction event evolution with associated uncertainties.
@inproceedings{pinto-2020-automated, title = {Towards Automated Satellite Conjunction Management with Bayesian Deep Learning}, author = {Pinto, Francesco and Acciarini, Giacomo and Metz, Sascha and Boufelja, Sarah and Kaczmarek, Sylvester and Merz, Klaus and Martinez-Heras, José A. and Letizia, Francesca and Bridges, Christopher and Baydin, Atılım Güneş}, booktitle = {AI for Earth Sciences Workshop at NeurIPS 2020, Vancouver, Canada}, year = {2020} }
Over 34,000 objects bigger than 10 cm in length are known to orbit Earth. Amongthem, only a small percentage are active satellites, while the rest of the populationis made of dead satellites, rocket bodies, and debris that pose a collision threatto operational spacecraft. Furthermore, the predicted growth of the space sectorand the planned launch of megaconstellations will add even more complexity,therefore causing the collision risk and the burden on space operators to increase.Managing this complex framework with internationally agreed methods is pivotaland urgent. In this context, we build a novel physics-based probabilistic generativemodel for synthetically generating conjunction data messages, calibrated usingreal data. By conditioning on observations, we use the model to obtain posteriordistributions via Bayesian inference. We show that the probabilistic programmingapproach to conjunction assessment can help in making predictions and in findingthe parameters that explain the observed data in conjunction data messages, thusshedding more light on key variables and orbital characteristics that more likelylead to conjunction events. Moreover, our technique enables the generation ofphysically accurate synthetic datasets of collisions, answering a fundamental needof the space and machine learning communities working in this area.
@inproceedings{acciarini-2020-spacecraft, title = {Spacecraft Collision Risk Assessment with Probabilistic Programming}, author = {Acciarini, Giacomo and Pinto, Francesco and Metz, Sascha and Boufelja, Sarah and Kaczmarek, Sylvester and Merz, Klaus and Martinez-Heras, José A. and Letizia, Francesca and Bridges, Christopher and Baydin, Atılım Güneş}, booktitle = {Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada}, year = {2020} }
A key component to the success of deep learning is the use of gradient-based optimization. Deep learning practitioners compose a variety of modules together to build a complex computational pipeline that may depend on millions or billions of parameters. Differentiating such functions is enabled through a computational technique known as automatic differentiation. The success of deep learning has led to an abstraction known as differentiable programming, which is being promoted to a first-class citizen in many programming languages and data analysis frameworks. This often involves replacing some common non-differentiable operations (eg. binning, sorting) with relaxed, differentiable analogues. The result is a system that can be optimized from end-to-end using efficient gradient-based optimization algorithms. A differentiable analysis could be optimized in this way—basic cuts to final fits all taking into account full systematic errors and automatically analyzed. This Snowmass LOI outlines the potential advantages and challenges of adopting a differentiable programming paradigm in high-energy physics.
@inproceedings{baydin-2020-differentiable, title = {Differentiable Programming in High-Energy Physics}, author = {Baydin, Atılım Güneş and Cranmer, Kyle and Feickert, Matthew and Gray, Lindsey and Heinrich, Lukas and Held, Alexander and Melo, Andrew and Neubauer, Mark and Pearkes, Jannicke and Simpson, Nathan and Smith, Nick and Stark, Giordon and Thais, Savannah and Vassilev, Vassil and Watts, Gordon}, booktitle = {Snowmass 2021 Letters of Interest (LOI), Division of Particles and Fields (DPF), American Physical Society}, year = {2020}, url = {https://snowmass21.org/loi} }
The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models (CovidSim and OpenMalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
@inproceedings{schroederdewitt-2020-simulation, title = {Simulation-Based Inference for Global Health Decisions}, author = {{Schroeder de Witt}, Christian and Gram-Hansen, Bradley and Nardelli, Nantas and Gambardella, Andrew and Zinkov, Rob and Dokania, Puneet and Siddharth, N. and Espinosa-Gonzalez, Ana Belen and Darzi, Ara and Torr, Philip and Baydin, Atılım Güneş}, booktitle = {ICML Workshop on Machine Learning for Global Health, Thirty-seventh International Conference on Machine Learning (ICML 2020)}, year = {2020} }
@inproceedings{naderiparizi-2020-amortized, title = {Amortized Rejection Sampling in Universal Probabilistic Programming}, author = {Naderiparizi, Saeid and {\'S}cibior, Adam and Munk, Andreas and Ghadiri, Mehrdad and Baydin, At{\i}l{\i}m G{\"u}ne{\c{s}} and Gram-Hansen, Bradley and de Witt, Christian Schroeder and Zinkov, Robert and Torr, Philip H.S. and Rainforth, Tom and Teh, Yee Whye and Wood, Frank}, booktitle = {International Conference on Probabilistic Programming (PROBPROG 2020), Cambridge, MA, United States}, year = {2020}, url = {https://probprog.cc/} }
@inproceedings{munk-2020-deep, title = {Deep Probabilistic Surrogate Networks for Universal Simulator Approximation}, author = {Munk, Andreas and Ścibior, Adam and Baydin, Atılım Güneş and Stewart, Andrew and Fernlund, Goran and Poursartip, Anoush and Wood, Frank}, booktitle = {International Conference on Probabilistic Programming (PROBPROG 2020), Cambridge, MA, United States}, year = {2020}, url = {https://probprog.cc/} }
@inproceedings{harvey-2020-attention, title = {Attention for Inference Compilation}, author = {Harvey, William and Munk, Andreas and Baydin, Atılım Güneş and Bergholm, Alexander and Wood, Frank}, booktitle = {International Conference on Probabilistic Programming (PROBPROG 2020), Cambridge, MA, United States}, year = {2020}, url = {https://probprog.cc/} }
Satellite imaging is a critical technology for monitoring and responding to natural disasters such as flooding. Despite the capabilities of modern satellites, there is still much to be desired from the perspective of first response organisations like UNICEF. Two main challenges are rapid access to data, and the ability to automatically identify flooded regions in images. We describe a prototypical flood segmentation system that could be deployed on a constellation of small satellites, performing processing on board to reduce downlink bandwidth by 2 orders of magnitude. We target PhiSat-1, part of the FSSCAT mission, which is planned to be launched by the European Space Agency (ESA) near the start of 2020 as a proof of concept for this new technology.
@inproceedings{mateogarcia-2019-orbital, title = {Flood Detection On Low Cost Orbital Hardware}, author = {{Mateo-Garcia}, Gonzalo and Oprea, Silviu and Smith, Lewis and {Veitch-Michaelis}, Joshua and Baydin, Atılım Güneş and Backes, Dietmar}, booktitle = {Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
We introduce a new method for performing inference in Bayesian neural networks (BNNs) using Hamiltonian Monte Carlo (HMC). We show how the previously introduced semi-separable HMC sampling scheme can be adapted to BNNs, which allows us to integrate over both the parameters and hyperparameters. We derive a suitable Riemannian metric for the BNN hyperparameters and show that it is positive definite. Our work is compared to both Monte Carlo dropout and a deterministic neural network, where our inference technique displays better calibrated uncertainties with comparable performance to current baselines. Our code is provided in a new open-source Python package, hamiltorch, which enables our method to scale to CNNs with over 400,000 parameters and take advantage of GPUs.
@inproceedings{cobb-2019-hamiltonian, title = {Semi-separable Hamiltonian Monte Carlo for inference in Bayesian neural networks}, author = {Cobb, Adam D and Baydin, Atılım Güneş and Kiskin, Ivan and Markham, Andrew and Roberts, Stephen}, booktitle = {Fourth workshop on Bayesian Deep Learning (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
We introduce two approaches for conducting efficient Bayesian inference in stochastic simulators containing nested stochastic sub-procedures, i.e., internal procedures such as rejection sampling loops for which the density cannot calculated directly. Such simulators are standard through the sciences and can be interpreted as probabilistic generative models. However, drawing inferences from them poses a substantial challenge due to the inability to evaluate even their unnormalised density. To address this, we introduce inference algorithms based on a two-step procedure where one first tackle the sub-procedures as amortised inference problems then uses the learned artefacts to construct an approximation of the original unnormalised density that can be used as a target for Markov chain Monte Carlo methods. Because the sub-procedures can be dealt with separately and are lower-dimensional than that of the overall problem, this two-step process allows them to be isolated and thus be tractably dealt with, without placing restrictions on the overall dimensionality of the problem. We demonstrate the utility of our methods on a simple, artificially constructed simulator.
@inproceedings{gramhansen-2019-efficient, author = {Gram-Hansen, Bradley and de Witt, Christian Schroeder and Zinkov, Robert and Naderiparizi, Saeid and Scibior, Adam and Munk, Andreas and Wood, Frank and Ghadiri, Mehrdad and Torr, Philip and Teh, Yee Whye and Baydin, Atılım Güneş and Rainforth, Tom}, booktitle = {Second Symposium on Advances in Approximate Bayesian Inference (AABI), Vancouver, Canada, 8 December 2019}, title = {Efficient Bayesian Inference for Nested Simulators}, year = {2019} }
This discussion paper presents a conversation between researchers having active interests in the usability of probabilistic programming languages (PPLs), but coming from a wide range of technical and research perspectives. Although PPL development is currently a vigorous and active research field, there has been very little attention to date to basic questions in the psychology of programming. Relevant issues include mental models associated with Bayesian probability, end-user applications of PPLs, the potential for data-first interaction styles, visualisation of model structure and solver behaviour, and many others. We look forward to further discussion with delegates at the PPIG workshop.
@inproceedings{blackwell-2019-usability, author = {Blackwell, Alan and Kohn, Tobias and Erwig, Martin and Baydin, Atılım Güneş and Church, Luke and Geddes, James and Gordon, Andy and Gorinova, Maria and Gram-Hansen, Bradley and Lawrence, Neil and Mansinghka, Vikash and Paige, Brooks and Petricek, Tomas and Robinson, Diana and Sarkar, Advait and Strickson, Oliver}, booktitle = {Psychology of Programming Interest Group Annual Workshop (PPIG 2019), Newcastle, UK, 28--30 August 2019}, title = {Usability of Probabilistic Programming Languages}, year = {2019} }
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of population-based epidemiology simulators as probabilistic generative models. This is achieved by hijacking the internal random number generator calls, through the use of an universal probabilistic programming system (PPS). In contrast to other methods, our approach can be easily retrofitted to simulators written in popular industrial programming frameworks. We demonstrate that our method can be used for interpretable introspection and inference, thus shedding light on black-box simulators. This reinstates much needed trust between policymakers and evidence-based methods.
@inproceedings{gramhansen-2019-hijacking, author = {{Gram-Hansen}, Bradley and Schroeder, Christian and Torr, Philip H.S. and Teh, Yee Whye and Rainforth, Tom and Baydin, Atılım Güneş}, booktitle = {ICML Workshop on AI for Social Good, Thirty-sixth International Conference on Machine Learning (ICML 2019), Long Beach, CA, US}, title = {Hijacking Malaria Simulators with Probabilistic Programming}, year = {2019} }
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of learning tasks in a way that primes the model for few-shot learning of new tasks. The MAML algorithm performs well on few-shot learning problems in classification, regression, and fine-tuning of policy gradients in reinforcement learning, but comes with the need for costly hyperparameter tuning for training stability. We address this shortcoming by introducing an extension to MAML, called Alpha MAML, to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates. Our results with the Omniglot database demonstrate a substantial reduction in the need to tune MAML training hyperparameters and improvement to training stability with less sensitivity to hyperparameter choice.
@inproceedings{behl-2019-alphamaml, author = {Behl, Harkirat and Baydin, Atılım Güneş and Torr, Philip H.S.}, booktitle = {6th ICML Workshop on Automated Machine Learning, Thirty-sixth International Conference on Machine Learning (ICML 2019), Long Beach, CA, US}, title = {Alpha MAML: Adaptive Model-Agnostic Meta-Learning}, year = {2019} }
Breakthroughs in our understanding of physical phenomena have traditionally followed improvements in instrumentation. Studies of the magnetic field of the Sun, and its influence on the solar dynamo and space weather events, have benefited from improvements in resolution and measurement frequency of new instruments. However, in order to fully understand the solar cycle, high-quality data across time-scales longer than the typical lifespan of a solar instrument are required. At the moment, discrepancies between measurement surveys prevent the combined use of all available data. In this work, we show that machine learning can help bridge the gap between measurement surveys by learning to super-resolve low-resolution magnetic field images and translate between characteristics of contemporary instruments in orbit. We also introduce the notion of physics-based metrics and losses for super-resolution to preserve underlying physics and constrain the solution space of possible super-resolution outputs.
@inproceedings{jungbluth-2019-super, title = {Single-Frame Super-Resolution of Solar Magnetograms: Investigating Physics-Based Metrics \& Losses}, author = {Jungbluth, Anna and Gitiaux, Xavier and Maloney, Shane and Shneider, Carl and Wright, Paul and Baydin, Atılım Güneş and Deudon, Michel and Kalaitzis, Alfredo and Gal, Yarin and Munoz-Jaramillo, Andres}, booktitle = {Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
Machine learning techniques have been successfully applied to super-resolution tasks on natural images where visually pleasing results are sufficient. However in many scientific domains this is not adequate and estimations of errors and uncertainties are crucial. To address this issue we propose a Bayesian framework that decomposes uncertainties into epistemic and aleatoric uncertainties. We test the validity of our approach by super-resolving images of the Sun’s magnetic field and by generating maps measuring the range of possible high resolution explanations compatible with a given low resolution magnetogram.
@inproceedings{gitiaux-2019-probabilistic, title = {Probabilistic Super-Resolution of Solar Magnetograms: Generating Many Explanations and Measuring Uncertainties}, author = {Gitiaux, Xavier and Maloney, Shane and Jungbluth, Anna and Shneider, Carl and Baydin, Atılım Güneş and Wright, Paul J. and Gal, Yarin and Deudon, Michel and Kalaitzis, Alfredo and {Munoz-Jaramillo}, Andres}, booktitle = {Fourth workshop on Bayesian Deep Learning (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
Understanding and monitoring the complex and dynamic processes of the Sun is important for a number of human activities on Earth and in space. For this reason, NASA’s Solar Dynamics Observatory (SDO) has been continuously monitoring the multi-layered Sun’s atmosphere in high-resolution since its launch in 2010, generating terabytes of observational data every day. The synergy between machine learning and this enormous amount of data has the potential, still largely unexploited, to advance our understanding of the Sun and extend the capabilities of heliophysics missions. In the present work, we show that deep learning applied to SDO data can be successfully used to create a high-fidelity “virtual telescope” that generates synthetic observations of the solar corona by image translation. Towards this end we developed a deep neural network, structured as an encoder-decoder with skip connections (U-Net), that reconstructs the Sun’s image of one instrument channel given temporally aligned images in three other channels. The approach we present has the potential to reduce the telemetry needs of SDO, enhance the capabilities of missions that have less observing channels, and transform the concept development of future missions.
@inproceedings{salvatelli-2019-virtual, title = {Using U-Nets to create high-fidelity virtual observations of the solar corona}, author = {Salvatelli, Valentina and Bose, Souvik and Neuberg, Brad and {Guedes dos Santos}, Luiz F. and Cheung, Mark and Janvier, Miho and Baydin, Atılım Güneş and Gal, Yarin and Jin, Meng}, booktitle = {Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
As a part of NASA’s Heliophysics System Observatory (HSO) fleet of satellites, the Solar Dynamics Observatory (SDO) has continuously monitored the Sun since 2010. Ultraviolet (UV) and Extreme UV (EUV) instruments in orbit, such as SDO’s Atmospheric Imaging Assembly (AIA) instrument, suffer time-dependent degradation which reduces instrument sensitivity. Accurate calibration for (E)UV instruments currently depends on periodic sounding rockets, which are infrequent and not practical for heliophysics missions in deep space. In the present work, we develop a Convolutional Neural Network (CNN) that auto-calibrates SDO/AIA channels and corrects sensitivity degradation by exploiting spatial patterns in multi-wavelength observations to arrive at a self-calibration of (E)UV imaging instruments. Our results remove a major impediment to developing future HSO missions of the same scientific caliber as SDO but in deep space, able to observe the Sun from more vantage points than just SDO’s current geosynchronous orbit. This approach can be adopted to perform autocalibration of other imaging systems exhibiting similar forms of degradation.
@inproceedings{neuberg-2019-autocalibration, title = {Auto-Calibration of Remote Sensing Solar Telescopes with Deep Learning}, author = {Neuberg, Brad and Bose, Souvik and Salvatelli, Valentina and {Guedes dos Santos}, Luiz F. and Cheung, Mark and Janvier, Miho and Baydin, Atılım Güneş and Gal, Yarin and Jin, Meng}, booktitle = {Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
High energy particles originating from solar activity travel along the the Earth’s magnetic field and interact with the atmosphere around the higher latitudes. These interactions often manifest as aurora in the form of visible light in the Earth’s ionosphere. These interactions also result in irregularities in the electron density, which cause disruptions in the amplitude and phase of the radio signals from the Global Navigation Satellite Systems (GNSS), known as “scintillation”. In this paper we use a multi-scale residual autoencoder (Res-AE) to show the correlation between specific dynamic structures of the aurora and the magnitude of the GNSS phase scintillations (σφ). Auroral images are encoded in a lower dimensional feature space using the Res-AE, which in turn are clustered with t-SNE and UMAP. Both methods produce similar clusters, and specific clusters demonstrate greater correlations with observed phase scintillations. Our results suggest that specific dynamic structures of auroras are highly correlated with GNSS phase scintillations.
@inproceedings{lamb-2019-gnss, title = {Correlation of Auroral Dynamics and GNSS Scintillation with an Autoencoder}, author = {Lamb, Kara and Malhotra, Garima and Vlontzos, Athanasios and Wagstaff, Edward and Baydin, Atılım Güneş and Bhiwandiwalla, Anahita and Gal, Yarin and Kalaitzis, Alfredo and Reina, Anthony and Bhatt, Asti}, booktitle = {Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
A Global Navigation Satellite System (GNSS) uses a constellation of satellites around the earth for accurate navigation, timing, and positioning. Natural phenomena like space weather introduce irregularities in the Earth’s ionosphere, disrupting the propagation of the radio signals that GNSS relies upon. Such disruptions affect both the amplitude and the phase of the propagated waves. No physics-based model currently exists to predict the time and location of these disruptions with sufficient accuracy and at relevant scales. In this paper, we focus on predicting the phase fluctuations of GNSS radio waves, known as phase scintillations. We propose a novel architecture and loss function to predict 1 hour in advance the magnitude of phase scintillations within a time window of ±5 minutes with state-of-the-art performance.
@inproceedings{lamb-2019-prediction, title = {Prediction of GNSS Phase Scintillations: A Machine Learning Approach}, author = {Lamb, Kara and Malhotra, Garima and Vlontzos, Athanasios and Wagstaff, Edward and Baydin, Atılım Güneş and Bhiwandiwalla, Anahita and Gal, Yarin and Kalaitzis, Alfredo and Reina, Anthony and Bhatt, Asti}, booktitle = {Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada}, year = {2019} }
Over the past decade, the study of extrasolar planets has evolved rapidly from plain detection and identification to comprehensive categorization and characterization of exoplanet systems and their atmospheres. Atmospheric retrieval, the inverse modeling technique used to determine an exoplanetary atmosphere’s temperature structure and composition from an observed spectrum, is both time-consuming and compute-intensive, requiring complex algorithms that compare thousands to millions of atmospheric models to the observational data to find the most probable values and associated uncertainties for each model parameter. For rocky, terrestrial planets, the retrieved atmospheric composition can give insight into the surface fluxes of gaseous species necessary to maintain the stability of that atmosphere, which may in turn provide insight into the geological and/or biological processes active on the planet. These atmospheres contain many molecules, some of them biosignatures, spectral fingerprints indicative of biological activity, which will become observable with the next generation of telescopes. Runtimes of traditional retrieval models scale with the number of model parameters, so as more molecular species are considered, runtimes can become prohibitively long. Recent advances in machine learning (ML) and computer vision offer new ways to reduce the time to perform a retrieval by orders of magnitude, given a sufficient data set to train with. Here we present an ML-based retrieval framework called Intelligent exoplaNet Atmospheric RetrievAl (INARA) that consists of a Bayesian deep learning model for retrieval and a data set of 3,000,000 synthetic rocky exoplanetary spectra generated using the NASA Planetary Spectrum Generator. Our work represents the first ML retrieval model for rocky, terrestrial exoplanets and the first synthetic data set of terrestrial spectra generated at this scale.
@inproceedings{soboczenski-2018-bayesian-exoplanet, title = {Bayesian Deep Learning for Exoplanet Atmospheric Retrieval}, author = {Soboczenski, Frank and Himes, Michael D. and O'Beirne, Molly D. and Zorzan, Simone and Baydin, Atılım Güneş and Cobb, Adam D. and Gal, Yarin and Angerhausen, Daniel and Mascaro, Massimo and Arney, Giada N. and Domagal-Goldman, Shawn D.}, booktitle = {Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal, Canada}, year = {2018} }
In this work we present a unified interface and methodology for performing end-to-end gradient-based refinement of pipelines of differentiable machine-learning primitives. This is distinguished from recent interoperability efforts such as the Open Neural Network Exchange (ONNX) format and other language-centric cross-compilation approaches in that the final pipeline does not need to be implemented nor trained in the same language nor cross-compiled into any single language; in other words, primitives may be written and pre-trained in PyTorch, TensorFlow, Caffe, scikit-learn or any of the other popular machine learning frameworks and fine-tuned end-to-end while being executed directly in their host frameworks. Provided primitives expose our proposed interface, it is possible to automatically compose all such primitives and refine them based on an end-to-end loss.
@inproceedings{milutinovic-2017-end-to-end, author = {Milutinovic, Mitar and Baydin, Atılım Güneş and Zinkov, Robert and Harvey, William and Song, Dawn and Wood, Frank and Shen, Wade}, booktitle = {Neural Information Processing Systems (NIPS) 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques, Long Beach, CA, US, December 9, 2017}, title = {End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks}, year = {2017} }
We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, low-level mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically non-differentiable, which poses challenges for traditional approaches to inference. We extend previous work in “inference compilation”, which combines universal probabilistic programming and deep learning methods, to large-scale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large code-base used in particle physics. Here we describe the technical innovations realized and planned for this library
@inproceedings{lezcano-2017-improvements-to-inference-compilation, author = {{Lezcano Casado}, Mario and Baydin, Atılım Güneş and {Martinez Rubio}, David and Le, Tuan Anh and Wood, Frank and Heinrich, Lukas and Louppe, Gilles and Cranmer, Kyle and Bhimji, Wahid and Ng, Karen and Prabhat}, booktitle = {Neural Information Processing Systems (NIPS) 2017 workshop on Deep Learning for Physical Sciences (DLPS), Long Beach, CA, US, December 8, 2017}, title = {Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators}, year = {2017} }
@inproceedings{le-2016-nested-compiled-inference, author = {Le, Tuan Anh and Baydin, Atılım Güneş and Wood, Frank}, booktitle = {Neural Information Processing Systems (NIPS) 2016 Workshop on Bayesian Deep Learning, Barcelona, Spain, December 10, 2016}, title = {Nested Compiled Inference for Hierarchical Reinforcement Learning}, year = {2016} }
Automatic differentiation—the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately—dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a variety of fields, including machine learning, have been little influenced by automatic differentiation, and make scant use of available tools. Here we review the technique of automatic differentiation, describe its two main modes, and explain how it can benefit machine learning practitioners. To reach the widest possible audience our treatment assumes only elementary differential calculus, and does not assume any knowledge of linear algebra.
@inproceedings{baydin-2014-ad-machinelearning, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A.}, booktitle = {AutoML Workshop, International Conference on Machine Learning (ICML), Beijing, China, June 21–26, 2014}, title = {Automatic differentiation of algorithms for machine learning}, year = {2014} }
Galactic cosmic rays (GCRs) and solar energetic particles (SEP) constitute the space radiation environment. Earth orbiting spacecraft in low earth orbit (LEO) are protected from these harsh environments by the Earth’s magnetosphere (except during the brief time they pass through the Van Allen Belts). However, deep space explorations such as manned lunar and Mars missions pose significant health risks due to the biological effects of these ionizing radiations in regions beyond low earth orbit (BLEO). While there exist many physics-based, empirical and machine learning (ML) models (e.g. the SEP scoreboard at NASA CCMC), accurate predictions of radiation levels with sufficient lead time remain a challenge. The “Forecasting Radiation Exposure for Human Space Flight” team of the 2024 FDL-X Heliolab AI research program, has developed ML models for predicting the radiation dose rates using various solar data, including full disk images, and the in situ measurements of absorbed radiation doses. In this presentation, we discuss the ML methods and the results obtained using these models.
@inproceedings{poduval-2024-simulation, title = {Machine Learning Models for Radiation Exposure Prediction Using Solar Data for Space Exploration Beyond Low Earth Orbit}, author = {Poduval, Bala and Massara, Elena and Gurav, Rutuja and Song, Xiaomei and Sinclair, Kimberly and Brown, Edward and Kusner, Matt and Baydin, {Atılım Güneş}}, booktitle = {American Geophysical Union (AGU) Annual Meeting, December 9--13, 2024}, year = {2024}, url = {https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1705954} }
The solar energetic particle (SEP) acceleration and transport in the heliosphere, and the role of suprathermal seed particle spectrum still remain open questions in heliophysics. These high energy particles is one of the major components of space radiation environment, the other being galactic cosmic rays, and therefore, accurate predictions of their occurrence, intensity and duration are critical in the mitigation of their adverse effects. However, SEP prediction based on first principle models still remains a challenge. While machine learning approach to SEP modeling and prediction seems promising, the lack of a balanced database of SEP events restrains this approach. Though this limitation could be overcome to a certain extent by simulated SEP events, generating data sets large enough for training and validation (tens to hundreds of thousands) using physics-based models such as Energetic Particle Radiation Environment Module (EPREM) is practically impossible because of the large computational overheads needed for physics-based models. In this scenario, we developed neural networks-based surrogate models, EPREM-S, that reproduce the output of EPREM with great accuracy while being hundreds of thousands times faster. These models, in addition to fast simulations of synthetic SEP events, make simulation-based inference workflows practicable in SEP studies while providing predictive uncertainty estimates using a deep ensemble approach. Event analysis of several synthetic events unseen during the training of EPREM-S, we found that all input parameters’ ground truth values were recovered as mode of the posterior. Encouraged by these results, we carried out similar analyses on several observed SEP events. Significant results of these analyses are presented here.
@inproceedings{poduval-2024-simulatioo, title = {Simulation Based Inference to StudySolar Energetic Particle Acceleration and Transport}, author = {Poduval, Bala and Baydin, {Atılım Güneş} and Schwadron, Nathan}, booktitle = {American Geophysical Union (AGU) Annual Meeting, December 9--13, 2024}, year = {2024}, url = {https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1721987} }
Accurately estimating spacecraft location is of crucial importance for a variety of safety-critical tasks in low-Earth orbit (LEO), including satellite collision avoidance and re-entry. The major source of uncertainty in LEO trajectory calculations is the variable drag force imposed by changes in thermospheric density in response to space weather. Current empirical and physics-based models, as well as many machine learning (ML) approaches, rely on daily solar irradiance and geophysical activity proxy indices as inputs, limiting their ability to capture the dynamic complexity of the system response to transitory solar flares and geomagnetic storms. NASA’s Solar Dynamics Observatory (SDO) has been continuously capturing data since 2010, providing high resolution extreme ultraviolet (EUV) and magnetic field images that have recently been pre-processed into a ML-ready dataset (SDOML). In this work, based on a previously developed ML thermospheric density model (Karman), we process the SDOML images via a sigma-variational autoencoder to include embeddings of 12 EUV and magnetic field channels at a nominal 6-minute cadence. The model uses these as base-level irradiance drivers instead of the proxy indices, greatly improving temporal resolution and enabling accurate nowcasting of the short-term density response to solar flares. We validate the model against CHAMP, GRACE, and GOCE thermospheric density measurements to show that it achieves mean absolute percentage error values comparable to or better than existing empirical models such as JB08 and MSIS.
@inproceedings{berger-2023-solar, title = {Incorporating Direct EUV Irradiance from Solar Images into Thermospheric Density Modelling with Machine Learning}, author = {Berger, Thomas E. and Malik, Shreshth and Walsh, James and Acciarini, Giacomo and Baydin, {Atılım Güneş}}, booktitle = {American Geophysical Union (AGU) Annual Meeting, December 11--15, 2023}, year = {2023}, url = {https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1403802} }
@inproceedings{mateogarcia-2023-onboardcloud, title = {Onboard cloud detection and atmospheric correction with deep learning emulators}, author = {{Mateo-Garcìa}, Gonzalo and Aybar, Cesar and Růžička, Vít and Acciarini, Giacomo and Baydin, Atılım Güneş and Meoni, Gabriele and Longépe, Nicolas and Parr, James and {Gómez-Chova}, Luis}, booktitle = {International Geoscience and Remote Sensing Symposium, July 16 -- 21, 2023, Pasadena, CA}, year = {2023}, url = {https://2023.ieeeigarss.org/} }
@inproceedings{mehta-2022-simulation, title = {Simulating Social Networks and Disinformation}, author = {Mehta, Swapneel and State, Bogdan and Bonneau, Richard and Nagler, Jonathan and Torr, Philip and Baydin, Atılım Güneş}, booktitle = {Misinformation Village co-hosted by {MisinfoCon}, {DEFCON} 30, August 12--13, Las Vegas, NV, USA}, year = {2022}, url = {https://defcon.misinfocon.com/} }
The search for life beyond Earth is complicated by the lack of a consensus on what life is – especially when considering potential forms of life not resembling anything known on Earth. Agnostic means of assessing samples for evidence of life are needed to address this challenge. Information encoded within the atoms and bonds of a molecule can be used to generate agnostic metrics of complexity. The distributions of complexity metrics for chemical mixtures involving biological processes have been hypothesized to be different from those produced by abiotic or prebiotic chemical reactions (Marshall et al. 2021). Complexity metrics, rooted in Shannon Entropy (Bertz 1981; Böttcher 2016) and Assembly Theory (Marshall et al., 2017, 2021), rely on knowledge of the precise structures of molecules and time-consuming human-expert-based analysis decoupled from real-time instrumental measurements onboard robotic missions. In addition, leveraging these metrics requires intensive – often intractable – computations that are infeasible for real-time, on-probe investigations. We propose light-weight, flexible neural network models, trainable from publicly available datasets that can be employed to predict molecular structures and their complexity metrics from mass spectra. We show that with careful selection of datasets, the ML-based approach can learn characteristics of experimental data and digital representation of molecules. This enables rapid, accurate prediction of molecular complexity from mass spectra. Such data pipelines may open new doors for critical robotic missions where autonomous decision-making is required, empowering rapid biosignature screening tasks and in situ fingerprinting of prebiotic molecular reaction networks.
@inproceedings{gong-2022-molecular, title = {Molecular Complexity to Biosignatures: A Machine Learning Pipeline that Connects Mass Spectrometry to Molecular Synthesis and Reaction Networks}, author = {Gong, Jian and Bell, Aaron C. and Gebhard, Timothy and Hastings, Jaden J.A. and Baydin, Atılım Güneş and Warren-Rhodes, Kimberly and Phillips, Michael and Fricke, Matthew and Cabrol, Nathalie A. and Sandford, Scott A. and Mascaro, Massimo}, booktitle = {American Geophysical Union (AGU) Fall Meeting, December 12--16, 2022}, year = {2022}, url = {https://agu.confex.com/agu/fm22/meetingapp.cgi/Paper/1186669} }
The ability to analyze and compare the structure of every known molecule, let alone molecules not yet encountered, and be able to predict all the possible synthesis pathways to be able to build ever more complex molecules at the atomic scale is a bottleneck spanning multiple disciplines. These span the fundamental and applied sciences – from organic synthesis of novel pharmaceuticals to detecting biosignatures on distant planets. Fundamental to this effort is the identification and standardization of key features of complexity and generating datasets optimized for machine learning methods. Connecting molecules and their complexity measures within vast chemical synthesis and reaction networks is similarly promising.The Molecular Complexity Consortium (MCC) – a working group of subject matter experts across academic, government, and commercial sectors – advances both applied and theoretical research in molecular complexity. We argue key shared objectives for unlocking the vast potential of ML-driven modeling of molecular complexity: the requisite standardization of features, generation of well-curated training datasets, and optimization of computation by ML method selection. Here we offer an overview of the field of molecular complexity, from methods of mathematical modeling to forming a notion of molecular signatures, and pose a call to action as we seek out new avenues for collaboration in this exciting emergent field.
@inproceedings{hastings-2022-molecular, title = {Modeling Molecular Complexity: Building a Novel Multidisciplinary Machine Learning Framework to Understand Molecular Synthesis and Signatures}, author = {Hastings, Jaden J.A. and Bell, Aaron C. and Gebhard, Timothy and Gong, Jian and Baydin, Atılım Güneş and Fricke, Matthew and Mascaro, Massimo and Phillips, Michael and Warren-Rhodes, Kimberly and Cabrol, Nathalie A.}, booktitle = {American Geophysical Union (AGU) Fall Meeting, December 12--16, 2022}, year = {2022}, url = {https://agu.confex.com/agu/fm22/meetingapp.cgi/Paper/1200601} }
The Solar Dynamics Observatory(SDO), a NASA mission that has been producing terabytes of observational data every day for more than ten years, has been used as a use-case to demonstrate the potential of particular methodologies and pave the way for future deep-space mission planning. In deep space, multispectral high-resolution missions like SDO would face two major challenges: 1- a low rate of telemetry 2- constrained hardware (i.e.limited number of observational channels). This project investigates the potential, and the limitations, of using a deep learning approach to reduce data transmission needs and data latency of a multi-wavelength satellite instrument. Namely, we use multi-channel data from the SDO’s Atmospheric Imaging Assembly(AIA) to show how self-supervised deep learning models can be used to synthetically produce, via image-to-image translation, images of the solar corona, and how this can be leveraged to reduce the downlink requirements of similar space missions. In this regards, we focus on encoder-decoder based architectures and we study how morphological traits and brightness of the solar surface affects the neural network predictions. We also investigate the limitations that these virtual observations might have and the impact on science. Finally we discuss how the method we propose can be used to create a data transmission schema that is both efficient and automated.
@inproceedings{salvatelli-2021-selfsupervised, title = {Self-supervised Deep Learning for Reducing Data Transmission Needs in Multi-Wavelength Space Instruments: a case study based on the Solar Dynamics Observatory}, author = {Salvatelli, Valentina and {Guedes dos Santos}, Luiz Fernando and Cheung, Mark and Bose, Souvik and Neuberg, Brad and Janvier, Miho and Jin, Meng and Gal, Yarin and Baydin, Atılım Güneş}, booktitle = {American Geophysical Union (AGU) Fall Meeting, December 13--17, 2021}, year = {2021}, url = {https://agu.confex.com/agu/fm21/meetingapp.cgi/Paper/984065} }
Over the past 50 years, a variety of instruments have obtained images of the Sun’s magnetic field (magnetograms) to study its origin and evolution. While improvements in instrumentation have led to breakthroughs in our understanding of physical phenomena, differences between subsequent instruments such as resolution, noise, and saturation levels all introduce inhomogeneities into long-term data sets. This has proven to be an insurmountable obstacle for research applications that require high-resolution and homogeneous data spanning time frames longer than the lifetime of a single instrument. Here we show that deep-learning-based super-resolution techniques can successfully up-sample and homogenize solar magnetic field images obtained both by space and ground-based instruments. In particular, we show the results of cross-calibrating and super-resolving MDI and GONG magnetograms to the characteristics of HMI. We also discuss the importance of agreeing on a standardized set of training, validation, and test data, as well as metrics that enable the community to benchmark different approaches to collectively and quantitatively identify the best practices. This includes distributing test data within the broad heliophysics community. Finally, we discuss our approach for making an empirical estimation of uncertainty and the importance that uncertainty estimation plays in the credibility and usefulness of deep learning applications in heliophysics.
@article{munozjaramillo-2021-crosscalibration, title = {Cross-calibration, super-resolution, and uncertainty estimation of the conversion of {MDI} and {GONG} to {HMI} full-disk magnetograms using deep learning}, author = {{Muñoz-Jaramillo}, Andrés and Jungbluth, Anna and Gitiaux, Xavier and Wright, Paul J. and Shneider, Carl and Maloney, Shane A. and Kalaitzis, Freddie and Baydin, {Atılım Güneş} and Gal, Yarin and Deudon, Michel}, journal = {Bulletin of the AAS}, number = {6}, volume = {53}, year = {2021}, month = jun, url = {https://baas.aas.org/pub/2021n6i123p03} }
Exoplanet atmospheres are characterized via retrieval, the inverse modeling method where atmospheric properties are determined based on the exoplanet’s observed spectrum. To determine the posterior probabilities of model parameters consistent with the data, a Bayesian framework proposes atmospheric models, calculates the theoretical spectra corresponding to the models via radiative transfer (RT), and compares the spectra with the observed spectrum. This typically requires thousands to millions of evaluated models, with each taking on the order of a second for RT. While recent machine-learning approaches to retrieval reduce the compute cost to minutes or less, they do so at the cost of reduced posterior accuracy. Here we present a novel machine-learning assisted retrieval approach which replaces the RT code with a neural network surrogate model to significantly reduce the compute cost of RT simulations, while retaining the Bayesian framework. Using emission data of HD 189733 b, we demonstrate close agreement between this method and that of the Bayesian Atmospheric Radiative Transfer (BART) code (mean Bhattacharyya coefficient of 0.9925 between 1D marginalized posteriors). This approach is 9x faster per parallel evaluation than BART when using an AMD EPYC 7402P central processing unit (CPU), and it is 90–180x faster per parallel evaluation when using an NVIDIA Titan Xp graphics processing unit than BART on that CPU.
@inproceedings{himes-2021-surrogate, title = {Neural Network Surrogate Models for Fast Bayesian Inference: Application to Exoplanet Atmospheric Retrieval}, author = {Himes, Michael D. and Harrington, Joseph and Cobb, Adam D. and Soboczenski, Frank and O'Beirne, Molly D. and Zorzan, Simone and Wright, David C. and Scheffer, Zacchaeus and Domagal-Goldman, Shawn D. and Arney, Giada N. and Baydin, Atılım Güneş}, booktitle = {Applications of Statistical Methods and Machine Learning in the Space Sciences, 17--21 May 2021}, year = {2021} }
Over the past 50 years, a variety of instruments have obtained images of the Sun’s magnetic field (magnetograms) to study its origin and evolution. While improvements in instrumentation have led to breakthroughs in our understanding of physical phenomena, differences between subsequent instruments such as resolution, noise, and saturation levels all introduce inhomogeneities into long-term data sets. This poses a significant issue for research applications that require high-resolution and homogeneous data spanning time frames longer than the lifetime of a single instrument. As super-resolution is an ill-posed problem, multiple super-resolution outputs can explain a low-resolution input. Classical methods, such as bicubic upsampling, use only the information contained in the low-resolution image. However, in recent years it has been shown that a learning-based approach can constrain the non-trivial solution space by exploiting regularities within a specific distribution of images. In this work, we cross-calibrate and super-resolve magnetic field data obtained by the Michelson Doppler Imager (MDI; 1024 x 1024 px) and the Helioseismic and Magnetic Imager (HMI; 4096 x 4096 px). These instruments overlap from 2010 to 2011, resulting in approximately 9000 co-temporal observations of the same physical structures. Our deep learning model is trained on a subset of the overlapping data after initial pre-processing to correct for temporal and orbital differences between the instruments. We evaluate the quality of the predictive output of the model with a series of performance metrics. These metrics include the distribution of the magnetic field and physical properties captured by the signed/unsigned field. Our approach also needs to quantify the certainty of predictions to be valuable to scientists. To address this, we estimate the posterior distribution of the super-resolved magnetic field by introducing Monte Carlo dropouts on each convolutional layer.
@inproceedings{wright-2020-super2, title = {Super-resolution of Solar Magnetograms}, author = {Wright, Paul James and Gitiaux, Xavier and Jungbluth, Anna and Maloney, Shane and Shneider, Carl and Kalaitzis, Alfredo and Baydin, Atılım Güneş and Deudon, Michel and Gal, Yarin and Munoz-Jaramillo, Andres}, booktitle = {American Geophysical Union (AGU) Fall Meeting, December 1--17, 2020}, year = {2020}, url = {https://agu.confex.com/agu/fm20/webprogram/Paper707966.html} }
Solar activity plays a major role in influencing the interplanetary medium and space-weather around us. Understanding the complex mechanisms that govern such a dynamic phenomenon is important and challenging. Remote-sensing instruments onboard heliophysics missions can provide a wealth of information on the Sun’s activity, especially via the measurement of magnetic fields and the emission of light from the multi-layered solar atmosphere. NASA currently operates the Heliophysics System Observatory (HSO) that consists of a fleet of satellites constantly monitoring the Sun, its extended atmosphere, and space environments around the Earth and other planets of the solar system. One of the flagship missions of the HSO is NASA’s Solar Dynamics Observatory (SDO). Launched in 2010, it consists of three instruments: the Atmospheric Imaging Assembly (AIA), the Helioseismic & Magnetic Imager (HMI), and the EUV Variability Experiment (EVE). The SDO has been generating terabytes of observational data every day and has constantly monitored theSun with the highest temporal and spatial resolution for full-disk observations. Unfortunately, the (E)UV instruments in orbit suffer time-dependent degradation, which reduces instrument sensitivity. Accurate calibration for EUV instruments currently depends on sounding rockets (e.g., for SDO/EVE and SDO/AIA) infrequent. Since SDO is in a geosynchronous orbit, sounding rockets can be used for calibration, but calibration experiments may not be practical for deep space missions (e.g., STEREO satellites). In the present work, we develop a neural network that auto-calibrates the SDO/AIA channels, correcting sensitivity degradation, by exploiting spatial patterns in multi-wavelength observations to arrive at a self-calibration (E)UV imaging instruments. This removes a major impediment to developing future HSO missions that can deliver solar observations from different vantagepoints beyond Earth-orbit.
@inproceedings{dossantos-2020-multi, title = {Multi-Channel Auto-Calibration for the Atmospheric Imaging Assembly instrument with Deep Learning}, author = {{Guedes dos Santos}, Luiz Fernando and Bose, Souvik and Salvatelli, Valentina and Neuberg, Brad and Cheung, Mark and Janvier, Miho and Jin, Meng and Gal, Yarin and Boerner, Paul and Baydin, Atılım Güneş}, booktitle = {American Geophysical Union (AGU) Fall Meeting, December 1--17, 2020}, year = {2020}, url = {https://agu2020fallmeeting-agu.ipostersessions.com/Default.aspx?s=58-34-12-15-E8-F1-7E-63-04-54-FB-78-A5-C9-FF-B4&pdfprint=true&guestview} }
@inproceedings{belavin-2020-blackbox, title = {Black-Box Optimization with Local Generative Surrogates}, author = {Belavin, Vladislav and Shirobokov, Sergey and Kagan, Michael Aaron and Ustyuzhanin, Andrey and Baydin, Atılım Güneş}, booktitle = {4th IML Machine Learning Workshop, 19--22 October 2020, Inter-experimental Machine Learning (IML) Working Group, CERN}, year = {2020}, url = {https://indico.cern.ch/event/852553/} }
As part of the NASA Frontier Development Lab, we implemented a parallelized cloud-based exploration strategy to better understand the statistical distributions and properties of potential planetary atmospheres. Starting with a modern-day Earth atmosphere, we iteratively and incrementally simulated a range of atmospheres to infer the landscape of the multi-parameter space, such as the abundances of biological mediated gases that would yield stable (non-runaway) planetary atmospheres on Earth-like planets around solar-type stars. Our current dataset comprises of 124,314 simulated models of earth-like exoplanet atmospheres and is available publicly on the NASA Exoplanet Archive. Our scalable approach of analysing atmospheres could also help interpret future observations of planetary atmospheres by providing estimates of atmospheric gas fluxes and temperatures as a function of altitude, and thereby enable high-throughput first-order assessment of the potential habitability of exoplanetary surfaces.
@inproceedings{chopra-2020-exoatmosgrid, title = {{EXO-ATMOS}: A scalable grid of hypothetical planetary atmospheres}, author = {Chopra, Aditya and Bell, Aaron and Fawcett, William and Talebi, Rodd and Angerhausen, Daniel and Baydin, Atılım Güneş and Berea, Anamaria and Cabrol, Nathalie A. and Kempes, Chris and Mascaro, Massimo}, booktitle = {Europlanet Science Congress 2020}, year = {2020}, volume = {14}, pages = {EPSC2020-664}, url = {https://meetingorganizer.copernicus.org/EPSC2020/EPSC2020-664.html} }
Over the past 50 years, a variety of instruments have obtained images of the Sun’s magnetic field (magnetograms) to study its origin and evolution. While improvements in instrumentation have led to breakthroughs in our understanding of physical phenomena, differences between subsequent instruments such as resolution, noise, and saturation levels all introduce inhomogeneities into long-term data sets. This poses a significant issue for research applications that require high-resolution and homogeneous data spanning time frames longer than the lifetime of a single instrument.As super-resolution is an ill-posed problem, multiple super-resolution outputs can explain a low-resolution input. Classical methods, such as bicubic upsampling, use only the information contained in the low-resolution image. However, in recent years it has been shown that a learning-based approach can constrain the non-trivial solution space by exploiting regularities within a specific distribution of images.In this work, we cross-calibrate and super-resolve magnetic field data obtained by the Michelson Doppler Imager (MDI); 1024 x 1024 px) and the Helioseismic and Magnetic Imager (HMI; 4096 x 4096 px). These instruments overlap from 2010 to 2011, resulting in approximately 9000 co-temporal observations of the same physical structures. Our deep learning model is trained on a subset of the overlapping data after initial pre-processing to correct for temporal and orbital differences between the instruments.We evaluate the quality of the predictive output of the model with a series of performance metrics. These metrics include the distribution of the magnetic field and physical properties captured by the signed/unsigned field. Our approach also needs to quantify the certainty of predictions to be valuable to scientists. To address this, we estimate the posterior distribution of the super-resolved magnetic field by introducing Monte Carlo dropouts on each convolutional layer.
@inproceedings{wright-2020-super1, title = {Super-resolution of {MDI} (and {GONG}) Magnetograms}, author = {Wright, Paul and Gitiaux, Xavier and Jungbluth, Anna and Maloney, Shane and Shneider, Carl and Kalaitzis, Alfredo and Deudon, Michel and Baydin, Atılım Güneş and Gal, Yarin and Munoz-Jaramillo, Andres}, booktitle = {50th Anniversary Meeting of the Solar Physics Division (SPD) of the American Astronomical Society (AAS)}, year = {2020}, url = {https://aas.org/meetings/spd51} }
Determining an exoplanet’s atmospheric properties from an observed spectrum (atmospheric retrieval) is a time-consuming and compute-intensive inverse modeling technique. They require complex algorithms that generate many atmospheric models and compare their simulated spectra to the observational data to find the most probable values and associated uncertainties for each model parameter. Retrieval may be the first method to find extraterrestrial life by remotely detecting biosignatures, atmospheric species indicative of biological activity. The work presented here is a result of the NASA Frontier Development Lab Astrobiology Team II. We present an ML-based retrieval framework called Intelligent exoplaNet Atmospheric RetrievAl (INARA) that consists of a Bayesian deep learning model for retrieval and a data set of 3,000,000 synthetic rocky exoplanetary spectra generated using approximately 2,000 high-end VMs and instances of the NASA Planetary Spectrum Generator (PSG). The generated dataset encompasses spectra based on a given planetary system model, where we consider F-, G-, K-, and M-type main sequence stars. Observations are simulated using an instrument model of the Large UltraViolet/Optical/InfraRed Surveyor (LUVOIR). Our work represents the first ML retrieval framework for rocky, terrestrial exoplanets and the first synthetic data set of terrestrial spectra generated at this scale.
@inproceedings{soboczenski-2020-inara, title = {{INARA}: A {Bayesian} Deep Learning Framework for Exoplanet Atmospheric Retrieval}, author = {Soboczenski, Frank and Himes, Michael D. and O’Beirne, Molly D. and Zorzan, Simone and Baydin, Atılım Güneş and Cobb, Adam D. and Gal, Yarin and Angerhausen, Daniel and Mascaro, Massimo and Villanueva, Geronimo and Domagal-Goldman, Shawn D. and Arney, Giada N.}, booktitle = {Second AI and Data Science Workshop for Earth and Space Sciences, Jet Propulsion Laboratory (NASA JPL), Pasadena, CA, United States, March 24--26, 2020}, year = {2020}, url = {https://datascience.jpl.nasa.gov/aiworkshop} }
@inproceedings{shirobokov-2020-differentiating, title = {Differentiating the Black-Box: Optimization with Local Generative Surrogates}, author = {Shirobokov, Sergey and Belavin, Vladislav and Kagan, Michael and Ustyuzhanin, Andrey and Baydin, Atılım Güneş}, booktitle = {Applied Machine Learning Days (AMLD) EPFL, Lausanne, Switzerland, January 25--29, 2020}, year = {2020} }
Machine learning approaches to atmospheric retrieval offer results comparable to traditional numerical approaches in just seconds, compared to hundreds of compute hours. This opens the possibility for fully-3D retrievals to execute in times comparable to traditional approaches. Recently, we developed plan-net, an ensemble of Bayesian neural networks for atmospheric retrieval; we trained plan-net on synthetic Wide Field Camera 3 (WFC3) hot-Jupiter transmission spectra, applied it to the WFC3 spectrum of WASP-12b, and found results consistent with the literature. Here, we present updates to plan-net and expand its application to our 28-parameter data set of simulated LUVOIR spectra of terrestrial exoplanets generated using the NASA Planetary Spectrum Generator. By including both dense dropout and convolutional layers, we find a significant improvement in accuracy. MH and FS acknowledge the support of NVIDIA Corporation for the donation of the Titan Xp GPUs used for this research. AC is sponsored by the AIMS-CDT and EPSRC. AGB is funded by Lawrence Berkeley National Lab and EPSRC/MURI grant EP/N019474/1.
@inproceedings{himes-2020-machine, title = {Machine Learning Retrieval of Jovian and Terrestrial Atmospheres}, author = {Himes, Michael D. and Cobb, Adam D. and Soboczenski, Frank and Zorzan, Simone and O’Beirne, Molly D. and Baydin, Atılım Güneş and Gal, Yarin and Angerhausen, Daniel and Domagal-Goldman, Shawn D. and Arney, Giada N.}, booktitle = {American Astronomical Society meeting \#235, id. 343.01. Bulletin of the American Astronomical Society, Vol. 52, No. 1}, year = {2020}, url = {https://ui.adsabs.harvard.edu/abs/2020AAS...23534301H/abstract} }
Solar activity has a major role in influencing space weather and the interplanetary medium. Understanding the complex mechanisms that govern such a dynamic phenomenon is important and challenging. Remote-sensing instruments on board of heliophysics missions can provide a wealth of information on the Sun’s activity, especially via the measurement of magnetic fields and the emission of light from the multi-layered Sun’s atmosphere. Ever since its launch in 2010, the observations by NASA’s Solar Dynamics Observatory (SDO) generates terabytes of observational data every day and has constantly monitored the Sun 24x7 with the highest time cadence and spatial resolution for full-disk observations. Using the enormous amount of data SDO provides, this project, developed at the NASA’s Frontier Development Lab (FDL 2019), focuses on algorithms that enhance our understanding of the Sun, as well as enhance the observation potential of present and future heliophysics missions with the aid of machine learning. In the present work, we use deep learning to increase the capabilities of NASA’s SDO and focus primarily on two aspects: (1) develop a neural network that auto-calibrates the SDO-AIA channels, which suffer from steady degradation over time; and (2) develop a “virtual telescope” that enlarges the missions possibilities by synthetically generating desired EUV channels derived from actual physical equipment flown on other mission. Towards this end, we use a deep neural network structured as an encoder-decoder to artificially generate images in different wavelengths from a limited number of observations. This approach can also improve other existing as well as the concept development of future missions that do not have as many observing instruments as SDO.
@inproceedings{cheung-2019-auto, title = {Auto-calibration and reconstruction of {SDO}’s Atmospheric Imaging Assembly channels with Deep Learning}, author = {Cheung, Mark and {Guedes dos Santos}, Luiz Fernando and Bose, Souvik and Neuberg, Brad and Salvatelli, Valentina and Baydin, Atılım Güneş and Janvier, Miho and Jin, Meng}, booktitle = {American Geophysical Union (AGU) Fall Meeting, San Francisco, CA, United States, December 9--13, 2019}, year = {2019}, url = {https://agu.confex.com/agu/fm19/meetingapp.cgi/Paper/628427} }
NASA’s Frontier Development Lab (FDL) is a research accelerator supported by NASA, the SETI Institute and industry partners. Each summer, FDL brings together teams of domain experts and machine learning scientists / engineers to work intensively for eight weeks to tackle some of the biggest challenges in space science, space exploration, and planetary protection. FDL solutions often require the training and deployment of deep neural networks, which are typically carried out on commercially available cloud compute infrastructure contributed by industry partners such as Google Cloud, Intel, IBM and NVIDIA. While FDL teams are co-located during the summer, collaborations persist for many more months, resulting in refereed journal, conference, and workshop publications and/or presentations. In this talk, the mentors of teams at NASA FDL and FDL Europe* will present case studies of how FDL teams use cloud storage and compute technologies for data preparation, rapid prototyping, and for scaling scientific and machine learning workflows to hundreds and thousands of machines . We also discuss how FDL teams use online tools (e.g., GitLab, Slack, Google Docs, Dropbox Papers) to facilitate effective remote collaboration. The domain areas covered in our case studies include astrobiology, exoplanet detection, space weather, lunar exploration and astronaut health monitoring.
@inproceedings{cheung-2019-cloud, title = {Cloud Computing at NASA's Frontier Development Lab}, author = {Cheung, Mark and Munoz-Jaramillo, Andrés and Wright, Paul and Bhatt, Asti and López-Francos, Ignacio and Baydin, Atılım Güneş and Bilinski, Piotr and Angerhausen, Daniel and Janvier, Miho}, booktitle = {Next Generation Cloud Research Infrastructure, Princeton, NJ, United States, November 11--12, 2019}, year = {2019}, url = {https://sites.google.com/view/workshop-on-cloud-cri} }
Atmospheric retrieval, the inverse modeling technique whereby atmospheric properties are inferred from observations, is computationally expensive and time consuming. Recently, machine learning (ML) approaches to atmospheric retrieval have been shown to provide results consistent with traditional approaches in just seconds to minutes. We introduce plan-net, the first ensemble of Bayesian neural networks for atmospheric retrieval. Our novel likelihood function captures parameter correlations, improving uncertainty estimations over standard likelihood functions common in ML. We replicate the results of Marquez-Neila et al. (2018), and we demonstrate plan-net’s improvement in accuracy over their random forest regression tree when applied to their synthetic data set of hot Jupiter WFC3 transmission spectra. We apply a trained plan-net ensemble to the transmission spectrum of WASP-12b and find results generally consistent with the literature. We also apply plan-net to our data set of over 3 million synthetic terrestrial exoplanet spectra generated using the NASA Planetary Spectrum Generator.
@inproceedings{himes-2019-exoplanetary, title = {Exoplanetary Atmospheric Retrieval via Bayesian Machine Learning}, author = {Himes, M. and Cobb, A. and Baydin, A. and Soboczenski, F. and Zorzan, S. and O'Beirne, M. and Arney, G.N. and Domagal-Goldman, S. and Angerhausen, D. and Gal, Y.}, booktitle = {American Astronomical Society Meeting on Extreme Solar Systems IV, Reykjavik, Iceland, August 19--23, 2019}, year = {2019}, url = {https://sites.northwestern.edu/iceland2019/} }
Traditional approaches for determining the atmospheres of exoplanets from telescopic spectral data (i.e., atmospheric retrievals) involve time-consuming and compute-intensive Bayesian sampling methods, requiring a compromise between physical and chemical realism and overall computational feasibility. For rocky, terrestrial exoplanets, the retrieved atmospheric composition can give insight into the surface fluxes of gaseous species necessary to maintain the stability of that atmosphere, which may in turn provide insight into the geological and/or biological processes active on the planet. Machine learning (ML) offers a feasible and reliable approach to expedite the process of atmospheric retrievals; however, ML models require a large data set to train on. Here we present a data set of 3,000,000 simulated atmospheric spectra of rocky, terrestrial exoplanets generated across a broad parameter space of stellar and planetary properties, including 12 molecular species relevant for determining extant life. We then introduce INARA (Intelligent exoplaNet Atmospheric RetrievAl), our ML-based atmospheric retrieval framework. In a matter of seconds, INARA is capable of retrieving accurate concentrations of 12 molecular atmospheric constituents when given an observed spectrum. Our work represents the first large-scale simulated spectral data set and first atmospheric retrieval ML model for rocky, terrestrial exoplanets.
@inproceedings{obeirne-2019-inara, title = {INARA: A Machine Learning Retrieval Framework with a Data Set of 3 Million Simulated Exoplanet Atmospheric Spectra}, author = {O’Beirne, Molly D. and Himes, Michael D. and Soboczenski, Frank and Zorzan, Simone and Cobb, Adam and Baydin, Atılım Güneş and Gal, Yarin and Angerhausen, Daniel and Mascaro, Massimo and Arney, Giada N. and Domagal-Goldman, Shawn D.}, booktitle = {Astrobiology Science Conference (AbSciCon 2019), Bellevue, Washington, June 24--28, 2019}, year = {2019}, url = {https://agu.confex.com/agu/abscicon19/meetingapp.cgi/Paper/481266} }
The NASA Frontier Development Laboratory (FDL) is an annual science accelerator that focuses on applying machine learning and large-scale computing to challenges in space science and exploration. During the 2018 FDL program, we implemented a cloud-based strategy to better understand the statistical distributions of habitable planets and life in the universe and lay out an avenue to characterize the potential role of biological regulation of planetary atmospheres. We simulated a range of atmospheres to infer the landscape of the multi-parameter space, such as the abundances of biological mediated gases that would yield stable (non-runaway) planetary atmospheres on Earth-like planets around solar-type stars. The dataset of planetary atmospheres we have generated can be used for training machine learning models to bootstrap the ATMOS code. It is an open-source dataset available for the community to understand distributions of habitability parameters such as surface temperatures and free energy available to life on different classes of atmosphere bearing planets. Our scalable tool, once coupled to a generalized ecosystem model, could help derive estimates of the biological mediated atmospheric gas fluxes and help constrain the type and the extent of exobiology on exoplanets based on the remotely detected atmospheric compositions.
@inproceedings{chopra-2019-exoatmos, title = {{EXO-ATMOS}: A Scalable Grid of Hypothetical Planetary Atmospheres}, author = {Chopra, Aditya and Bell, Aaron and Fawcett, William and Talebi, Rodd and Angerhausen, Daniel and Baydin, Atılım Güneş and Berea, Anamaria and Cabrol, Nathalie A. and Kempes, Chris and Mascaro, Massimo}, booktitle = {Astrobiology Science Conference (AbSciCon 2019), Bellevue, Washington, June 24--28, 2019}, year = {2019}, url = {https://agu.confex.com/agu/abscicon19/prelim.cgi/Paper/480996} }
@inproceedings{baydin-2015-mloss, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A.}, booktitle = {International Conference on Machine Learning (ICML) Workshop on Machine Learning Open Source Software 2015: Open Ecosystems, Lille, France, July 10, 2015}, title = {DiffSharp: Automatic Differentiation Library}, year = {2015} }
The search for life beyond Earth is complicated by the lack of a consensus as to what "life" actually is—especially if we also want to consider potential forms of life that do not not resemble anything that we have encountered thus far on Earth. Various so-called agnostic biosignatures have been proposed already, but one that has received particular attention lately is the concept of molecular complexity. The key hypothesis is that life creates not only more complex molecules, but also greater abundances of complex molecules, than purely abiotic processes. Known challenges with this approach are, for example, that the calculation of molecular complexity metrics can be computationally expensive (depending on the chosen definition of complexity), and that ultimately, we need to be able to actually measure molecular complexity in situ, for example on board of a spacecraft probing the surface of another planet. In this FDL 2022 challenge on Astrobiology, we seek to tackle these challenges through the use of machine learning. As a first step, we generate a large dataset of molecules with corresponding complexity scores, which we plan to make publicly available to the community. Using this dataset, we then illustrate on two tasks the potential benefits of machine learning: First, we show that we can learn models that predict the complexity of a molecule from a suitable representation (e.g., a SMILES string) with low relative errors (less than 5% on average) and at a significantly greater speed than existing baselines. Second, we demonstrate that machine learning models can infer the complexity of a molecule directly from its mass spectrum, with a significantly lower error than the existing proof-of-concept from the literature. This is a first step towards measuring molecular complexity in the field, and may help open new doors for critical robotic missions where autonomous decision-making is required. After all, even if we do not find life beyond Earth, being able to determine the molecular complexity of samples in situ can help inform decisions such as which areas to prioritize for exploration, or which data to send back to Earth for detailed analysis.
@techreport{bell-2022-molecules, title = {Signatures of Life: Learning Features of Prebiotic and Biotic Molecules}, author = {Bell, Aaron C. and Gebhard, Timothy D. and Gong, Jian and Hastings, Jaden J. A. and Baydin, {Atılım Güneş} and Fricke, G. Matthew and Phillips, Michael and {Warren-Rhodes}, Kimberley and Cabrol, Nathalie A. and Mascaro, Massimo and Sanford, Scott}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2021} }
@techreport{benson-2021-heliophysics, title = {Heliophysics -- Solar Drag: Learning how the Sun affects spacecraft orbits}, author = {Benson, Bernard and Bonasera, Stefano and Brown, Edward and {Pérez-Hernández}, Jorge A. and Jah, Moriba K. and Sutton, Eric and Acciarini, Giacomo and Bridges, Christopher P. and Baydin, Atılım Güneş}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2021} }
@techreport{mateogarcia-2019-flood, title = {Flood Detection On Low Cost Orbital Hardware}, author = {{Mateo-Garcia}, Gonzalo and Oprea, Silviu and Smith, Lewis and {Veitch-Michaelis}, Joshua and Baydin, Atılım Güneş and Backes, Dietmar}, institution = {{ESA} Frontier Development Lab Technical Memorandum}, year = {2019} }
@techreport{gitiaux-2019-super, title = {Super-resolution Maps of Solar Magnetic Field Covering 40 Years of Space Weather Events}, author = {Gitiaux, Xavier and Jungbluth, Anna and Maloney, Shane and Shneider, Carl and Baydin, Atılım Güneş and {Muñoz-Jaramillo}, Andrés and Wright, Paul}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2019} }
@techreport{lamb-2019-living, title = {Living With Our Star: Enhanced Predictability of GNSS Disturbances}, author = {Lamb, Kara and Malhotra, Garima and Vlontzos, Athanasios and Wagstaff, Edward and Bhatt, Asti and Baydin, Atılım Güneş and Bhiwandiwalla, Anahita and Gal, Yarin and Kalaitzis, Alfredo and Reina, Tony}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2019} }
@techreport{bose-2019-expanding, title = {Expanding the capabilities of {NASA}'s Solar Dynamics Observatory}, author = {Bose, Souvik and Neuberg, Brad and Salvatelli, Valentina and {Guedes dos Santos}, Luiz F. and Cheung, Mark and Janvier, Miho and Baydin, Atılım Güneş and Jin, Meng}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2019} }
@techreport{himes-2018-biohints, title = {From Biohints to Confirmed Evidence of Life: Possible Metabolisms Within Extraterrestrial Environmental Substrates}, author = {Himes, Michael D. and O’Beirne, Molly D. and Soboczenski, Frank and Zorzan, Simone and Baydin, Atılım Güneş and Cobb, Adam and Angerhausen, Daniel and Arney, Giada N. and Domagal-Goldman, Shawn D.}, institution = {{NASA} Frontier Development Lab Technical Memorandum}, year = {2018} }
Analogy plays a fundamental role in problem solving and it lies behind many processes central to human cognitive capacity, to the point that it has been considered "the core of cognition". Analogical reasoning functions through the process of transfer, the use of knowledge learned in one situation in another for which it was not targeted. The case-based reasoning (CBR) paradigm presents a highly related, but slightly different model of reasoning mainly used in artificial intelligence, different in part because analogical reasoning commonly focuses on cross-domain structural similarity whereas CBR is concerned with transfer of solutions between semantically similar cases within one specific domain. In this dissertation, we join these interrelated approaches from cognitive science, psychology, and artificial intelligence, in a CBR system where case retrieval and adaptation are accomplished by the Structure Mapping Engine (SME) and are supported by commonsense reasoning integrating information from several knowledge bases. For enabling this, we use a case representation structure that is based on semantic networks. This gives us a CBR model capable of recalling and adapting solutions from seemingly different, but structurally very similar domains, forming one of our contributions in this study. A traditional weakness of research on CBR systems has always been about adaptation, where most applications settle for a very simple "reuse" of the solution from the retrieved case, mostly through null adaptation or substitutional adaptation. The difficulty of adaptation is even more obvious for our case of cross-domain CBR using semantic networks. Solving this difficulty paves the way to another contribution of this dissertation, where we introduce a novel generative adaptation technique based on evolutionary computation that enables the spontaneous creation or modification of semantic networks according to the needs of CBR adaptation. For the evaluation of this work, we apply our CBR system to the problem of mediation, an important method in conflict resolution. The mediation problem is non-trivial and presents a very good real world example where we can spot structurally similar problems from domains seemingly as far as international relations, family disputes, and intellectual rights.
@phdthesis{baydin-2013-phd-thesis, author = {Baydin, Atılım Güneş}, title = {Evolutionary Adaptation in Case-Based Reasoning: An Application to Inter-Domain Analogies for Mediation}, school = {Universitat Autònoma de Barcelona}, year = {2013}, address = {Barcelona, Spain}, doi = {doi:10803/129294} }
This thesis provides a review of the dissipative particle dynamics (DPD) technique, a commonly used mesoscopic simulation tool in computational physics; and an investigation of the feasibility of using evolutionary optimization techniques for the determination of interactions in the DPD model from measurements in atomistic simulations. The text starts with a brief overview of the historical development of particle models to provide a foundation for the discussion of coarse-graining, i.e. the description of a system at a less detailed level by smoothing out fine details that are not relevant for a particular study. Detailed introductions of fundamental computational physics methods are presented, such as molecular dynamics and Monte Carlo simulations, together with their application areas. The DPD technique is introduced, with detailed information about its historical development, interpretation as a mesoscopic model, and application areas. The two parts of the DPD coarse-graining process, i.e. the determination of conservative and dissipative interactions, are discussed. Major existing techniques for DPD coarse-graining are presented, such as the inverse Monte Carlo (IMC) procedure specialized for the determination of conservative interactions from structural observables. The thesis continues with an investigation of the feasibility of using evolutionary computation, a generic optimization approach with its roots in the biological process of evolution, for the determination of interactions in the DPD model, based on fitness measures comparing equilibrium and transport properties of the system with those measured in atomistic simulations. Taking the simple point charge water model as a case study, the technique is first used for the determination of conservative interactions from the radial distribution function (with the aim of validating the approach by results from the IMC technique) and after that, for the determination of dissipative interactions based on escape time distributions. The practicality of having relatively long DPD simulations within fitness evaluations of such a procedure is confirmed, also establishing a general framework for applying evolutionary optimization techniques for the determination of functional forms in possibly other models within the field of computational physics.
@mastersthesis{baydin-2008-dissipative-particle-dynamics, author = {Baydin, Atılım Güneş}, title = {Dissipative Particle Dynamics and Coarse-Graining: Review of Existing Techniques, Trials with Evolutionary Computation}, school = {Department of Applied Physics, Chalmers University of Technology}, year = {2008}, address = {Göteborg, Sweden} }
We present the results of our analysis of publication venues for papers on automatic differentiation (AD), covering academic journals and conference proceedings. Our data are collected from the AD publications database maintained by the autodiff.org community website. The database is purpose-built for the AD field and is expanding via submissions by AD researchers. Therefore, it provides a relatively noise-free list of publications relating to the field. However, it does include noise in the form of variant spellings of journal and conference names. We handle this by manually correcting and merging these variants under the official names of corresponding venues. We also share the raw data we get after these corrections.
@article{baydin-2014-ad-venues, title = {An Analysis of Publication Venues for Automatic Differentiation Research}, author = {Baydin, Atılım Güneş and Pearlmutter, Barak A.}, journal = {arXiv preprint arXiv:1409.7316}, year = {2014} }
There are many studies dealing with the protection or restoration of wetlands and the sustainable economic growth of cities as separate subjects. This study investigates the conflict between the two in an area where city growth is threatening a protected wetland area. We develop a stochastic cellular automaton model for urban growth and apply it to the Vecht area surrounding the city of Hilversum in the Netherlands, using topographic maps covering the past 150 years. We investigate the dependence of the urban growth pattern on the values associated with the protected wetland and other types of landscape surrounding the city. The conflict between city growth and wetland protection is projected to occur before 2035, assuming full protection of the wetland. Our results also show that a milder protection policy, allowing some of the wetland to be sacrificed, could be beneficial for maintaining other valuable landscapes. This insight would be difficult to achieve by other analytical means. We conclude that even slight changes in usage priorities of landscapes can significantly affect the landscape distribution in near future. Our results also point to the importance of a protection policy to take the value of surrounding landscapes and the dynamic nature of urban areas into account.
@article{tendurus-2013-urbangrowth-cellularautomaton, title = {City versus wetland: Predicting urban growth in the Vecht area with a cellular automaton model}, author = {Tendürüs, Melek and Baydin, Atılım Güneş and Eleveld, Marieke A. and Gilbert, Alison J.}, journal = {arXiv preprint arXiv:1304.1609}, year = {2013} }