Search | arXiv e-print repository

Self-Supervised Spatial Correspondence Across Modalities

Authors: Ayush Shrivastava, Andrew Owens

Abstract: We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for bot… ▽ More We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for both cross-modal and intra-modal matching. The resulting model is simple and has no explicit photo-consistency assumptions. It can be trained entirely using unlabeled data, without the need for any spatially aligned multimodal image pairs. We evaluate our method on both geometric and semantic correspondence tasks. For geometric matching, we consider challenging tasks such as RGB-to-depth and RGB-to-thermal matching (and vice versa); for semantic matching, we evaluate on photo-sketch and cross-style image alignment. Our method achieves strong performance across all benchmarks. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: CVPR 2025. Project link: https://www.ayshrv.com/cmrw . Code: https://github.com/ayshrv/cmrw

arXiv:2503.11769 [pdf, other]

The Chicago Carnegie Hubble Program: Improving the Calibration of SNe Ia with JWST Measurements of the Tip of the Red Giant Branch

Authors: Taylor J. Hoyt, In Sung Jang, Wendy L. Freedman, Barry F. Madore, Kayla A. Owens, Abigail J. Lee

Abstract: We present distances to ten supernova (SN) host galaxies determined via the red giant branch tip (TRGB) using JWST/NIRCAM and the F115W, F356W, and F444W bandpasses. Our analysis, including photometric catalog cleaning, adoption of disk light profiles, TRGB color slope estimation, and a novel technique for identifying the infrared TRGB, was conducted blinded. The new F115W TRGB distances agree wel… ▽ More We present distances to ten supernova (SN) host galaxies determined via the red giant branch tip (TRGB) using JWST/NIRCAM and the F115W, F356W, and F444W bandpasses. Our analysis, including photometric catalog cleaning, adoption of disk light profiles, TRGB color slope estimation, and a novel technique for identifying the infrared TRGB, was conducted blinded. The new F115W TRGB distances agree well with our previously derived HST TRGB distances, differing by only 1 percent on average and 4 percent on a per-galaxy basis. The color-corrected F115W TRGB is therefore equally precise a method of distance measurement as, and offers unique advantages over, its color-insensitive, I-band counterpart. Using these distances, we update the absolute calibrations of eleven calibrator SNe, yielding 68.4 < H0 < 69.6 km/s/Mpc depending on which of four sets of SN magnitudes are used. We expand the sample of calibrator SNe to 24 by combining with HST TRGB distances. Doing so increases our H0 estimate based on the Carnegie Supernova Project II (CSP-II) by 0.8 km/s/Mpc (1.4 sigma) demonstrating that our JWST H0 based on 11 SNe is not significantly biased toward lower values. In contrast, the Pantheon+ calibration shifts higher by +2 km/s/Mpc (3.1 sigma), a significantly larger increase than seen in both the CSP and the Pantheon team's own SuperCal analysis. More JWST observations of the TRGB as well as independent analyses of low-redshift SNe are needed to continue unraveling the true nature of the Hubble Tension. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 56 pages; 25 figures; 8 tables

arXiv:2503.10329 [pdf, other]

A Comparison of Calcium Sources for Ion-Trap Loading via Laser Ablation

Authors: Daisy R H Smith, Silpa Muralidharan, Roland Hablutzel, Georgina Croft, Klara Theophilo, Alexander Owens, Yashna N D Lekhai, Scott J Thomas, Cameron Deans

Abstract: Trapped-ion technology is a leading approach for scalable quantum computing. A key element of ion trapping is reliable loading of atomic sources into the trap. While thermal atomic ovens have traditionally been used for this purpose, laser ablation has emerged as a viable alternative in recent years, offering the advantages of faster and more localized loading with lower heat dissipation. Calcium… ▽ More Trapped-ion technology is a leading approach for scalable quantum computing. A key element of ion trapping is reliable loading of atomic sources into the trap. While thermal atomic ovens have traditionally been used for this purpose, laser ablation has emerged as a viable alternative in recent years, offering the advantages of faster and more localized loading with lower heat dissipation. Calcium is a well-established ion for qubit applications. Here we examine a range of calcium sources for ablation and provide a comprehensive analysis of each. We consider factors such as ease of use, temperature and yield of the ablation plume, and the lifetime of ablation spots. For each target, we estimate the number of trappable atoms per ablation pulse for a typical surface and 3D ion trap. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: 10 pages, 7 figures

arXiv:2502.18705 [pdf, other]

Understanding Children's Avatar Making in Social Online Games

Authors: Yue Fu, Samuel Schwamm, Amanda Baughan, Nicole M Powell, Zoe Kronberg, Alicia Owens, Emily Renee Izenman, Dania Alsabeh, Elizabeth Hunt, Michael Rich, David Bickham, Jenny Radesky, Alexis Hiniker

Abstract: Social online games like Minecraft and Roblox have become increasingly integral to children's daily lives. Our study explores how children aged 8 to 13 create and customize avatars in these virtual environments. Through semi-structured interviews and gameplay observations with 48 participants, we investigate the motivations behind children's avatar-making. Our findings show that children's avatar… ▽ More Social online games like Minecraft and Roblox have become increasingly integral to children's daily lives. Our study explores how children aged 8 to 13 create and customize avatars in these virtual environments. Through semi-structured interviews and gameplay observations with 48 participants, we investigate the motivations behind children's avatar-making. Our findings show that children's avatar creation is motivated by self-representation, experimenting with alter ego identities, fulfilling social needs, and improving in-game performance. In addition, designed monetization strategies play a role in shaping children's avatars. We identify the ''wardrobe effect,'' where children create multiple avatars but typically use only one favorite consistently. We discuss the impact of cultural consumerism and how social games can support children's identity exploration while balancing self-expression and social conformity. This work contributes to understanding how avatar shapes children's identity growth in social online games. △ Less

Submitted 11 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2501.12390 [pdf, other]

GPS as a Control Signal for Image Generation

Authors: Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens

Abstract: We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appea… ▽ More We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure. △ Less

Submitted 22 January, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: Project page: https://cfeng16.github.io/gps-gen/

arXiv:2412.02700 [pdf, other]

Motion Prompting: Controlling Video Generation with Motion Trajectories

Authors: Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun

Abstract: Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion c… ▽ More Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing. Our results showcase emergent behaviors, such as realistic physics, suggesting the potential of motion prompts for probing video models and interacting with future generative world models. Finally, we evaluate quantitatively, conduct a human study, and demonstrate strong performance. Video results are available on our webpage: https://motion-prompting.github.io/ △ Less

Submitted 27 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

Comments: CVPR 2025 camera ready. Project page: https://motion-prompting.github.io/

arXiv:2411.17698 [pdf, other]

Video-Guided Foley Sound Generation with Multimodal Controls

Authors: Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon

Abstract: Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design. To address this problem, we introduce MultiFoley, a model designed for video-guided sound generation that supports multimodal conditioning through text, audio, and video. Given a silent video and a text prompt, MultiFoley all… ▽ More Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design. To address this problem, we introduce MultiFoley, a model designed for video-guided sound generation that supports multimodal conditioning through text, audio, and video. Given a silent video and a text prompt, MultiFoley allows users to create clean sounds (e.g., skateboard wheels spinning without wind noise) or more whimsical sounds (e.g., making a lion's roar sound like a cat's meow). MultiFoley also allows users to choose reference audio from sound effects (SFX) libraries or partial videos for conditioning. A key novelty of our model lies in its joint training on both internet video datasets with low-quality audio and professional SFX recordings, enabling high-quality, full-bandwidth (48kHz) audio generation. Through automated evaluations and human studies, we demonstrate that MultiFoley successfully generates synchronized high-quality sounds across varied conditional inputs and outperforms existing methods. Please see our project page for video results: https://ificl.github.io/MultiFoley/ △ Less

Submitted 17 March, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

Comments: Accepted at CVPR 2025. Project site: https://ificl.github.io/MultiFoley/

arXiv:2411.04125 [pdf, other]

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Authors: Jeongsoo Park, Andrew Owens

Abstract: One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and we propose a new dataset that is significantly larger and more diverse than prior work. As part of creating this dataset, we systematically download t… ▽ More One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and we propose a new dataset that is significantly larger and more diverse than prior work. As part of creating this dataset, we systematically download thousands of text-to-image latent diffusion models and sample images from them. We also collect images from dozens of popular open source and commercial models. The resulting dataset contains 2.7M images that have been sampled from 4803 different models. These images collectively capture a wide range of scene content, generator architectures, and image processing settings. Using this dataset, we study the generalization abilities of fake image detectors. Our experiments suggest that detection performance improves as the number of models in the training set increases, even when these models have similar architectures. We also find that detection performance improves as the diversity of the models increases, and that our trained detectors generalize better than those trained on other datasets. △ Less

Submitted 6 November, 2024; originally announced November 2024.

Comments: 15 pages

arXiv:2410.16575 [pdf, other]

ExoMol line lists -- LXV. Mid-Infrared rovibronic spectroscopy of isotopologues of NiH

Authors: Kirill Batrakov, Sergei N. Yurchenko, Alec Owens, Jonathan Tennyson, Alexander Mitrushchenkov, Amanda J. Ross, Patrick Crozet, Asen Pashov

Abstract: New line lists for four isotopologues of nickel monohydride, $^{58}$NiH, $^{60}$NiH, $^{62}$NiH, and $^{58}$NiD are presented covering the wavenumber range $<10000$ cm$^{-1}$ ($λ> 1$ $μ$m), $J$ up to 37.5 for transitions within and between the three lowest-lying electronic states, ${X}\,^{2}Δ$, ${W}\,^{2}Π$, and ${V}\,^{2}Σ^{+}$. The line lists are applicable for temperatures up to 5000 K. The lin… ▽ More New line lists for four isotopologues of nickel monohydride, $^{58}$NiH, $^{60}$NiH, $^{62}$NiH, and $^{58}$NiD are presented covering the wavenumber range $<10000$ cm$^{-1}$ ($λ> 1$ $μ$m), $J$ up to 37.5 for transitions within and between the three lowest-lying electronic states, ${X}\,^{2}Δ$, ${W}\,^{2}Π$, and ${V}\,^{2}Σ^{+}$. The line lists are applicable for temperatures up to 5000 K. The line lists calculations are based on a recent empirical NiH spectroscopic model [Havalyova et al. J. Quant. Spectrosc. Radiat. Transf., 272, 107800, (2021)] which is adapted for the variational nuclear-motion code Duo. The model consists of potential energy curves, spin-orbit coupling curves, electronic angular momentum curves, spin-rotation coupling curves, $Λ$-doubling correction curve for $^2Π$ states and Born-Oppenheimer breakdown (BOB) rotational correction curves. New ab initio dipole moment curves, scaled to match the experimental dipole moment of the ground state, are used to compute Einstein A coefficients. The BYOT line lists are included in the ExoMol database at www.exomol.com. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.11834 [pdf, other]

Contrastive Touch-to-Touch Pretraining

Authors: Samanta Rodriguez, Yiming Dou, William van den Bogert, Miquel Oller, Kevin So, Andrew Owens, Nima Fazeli

Abstract: Today's tactile sensors have a variety of different designs, making it challenging to develop general-purpose methods for processing touch signals. In this paper, we learn a unified representation that captures the shared information between different tactile sensors. Unlike current approaches that focus on reconstruction or task-specific supervision, we leverage contrastive learning to integrate… ▽ More Today's tactile sensors have a variety of different designs, making it challenging to develop general-purpose methods for processing touch signals. In this paper, we learn a unified representation that captures the shared information between different tactile sensors. Unlike current approaches that focus on reconstruction or task-specific supervision, we leverage contrastive learning to integrate tactile signals from two different sensors into a shared embedding space, using a dataset in which the same objects are probed with multiple sensors. We apply this approach to paired touch signals from GelSlim and Soft Bubble sensors. We show that our learned features provide strong pretraining for downstream pose estimation and classification tasks. We also show that our embedding enables models trained using one touch sensor to be deployed using another without additional training. Project details can be found at https://www.mmintlab.com/research/cttp/. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.04295 [pdf, other]

doi 10.1093/mnras/stae1849

ExoMol line lists -- LX. Molecular line list for the ammonia isotopologue $^{15}$NH$_3$

Authors: Sergei N. Yurchenko, Charles A. Bowesman, Ryan P. Brady, Elizabeth R. Guest, Kyriaki Kefala, Georgi B. Mitev, Alec Owens, Armando N. Perri, Marco Pezzella, Oleksiy Smola, Andrei Sokolov, Jingxin Zhang, Jonathan Tennyson

Abstract: A theoretical line list for $^{15}$NH$_3$ CoYuTe-15 is presented based on the empirical potential energy and ab initio dipole moments surfaces developed and used for the production of the ExoMol line list CoYuTe for $^{14}$NH$_3$. The ro-vibrational energy levels and wavefunctions are computed using the variational program TROVE. The line list ranges up to 10000 cm$^{-1}$ ($λ\geq 1$ $μ$m) and cont… ▽ More A theoretical line list for $^{15}$NH$_3$ CoYuTe-15 is presented based on the empirical potential energy and ab initio dipole moments surfaces developed and used for the production of the ExoMol line list CoYuTe for $^{14}$NH$_3$. The ro-vibrational energy levels and wavefunctions are computed using the variational program TROVE. The line list ranges up to 10000 cm$^{-1}$ ($λ\geq 1$ $μ$m) and contains 929 795 249 transitions between 1 269 961 states with $J\le 30$. The line list should be applicable for temperatures up to $\sim$1000 K. To improve the accuracy of the line positions, a set of experimentally-derived energy levels of $^{15}$NH$_3$ is produced using the MARVEL procedure. To this end, 37 experimental sources of the line positions of $^{15}$NH$_3$ available in the literature are collected, combined and systematised to produce a self-consistent spectroscopic network of 21095 $^{15}$NH$_3$ transitions covering 40 vibrational bands ranging up to 6818 cm$^{-1}$ and resulting in 2777 energy term values. These MARVEL energies are then used to replace the theoretical values in the CoYuTe-15 line list and also complemented by pseudo-MARVEL energies obtained by an isotopologue extrapolation using the previously reported MARVEL energies of the $^{14}$NH$_3$ parent isotopologue of ammonia. A list of 53856 high resolution transitions between MARVEL states and theoretical intensities is provided in the HITRAN format. Comparison with the recent experimental spectra of $^{15}$NH$_3$ illustrate the potential of the line list for detections and as an efficient assistant in spectroscopic assignments. The line list is available from www.exomol.com. △ Less

Submitted 9 October, 2024; v1 submitted 5 October, 2024; originally announced October 2024.

Journal ref: MNRAS, 533, 3442-3456 (2020)

arXiv:2409.16288 [pdf, other]

Self-Supervised Any-Point Tracking by Contrastive Random Walks

Authors: Ayush Shrivastava, Andrew Owens

Abstract: We present a simple, self-supervised approach to the Tracking Any Point (TAP) problem. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks, using the transformer's attention-based global matching to define the transition matrices for a random walk on a space-time graph. The ability to perform "all pairs" comparisons between points allow… ▽ More We present a simple, self-supervised approach to the Tracking Any Point (TAP) problem. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks, using the transformer's attention-based global matching to define the transition matrices for a random walk on a space-time graph. The ability to perform "all pairs" comparisons between points allows the model to obtain high spatial precision and to obtain a strong contrastive learning signal, while avoiding many of the complexities of recent approaches (such as coarse-to-fine matching). To do this, we propose a number of design decisions that allow global matching architectures to be trained through self-supervision using cycle consistency. For example, we identify that transformer-based methods are sensitive to shortcut solutions, and propose a data augmentation scheme to address them. Our method achieves strong performance on the TapVid benchmarks, outperforming previous self-supervised tracking methods, such as DIFT, and is competitive with several supervised methods. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: ECCV 2024. Project link: https://ayshrv.com/gmrw . Code: https://github.com/ayshrv/gmrw/

arXiv:2409.14592 [pdf, other]

Tactile Functasets: Neural Implicit Representations of Tactile Datasets

Authors: Sikai Li, Samanta Rodriguez, Yiming Dou, Andrew Owens, Nima Fazeli

Abstract: Modern incarnations of tactile sensors produce high-dimensional raw sensory feedback such as images, making it challenging to efficiently store, process, and generalize across sensors. To address these concerns, we introduce a novel implicit function representation for tactile sensor feedback. Rather than directly using raw tactile images, we propose neural implicit functions trained to reconstruc… ▽ More Modern incarnations of tactile sensors produce high-dimensional raw sensory feedback such as images, making it challenging to efficiently store, process, and generalize across sensors. To address these concerns, we introduce a novel implicit function representation for tactile sensor feedback. Rather than directly using raw tactile images, we propose neural implicit functions trained to reconstruct the tactile dataset, producing compact representations that capture the underlying structure of the sensory inputs. These representations offer several advantages over their raw counterparts: they are compact, enable probabilistically interpretable inference, and facilitate generalization across different sensors. We demonstrate the efficacy of this representation on the downstream task of in-hand object pose estimation, achieving improved performance over image-based methods while simplifying downstream models. We release code, demos and datasets at https://www.mmintlab.com/tactile-functasets. △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.14340 [pdf, other]

Self-Supervised Audio-Visual Soundscape Stylization

Authors: Tingle Li, Renhao Wang, Po-Yao Huang, Andrew Owens, Gopala Anumanchipalli

Abstract: Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a different scene, given an audio-visual conditional example recorded from that scene. Our model learns through self-supervision, taking advantage of the fact… ▽ More Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a different scene, given an audio-visual conditional example recorded from that scene. Our model learns through self-supervision, taking advantage of the fact that natural video contains recurring sound events and textures. We extract an audio clip from a video and apply speech enhancement. We then train a latent diffusion model to recover the original speech, using another audio-visual clip taken from elsewhere in the video as a conditional hint. Through this process, the model learns to transfer the conditional example's sound properties to the input speech. We show that our model can be successfully trained using unlabeled, in-the-wild videos, and that an additional visual signal can improve its sound prediction abilities. Please see our project webpage for video results: https://tinglok.netlify.app/files/avsoundscape/ △ Less

Submitted 22 September, 2024; originally announced September 2024.

Comments: ECCV 2024

arXiv:2409.08269 [pdf, other]

Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation

Authors: Samanta Rodriguez, Yiming Dou, Miquel Oller, Andrew Owens, Nima Fazeli

Abstract: Today's touch sensors come in many shapes and sizes. This has made it challenging to develop general-purpose touch processing methods since models are generally tied to one specific sensor design. We address this problem by performing cross-modal prediction between touch sensors: given the tactile signal from one sensor, we use a generative model to estimate how the same physical contact would be… ▽ More Today's touch sensors come in many shapes and sizes. This has made it challenging to develop general-purpose touch processing methods since models are generally tied to one specific sensor design. We address this problem by performing cross-modal prediction between touch sensors: given the tactile signal from one sensor, we use a generative model to estimate how the same physical contact would be perceived by another sensor. This allows us to apply sensor-specific methods to the generated signal. We implement this idea by training a diffusion model to translate between the popular GelSlim and Soft Bubble sensors. As a downstream task, we perform in-hand object pose estimation using GelSlim sensors while using an algorithm that operates only on Soft Bubble signals. The dataset, the code, and additional details can be found at https://www.mmintlab.com/research/touch2touch/. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2408.06153 [pdf, other]

Status Report on the Chicago-Carnegie Hubble Program (CCHP): Measurement of the Hubble Constant Using the Hubble and James Webb Space Telescopes

Authors: Wendy L. Freedman, Barry F. Madore, In Sung Jang, Taylor J. Hoyt, Abigail J. Lee, Kayla A. Owens

Abstract: We present the latest results from the Chicago-Carnegie Hubble Program (\cchp) to measure the Hubble constant, using data from the James Webb Space Telescope (JWST). The overall program aims to calibrate three independent methods: (1) Tip of the Red Giant Branch (TRGB) stars, (2) JAGB (J-Region Asymptotic Giant Branch) stars, and (3) Cepheids. To date, our program includes 10 nearby galaxies, host… ▽ More We present the latest results from the Chicago-Carnegie Hubble Program (\cchp) to measure the Hubble constant, using data from the James Webb Space Telescope (JWST). The overall program aims to calibrate three independent methods: (1) Tip of the Red Giant Branch (TRGB) stars, (2) JAGB (J-Region Asymptotic Giant Branch) stars, and (3) Cepheids. To date, our program includes 10 nearby galaxies, hosting 11 Type Ia supernovae (SNe Ia) suitable for measuring the Hubble constant ($H_0$). It also includes the galaxy NGC 4258, whose geometric distance provides the zero-point calibration. In this paper we discuss our results from the TRGB and JAGB methods. Our current best (highest precision) estimate is $H_0$ = 70.39 $\pm$ 1.22 (stat) $\pm$ 1.33 (sys) $\pm$ 0.70 ($σ_{SN}$), based on the TRGB method alone, with a total of 24 SN Ia calibrators from both HST and JWST data. Based on our new JWST data only, and tying into SNe Ia, we find values of $H_0$ = 68.81 $\pm$ 1.79 (stat) $\pm$ 1.32 (sys) for the TRGB, and $H_0$ = 67.80 $\pm$ 2.17 (stat) $\pm$ 1.64 (sys) km/s/Mpc for the JAGB method. The distances measured using the TRGB and the JAGB method agree, on average, at a level better than 1%, and with the SH0ES Cepheid distances at just over the 1% level. Our results are consistent with the current standard LambdaCDM model, without the need for the inclusion of additional new physics. Future JWST data will be required to increase the precision and accuracy of the local distance scale. △ Less

Submitted 17 March, 2025; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: 70 pages, 21 figures. Major updates from V1 include HST plus JWST calibration of the TRGB increasing the number of calibrators from 10 to 24, and improving the statistical precision in H0. Minor change in fonts from V2

arXiv:2408.03474 [pdf, other]

The Chicago-Carnegie Hubble Program: The JWST J-region Asymptotic Giant Branch (JAGB) Extragalactic Distance Scale

Authors: Abigail J. Lee, Wendy L. Freedman, Barry F. Madore, In Sung Jang, Kayla A. Owens, Taylor J. Hoyt

Abstract: The J-region asymptotic giant branch (JAGB) method is a new standard candle based on the constant luminosities of carbon-rich asymptotic giant branch stars in the J band. The JAGB method is independent of the Cepheid and TRGB distance indicators. Therefore, we can leverage it to both cross-check Cepheid and TRGB distances for systematic errors and use it to measure an independent local Hubble cons… ▽ More The J-region asymptotic giant branch (JAGB) method is a new standard candle based on the constant luminosities of carbon-rich asymptotic giant branch stars in the J band. The JAGB method is independent of the Cepheid and TRGB distance indicators. Therefore, we can leverage it to both cross-check Cepheid and TRGB distances for systematic errors and use it to measure an independent local Hubble constant. The JAGB method also boasts a number of advantages in measuring distances relative to the TRGB and Cepheids, several of which are especially amplified when combined with JWST's revolutionary resolving power. First, JAGB stars are 1 mag brighter in the NIR than the TRGB, and can be discovered from single-epoch NIR photometry unlike Cepheids which require congruent optical imaging in at least 12 epochs. Thus, JAGB stars can be used to measure significantly farther distances than both the TRGB stars and Cepheids using the same amount of observing time. Further advantages include: JAGB stars are easily identified solely via their colors and magnitudes, dust extinction is reduced in near-infrared observations, and JAGB stars are ubiquitous in all galaxies with intermediate-age populations. In this paper, we present a novel algorithm that identifies the optimal location in a galaxy for applying the JAGB method, so as to minimize effects from crowding. We then deploy this algorithm in JWST NIRCam imaging of seven SN Ia host galaxies to measure their JAGB distances, undertaking a completely blind analysis. The zero-point of this JAGB distance scale is set in the water mega-maser galaxy NGC 4258. In our CCHP overview paper Freedman et al. (2025), we apply the JAGB distances measured in this paper to the Carnegie Supernova Program (CSP) SNe Ia sample, measuring a Hubble constant of H0 = 67.80 +/- 2.17 (stat) +/- 1.64 (sys) km/s/Mpc. △ Less

Submitted 26 March, 2025; v1 submitted 6 August, 2024; originally announced August 2024.

Comments: 25 pages, 10 figures, 5 tables, accepted to ApJ

arXiv:2407.07309 [pdf, other]

doi 10.3847/1538-4357/ad7952

Coordinated JWST Imaging of Three Distance Indicators in a SN Host Galaxy and an Estimate of the TRGB Color Dependence

Authors: Taylor J. Hoyt, In Sung Jang, Wendy L. Freedman, Barry F. Madore, Abigail J. Lee, Kayla A. Owens

Abstract: Boasting a 6.5m mirror in space, JWST can increase by several times the number of supernovae (SNe) to which a redshift-independent distance has been measured with a precision distance indicator (e.g., TRGB or Cepheids); the limited number of such SN calibrators currently dominates the uncertainty budget in distance ladder Hubble constant (H0) experiments. JWST/NIRCAM imaging of the Virgo Cluster g… ▽ More Boasting a 6.5m mirror in space, JWST can increase by several times the number of supernovae (SNe) to which a redshift-independent distance has been measured with a precision distance indicator (e.g., TRGB or Cepheids); the limited number of such SN calibrators currently dominates the uncertainty budget in distance ladder Hubble constant (H0) experiments. JWST/NIRCAM imaging of the Virgo Cluster galaxy NGC4536 is used here to preview JWST program GO-1995, which aims to measure H0 using three stellar distance indicators (Cepheids, TRGB, JAGB/carbon stars). Each population of distance indicator was here successfully detected -- with sufficiently large number statistics, well-measured fluxes, and characteristic distributions consistent with ingoing expectations -- so as to confirm that we can acquire distances from each method precise to about 0.05mag (statistical uncertainty only). We leverage overlapping HST imaging to identify TRGB stars, cross-match them with the JWST photometry, and present a preliminary constraint on the slope of the TRGB's F115W-(F115W}-F444W) relation equal to -0.99 +/- 0.16 mag/mag. This slope is consistent with prior slope measurements in the similar 2MASS J-band, as well as with predictions from the BASTI isochrone suite. We use the new TRGB slope estimate to flatten the two-dimensional TRGB feature and measure a (blinded) TRGB distance relative to a set of fiducial TRGB colors, intended to represent the absolute fiducial calibrations expected from geometric anchors such as NGC4258 and the Magellanic Clouds. In doing so, we empirically demonstrate that the TRGB can be used as a standardizable candle at the IR wavelengths accessible with JWST. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Revised version after submission to AAS journals; 20 pages, 12 figures; Fig. 1 compressed to reduce file size

arXiv:2406.06347 [pdf, other]

The 2024 release of the ExoMol database: molecular line lists for exoplanet and other hot atmospheres

Authors: Jonathan Tennyson, Sergei N. Yurchenko, Jingxin Zhang, Charles A. Bowesman, Ryan P. Brady, Jeanna Buldyreva, Katy L. Chubb, Robert R. Gamache, Maire N. Gorman, Elizabeth R. Guest, Christian Hill, Kyriaki Kefala, A. E. Lynas-Gray, Thomas M. Mellor, Laura K. McKemmish, Georgi B. Mitev, Irina I. Mizus, Alec Owens, Zhijian Peng, Armando N. Perri, Marco Pezzella, Oleg L. Polyansky, Qianwei Qu, Mikhail Semenov, Oleksiy Smola , et al. (5 additional authors not shown)

Abstract: The ExoMol database (www.exomol.com) provides molecular data for spectroscopic studies of hot atmospheres. These data are widely used to model atmospheres of exoplanets, cool stars and other astronomical objects, as well as a variety of terrestrial applications. The 2024 data release reports the current status of the database which contains recommended line lists for 91 molecules and 224 isotopolo… ▽ More The ExoMol database (www.exomol.com) provides molecular data for spectroscopic studies of hot atmospheres. These data are widely used to model atmospheres of exoplanets, cool stars and other astronomical objects, as well as a variety of terrestrial applications. The 2024 data release reports the current status of the database which contains recommended line lists for 91 molecules and 224 isotopologues giving a total of almost 10$^{12}$ individual transitions. New features of the database include extensive "MARVELization" of line lists to allow them to be used for high resolutions studies, extension of several line lists to ultraviolet wavelengths, provision of photodissociation cross sections and extended provision of broadening parameters. Some of the in-house data specifications have been rewritten in JSON and moved to conformity with other international standards. Data products, including specific heats, a database of lifetimes for plasma studies, and the ExoMolHR web app which allows exclusively high resolution data to be extracted, are discussed. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Report number: JQSRT in press 2024

arXiv:2405.12221 [pdf, other]

Images that Sound: Composing Images and Sounds on a Single Canvas

Authors: Ziyang Chen, Daniel Geng, Andrew Owens

Abstract: Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these visual spectrograms images that sound. Our approach is s… ▽ More Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these visual spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/ △ Less

Submitted 4 February, 2025; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: Accepted to NeurIPS 2024. Project site: https://ificl.github.io/images-that-sound/

arXiv:2405.08815 [pdf, other]

Efficient Vision-Language Pre-training by Cluster Masking

Authors: Zihao Wei, Zixuan Pan, Andrew Owens

Abstract: We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself,… ▽ More We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself, since it forces a model to predict words for masked visual structures solely from context. It also speeds up training by reducing the amount of data used in each image. We evaluate the effectiveness of our model by pre-training on a number of benchmarks, finding that it outperforms other masking strategies, such as FLIP, on the quality of the learned representation. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: CVPR 2024, Project page: https://zxp46.github.io/cluster-masking/ , Code: https://github.com/Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking

arXiv:2405.04534 [pdf, other]

Tactile-Augmented Radiance Fields

Authors: Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens

Abstract: We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common… ▽ More We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features. We use these insights to register touch signals to a captured visual scene, and to train a conditional diffusion model that, provided with an RGB-D image rendered from a neural radiance field, generates its corresponding tactile signal. To evaluate our approach, we collect a dataset of TaRFs. This dataset contains more touch samples than previous real-world datasets, and it provides spatially aligned visual signals for each captured touch signal. We demonstrate the accuracy of our cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. Project page: https://dou-yiming.github.io/TaRF △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: CVPR 2024, Project page: https://dou-yiming.github.io/TaRF, Code: https://github.com/Dou-Yiming/TaRF/

arXiv:2404.11615 [pdf, other]

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Authors: Daniel Geng, Inbum Park, Andrew Owens

Abstract: Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposin… ▽ More Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem. △ Less

Submitted 10 January, 2025; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: ECCV 2024 camera ready version + more readable size

arXiv:2403.18821 [pdf, other]

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Authors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

Abstract: We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthes… ▽ More We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, resulting in significant improvements in the few-shot learning approach. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. Demos and datasets are available on our project page: https://facebookresearch.github.io/real-acoustic-fields/ △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. Project site: https://facebookresearch.github.io/real-acoustic-fields/

arXiv:2402.18794 [pdf, other]

Resolved Near-infrared Stellar Photometry from the Magellan Telescope for 13 Nearby Galaxies: JAGB Method Distances

Authors: Abigail J. Lee, Andrew J. Monson, Wendy L. Freedman, Barry F. Madore, Kayla A. Owens, Rachael L. Beaton, Coral Espinoza, Tongtian Ren, Yi Ren

Abstract: We present near-infrared JHK photometry for the resolved stellar populations in 13 nearby galaxies: NGC 6822, IC 1613, NGC 3109, Sextans B, Sextans A, NGC 300, NGC 55, NGC 7793, NGC 247, NGC 5253, Cen A, NGC 1313, and M83, acquired from the 6.5m Baade-Magellan telescope. We measure distances to each galaxy using the J-region asymptotic giant branch (JAGB) method, a new standard candle that leverag… ▽ More We present near-infrared JHK photometry for the resolved stellar populations in 13 nearby galaxies: NGC 6822, IC 1613, NGC 3109, Sextans B, Sextans A, NGC 300, NGC 55, NGC 7793, NGC 247, NGC 5253, Cen A, NGC 1313, and M83, acquired from the 6.5m Baade-Magellan telescope. We measure distances to each galaxy using the J-region asymptotic giant branch (JAGB) method, a new standard candle that leverages the constant luminosities of color-selected, carbon-rich AGB stars. While only single-epoch, random-phase photometry is necessary to derive JAGB distances, our photometry is time-averaged over multiple epochs, thereby decreasing the contribution of the JAGB stars' intrinsic variability to the measured dispersions in their observed luminosity functions. To cross-validate these distances, we also measure near-infrared tip of the red giant branch (TRGB) distances to these galaxies. The residuals obtained from subtracting the distance moduli from the two methods yield an RMS scatter of $σ_{JAGB - TRGB}= \pm 0.07$ mag. Therefore, all systematics in either the JAGB method and TRGB method (e.g., crowding, differential reddening, star formation histories) must be contained within these $\pm0.07$ mag bounds for this sample of galaxies because the JAGB and TRGB distance indicators are drawn from entirely distinct stellar populations, and are thus affected by these systematics independently. Finally, the composite JAGB star luminosity function formed from this diverse sample of galaxies is well-described by a Gaussian function with a modal value of $M_J = -6.20 \pm 0.003$ mag (stat), indicating the underlying JAGB star luminosity function of a well-sampled full star formation history is highly symmetric and Gaussian, based on over 6,700 JAGB stars in the composite sample. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 31 pages, 11 figures, 6 tables, accepted to ApJ. Photometry catalogs for 13 galaxies available at https://zenodo.org/records/10606945

arXiv:2401.18085 [pdf, other]

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

Authors: Daniel Geng, Andrew Owens

Abstract: Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout, position, pose, and shape of objects in images with diffusion models is still difficult. To this end, we propose motion guidance, a zero-shot technique that allows a… ▽ More Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout, position, pose, and shape of objects in images with diffusion models is still difficult. To this end, we propose motion guidance, a zero-shot technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move. Motion guidance works by steering the diffusion sampling process with the gradients through an off-the-shelf optical flow network. Specifically, we design a guidance loss that encourages the sample to have the desired motion, as estimated by a flow network, while also being visually similar to the source image. By simultaneously sampling from a diffusion model and guiding the sample to have low guidance loss, we can obtain a motion-edited image. We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.18084 [pdf, other]

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Authors: Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong

Abstract: The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and s… ▽ More The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors, all at the same time. UniTouch is capable of conducting various touch sensing tasks in the zero-shot setting, from robot grasping prediction to touch image question answering. To the best of our knowledge, UniTouch is the first to demonstrate such capabilities. Project page: https://cfeng16.github.io/UniTouch/ △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2312.02282 [pdf, other]

First JWST Observations of JAGB Stars in the SN Ia Host Galaxies: NGC 7250, NGC 4536, NGC 3972

Authors: Abigail J. Lee, Wendy L. Freedman, In Sung Jang, Barry F. Madore, Kayla A. Owens

Abstract: The J-region Asymptotic Giant Branch (JAGB) method is a standard candle that leverages the constant luminosities of color-selected, carbon-rich AGB stars, measured in the near infrared at 1.2 microns. The Chicago-Carnegie Hubble Program (CCHP) has obtained JWST imaging of the SN Ia host galaxies NGC 7250, NGC 4536, and NGC 3972. With these observations, the JAGB method can be studied for the first… ▽ More The J-region Asymptotic Giant Branch (JAGB) method is a standard candle that leverages the constant luminosities of color-selected, carbon-rich AGB stars, measured in the near infrared at 1.2 microns. The Chicago-Carnegie Hubble Program (CCHP) has obtained JWST imaging of the SN Ia host galaxies NGC 7250, NGC 4536, and NGC 3972. With these observations, the JAGB method can be studied for the first time using JWST. Lee et al. 2022 [arXiv:2205.11323] demonstrated the JAGB magnitude is optimally measured in the outer disks of galaxies, because in the inner regions the JAGB magnitude can vary significantly due to a confluence of reddening, blending, and crowding effects. However, determining where the 'outer disk' lies can be subjective. Therefore, we introduce a novel method for systematically selecting the outer disk. In a given galaxy, the JAGB magnitude is first separately measured in concentric regions, and the 'outer disk' is then defined as the first radial bin where the JAGB magnitude stabilizes to a few hundredths of a magnitude. After successfully employing this method in our JWST galaxy sample, we find the JAGB stars are well-segregated from other stellar populations in color-magnitude space, and have observed dispersions about their individual F115W modes of $σ_{N7250}=0.32$ mag, $σ_{N4536}=0.34$ mag, and $σ_{N3972}=0.35$ mag. These measured dispersions are similar to the scatter measured for the JAGB stars in the LMC using 2MASS data ($σ=0.33$ mag, Weinberg & Nikolaev 2001 [arXiv:astro-ph/0003204 ). In conclusion, the JAGB stars as observed with JWST clearly demonstrate their considerable power both as high-precision extragalactic distance indicators and as SN Ia supernova calibrators. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 13 pages, 7 figures, accepted to ApJ

arXiv:2311.17919 [pdf, other]

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Authors: Daniel Geng, Inbum Park, Andrew Owens

Abstract: We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise est… ▽ More We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations, of which permutations are a subset. This leads to the idea of a visual anagram--an image that changes appearance under some rearrangement of pixels. This includes rotations and flips, but also more exotic pixel permutations such as a jigsaw rearrangement. Our approach also naturally extends to illusions with more than two views. We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method. Please see our project webpage for additional visualizations and results: https://dangeng.github.io/visual_anagrams/ △ Less

Submitted 2 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: CVPR 2024 camera ready

arXiv:2311.17056 [pdf, other]

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow

Authors: Zhaoying Pan, Daniel Geng, Andrew Owens

Abstract: This paper presents a simple, self-supervised method for magnifying subtle motions in video: given an input video and a magnification factor, we manipulate the video such that its new optical flow is scaled by the desired amount. To train our model, we propose a loss function that estimates the optical flow of the generated video and penalizes how far if deviates from the given magnification facto… ▽ More This paper presents a simple, self-supervised method for magnifying subtle motions in video: given an input video and a magnification factor, we manipulate the video such that its new optical flow is scaled by the desired amount. To train our model, we propose a loss function that estimates the optical flow of the generated video and penalizes how far if deviates from the given magnification factor. Thus, training involves differentiating through a pretrained optical flow network. Since our model is self-supervised, we can further improve its performance through test-time adaptation, by finetuning it on the input video. It can also be easily extended to magnify the motions of only user-selected objects. Our approach avoids the need for synthetic magnification datasets that have been used to train prior learning-based approaches. Instead, it leverages the existing capabilities of off-the-shelf motion estimators. We demonstrate the effectiveness of our method through evaluations of both visual quality and quantitative metrics on a range of real-world and synthetic videos, and we show our method works for both supervised and unsupervised optical flow methods. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

arXiv:2310.15238 [pdf, other]

Hyperbolic Conduction: A Fast, Physical Conduction Model Implemented in Smoothed Particle Hydrodynamics

Authors: N. A. Owens, J. Wadsley

Abstract: We present the first implementation of hyperbolic thermal conduction in smoothed particle hydrodynamics (SPH). Hyperbolic conduction is a physically-motivated alternative to traditional, parabolic conduction. It incorporates a relaxation time, which ensures that heat propagates no faster than a physical signal speed. This allows for larger, Courant like, time steps for explicit schemes. Numerical… ▽ More We present the first implementation of hyperbolic thermal conduction in smoothed particle hydrodynamics (SPH). Hyperbolic conduction is a physically-motivated alternative to traditional, parabolic conduction. It incorporates a relaxation time, which ensures that heat propagates no faster than a physical signal speed. This allows for larger, Courant like, time steps for explicit schemes. Numerical solutions of the hyperbolic conduction equations require added dissipation to remain stable at discontinuities and we present a novel scheme for this. Test cases include a simple step, the Sod shock tube, the Sedov-Taylor blast, and a super bubble. We demonstrate how longer relaxation times limit conduction, recovering the purely hydrodynamical results, while short relaxation times converge on the parabolic conduction result. We demonstrate that our scheme is stable with explicit Courant-like time steps and can be orders of magnitude faster than explicit parabolic conduction, depending on the application. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2309.15117 [pdf, other]

Generating Visual Scenes from Touch

Authors: Fengyu Yang, Jiacheng Zhang, Andrew Owens

Abstract: An emerging line of work has sought to generate plausible imagery from touch. Existing approaches, however, tackle only narrow aspects of the visuo-tactile synthesis problem, and lag significantly behind the quality of cross-modal synthesis methods in other domains. We draw on recent advances in latent diffusion to create a model for synthesizing images from tactile signals (and vice versa) and ap… ▽ More An emerging line of work has sought to generate plausible imagery from touch. Existing approaches, however, tackle only narrow aspects of the visuo-tactile synthesis problem, and lag significantly behind the quality of cross-modal synthesis methods in other domains. We draw on recent advances in latent diffusion to create a model for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks. Using this model, we significantly outperform prior work on the tactile-driven stylization problem, i.e., manipulating an image to match a touch signal, and we are the first to successfully generate images from touch without additional sources of information about the scene. We also successfully use our model to address two novel synthesis problems: generating images that do not contain the touch sensor or the hand holding it, and estimating an image's shading from its reflectance and touch. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: ICCV 2023; Project site: https://fredfyyang.github.io/vision-from-touch/

arXiv:2308.03941 [pdf, other]

ExoMol line lists -- LI. Molecular line list for lithium hydroxide (LiOH)

Authors: Alec Owens, Sam O. M. Wright, Yakiv Pavlenko, Alexander Mitrushchenkov, Jacek Koput, Sergei N. Yurchenko, Jonathan Tennyson

Abstract: A new molecular line list for lithium hydroxide ($^{7}$Li$^{16}$O$^{1}$H) covering wavelengths $λ> 1 μ$m (the 0-10000 cm$^{-1}$ range) is presented. The OYT7 line list contains over 331 million transitions between rotation-vibration energy levels with total angular momentum up to $J=95$ and is applicable for temperatures up to $T\approx 3500$ K. Line list calculations are based on a previously pub… ▽ More A new molecular line list for lithium hydroxide ($^{7}$Li$^{16}$O$^{1}$H) covering wavelengths $λ> 1 μ$m (the 0-10000 cm$^{-1}$ range) is presented. The OYT7 line list contains over 331 million transitions between rotation-vibration energy levels with total angular momentum up to $J=95$ and is applicable for temperatures up to $T\approx 3500$ K. Line list calculations are based on a previously published, high-level \textit{ab initio} potential energy surface and a newly computed dipole moment surface of the ground $\tilde{X}\,^1Σ^+$ electronic state. Lithium-containing molecules are important in a variety of stellar objects and there is potential for LiOH to be observed in the atmospheres of exoplanets. This work provides the first, comprehensive line list of LiOH and will facilitate its future molecular detection. The OYT7 line list along with the associated temperature- and pressure-dependent opacities can be downloaded from the ExoMol database at www.exomol.com and the CDS astronomical database. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2305.06195 [pdf, other]

doi 10.3847/1538-3881/acd3f3

Quantifying Uncertainties on the Tip of the Red Giant Branch Method

Authors: Barry F. Madore, Wendy L. Freedman Kayla A. Owens, In Sung Jang

Abstract: We present an extensive grid of numerical simulations quantifying the uncertainties in measurements of the Tip of the Red Giant Branch (TRGB). These simulations incorporate a luminosity function composed of 2 magnitudes of red giant branch (RGB) stars leading up to the tip, with asymptotic giant branch (AGB) stars contributing exclusively to the luminosity function for at least a magnitude above t… ▽ More We present an extensive grid of numerical simulations quantifying the uncertainties in measurements of the Tip of the Red Giant Branch (TRGB). These simulations incorporate a luminosity function composed of 2 magnitudes of red giant branch (RGB) stars leading up to the tip, with asymptotic giant branch (AGB) stars contributing exclusively to the luminosity function for at least a magnitude above the RGB tip. We quantify the sensitivity of the TRGB detection and measurement to three important error sources: (1) the sample size of stars near the tip, (2) the photometric measurement uncertainties at the tip, and (3) the degree of self-crowding of the RGB population. The self-crowding creates a population of supra-TRGB stars due to the blending of one or more RGB stars just below the tip. This last population is ultimately difficult, though still possible, to disentangle from true AGB stars. In the analysis given here, the precepts and general methodology as used in the Chicago-Carnegie Hubble Program (CCHP) has been followed. However, in the Appendix, we introduce and test a set of new tip detection kernels which internally incorporate self-consistent smoothing. These are generalizations of the two-step model used by the CCHP (smoothing followed by Sobel-filter tip detection), where the new kernels are based on successive binomial-coefficient approximations to the Derivative-of-a-Gaussian (DoG) edge detector, as is commonly used in modern digital image processing. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Accepte to the Astronomical Journal

arXiv:2304.08490 [pdf, other]

Conditional Generation of Audio from Video via Foley Analogies

Authors: Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Abstract: The sound effects that designers add to videos are designed to convey a particular artistic effect and, thus, may be quite different from a scene's true sound. Inspired by the challenges of creating a soundtrack for a video that differs from its true sound, but that nonetheless matches the actions occurring on screen, we propose the problem of conditional Foley. We present the following contributi… ▽ More The sound effects that designers add to videos are designed to convey a particular artistic effect and, thus, may be quite different from a scene's true sound. Inspired by the challenges of creating a soundtrack for a video that differs from its true sound, but that nonetheless matches the actions occurring on screen, we propose the problem of conditional Foley. We present the following contributions to address this problem. First, we propose a pretext task for training our model to predict sound for an input video clip using a conditional audio-visual clip sampled from another time within the same source video. Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like". We show through human studies and automated evaluation metrics that our model successfully generates sound from video, while varying its output according to the content of a supplied example. Project site: https://xypb.github.io/CondFoleyGen/ △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2304.04869 [pdf, other]

doi 10.1088/1538-3873/acd1b5

The James Webb Space Telescope Mission

Authors: Jonathan P. Gardner, John C. Mather, Randy Abbott, James S. Abell, Mark Abernathy, Faith E. Abney, John G. Abraham, Roberto Abraham, Yasin M. Abul-Huda, Scott Acton, Cynthia K. Adams, Evan Adams, David S. Adler, Maarten Adriaensen, Jonathan Albert Aguilar, Mansoor Ahmed, Nasif S. Ahmed, Tanjira Ahmed, Rüdeger Albat, Loïc Albert, Stacey Alberts, David Aldridge, Mary Marsha Allen, Shaune S. Allen, Martin Altenburg , et al. (983 additional authors not shown)

Abstract: Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astrono… ▽ More Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figures

arXiv:2303.17490 [pdf, other]

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Authors: Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

Abstract: How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We design a model that works by scheduling the learning procedure of each model component to associate audio-visual modalities despite their information gaps. The k… ▽ More How does audio describe the world around us? In this paper, we propose a method for generating an image of a scene from sound. Our method addresses the challenges of dealing with the large gaps that often exist between sight and sound. We design a model that works by scheduling the learning procedure of each model component to associate audio-visual modalities despite their information gaps. The key idea is to enrich the audio features with visual information by learning to align audio to visual latent space. We translate the input audio to visual features, then use a pre-trained generator to produce an image. To further improve the quality of our generated images, we use sound source localization to select the audio-visual pairs that have strong cross-modal correlations. We obtain substantially better results on the VEGAS and VGGSound datasets than prior approaches. We also show that we can control our model's predictions by applying simple manipulations to the input waveform, or to the latent space. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: CVPR 2023

arXiv:2303.11989 [pdf, other]

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

Authors: Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

Abstract: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of… ▽ More We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of our approach is a tailored viewpoint selection such that the content of each image can be fused into a seamless, textured 3D mesh. More specifically, we propose a continuous alignment strategy that iteratively fuses scene frames with the existing geometry to create a seamless mesh. Unlike existing works that focus on generating single objects or zoom-out trajectories from text, our method generates complete 3D scenes with multiple objects and explicit 3D geometry. We evaluate our approach using qualitative and quantitative metrics, demonstrating it as the first method to generate room-scale 3D geometry with compelling textures from only text as input. △ Less

Submitted 10 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted to ICCV 2023 (Oral) video: https://youtu.be/fjRnFL91EZc project page: https://lukashoel.github.io/text-to-room/ code: https://github.com/lukasHoel/text2room

arXiv:2303.11329 [pdf, other]

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Authors: Ziyang Chen, Shengyi Qian, Andrew Owens

Abstract: The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of ima… ▽ More The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of images, while an audio model predicts the direction of sound sources from binaural sounds. We train these models to generate predictions that agree with one another. At test time, the models can be deployed independently. To obtain a feature representation that is well-suited to solving this challenging problem, we also propose a method for learning an audio-visual representation through cross-view binauralization: estimating binaural sound from one view, given images and sound from another. Our model can successfully estimate accurate rotations on both real and synthetic scenes, and localize sound sources with accuracy competitive with state-of-the-art self-supervised approaches. Project site: https://ificl.github.io/SLfM/ △ Less

Submitted 21 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: ICCV 2023. Project site: https://ificl.github.io/SLfM/

arXiv:2301.04647 [pdf, other]

EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata

Authors: Chenhao Zheng, Ayush Shrivastava, Andrew Owens

Abstract: We learn a visual representation that captures information about the camera that recorded a given photo. To do this, we train a multimodal embedding between image patches and the EXIF metadata that cameras automatically insert into image files. Our model represents this metadata by simply converting it to text and then processing it with a transformer. The features that we learn significantly outp… ▽ More We learn a visual representation that captures information about the camera that recorded a given photo. To do this, we train a multimodal embedding between image patches and the EXIF metadata that cameras automatically insert into image files. Our model represents this metadata by simply converting it to text and then processing it with a transformer. The features that we learn significantly outperform other self-supervised and supervised features on downstream image forensics and calibration tasks. In particular, we successfully localize spliced image regions "zero shot" by clustering the visual embeddings for all of the patches within an image. △ Less

Submitted 17 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: CVPR 2023 (Highlight). Project link: http://hellomuffin.github.io/exif-as-language

arXiv:2301.01767 [pdf, other]

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

Authors: Chao Feng, Ziyang Chen, Andrew Owens

Abstract: Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronizati… ▽ More Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. Project site: https://cfeng16.github.io/audio-visual-forensics △ Less

Submitted 27 March, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: CVPR 2023

arXiv:2211.15058 [pdf, other]

Mix and Localize: Localizing Sound Sources in Mixtures

Authors: Xixi Hu, Ziyang Chen, Andrew Owens

Abstract: We present a method for simultaneously localizing multiple sound sources within a visual scene. This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal. Our method jointly solves both tasks at once, using a formulation inspired by the contrastive random walk of Jabri et al. We create a graph in which images and separated sounds c… ▽ More We present a method for simultaneously localizing multiple sound sources within a visual scene. This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal. Our method jointly solves both tasks at once, using a formulation inspired by the contrastive random walk of Jabri et al. We create a graph in which images and separated sounds correspond to nodes, and train a random walker to transition between nodes from different modalities with high return probability. The transition probabilities for this walk are determined by an audio-visual similarity metric that is learned by our model. We show through experiments with musical instruments and human speech that our model can successfully localize multiple sounds, outperforming other self-supervised methods. Project site: https://hxixixh.github.io/mix-and-localize △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: CVPR 2022

arXiv:2211.12498 [pdf, other]

Touch and Go: Learning from Human-Collected Vision and Touch

Authors: Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens

Abstract: The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely b… ▽ More The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely been confined to lab settings or simulated environments, our dataset spans a large number of "in the wild" objects and scenes. To demonstrate our dataset's effectiveness, we successfully apply it to a variety of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven image stylization, i.e., making the visual appearance of an object more consistent with a given tactile signal, and 3) predicting future frames of a tactile signal from visuo-tactile inputs. △ Less

Submitted 29 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted by NeurIPS 2022 Track of Datasets and Benchmarks

arXiv:2211.07728 [pdf, other]

doi 10.3847/1538-3881/aca07a

Hazy with a chance of star spots: constraining the atmosphere of the young planet, K2-33b

Authors: Pa Chia Thao, Andrew W. Mann, Peter Gao, Dylan A. Owens, Andrew Vanderburg, Elisabeth R. Newton, Yao Tang, Matthew J. Fields, Trevor J. David, Jonathan M. Irwin, Tim-Oliver Husser, David Charbonneau, Sarah Ballard

Abstract: Although all-sky surveys have led to the discovery of dozens of young planets, little is known about their atmospheres. Here, we present multi-wavelength transit data for the super Neptune-sized exoplanet, K2-33b -- the youngest (~10 Myr) transiting exoplanet to-date. We combined photometric observations of K2-33 covering a total of 33 transits spanning >2 years, taken from K2, MEarth, Hubble, and… ▽ More Although all-sky surveys have led to the discovery of dozens of young planets, little is known about their atmospheres. Here, we present multi-wavelength transit data for the super Neptune-sized exoplanet, K2-33b -- the youngest (~10 Myr) transiting exoplanet to-date. We combined photometric observations of K2-33 covering a total of 33 transits spanning >2 years, taken from K2, MEarth, Hubble, and Spitzer. The transit photometry spanned from the optical to the near-infrared (0.6-4.5$μ$m), enabling us to construct a transmission spectrum of the planet. We find that the optical transit depths are nearly a factor of two deeper than those from the near-infrared. This difference holds across multiple datasets taken over years, ruling out issues of data analysis and unconstrained systematics. Surface inhomogeneities on the young star can reproduce some of the difference, but required spot coverage fractions (>60%) are ruled out by the observed stellar spectrum(<20%). We find a better fit to the transmission spectrum using photochemical hazes, which were predicted to be strong in young, moderate-temperature, and large-radius planets like K2-33b. A tholin haze with CO as the dominant gaseous carbon carrier in the atmosphere can reasonably reproduce the data with small or no stellar surface inhomogeneities, consistent with the stellar spectrum. The HST data quality is insufficient for the detection of any molecular features. More observations would be required to fully characterize the hazes and spot properties and confirm the presence of CO suggested by current data. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted to AJ. 26 pages, 14 figures, 6 tables

arXiv:2210.12480 [pdf, other]

doi 10.1093/mnras/stac2462

ExoMol line lists -- XLVII. Rovibronic molecular line list of the calcium monohydroxide radical (CaOH)

Authors: Alec Owens, Alexander Mitrushchenkov, Sergei N. Yurchenko, Jonathan Tennyson

Abstract: Any future detection of the calcium monohydroxide radical (CaOH) in stellar and exoplanetary atmospheres will rely on accurate molecular opacity data. Here, we present the first comprehensive molecular line list of CaOH covering the \A--\X\ rotation-vibration-electronic and \X--\X\ rotation-vibration bands. The newly computed OYT6 line list contains over 24.2 billion transitions between 3.2 millio… ▽ More Any future detection of the calcium monohydroxide radical (CaOH) in stellar and exoplanetary atmospheres will rely on accurate molecular opacity data. Here, we present the first comprehensive molecular line list of CaOH covering the \A--\X\ rotation-vibration-electronic and \X--\X\ rotation-vibration bands. The newly computed OYT6 line list contains over 24.2 billion transitions between 3.2 million energy levels with rotational excitation up to $J=175.5$. It is applicable to temperatures up to $T=3000$~K and covers the 0\,--\,35\,000~cm$^{-1}$ range (wavelengths $λ> 0.29$~$μ$m) for rotational, rotation-vibration and the \A--\X\ electronic transition. The strong band around 16\,000~cm$^{-1}$ ($λ= 0.63$~$μ$m) is likely to be of interest in future astronomical observations, particularly in hot rocky exoplanets where temperatures can become extremely high. The OYT6 line list has been generated using empirically-refined \X\ and \A\ state potential energy surfaces, high-level \textit{ab initio} transition dipole moment surfaces and a rigorous treatment of both Renner-Teller and spin-orbit coupling effects, which are necessary for correctly modelling the CaOH spectrum. Post-processing of the CaOH line list has been performed so as to tailor it to high-resolution applications, i.e.\ by replacing calculated energy levels with more accurate empirically-derived values (where available), hence improving the accuracy of the predicted line positions in certain regions. The OYT6 line list is available from the ExoMol database at http://www.exomol.com and the CDS astronomical database. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Journal ref: Mon. Not. R. astr. Soc., 516, 3995-4002 (2022)

arXiv:2210.12474 [pdf, other]

doi 10.1093/mnras/stac371

ExoMol line lists -- XLV. Rovibronic molecular line lists of calcium monohydride (CaH) and magnesium monohydride (MgH)

Authors: Alec Owens, Sophie Dooley, Luke McLaughlin, Brandon Tan, Guanming Zhang, Sergei N. Yurchenko, Jonathan Tennyson

Abstract: New molecular line lists for calcium monohydride ($^{40}$Ca$^{1}$H) and magnesium monohydride ($^{24}$Mg$^{1}$H) and its minor isotopologues ($^{25}$Mg$^{1}$H and $^{26}$Mg$^{1}$H) are presented. The rotation-vibration-electronic (rovibronic) line lists, named \texttt{XAB}, consider transitions involving the \X, \A, and \BBp\ electronic states in the 0--30\,000~cm$^{-1}$ region (wavelengths… ▽ More New molecular line lists for calcium monohydride ($^{40}$Ca$^{1}$H) and magnesium monohydride ($^{24}$Mg$^{1}$H) and its minor isotopologues ($^{25}$Mg$^{1}$H and $^{26}$Mg$^{1}$H) are presented. The rotation-vibration-electronic (rovibronic) line lists, named \texttt{XAB}, consider transitions involving the \X, \A, and \BBp\ electronic states in the 0--30\,000~cm$^{-1}$ region (wavelengths $λ> 0.33$~$μ$m) and are suitable for temperatures up to 5000 K. A comprehensive analysis of the published spectroscopic literature on CaH and MgH is used to obtain new extensive datasets of accurate rovibronic energy levels with measurement uncertainties and consistent quantum number labelling. These datasets are used to produce new spectroscopic models for CaH and MgH, composed of newly empirically-refined potential energy curves and couplings in/between the different electronic states (e.g.\ spin-orbit, electronic angular momentum, Born-Oppenheimer breakdown, spin-rotation, $Λ$-doubling) and previously published \textit{ab initio} transition dipole moment curves. Along with Einstein $A$ coefficients, state lifetimes and Landé $g$-factors are provided, the latter being particularly useful as CaH and MgH can be used to probe stellar magnetic fields. Computed energy levels have been replaced with the more accurate empirical values (if available) when post-processing the line lists, thus tailoring the line lists to high resolution applications. The \texttt{XAB} line lists are available from the ExoMol database at http://www.exomol.com and the CDS astronomical database. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Journal ref: Mon. Not. R. astr. Soc., 511, 5448-5461 (2022)

arXiv:2206.08383 [pdf, other]

doi 10.3847/1538-3881/ac7b28

Transit Hunt for Young and Maturing Exoplanets (THYME) VIII: a Pleiades-age association harboring two transiting planetary systems from Kepler

Authors: Madyson G. Barber, Andrew W. Mann, Jonathan L. Bush, Benjamin M. Tofflemire, Adam L. Kraus, Daniel M. Krolikowski, Andrew Vanderburg, Matthew J. Fields, Elisabeth R. Newton, Dylan A. Owens, Pa Chia Thao

Abstract: Young planets provide a window into the early stages and evolution of planetary systems. Ideal planets for such research are in coeval associations, where the parent population can precisely determine their ages. We describe a young association (MELANGE-3) in the Kepler field, which harbors two transiting planetary systems (Kepler-1928 and Kepler-970). We identify MELANGE-3 by searching for kinema… ▽ More Young planets provide a window into the early stages and evolution of planetary systems. Ideal planets for such research are in coeval associations, where the parent population can precisely determine their ages. We describe a young association (MELANGE-3) in the Kepler field, which harbors two transiting planetary systems (Kepler-1928 and Kepler-970). We identify MELANGE-3 by searching for kinematic and spatial overdensities around Kepler planet hosts with high levels of lithium. To determine the age and membership of MELANGE-3, we combine new high-resolution spectra with archival light curves, velocities, and astrometry of stars near Kepler-1928 spatially and kinematically. We use the resulting rotation sequence, lithium levels, and color-magnitude diagram of candidate members to confirm the presence of a coeval $105\pm$10 Myr population. MELANGE-3 may be part of the recently identified Theia 316 stream. For the two exoplanet systems, we revise the stellar and planetary parameters, taking into account the newly-determined age. Fitting the 4.5 yr Kepler light curves, we find that Kepler-1928 b is a $2.0\pm0.1R_\oplus$ planet on a 19.58-day orbit, while Kepler-970 b is a $2.8\pm0.2R_\oplus$ planet on a 16.73-day orbit. Kepler-1928 was previously flagged as an eclipsing binary, which we rule out using radial velocities from APOGEE and statistically validate the signal as planetary in origin. Given its overlap with the Kepler field, MELANGE-3 is valuable for studies of spot evolution on year timescales, and both planets contribute to the growing work on transiting planets in young stellar associations. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: accepted for publication in AJ

arXiv:2205.11323 [pdf, other]

doi 10.3847/1538-4357/ac7321

The Astrophysical Distance Scale: V. A 2% Distance to the Local Group Spiral M33 via the JAGB Method, Tip of the Red Giant Branch, and Leavitt Law

Authors: Abigail J. Lee, Laurie Rousseau-Nepton, Wendy L. Freedman, Barry F. Madore, Maria-Rosa L. Cioni, Taylor J. Hoyt, In Sung Jang, Atefeh Javadi, Kayla A. Owens

Abstract: The J-region asymptotic giant branch (JAGB) method is a new standard candle that is based on the stable intrinsic J-band magnitude of color-selected carbon stars, and has a precision comparable to other primary distance indicators such as Cepheids and the TRGB. We further test the accuracy of the JAGB method in the Local Group Galaxy M33. M33's moderate inclination, low metallicity, and nearby pro… ▽ More The J-region asymptotic giant branch (JAGB) method is a new standard candle that is based on the stable intrinsic J-band magnitude of color-selected carbon stars, and has a precision comparable to other primary distance indicators such as Cepheids and the TRGB. We further test the accuracy of the JAGB method in the Local Group Galaxy M33. M33's moderate inclination, low metallicity, and nearby proximity make it an ideal laboratory for tests of systematics in local distance indicators. Using high-precision optical BVI and near-infrared JHK photometry, we explore the application of three independent distance indicators: the JAGB method, the Cepheid Leavitt Law, and the TRGB. We find: $μ_0$ (TRGB I) = 24.72 +/- 0.02 (stat) +/- 0.07 (sys) mag, $μ_0$ (TRGB NIR) = 24.72 +/- 0.04 (stat) +/- 0.10 (sys) mag, $μ_0$ (JAGB) = 24.67 +/- 0.03 (stat) +/- 0.04 (sys) mag, $μ_0$ (Cepheid) = 24.71 +/- 0.04 (stat) +/- 0.01 (sys) mag. For the first time, we also directly compare a JAGB distance using ground-based and space-based photometry. We measure: $μ_0$ (JAGB F110W) = 24.71 +/- 0.06 (stat) +/- 0.05 (sys) mag using the (F814-F110W) color combination to effectively isolate the JAGB stars. In this paper, we measure a distance to M33 accurate to 2% and provide further evidence that the JAGB method is a powerful extragalactic distance indicator that can effectively probe a local measurement of the Hubble constant using spaced-based observations. We expect to measure the Hubble constant via the JAGB method in the near future, using observations from JWST. △ Less

Submitted 23 June, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 23 pages, 14 figures, accepted to the ApJ. v2 is exactly the same as v1 except for a fixed minor typo found while looking at the proofs

arXiv:2205.05072 [pdf, other]

Learning Visual Styles from Audio-Visual Associations

Authors: Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

Abstract: From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn t… ▽ More From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2204.12489 [pdf, other]

Sound Localization by Self-Supervised Time Delay Estimation

Authors: Ziyang Chen, David F. Fouhey, Andrew Owens

Abstract: Sounds reach one microphone in a stereo pair sooner than the other, resulting in an interaural time delay that conveys their directions. Estimating a sound's time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. We adapt the contrastive rando… ▽ More Sounds reach one microphone in a stereo pair sooner than the other, resulting in an interaural time delay that conveys their directions. Estimating a sound's time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings. We also propose a multimodal contrastive learning model that solves a visually-guided localization task: estimating the time delay for a particular person in a multi-speaker mixture, given a visual representation of their face. Project site: https://ificl.github.io/stereocrw/ △ Less

Submitted 28 January, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: ECCV 2022

Showing 1–50 of 130 results for author: Owens, A