Search | arXiv e-print repository

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Authors: Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling

Abstract: Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generat… ▽ More Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform. Project page: https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams △ Less

Submitted 18 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

Comments: Only the core contributors are listed. The full list of contributors can be found in Appendix A of this paper

arXiv:2504.08888 [pdf, other]

Measurement-induced phase transitions in quantum inference problems and quantum hidden Markov models

Authors: Sun Woo P. Kim, Curt von Keyserlingk, Austen Lamacraft

Abstract: Recently, there is interest in coincident 'sharpening' and 'learnability' transitions in monitored quantum systems. In the latter, an outside observer's ability to infer properties of a quantum system from measurements undergoes a phase transition. Such transitions appear to be related to the decodability transition in quantum error correction, but the precise connection is not clear. Here, we stu… ▽ More Recently, there is interest in coincident 'sharpening' and 'learnability' transitions in monitored quantum systems. In the latter, an outside observer's ability to infer properties of a quantum system from measurements undergoes a phase transition. Such transitions appear to be related to the decodability transition in quantum error correction, but the precise connection is not clear. Here, we study these problems under one framework we call the general quantum inference problem. In cases as above where the system has a Markov structure, we say that the inference is on a quantum hidden Markov model. We show a formal connection to classical hidden Markov models and that they coincide for certain setups. For example, we prove this for those involving Haar-random unitaries and measurements. We introduce the notion of Bayes non-optimality, where parameters used for inference differs from true ones. This allows us to expand the phase diagrams of above models. At Bayes optimality, we obtain an explicit relation between 'sharpening' and 'learnability' order parameters, explicitly showing that the two transitions coincide. Next, we study concrete examples. We review quantum error correction on the toric and repetition code and their mapping to 2D random-bond Ising model (RBIM) through our framework. We study the Haar-random U(1)-symmetric monitored quantum circuit and tree, mapping each to inference models that we call the planted SSEP and planted XOR, respectively, and expanding the phase diagram to Bayes non-optimality. For the circuit, we deduce the phase boundary numerically and analytically argue that it is of a single universality class. For the tree, we present an exact solution of the entire phase boundary, which displays re-entrance as does the 2D RBIM. We discuss these phase diagrams, with their interpretations for quantum inference problems and rigorous arguments on their shapes. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 24 pages of main text, 23 pages of appendix, 9 figures

arXiv:2504.02011 [pdf, other]

Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression

Authors: Dohyun Kim, Sehwan Park, Geonhee Han, Seung Wook Kim, Paul Hongsuck Seo

Abstract: Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application… ▽ More Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application to diffusion models remains underexplored, especially in enabling student models to generate concepts not covered by the training images. In this work, we propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation. By leveraging this technique, we show that the student can generate concepts unseen in the training images. When applied to conditional diffusion model distillation, our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency. This promotes resource-efficient deployment of generative diffusion models, broadening their accessibility for both research and real-world applications. Code, models, and datasets are available at https://dohyun-as.github.io/Random-Conditioning . △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: Accepted to CVPR 2025. 8 pages main paper + 4 pages references + 5 pages supplementary, 9 figures in total

arXiv:2503.08788 [pdf, other]

Circuits as a simple platform for the emergence of hydrodynamics in deterministic chaotic many-body systems

Authors: Sun Woo P. Kim, Friedrich Hübner, Juan P. Garrahan, Benjamin Doyon

Abstract: The emergence of hydrodynamics is one of the deepest phenomena in many-body systems. Arguably, the hydrodynamic equations are also the most important tools for predicting large-scale behaviour. Understanding how such equations emerge from microscopic deterministic dynamics is a century-old problem, despite recent progress in fine-tuned integrable systems. Due to the universality of hydrodynamics,… ▽ More The emergence of hydrodynamics is one of the deepest phenomena in many-body systems. Arguably, the hydrodynamic equations are also the most important tools for predicting large-scale behaviour. Understanding how such equations emerge from microscopic deterministic dynamics is a century-old problem, despite recent progress in fine-tuned integrable systems. Due to the universality of hydrodynamics, the specific microscopic implementation should not matter. Here, we show that classical deterministic circuits provide a minimal, exact, and efficient platform that admits non-trivial hydrodynamic behaviour for deterministic but chaotic systems. By developing new techniques and focusing on 1D circuits as a proof of concept, we obtain the characteristic dynamics, including relaxation to Gibbs states, exact Euler equations, shocks, diffusion, and exact KPZ super-diffusion. Our methods can be easily generalised to higher dimensions or quantum circuits. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2501.17683 [pdf, other]

Temperature-Free Loss Function for Contrastive Learning

Authors: Bum Jun Kim, Sang Woo Kim

Abstract: As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a t… ▽ More As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning. △ Less

Submitted 29 January, 2025; originally announced January 2025.

Comments: 10 pages, 5 figures

arXiv:2501.15690 [pdf, ps, other]

doi 10.1088/1748-9326/ade4dd

Refined climatologies of future precipitation over High Mountain Asia using probabilistic ensemble learning

Authors: Kenza Tazi, Sun Woo P. Kim, Marc Girona-Mata, Richard E. Turner

Abstract: High Mountain Asia (HMA) holds the highest concentration of frozen water outside the polar regions, serving as a crucial water source for more than 1.9 billion people. Precipitation represents the largest source of uncertainty for future hydrological modelling in this area. In this study, we propose a probabilistic machine learning framework to combine monthly precipitation from 13 regional climat… ▽ More High Mountain Asia (HMA) holds the highest concentration of frozen water outside the polar regions, serving as a crucial water source for more than 1.9 billion people. Precipitation represents the largest source of uncertainty for future hydrological modelling in this area. In this study, we propose a probabilistic machine learning framework to combine monthly precipitation from 13 regional climate models developed under the Coordinated Regional Downscaling Experiment (CORDEX) over HMA via a mixture of experts (MoE). This approach accounts for seasonal and spatial biases within the models, enabling the prediction of more faithful precipitation distributions. The MoE is trained and validated against gridded historical precipitation data, yielding 32% improvement over an equally-weighted average and 254% improvement over choosing any single ensemble member. This approach is then used to generate precipitation projections for the near future (2036-2065) and far future (2066-2095) under RCP4.5 and RCP8.5 scenarios. Compared to previous estimates, the MoE projects wetter summers but drier winters over the western Himalayas and Karakoram and wetter winters over the Tibetan Plateau, Hengduan Shan, and South East Tibet. △ Less

Submitted 30 June, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

Comments: 16 pages 8 figures (main text), 32 pages 14 figures (total)

arXiv:2501.03575 [pdf, other]

Cosmos World Foundation Model Platform for Physical AI

Authors: NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman , et al. (54 additional authors not shown)

Abstract: Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into cu… ▽ More Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make Cosmos open-source and our models open-weight with permissive licenses available via https://github.com/nvidia-cosmos/cosmos-predict1. △ Less

Submitted 18 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

arXiv:2409.16630 [pdf, other]

Stochastic Subsampling With Average Pooling

Authors: Bum Jun Kim, Sang Woo Kim

Abstract: Regularization of deep neural networks has been an important issue to achieve higher generalization performance without overfitting problems. Although the popular method of Dropout provides a regularization effect, it causes inconsistent properties in the output, which may degrade the performance of deep neural networks. In this study, we propose a new module called stochastic average pooling, whi… ▽ More Regularization of deep neural networks has been an important issue to achieve higher generalization performance without overfitting problems. Although the popular method of Dropout provides a regularization effect, it causes inconsistent properties in the output, which may degrade the performance of deep neural networks. In this study, we propose a new module called stochastic average pooling, which incorporates Dropout-like stochasticity in pooling. We describe the properties of stochastic subsampling and average pooling and leverage them to design a module without any inconsistency problem. The stochastic average pooling achieves a regularization effect without any potential performance degradation due to the inconsistency issue and can easily be plugged into existing architectures of deep neural networks. Experiments demonstrate that replacing existing average pooling with stochastic average pooling yields consistent improvements across a variety of tasks, datasets, and models. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: 17 pages, 8 figures

arXiv:2407.09153 [pdf]

doi 10.1038/s41467-024-49841-6

Topological Fermi-arc surface state covered by floating electrons on a two-dimensional electride

Authors: Chan-young Lim, Min-Seok Kim, Dong Cheol Lim, Sunghun Kim, Yeonghoon Lee, Jaehoon Cha, Gyubin Lee, Sang Yong Song, Dinesh Thapa, Jonathan D. Denlinger, Seong-Gon Kim, Sung Wng Kim, Jungpil Seo, Yeongkwan Kim

Abstract: Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromag… ▽ More Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromagnetic electride $[Gd_{2}$C]^{2+}\cdot2e^{-}$. In particular, the presence of Weyl cones and Fermi-arc states is demonstrated through photon energy-dependent ARPES measurements, agreeing with theoretical band structure calculations. Notably, the STM measurements reveal that the Fermi-arc states exist underneath a floating quantum electron liquid on the top Gd layer, forming double-stacked surface states in a heterostructure. Our work thus not only unveils the non-trivial topology of the $[Gd_{2}$C]^{2+}\cdot2e^{-}$ electride but also realizes a surface heterostructure that can host phenomena distinct from the bulk. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 22 pages, 6 figures

Journal ref: Nat. Commun. 15 (2024) 5615

arXiv:2406.19287 [pdf, other]

Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul… ▽ More We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures, accepted for publication in PRL

arXiv:2406.19286 [pdf, other]

Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc… ▽ More We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures, accepted for publication in PRD

arXiv:2406.12095 [pdf, other]

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in outdoor autonomous driving scenes. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs with limited view overlap, and is trained self-supervised with dif… ▽ More We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in outdoor autonomous driving scenes. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs with limited view overlap, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images. Our first insight is to exploit per-scene optimized Neural Radiance Fields (NeRFs) by generating dense depth and virtual camera targets from them, which helps our model to learn enhanced 3D geometry from sparse non-overlapping image inputs. Second, to learn a semantically rich 3D representation, we propose distilling features from pre-trained 2D foundation models, such as CLIP or DINOv2, thereby enabling various downstream tasks without the need for costly 3D human annotations. To leverage these two insights, we introduce a novel model architecture with a two-stage lift-splat-shoot encoder and a parameterized sparse hierarchical voxel representation. Experimental results on the NuScenes and Waymo NOTR datasets demonstrate that DistillNeRF significantly outperforms existing comparable state-of-the-art self-supervised methods for scene reconstruction, novel view synthesis, and depth estimation; and it allows for competitive zero-shot 3D semantic occupancy prediction, as well as open-world scene understanding through distilled foundation model features. Demos and code will be available at https://distillnerf.github.io/. △ Less

Submitted 30 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by Advances in Neural Information Processing Systems (NeurIPS 2024)

arXiv:2406.10324 [pdf, other]

L4GM: Large 4D Gaussian Reconstruction Model

Authors: Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

Abstract: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in… ▽ More We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Project page: https://research.nvidia.com/labs/toronto-ai/l4gm

arXiv:2406.08612 [pdf, other]

Observation of Declination Dependence in the Cosmic Ray Energy Spectrum

Authors: The Telescope Array Collaboration, R. U. Abbasi, T. Abu-Zayyad, M. Allen, J. W. Belz, D. R. Bergman, I. Buckland, W. Campbell, B. G. Cheon, K. Endo, A. Fedynitch, T. Fujii, K. Fujisue, K. Fujita, M. Fukushima, G. Furlich, Z. Gerber, N. Globus, W. Hanlon, N. Hayashida, H. He, K. Hibino, R. Higuchi, D. Ikeda, T. Ishii , et al. (101 additional authors not shown)

Abstract: We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements fr… ▽ More We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements from different observatories introduces the issue of possible systematic differences between detectors and analyses, we validate the methodology of the comparison by examining the region of the sky where the apertures of the two observatories overlap. Although the spectra differ in this region, we find that there is only a $1.8σ$ difference between the spectrum measurements when anisotropic regions are removed and a fiducial cut in the aperture is applied. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures

arXiv:2406.06650 [pdf, other]

Assessing the risk of recurrence in early-stage breast cancer through H&E stained whole slide images

Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology.We analyzed 125 hematoxylin and eosin-stained whole slide images (WSIs) from 125… ▽ More Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology.We analyzed 125 hematoxylin and eosin-stained whole slide images (WSIs) from 125 patients across two institutions (National Cancer Center and Korea University Medical Center Guro Hospital) to predict breast cancer recurrence risk using deep learning. Sensitivity reached 0.857, 0.746, and 0.529 for low, intermediate, and high-risk categories, respectively, with specificity of 0.816, 0.803, and 0.972, and a Pearson correlation of 0.61 with histological grade. Class activation maps highlighted features like tubule formation and mitotic rate, suggesting a cost-effective approach to risk stratification, pending broader validation. These findings suggest that deep learning models trained exclusively on hematoxylin and eosin stained whole slide images can approximate genomic assay results, offering a cost-effective and scalable tool for breast cancer recurrence risk assessment. However, further validation using larger and more balanced datasets is needed to confirm the clinical applicability of our approach. △ Less

Submitted 9 April, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 20 pages, 9 figures

arXiv:2405.14126 [pdf, other]

The Disappearance of Timestep Embedding in Modern Time-Dependent Neural Networks

Authors: Bum Jun Kim, Yoshinobu Kawahara, Sang Woo Kim

Abstract: Dynamical systems are often time-varying, whose modeling requires a function that evolves with respect to time. Recent studies such as the neural ordinary differential equation proposed a time-dependent neural network, which provides a neural network varying with respect to time. However, we claim that the architectural choice to build a time-dependent neural network significantly affects its time… ▽ More Dynamical systems are often time-varying, whose modeling requires a function that evolves with respect to time. Recent studies such as the neural ordinary differential equation proposed a time-dependent neural network, which provides a neural network varying with respect to time. However, we claim that the architectural choice to build a time-dependent neural network significantly affects its time-awareness but still lacks sufficient validation in its current states. In this study, we conduct an in-depth analysis of the architecture of modern time-dependent neural networks. Here, we report a vulnerability of vanishing timestep embedding, which disables the time-awareness of a time-dependent neural network. Furthermore, we find that this vulnerability can also be observed in diffusion models because they employ a similar architecture that incorporates timestep embedding to discriminate between different timesteps during a diffusion process. Our analysis provides a detailed description of this phenomenon as well as several solutions to address the root cause. Through experiments on neural ordinary differential equations and diffusion models, we observed that ensuring alive time-awareness via proposed solutions boosted their performance, which implies that their current implementations lack sufficient time-dependency. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 14 pages, 7 figures

arXiv:2405.14115 [pdf, other]

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers

Authors: Bum Jun Kim, Sang Woo Kim

Abstract: Vision transformers (ViTs) have demonstrated remarkable performance in a variety of vision tasks. Despite their promising capabilities, training a ViT requires a large amount of diverse data. Several studies empirically found that using rich data augmentations, such as Mixup, Cutmix, and random erasing, is critical to the successful training of ViTs. Now, the use of rich data augmentations has bec… ▽ More Vision transformers (ViTs) have demonstrated remarkable performance in a variety of vision tasks. Despite their promising capabilities, training a ViT requires a large amount of diverse data. Several studies empirically found that using rich data augmentations, such as Mixup, Cutmix, and random erasing, is critical to the successful training of ViTs. Now, the use of rich data augmentations has become a standard practice in the current state. However, we report a vulnerability to this practice: Certain data augmentations such as Mixup cause a variance shift in the positional embedding of ViT, which has been a hidden factor that degrades the performance of ViT during the test phase. We claim that achieving a stable effect from positional embedding requires a specific condition on the image, which is often broken for the current data augmentation methods. We provide a detailed analysis of this problem as well as the correct configuration for these data augmentations to remove the side effects of variance shift. Experiments showed that adopting our guidelines improves the performance of ViTs compared with the current configuration of data augmentations. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

arXiv:2404.10765 [pdf, other]

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Authors: Ashkan Mirzaei, Riccardo De Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

Abstract: Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plau… ▽ More Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Project page: https://reffusion.github.io

arXiv:2404.07263 [pdf, other]

doi 10.1103/PhysRevE.111.024135

The planted directed polymer: inferring a random walk from noisy images

Authors: Sun Woo P. Kim, Austen Lamacraft

Abstract: We introduce and study the planted directed polymer, in which the path of a random walker is inferred from noisy 'images' accumulated at each timestep. Formulated as a nonlinear problem of Bayesian inference for a hidden Markov model, this problem is a generalization of the directed polymer problem of statistical physics, coinciding with it in the limit of zero signal to noise. For a 1D walker we… ▽ More We introduce and study the planted directed polymer, in which the path of a random walker is inferred from noisy 'images' accumulated at each timestep. Formulated as a nonlinear problem of Bayesian inference for a hidden Markov model, this problem is a generalization of the directed polymer problem of statistical physics, coinciding with it in the limit of zero signal to noise. For a 1D walker we present numerical investigations and analytical arguments that no phase transition is present. When formulated on a Cayley tree, methods developed for the directed polymer are used to show that there is a transition with decreasing signal to noise where effective inference becomes impossible, meaning that the average fractional overlap between the inferred and true paths falls from one to zero. △ Less

Submitted 20 February, 2025; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 13 pages, 13 figures

arXiv:2402.01149 [pdf, other]

Scale Equalization for Multi-Level Feature Fusion

Authors: Bum Jun Kim, Sang Woo Kim

Abstract: Deep neural networks have exhibited remarkable performance in a variety of computer vision fields, especially in semantic segmentation tasks. Their success is often attributed to multi-level feature fusion, which enables them to understand both global and local information from an image. However, we found that multi-level features from parallel branches are on different scales. The scale disequili… ▽ More Deep neural networks have exhibited remarkable performance in a variety of computer vision fields, especially in semantic segmentation tasks. Their success is often attributed to multi-level feature fusion, which enables them to understand both global and local information from an image. However, we found that multi-level features from parallel branches are on different scales. The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent, thereby degrading performance in semantic segmentation. We discover that scale disequilibrium is caused by bilinear upsampling, which is supported by both theoretical and empirical evidence. Based on this observation, we propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling. Our proposed scale equalizers are easy to implement, applicable to any architecture, hyperparameter-free, implementable without requiring extra computational cost, and guarantee scale equilibrium for any dataset. Experiments showed that adopting scale equalizers consistently improved the mIoU index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPerHead, PSPHead, ASPPHead, SepASPPHead, and FCNHead. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 10 pages, 3 figures

arXiv:2401.11739 [pdf, other]

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Authors: Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim

Abstract: Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated imag… ▽ More Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated images. To address this question, we leverage the semantic knowledge extracted from Stable Diffusion (SD) and aim to develop an image segmentor capable of generating fine-grained segmentation maps without any additional training. The primary difficulty stems from the fact that semantically meaningful feature maps typically exist only in the spatially lower-dimensional layers, which poses a challenge in directly extracting pixel-level semantic relations from these feature maps. To overcome this issue, our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps by exploiting SD's generation process and utilizes them for constructing image-resolution segmentation maps. In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images, indicating the existence of highly accurate pixel-level semantic knowledge in diffusion models. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: ICLR 2024. Project page: https://kmcode1.github.io/Projects/EmerDiff/

arXiv:2401.07462 [pdf, other]

doi 10.1140/epjc/s10052-024-12770-1

Nonproportionality of NaI(Tl) Scintillation Detector for Dark Matter Search Experiments

Authors: S. M. Lee, G. Adhikari, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Fran. a, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (37 additional authors not shown)

Abstract: We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced… ▽ More We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced by decays supported by both long and short-lived isotopes. Analyzing peaks from decays supported only by short-lived isotopes presented a unique challenge due to their limited statistics and overlapping energies, which was overcome by long-term data collection and a time-dependent analysis. A key achievement is the direct measurement of the 0.87 keV light yield, resulting from the cascade following electron capture decay of $^{22}$Na from internal contamination. This measurement, previously accessible only indirectly, deepens our understanding of NaI(Tl) scintillator behavior in the region of interest for dark matter searches. This study holds substantial implications for background modeling and the interpretation of dark matter signals in NaI(Tl) experiments. △ Less

Submitted 10 May, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures

Journal ref: Eur. Phys. J. C 84 (2024) 484

arXiv:2312.13763 [pdf, other]

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Authors: Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

Abstract: Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional gener… ▽ More Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional generation-based approach, and combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization, thereby simultaneously enforcing temporal consistency, high-quality visual appearance and realistic geometry. Our method, called Align Your Gaussians (AYG), leverages dynamic 3D Gaussian Splatting with deformation fields as 4D representation. Crucial to AYG is a novel method to regularize the distribution of the moving 3D Gaussians and thereby stabilize the optimization and induce motion. We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation. These techniques allow us to synthesize vivid dynamic scenes, outperform previous work qualitatively and quantitatively and achieve state-of-the-art text-to-4D performance. Due to the Gaussian 4D representation, different 4D animations can be seamlessly combined, as we demonstrate. AYG opens up promising avenues for animation, simulation and digital content creation as well as synthetic data generation. △ Less

Submitted 3 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Project page: https://research.nvidia.com/labs/toronto-ai/AlignYourGaussians/

arXiv:2312.03206 [pdf]

Seamless monolithic three-dimensional integration of single-crystalline films by growth

Authors: Ki Seok Kim, Seunghwan Seo, Junyoung Kwon, Doyoon Lee, Changhyun Kim, Jung-El Ryu, Jekyung Kim, Min-Kyu Song, Jun Min Suh, Hang-Gyo Jung, Youhwan Jo, Hogeun Ahn, Sangho Lee, Kyeongjae Cho, Jongwook Jeon, Minsu Seol, Jin-Hong Park, Sang Won Kim, Jeehwan Kim

Abstract: The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystal… ▽ More The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystalline semiconductors without intervening wafers has yet to be demonstrated. This challenge arises from the inherent difficulty of growing single crystals on amorphous or polycrystalline surfaces post the back-end-of-the-line process at low temperatures to preserve the underlying circuitry. Consequently, a practical growth-based solution for M3D of single crystals remains elusive. Here, we present a method for growing single-crystalline channel materials, specifically composed of transition metal dichalcogenides, on amorphous and polycrystalline surfaces at temperatures lower than 400 °C. Building on this developed technique, we demonstrate the seamless monolithic integration of vertical single-crystalline logic transistor arrays. This accomplishment leads to the development of unprecedented vertical CMOS arrays, thereby constructing vertical inverters. Ultimately, this achievement sets the stage to pave the way for M3D integration of various electronic and optoelectronic hardware in the form of single crystals. △ Less

Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.13570 [pdf, other]

WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

Abstract: Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for in-the-wild datasets a shared canonical system can be difficult to define or might not even exist. In this work, we instead model instances in view space, alleviating th… ▽ More Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for in-the-wild datasets a shared canonical system can be difficult to define or might not even exist. In this work, we instead model instances in view space, alleviating the need for posed images and learned camera distributions. We find that in this setting, existing GAN-based methods are prone to generating flat geometry and struggle with distribution coverage. We hence propose WildFusion, a new approach to 3D-aware image synthesis based on latent diffusion models (LDMs). We first train an autoencoder that infers a compressed latent representation, which additionally captures the images' underlying 3D structure and enables not only reconstruction but also novel view synthesis. To learn a faithful 3D representation, we leverage cues from monocular depth prediction. Then, we train a diffusion model in the 3D-aware latent space, thereby enabling synthesis of high-quality 3D-consistent image samples, outperforming recent state-of-the-art GAN-based methods. Importantly, our 3D-aware LDM is trained without any direct supervision from multiview images or 3D geometry and does not require posed images or learned pose or camera distributions. It directly learns a 3D representation without relying on canonical camera coordinates. This opens up promising research avenues for scalable 3D-aware image synthesis and 3D content creation from in-the-wild image data. See https://katjaschwarz.github.io/wildfusion for videos of our 3D results. △ Less

Submitted 12 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.05010 [pdf, other]

doi 10.1016/j.astropartphys.2024.102945

Alpha backgrounds in NaI(Tl) crystals of COSINE-100

Authors: G. Adhikari, N. Carlin, D. F. F. S. Cavalcante, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (38 additional authors not shown)

Abstract: COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Ca… ▽ More COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Carlo simulation results and activity quantification of the alpha decay components of the COSINE-100 NaI(Tl) crystals. The data strongly indicate that the alpha decays probabilistically undergo two possible quenching factors but require further investigation. The fitted results are consistent with independent measurements and improve the overall understanding of the COSINE-100 backgrounds. Furthermore, the half-life of 216Po has been measured to be 143.4 +/- 1.2 ms, which is consistent with and more precise than recent measurements. △ Less

Submitted 30 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.03938 [pdf, other]

Analysis of NaN Divergence in Training Monocular Depth Estimation Model

Authors: Bum Jun Kim, Hyeonah Jang, Sang Woo Kim

Abstract: The latest advances in deep learning have facilitated the development of highly accurate monocular depth estimation models. However, when training a monocular depth estimation network, practitioners and researchers have observed not a number (NaN) loss, which disrupts gradient descent optimization. Although several practitioners have reported the stochastic and mysterious occurrence of NaN loss th… ▽ More The latest advances in deep learning have facilitated the development of highly accurate monocular depth estimation models. However, when training a monocular depth estimation network, practitioners and researchers have observed not a number (NaN) loss, which disrupts gradient descent optimization. Although several practitioners have reported the stochastic and mysterious occurrence of NaN loss that bothers training, its root cause is not discussed in the literature. This study conducted an in-depth analysis of NaN loss during training a monocular depth estimation network and identified three types of vulnerabilities that cause NaN loss: 1) the use of square root loss, which leads to an unstable gradient; 2) the log-sigmoid function, which exhibits numerical stability issues; and 3) certain variance implementations, which yield incorrect computations. Furthermore, for each vulnerability, the occurrence of NaN loss was demonstrated and practical guidelines to prevent NaN loss were presented. Experiments showed that both optimization stability and performance on monocular depth estimation could be improved by following our guidelines. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 10 pages, 3 figures

arXiv:2311.02077 [pdf, other]

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

Abstract: We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. Grounded in neural fields, EmerNeRF simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. EmerNeRF hinges upon two core components: First, it stratifies scenes into static and dynamic fields. This decomposition emerges purely from… ▽ More We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. Grounded in neural fields, EmerNeRF simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. EmerNeRF hinges upon two core components: First, it stratifies scenes into static and dynamic fields. This decomposition emerges purely from self-supervision, enabling our model to learn from general, in-the-wild data sources. Second, EmerNeRF parameterizes an induced flow field from the dynamic field and uses this flow field to further aggregate multi-frame features, amplifying the rendering precision of dynamic objects. Coupling these three fields (static, dynamic, and flow) enables EmerNeRF to represent highly-dynamic scenes self-sufficiently, without relying on ground truth object annotations or pre-trained models for dynamic object segmentation or optical flow estimation. Our method achieves state-of-the-art performance in sensor simulation, significantly outperforming previous methods when reconstructing static (+2.93 PSNR) and dynamic (+3.70 PSNR) scenes. In addition, to bolster EmerNeRF's semantic generalization, we lift 2D visual foundation model features into 4D space-time and address a general positional bias in modern Transformers, significantly boosting 3D perception performance (e.g., 37.50% relative improvement in occupancy prediction accuracy on average). Finally, we construct a diverse and challenging 120-sequence dataset to benchmark neural fields under extreme and highly-dynamic settings. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: See the project page for code, data, and request pre-trained models: https://emernerf.github.io

arXiv:2308.03899 [pdf, other]

doi 10.1103/PhysRevB.108.245414

Polarization Charge around Impurities in Two-Dimensional Anisotropic Dirac Systems

Authors: Mohamed M. Elsayed, Sang Wook Kim, Juan M. Vanegas, Valeri N. Kotov

Abstract: Introducing quasiparticle anisotropy in graphene via uniaxial strain has a profound effect on the polarization charge density induced by external impurities, both Coulomb and short-range. In particular, the charge distribution induced by a Coulomb impurity exhibits a power law tail modulated by a strain-dependent admixture of angular harmonics. The appearance of distributed charge is in sharp cont… ▽ More Introducing quasiparticle anisotropy in graphene via uniaxial strain has a profound effect on the polarization charge density induced by external impurities, both Coulomb and short-range. In particular, the charge distribution induced by a Coulomb impurity exhibits a power law tail modulated by a strain-dependent admixture of angular harmonics. The appearance of distributed charge is in sharp contrast to the response in pristine/isotropic graphene, where for subcritical impurities the polarization charge is fully localized at the impurity position. It is also interesting to note that our results are obtained strictly at zero chemical potential, and the behavior is distinct from the familiar Friedel oscillations observed at finite chemical potential. We find that over a wide range of strain, the $d$-wave symmetry is dominant. The presence of Dirac cone tilt, relevant to some 2D materials beyond graphene, can also substantially affect the induced charge distribution. Finally we consider impurities with short range potentials, and study the effect of strain on the charge response. Our results were obtained in the continuum via perturbation theory valid for weak (subcritical) potentials, and supported by numerical lattice simulations based on density functional theory. △ Less

Submitted 11 November, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: 13 pages, 13 figures. Added new Section VI with new figures, and updated old figures

Journal ref: Phys. Rev. B 108, 245414 (2023)

arXiv:2307.14179 [pdf, other]

Resolution-Aware Design of Atrous Rates for Semantic Segmentation Networks

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim

Abstract: DeepLab is a widely used deep neural network for semantic segmentation, whose success is attributed to its parallel architecture called atrous spatial pyramid pooling (ASPP). ASPP uses multiple atrous convolutions with different atrous rates to extract both local and global information. However, fixed values of atrous rates are used for the ASPP module, which restricts the size of its field of vie… ▽ More DeepLab is a widely used deep neural network for semantic segmentation, whose success is attributed to its parallel architecture called atrous spatial pyramid pooling (ASPP). ASPP uses multiple atrous convolutions with different atrous rates to extract both local and global information. However, fixed values of atrous rates are used for the ASPP module, which restricts the size of its field of view. In principle, atrous rate should be a hyperparameter to change the field of view size according to the target task or dataset. However, the manipulation of atrous rate is not governed by any guidelines. This study proposes practical guidelines for obtaining an optimal atrous rate. First, an effective receptive field for semantic segmentation is introduced to analyze the inner behavior of segmentation networks. We observed that the use of ASPP module yielded a specific pattern in the effective receptive field, which was traced to reveal the module's underlying mechanism. Accordingly, we derive practical guidelines for obtaining the optimal atrous rate, which should be controlled based on the size of input image. Compared to other values, using the optimal atrous rate consistently improved the segmentation results across multiple datasets, including the STARE, CHASE_DB1, HRF, Cityscapes, and iSAID datasets. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 18 pages, 12 figures

arXiv:2307.07487 [pdf, other]

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

Authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

Abstract: In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling… ▽ More In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Project page: https://research.nvidia.com/labs/toronto-ai/DreamTeacher/

arXiv:2305.04722 [pdf, other]

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim

Abstract: Vision transformers (ViTs) that model an image as a sequence of partitioned patches have shown notable performance in diverse vision tasks. Because partitioning patches eliminates the image structure, to reflect the order of patches, ViTs utilize an explicit component called positional embedding. However, we claim that the use of positional embedding does not simply guarantee the order-awareness o… ▽ More Vision transformers (ViTs) that model an image as a sequence of partitioned patches have shown notable performance in diverse vision tasks. Because partitioning patches eliminates the image structure, to reflect the order of patches, ViTs utilize an explicit component called positional embedding. However, we claim that the use of positional embedding does not simply guarantee the order-awareness of ViT. To support this claim, we analyze the actual behavior of ViTs using an effective receptive field. We demonstrate that during training, ViT acquires an understanding of patch order from the positional embedding that is trained to be a specific pattern. Based on this observation, we propose explicitly adding a Gaussian attention bias that guides the positional embedding to have the corresponding pattern from the beginning of training. We evaluated the influence of Gaussian attention bias on the performance of ViTs in several image classification, object detection, and semantic segmentation experiments. The results showed that proposed method not only facilitates ViTs to understand images but also boosts their performance on various datasets, including ImageNet, COCO 2017, and ADE20K. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 11 pages, 7 Figures

arXiv:2304.09787 [pdf, other]

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

Abstract: Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first trai… ▽ More Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2304.08818 [pdf, other]

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

Abstract: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by int… ▽ More Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512 x 1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280 x 2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/ △ Less

Submitted 27 December, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

arXiv:2302.06112 [pdf, other]

How to Use Dropout Correctly on Residual Networks with Batch Normalization

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Donggeon Lee, Sang Woo Kim

Abstract: For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a re… ▽ More For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: 10 pages, 4 figures

arXiv:2302.03193 [pdf, other]

On the Ideal Number of Groups for Isometric Gradient Propagation

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim

Abstract: Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experi… ▽ More Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all layers. The proposed method exhibited improved performance over existing methods in numerous neural network architectures, tasks, and datasets. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 10 pages, 2 figures

arXiv:2301.01375 [pdf, other]

Brightest Cluster Galaxy Formation in the z=4.3 Protocluster SPT2349-56: Discovery of a Radio-Loud AGN

Authors: Scott C. Chapman, Ryley Hill, Manuel Aravena, Melanie Archipley, Arif Babul, James Burgoyne, Rebecca E. A. Canning, Carlos De Breuck, Anthony H. Gonzalez, Christopher C. Hayward, Seon Woo Kim, Matt Malkan, Dan P. Marrone, Vincent McIntyre, Eric Murphy, Emily Pass, Ryan W. Perry, Kedar A. Phadke, Douglas Rennehan, Cassie Reuter, Kaja M. Rotermund, Douglas Scott, Nick Seymour, Manuel Solimano, Justin Spilker , et al. (7 additional authors not shown)

Abstract: We have observed the z=4.3 protocluster SPT2349-56 with ATCA with the aim of detecting radio-loud active galactic nuclei (AGN) amongst the ~30 submillimeter galaxies identified in the structure. We detect the central complex of SMGs at 2.2\,GHz with a luminosity of L_2.2=(4.42pm0.56)x10^{25} W/Hz. The ASKAP also detects the source at 888 MHz, constraining the radio spectral index to alpha=-1.6pm0.… ▽ More We have observed the z=4.3 protocluster SPT2349-56 with ATCA with the aim of detecting radio-loud active galactic nuclei (AGN) amongst the ~30 submillimeter galaxies identified in the structure. We detect the central complex of SMGs at 2.2\,GHz with a luminosity of L_2.2=(4.42pm0.56)x10^{25} W/Hz. The ASKAP also detects the source at 888 MHz, constraining the radio spectral index to alpha=-1.6pm0.3, consistent with ATCA non-detections at 5.5 and 9GHz, and implying L_1.4(rest)=(2.4pm0.3)x10^{26}W/Hz. This radio luminosity is about 100 times higher than expected from star formation, assuming the usual FIR-radio correlation, which is a clear indication of an AGN driven by a forming brightest cluster galaxy (BCG). None of the SMGs in SPT2349-56 show signs of AGN in any other diagnostics available to us (notably 12CO out to J=16, OH163um, CII/IR, and optical spectra), highlighting the radio continuum as a powerful probe of obscured AGN in high-z protoclusters. No other significant radio detections are found amongst the cluster members, consistent with the FIR-radio correlation. We compare these results to field samples of radio sources and SMGs, along with the 22 SPT-SMG gravitational lenses also observed in the ATCA program, as well as powerful radio galaxies at high redshifts. Our results allow us to better understand the effects of this gas-rich, overdense environment on early supermassive black hole (SMBH) growth and cluster feedback. We estimate that (3.3pm0.7)x10^{38} W of power are injected into the growing ICM by the radio-loud AGN, whose energy over 100Myr is comparable to the binding energy of the gas mass of the central halo. The AGN power is also comparable to the instantaneous energy injection from supernova feedback from the 23 catalogued SMGs in the core region of 120kpc projected radius. The SPT2349-56 radio-loud AGN may be providing strong feedback on a nascent ICM. △ Less

Submitted 4 January, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: 31 pages, submitted to ApJ, Dec17,2022

arXiv:2301.00494 [pdf, other]

doi 10.1088/2058-9565/acf1c8

Quantum Atomic Matter Near Two-Dimensional Materials in Microgravity

Authors: Adrian Del Maestro, Sang Wook Kim, Nicholas P. Bigelow, Robert J. Thompson, Valeri N. Kotov

Abstract: Novel two-dimensional (2D) atomically flat materials, such as graphene and transition-metal dichalcogenides, exhibit unconventional Dirac electronic spectra. We propose to effectively engineer their interactions with cold atoms in microgravity, leading to a synergy between complex electronic and atomic collective quantum phases and phenomena. Dirac materials are susceptible to manipulation and qua… ▽ More Novel two-dimensional (2D) atomically flat materials, such as graphene and transition-metal dichalcogenides, exhibit unconventional Dirac electronic spectra. We propose to effectively engineer their interactions with cold atoms in microgravity, leading to a synergy between complex electronic and atomic collective quantum phases and phenomena. Dirac materials are susceptible to manipulation and quantum engineering via changes in their electronic properties by application of strain, doping with carriers, adjustment of their dielectric environment, etc. Consequently the interaction of atoms with such materials, namely the van der Waals / Casimir-Polder interaction, can be effectively manipulated, leading to the potential observation of physical effects such as Quantum Reflection off atomically thin materials and confined Bose-Einstein Condensate (BEC) frequency shifts. △ Less

Submitted 18 August, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: 11 pages, 3 figures; discussion and references added

Journal ref: Quantum Science and Technology 8, 044002 (2023)

arXiv:2211.07672 [pdf, other]

doi 10.1103/PhysRevB.109.064512

Strain-induced superfluid transition for atoms on graphene

Authors: Sang Wook Kim, Mohamed Elsayed, Nathan S. Nichols, Taras Lakoba, Juan Vanegas, Carlos Wexler, Valeri N. Kotov, Adrian Del Maestro

Abstract: Bosonic atoms deposited on atomically thin substrates represent a playground for exotic quantum many-body physics due to the highly-tunable, atomic-scale nature of the interaction potentials. The ability to engineer strong interparticle interactions can lead to the emergence of complex collective atomic states of matter, not possible in the context of dilute atomic gases confined in optical lattic… ▽ More Bosonic atoms deposited on atomically thin substrates represent a playground for exotic quantum many-body physics due to the highly-tunable, atomic-scale nature of the interaction potentials. The ability to engineer strong interparticle interactions can lead to the emergence of complex collective atomic states of matter, not possible in the context of dilute atomic gases confined in optical lattices. While it is known that the first layer of adsorbed helium on graphene is permanently locked into a solid phase, we show by a combination of quantum Monte Carlo and mean-field techniques, that simple isotropic graphene lattice expansion effectively unlocks a large variety of two-dimensional ordered commensurate, incommensurate, cluster atomic solid, and superfluid states for adsorbed atoms. It is especially significant that an atomically thin superfluid phase of matter emerges under experimentally feasible strain values, with potentially supersolid phases in close proximity on the phase diagram. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Main text: 12 pages and 5 figures. Supplementary Information: 3 pages and 3 figures. For associated data and code repository see: https://github.com/DelMaestroGroup/papers-code-Superfluid4HeStrainGraphene

Journal ref: Phys. Rev. B 109, 064512 (2024)

arXiv:2209.15297 [pdf]

doi 10.1038/s41563-022-01353-8

Quantum electron liquid and its possible phase transition

Authors: Sunghun Kim, Joonho Bang, Chan-young Lim, Seung Yong Lee, Jounghoon Hyun, Gyubin Lee, Yeonghoon Lee, Jonathan D. Denlinger, Soonsang Huh, Changyoung Kim, Sang Yong Song, Junpil Seo, Dinesh Thapa, Seong-Gon Kim, Young Hee Lee, Yeongkwan Kim, Sung Wng Kim

Abstract: Purely quantum electron systems exhibit intriguing correlated electronic phases by virtue of quantum fluctuations in addition to electron-electron interactions. To realize such quantum electron systems, a key ingredient is dense electrons decoupled from other degrees of freedom. Here, we report the discovery of a pure quantum electron liquid, which spreads up to ~ 3 Å in the vacuum on the surface… ▽ More Purely quantum electron systems exhibit intriguing correlated electronic phases by virtue of quantum fluctuations in addition to electron-electron interactions. To realize such quantum electron systems, a key ingredient is dense electrons decoupled from other degrees of freedom. Here, we report the discovery of a pure quantum electron liquid, which spreads up to ~ 3 Å in the vacuum on the surface of electride crystal. An extremely high electron density and its weak hybridisation with buried atomic orbitals evidence the quantum and pure nature of electrons, that exhibit a polarized liquid phase as demonstrated by our spin-dependent measurement. Further, upon enhancing the electron correlation strength, the dynamics of quantum electrons changes to that of non-Fermi liquid along with an anomalous band deformation, suggestive of a transition to a hexatic liquid crystal phase. Our findings cultivate the frontier of quantum electron systems, and serve as a platform for exploring correlated electronic phases in a pure fashion. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 29 pages, 4 figures, 10 extended data figures

Journal ref: Nature Material (2022)

arXiv:2208.12544 [pdf]

doi 10.1016/j.combustflame.2022.112583

Deep learning-based denoising for fast time-resolved flame emission spectroscopy in high-pressure combustion environment

Authors: Taekeun Yoon, Seon Woong Kim, Hosung Byun, Younsik Kim, Campbell D. Carter, Hyungrok Do

Abstract: A deep learning strategy is developed for fast and accurate gas property measurements using flame emission spectroscopy (FES). Particularly, the short-gated fast FES is essential to resolve fast-evolving combustion behaviors. However, as the exposure time for capturing the flame emission spectrum gets shorter, the signal-to-noise ratio (SNR) decreases, and characteristic spectral features indicati… ▽ More A deep learning strategy is developed for fast and accurate gas property measurements using flame emission spectroscopy (FES). Particularly, the short-gated fast FES is essential to resolve fast-evolving combustion behaviors. However, as the exposure time for capturing the flame emission spectrum gets shorter, the signal-to-noise ratio (SNR) decreases, and characteristic spectral features indicating the gas properties become relatively weaker. Then, the property estimation based on the short-gated spectrum is difficult and inaccurate. Denoising convolutional neural networks (CNN) can enhance the SNR of the short-gated spectrum. A new CNN architecture including a reversible down- and up-sampling (DU) operator and a loss function based on proper orthogonal decomposition (POD) coefficients is proposed. For training and testing the CNN, flame chemiluminescence spectra were captured from a stable methane-air flat flame using a portable spectrometer (spectral range: 250 - 850 nm, resolution: 0.5 nm) with varied equivalence ratio (0.8 - 1.2), pressure (1 - 10 bar), and exposure time (0.05, 0.2, 0.4, and 2 s). The long exposure (2 s) spectra were used as the ground truth when training the denoising CNN. A kriging model with POD is trained by the long-gated spectra for calibration, and then the prediction of the gas properties taking the denoised short-gated spectrum as the input: The property prediction errors of pressure and equivalence ratio were remarkably lowered in spite of the low SNR attendant with reduced exposure. △ Less

Submitted 26 December, 2022; v1 submitted 29 July, 2022; originally announced August 2022.

Comments: 25 pages, 12 figures, accepted to Combustion and Flame

Report number: Combustion and Flame 248 (2023) 112583

arXiv:2206.02903 [pdf, other]

Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps

Authors: Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler

Abstract: Modern image generative models show remarkable sample quality when trained on a single domain or class of objects. In this work, we introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains. We leverage the fact that a variety of object classes share common attributes, with certain geometric differences. We propose Polymorphic-G… ▽ More Modern image generative models show remarkable sample quality when trained on a single domain or class of objects. In this work, we introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains. We leverage the fact that a variety of object classes share common attributes, with certain geometric differences. We propose Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain. In contrast to previous works, our framework allows simultaneous modelling of images with highly varying geometries, such as images of human faces, painted and artistic faces, as well as multiple different animal faces. We demonstrate that our model produces aligned samples for all domains and show how it can be used for applications such as segmentation transfer and cross-domain image editing, as well as training in low-data regimes. Additionally, we apply our Polymorphic-GAN on image-to-image translation tasks and show that we can greatly surpass previous approaches in cases where the geometric differences between domains are large. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: CVPR 2022 Oral

arXiv:2205.07260 [pdf, other]

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Dong Gu Lee, Wonseok Jeong, Sang Woo Kim

Abstract: L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approa… ▽ More L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approaches: 1) variance control to make the residual network behave like identity mapping and 2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable gamma to apply L2 regularization and propose four guidelines for managing them. In several experiments, we observed the increase and decrease in performance caused by applying L2 regularization to gamma of four categories, which is consistent with our four guidelines. Our proposed guidelines were validated through various tasks and architectures, including variants of residual networks and transformers. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: 12 pages, 6 figures

arXiv:2205.05115 [pdf, other]

doi 10.1029/2023GL102958

First High-speed Video Camera Observations of a Lightning Flash Associated with a Downward Terrestrial Gamma-ray Flash

Authors: R. U. Abbasi, M. M. F. Saba, J. W. Belz, P. R. Krehbiel, W. Rison, N. Kieu, D. R. da Silva, Dan Rodeheffer, M. A. Stanley, J. Remington, J. Mazich, R. LeVon, K. Smout, A. Petrizze, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii , et al. (127 additional authors not shown)

Abstract: In this paper, we present the first high-speed video observation of a cloud-to-ground lightning flash and its associated downward-directed Terrestrial Gamma-ray Flash (TGF). The optical emission of the event was observed by a high-speed video camera running at 40,000 frames per second in conjunction with the Telescope Array Surface Detector, Lightning Mapping Array, interferometer, electric-field… ▽ More In this paper, we present the first high-speed video observation of a cloud-to-ground lightning flash and its associated downward-directed Terrestrial Gamma-ray Flash (TGF). The optical emission of the event was observed by a high-speed video camera running at 40,000 frames per second in conjunction with the Telescope Array Surface Detector, Lightning Mapping Array, interferometer, electric-field fast antenna, and the National Lightning Detection Network. The cloud-to-ground flash associated with the observed TGF was formed by a fast downward leader followed by a very intense return stroke peak current of -154 kA. The TGF occurred while the downward leader was below cloud base, and even when it was halfway in its propagation to ground. The suite of gamma-ray and lightning instruments, timing resolution, and source proximity offer us detailed information and therefore a unique look at the TGF phenomena. △ Less

Submitted 9 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

Journal ref: Geophysical Research Letters, 50, e2023GL102958 (2023)

arXiv:2201.07313 [pdf, other]

doi 10.3847/1538-4357/ac6def

Search for Spatial Correlations of Neutrinos with Ultra-High-Energy Cosmic Rays

Authors: The ANTARES collaboration, A. Albert, S. Alves, M. André, M. Anghinolfi, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, B. Belhorma, M. Bendahman, V. Bertin, S. Biagi, M. Bissinger, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzaş, R. Bruijn, J. Brunner, J. Busto, B. Caiffi, D. Calvo , et al. (1025 additional authors not shown)

Abstract: For several decades, the origin of ultra-high-energy cosmic rays (UHECRs) has been an unsolved question of high-energy astrophysics. One approach for solving this puzzle is to correlate UHECRs with high-energy neutrinos, since neutrinos are a direct probe of hadronic interactions of cosmic rays and are not deflected by magnetic fields. In this paper, we present three different approaches for corre… ▽ More For several decades, the origin of ultra-high-energy cosmic rays (UHECRs) has been an unsolved question of high-energy astrophysics. One approach for solving this puzzle is to correlate UHECRs with high-energy neutrinos, since neutrinos are a direct probe of hadronic interactions of cosmic rays and are not deflected by magnetic fields. In this paper, we present three different approaches for correlating the arrival directions of neutrinos with the arrival directions of UHECRs. The neutrino data is provided by the IceCube Neutrino Observatory and ANTARES, while the UHECR data with energies above $\sim$50 EeV is provided by the Pierre Auger Observatory and the Telescope Array. All experiments provide increased statistics and improved reconstructions with respect to our previous results reported in 2015. The first analysis uses a high-statistics neutrino sample optimized for point-source searches to search for excesses of neutrinos clustering in the vicinity of UHECR directions. The second analysis searches for an excess of UHECRs in the direction of the highest-energy neutrinos. The third analysis searches for an excess of pairs of UHECRs and highest-energy neutrinos on different angular scales. None of the analyses has found a significant excess, and previously reported over-fluctuations are reduced in significance. Based on these results, we further constrain the neutrino flux spatially correlated with UHECRs. △ Less

Submitted 23 August, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: 39 pages, 7 figures, 4 tables; updated source files including xml authorlist

Report number: FERMILAB-PUB-22-033-AD-PPD-SCD-TD

Journal ref: ApJ 934 164 (2022)

arXiv:2201.04684 [pdf, other]

BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations

Authors: Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba

Abstract: Annotating images with pixel-wise labels is a time-consuming and costly process. Recently, DatasetGAN showcased a promising alternative - to synthesize a large labeled dataset via a generative adversarial network (GAN) by exploiting a small set of manually labeled, GAN-generated images. Here, we scale DatasetGAN to ImageNet scale of class diversity. We take image samples from the class-conditional… ▽ More Annotating images with pixel-wise labels is a time-consuming and costly process. Recently, DatasetGAN showcased a promising alternative - to synthesize a large labeled dataset via a generative adversarial network (GAN) by exploiting a small set of manually labeled, GAN-generated images. Here, we scale DatasetGAN to ImageNet scale of class diversity. We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes. By training an effective feature segmentation architecture on top of BigGAN, we turn BigGAN into a labeled dataset generator. We further show that VQGAN can similarly serve as a dataset generator, leveraging the already annotated data. We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings. Through an extensive ablation study we show big gains in leveraging a large generated dataset to train different supervised and self-supervised backbone models on pixel-wise tasks. Furthermore, we demonstrate that using our synthesized datasets for pre-training leads to improvements over standard ImageNet pre-training on several downstream datasets, such as PASCAL-VOC, MS-COCO, Cityscapes and chest X-ray, as well as tasks (detection, segmentation). Our benchmark will be made public and maintain a leaderboard for this challenging task. Project Page: https://nv-tlabs.github.io/big-datasetgan/ △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: https://nv-tlabs.github.io/big-datasetgan/

arXiv:2111.09962 [pdf, other]

doi 10.1103/PhysRevD.105.062002

Observation of Variations in Cosmic Ray Single Count Rates During Thunderstorms and Implications for Large-Scale Electric Field Changes

Authors: R. U. Abbasi, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, R. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, M. Hayashi , et al. (140 additional authors not shown)

Abstract: We present the first observation by the Telescope Array Surface Detector (TASD) of the effect of thunderstorms on the development of cosmic ray single count rate intensity over a 700 km$^{2}$ area. Observations of variations in the secondary low-energy cosmic ray counting rate, using the TASD, allow us to study the electric field inside thunderstorms, on a large scale, as it progresses on top of t… ▽ More We present the first observation by the Telescope Array Surface Detector (TASD) of the effect of thunderstorms on the development of cosmic ray single count rate intensity over a 700 km$^{2}$ area. Observations of variations in the secondary low-energy cosmic ray counting rate, using the TASD, allow us to study the electric field inside thunderstorms, on a large scale, as it progresses on top of the 700 km$^{2}$ detector, without dealing with the limitation of narrow exposure in time and space using balloons and aircraft detectors. In this work, variations in the cosmic ray intensity (single count rate) using the TASD, were studied and found to be on average at the $\sim(0.5-1)\%$ and up to 2\% level. These observations were found to be both in excess and in deficit. They were also found to be correlated with lightning in addition to thunderstorms. These variations lasted for tens of minutes; their footprint on the ground ranged from 6 to 24 km in diameter and moved in the same direction as the thunderstorm. With the use of simple electric field models inside the cloud and between cloud to ground, the observed variations in the cosmic ray single count rate were recreated using CORSIKA simulations. Depending on the electric field model used and the direction of the electric field in that model, the electric field magnitude that reproduces the observed low-energy cosmic ray single count rate variations was found to be approximately between 0.2-0.4 GV. This in turn allows us to get a reasonable insight on the electric field and its effect on cosmic ray air showers inside thunderstorms. △ Less

Submitted 18 November, 2021; originally announced November 2021.

arXiv:2111.08413 [pdf, other]

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

Authors: Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Dong Gu Lee, Wonseok Jeong, Sang Woo Kim

Abstract: Vision transformers (ViTs) have recently demonstrated state-of-the-art performance in a variety of vision tasks, replacing convolutional neural networks (CNNs). Meanwhile, since ViT has a different architecture than CNN, it may behave differently. To investigate the reliability of ViT, this paper studies the behavior and robustness of ViT. We compared the robustness of CNN and ViT by assuming vari… ▽ More Vision transformers (ViTs) have recently demonstrated state-of-the-art performance in a variety of vision tasks, replacing convolutional neural networks (CNNs). Meanwhile, since ViT has a different architecture than CNN, it may behave differently. To investigate the reliability of ViT, this paper studies the behavior and robustness of ViT. We compared the robustness of CNN and ViT by assuming various image corruptions that may appear in practical vision tasks. We confirmed that for most image transformations, ViT showed robustness comparable to CNN or more improved. However, for contrast enhancement, severe performance degradations were consistently observed in ViT. From a detailed analysis, we identified a potential problem: positional embedding in ViT's patch embedding could work improperly when the color scale changes. Here we claim the use of PreLayerNorm, a modified patch embedding structure to ensure scale-invariant behavior of ViT. ViT with PreLayerNorm showed improved robustness in various corruptions including contrast-varying environments. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: 7 pages, 8 figures. Work in Progress

arXiv:2111.03186 [pdf, other]

EditGAN: High-Precision Semantic Image Editing

Authors: Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

Abstract: Generative adversarial networks (GANs) have recently found applications in image editing. However, most GAN based image editing methods often require large scale datasets with semantic segmentation annotations for training, only provide high level control, or merely interpolate between different images. Here, we propose EditGAN, a novel method for high quality, high precision semantic image editin… ▽ More Generative adversarial networks (GANs) have recently found applications in image editing. However, most GAN based image editing methods often require large scale datasets with semantic segmentation annotations for training, only provide high level control, or merely interpolate between different images. Here, we propose EditGAN, a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks, e.g., drawing a new mask for the headlight of a car. EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing. Specifically, we embed an image into the GAN latent space and perform conditional latent code optimization according to the segmentation edit, which effectively also modifies the image. To amortize optimization, we find editing vectors in latent space that realize the edits. The framework allows us to learn an arbitrary number of editing vectors, which can then be directly applied on other images at interactive rates. We experimentally show that EditGAN can manipulate images with an unprecedented level of detail and freedom, while preserving full image quality.We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data. We demonstrate EditGAN on a wide variety of image types and quantitatively outperform several previous editing methods on standard editing benchmark tasks. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.14827 [pdf, other]

Indications of a Cosmic Ray Source in the Perseus-Pisces Supercluster

Authors: Telescope Array Collaboration, R. U. Abbasi, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, R. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon , et al. (135 additional authors not shown)

Abstract: The Telescope Array Collaboration has observed an excess of events with $E \ge 10^{19.4} ~{\rm eV}$ in the data which is centered at (RA, dec) = ($19^\circ$, $35^\circ$). This is near the center of the Perseus-Pisces supercluster (PPSC). The PPSC is about $70 ~{\rm Mpc}$ distant and is the closest supercluster in the Northern Hemisphere (other than the Virgo supercluster of which we are a part). A… ▽ More The Telescope Array Collaboration has observed an excess of events with $E \ge 10^{19.4} ~{\rm eV}$ in the data which is centered at (RA, dec) = ($19^\circ$, $35^\circ$). This is near the center of the Perseus-Pisces supercluster (PPSC). The PPSC is about $70 ~{\rm Mpc}$ distant and is the closest supercluster in the Northern Hemisphere (other than the Virgo supercluster of which we are a part). A Li-Ma oversampling analysis with $20^\circ$-radius circles indicates an excess in the arrival direction of events with a local significance of about 4 standard deviations. The probability of having such excess close to the PPSC by chance is estimated to be 3.5 standard deviations. This result indicates that a cosmic ray source likely exists in that supercluster. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 8 pages, 4 figures, 1 table

Showing 1–50 of 144 results for author: Kim, S W