Search | arXiv e-print repository

GLOS: Sign Language Generation with Temporally Aligned Gloss-Level Conditioning

Authors: Taeryung Lee, Hyeongjin Nam, Gyeongsik Moon, Kyoung Mu Lee

Abstract: Sign language generation (SLG), or text-to-sign generation, bridges the gap between signers and non-signers. Despite recent progress in SLG, existing methods still often suffer from incorrect lexical ordering and low semantic accuracy. This is primarily due to sentence-level condition, which encodes the entire sentence of the input text into a single feature vector as a condition for SLG. This app… ▽ More Sign language generation (SLG), or text-to-sign generation, bridges the gap between signers and non-signers. Despite recent progress in SLG, existing methods still often suffer from incorrect lexical ordering and low semantic accuracy. This is primarily due to sentence-level condition, which encodes the entire sentence of the input text into a single feature vector as a condition for SLG. This approach fails to capture the temporal structure of sign language and lacks the granularity of word-level semantics, often leading to disordered sign sequences and ambiguous motions. To overcome these limitations, we propose GLOS, a sign language generation framework with temporally aligned gloss-level conditioning. First, we employ gloss-level conditions, which we define as sequences of gloss embeddings temporally aligned with the motion sequence. This enables the model to access both the temporal structure of sign language and word-level semantics at each timestep. As a result, this allows for fine-grained control of signs and better preservation of lexical order. Second, we introduce a condition fusion module, temporal alignment conditioning (TAC), to efficiently deliver the word-level semantic and temporal structure provided by the gloss-level condition to the corresponding motion timesteps. Our method, which is composed of gloss-level conditions and TAC, generates signs with correct lexical order and high semantic accuracy, outperforming prior methods on CSL-Daily and Phoenix-2014T. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2505.13235 [pdf, ps, other]

WriteViT: Handwritten Text Generation with Vision Transformer

Authors: Dang Hoai Nam, Huynh Tong Dang Khoa, Vo Nguyen Le Duy

Abstract: Humans can quickly generalize handwriting styles from a single example by intuitively separating content from style. Machines, however, struggle with this task, especially in low-data settings, often missing subtle spatial and stylistic cues. Motivated by this gap, we introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT), a family of models… ▽ More Humans can quickly generalize handwriting styles from a single example by intuitively separating content from style. Machines, however, struggle with this task, especially in low-data settings, often missing subtle spatial and stylistic cues. Motivated by this gap, we introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT), a family of models that have shown strong performance across various computer vision tasks. WriteViT integrates a ViT-based Writer Identifier for extracting style embeddings, a multi-scale generator built with Transformer encoder-decoder blocks enhanced by conditional positional encoding (CPE), and a lightweight ViT-based recognizer. While previous methods typically rely on CNNs or CRNNs, our design leverages transformers in key components to better capture both fine-grained stroke details and higher-level style information. Although handwritten text synthesis has been widely explored, its application to Vietnamese -- a language rich in diacritics and complex typography -- remains limited. Experiments on Vietnamese and English datasets demonstrate that WriteViT produces high-quality, style-consistent handwriting while maintaining strong recognition performance in low-resource scenarios. These results highlight the promise of transformer-based designs for multilingual handwriting generation and efficient style adaptation. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.11855 [pdf, ps, other]

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Authors: Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinha Choi, Gonçalo Paulo, Youngjae Yu, Stella Biderman

Abstract: Recent advances in large language models (LLMs) have fueled the vision of automated scientific discovery, often called AI Co-Scientists. To date, prior work casts these systems as generative co-authors responsible for crafting hypotheses, synthesizing code, or drafting manuscripts. In this work, we explore a complementary application: using LLMs as verifiers to automate the \textbf{academic verifi… ▽ More Recent advances in large language models (LLMs) have fueled the vision of automated scientific discovery, often called AI Co-Scientists. To date, prior work casts these systems as generative co-authors responsible for crafting hypotheses, synthesizing code, or drafting manuscripts. In this work, we explore a complementary application: using LLMs as verifiers to automate the \textbf{academic verification of scientific manuscripts}. To that end, we introduce SPOT, a dataset of 83 published papers paired with 91 errors significant enough to prompt errata or retraction, cross-validated with actual authors and human annotators. Evaluating state-of-the-art LLMs on SPOT, we find that none surpasses 21.1\% recall or 6.1\% precision (o3 achieves the best scores, with all others near zero). Furthermore, confidence estimates are uniformly low, and across eight independent runs, models rarely rediscover the same errors, undermining their reliability. Finally, qualitative analysis with domain experts reveals that even the strongest models make mistakes resembling student-level misconceptions derived from misunderstandings. These findings highlight the substantial gap between current LLM capabilities and the requirements for dependable AI-assisted academic verification. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: work in progress

arXiv:2504.14817 [pdf]

DNN based HRIRs Identification with a Continuously Rotating Speaker Array

Authors: Byeong-Yun Ko, Deokki Min, Hyeonuk Nam, Yong-Hwa Park

Abstract: Conventional static measurement of head-related impulse responses (HRIRs) is time-consuming due to the need for repositioning a speaker array for each azimuth angle. Dynamic approaches using analytical models with a continuously rotating speaker array have been proposed, but their accuracy is significantly reduced at high rotational speeds. To address this limitation, we propose a DNN-based HRIRs… ▽ More Conventional static measurement of head-related impulse responses (HRIRs) is time-consuming due to the need for repositioning a speaker array for each azimuth angle. Dynamic approaches using analytical models with a continuously rotating speaker array have been proposed, but their accuracy is significantly reduced at high rotational speeds. To address this limitation, we propose a DNN-based HRIRs identification using sequence-to-sequence learning. The proposed DNN model incorporates fully connected (FC) networks to effectively capture HRIR transitions and includes reset and update gates to identify HRIRs over a whole sequence. The model updates the HRIRs vector coefficients based on the gradient of the instantaneous square error (ISE). Additionally, we introduce a learnable normalization process based on the speaker excitation signals to stabilize the gradient scale of ISE across time. A training scheme, referred to as whole-sequence updating and optimization scheme, is also introduced to prevent overfitting. We evaluated the proposed method through simulations and experiments. Simulation results using the FABIAN database show that the proposed method outperforms previous analytic models, achieving over 7 dB improvement in normalized misalignment (NM) and maintaining log spectral distortion (LSD) below 2 dB at a rotational speed of 45°/s. Experimental results with a custom-built speaker array confirm that the proposed method successfully preserved accurate sound localization cues, consistent with those from static measurement. Source code is available at https://github.com/byko0810/DNN-based-HRIRs-identification △ Less

Submitted 20 April, 2025; originally announced April 2025.

arXiv:2504.12670 [pdf, other]

Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection

Authors: Hyeonuk Nam, Yong-Hwa Park

Abstract: Recent advances in deep learning, particularly frequency dynamic convolution (FDY conv), have significantly improved sound event detection (SED) by enabling frequency-adaptive feature extraction. However, FDY conv relies on temporal average pooling, which treats all temporal frames equally, limiting its ability to capture transient sound events such as alarm bells, door knocks, and speech plosives… ▽ More Recent advances in deep learning, particularly frequency dynamic convolution (FDY conv), have significantly improved sound event detection (SED) by enabling frequency-adaptive feature extraction. However, FDY conv relies on temporal average pooling, which treats all temporal frames equally, limiting its ability to capture transient sound events such as alarm bells, door knocks, and speech plosives. To address this limitation, we propose temporal attention pooling frequency dynamic convolution (TFD conv) to replace temporal average pooling with temporal attention pooling (TAP). TAP adaptively weights temporal features through three complementary mechanisms: time attention pooling (TA) for emphasizing salient features, velocity attention pooling (VA) for capturing transient changes, and conventional average pooling for robustness to stationary signals. Ablation studies show that TFD conv improves average PSDS1 by 3.02% over FDY conv with only a 14.8% increase in parameter count. Classwise ANOVA and Tukey HSD analysis further demonstrate that TFD conv significantly enhances detection performance for transient-heavy events, outperforming existing FDY conv models. Notably, TFD conv achieves a maximum PSDS1 score of 0.456, surpassing previous state-of-the-art SED systems. We also explore the compatibility of TAP with other FDY conv variants, including dilated FDY conv (DFD conv), partial FDY conv (PFD conv), and multi-dilated FDY conv (MDFD conv). Among these, the integration of TAP with MDFD conv achieves the best result with a PSDS1 score of 0.459, validating the complementary strengths of temporal attention and multi-scale frequency adaptation. These findings establish TFD conv as a powerful and generalizable framework for enhancing both transient sensitivity and overall feature robustness in SED. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.00600 [pdf, other]

De Sitter vacuum on nucleated D-branes with stringy corrections

Authors: Cao H. Nam

Abstract: We find four-dimensional de Sitter (dS) vacuum solutions on probe D-branes which nucleate in an asymptotically $\text{AdS}_5\times T^{1,1}$ background, including stringy corrections. A sufficiently high chemical potential induced by the wrapped D3-brane charge, breaking supersymmetry in the bulk, is essential to lead to the nucleation of the probe D-brane. We show that stringy corrections can yiel… ▽ More We find four-dimensional de Sitter (dS) vacuum solutions on probe D-branes which nucleate in an asymptotically $\text{AdS}_5\times T^{1,1}$ background, including stringy corrections. A sufficiently high chemical potential induced by the wrapped D3-brane charge, breaking supersymmetry in the bulk, is essential to lead to the nucleation of the probe D-brane. We show that stringy corrections can yield a dS vacuum on a spherical D3-brane consistent with the observations without being fine-tuned. Motivated by the fact that the matter fields propagating in the compact extra dimensions can provide solutions for problems in particle physics and cosmology, we also construct a dS vacuum on a probe D5-brane which wraps on a two-torus of the internal manifold $T^{1,1}$. This construction requires turning on a worldvolume U(1) gauge field along the wrapped part of the D5-brane and dissolving some D3-branes in it. The stringy corrections play an important role in yielding a dS vacuum and the field strength of the U(1) gauge field must be sufficiently large to produce a tiny cosmological constant. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: 20 pages, 1 figure

arXiv:2503.19373 [pdf, other]

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

Authors: Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee

Abstract: Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challe… ▽ More Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challenging to infer accurate geometries and textures. Moreover, while recent 3D human reconstruction methods have achieved impressive results using text-to-image diffusion models, directly applying such an approach to this problem often leads to incorrect guidance, particularly in reconstructing 3D cloth. To address these challenges, we propose two core designs in our framework. First, to alleviate the occlusion issue, we leverage 3D template models of cloth and human body as regularizations, which provide strong geometric priors to prevent erroneous reconstruction by the occlusion. Second, we introduce a cloth diffusion model specifically designed to provide contextual information about cloth appearance, thereby enhancing the reconstruction of 3D cloth. Qualitative and quantitative experiments demonstrate that our proposed approach is highly effective in reconstructing both 3D cloth and the human body. More qualitative results are provided at https://hygenie1228.github.io/DeClotH/. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: Published at CVPR 2025, 17 pages including the supplementary material

arXiv:2503.15879 [pdf, other]

Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering

Authors: DongGeon Lee, Ahjeong Park, Hyeri Lee, Hyeonseo Nam, Yunho Maeng

Abstract: Non-factoid question-answering (NFQA) poses a significant challenge due to its open-ended nature, diverse intents, and the need for multi-aspect reasoning, which renders conventional factoid QA approaches, including retrieval-augmented generation (RAG), inadequate. Unlike factoid questions, non-factoid questions (NFQs) lack definitive answers and require synthesizing information from multiple sour… ▽ More Non-factoid question-answering (NFQA) poses a significant challenge due to its open-ended nature, diverse intents, and the need for multi-aspect reasoning, which renders conventional factoid QA approaches, including retrieval-augmented generation (RAG), inadequate. Unlike factoid questions, non-factoid questions (NFQs) lack definitive answers and require synthesizing information from multiple sources across various reasoning dimensions. To address these limitations, we introduce Typed-RAG, a type-aware multi-aspect decomposition framework within the RAG paradigm for NFQA. Typed-RAG classifies NFQs into distinct types -- such as debate, experience, and comparison -- and applies aspect-based decomposition to refine retrieval and generation strategies. By decomposing multi-aspect NFQs into single-aspect sub-queries and aggregating the results, Typed-RAG generates more informative and contextually relevant responses. To evaluate Typed-RAG, we introduce Wiki-NFQA, a benchmark dataset covering diverse NFQ types. Experimental results demonstrate that Typed-RAG outperforms baselines, thereby highlighting the importance of type-aware decomposition for effective retrieval and generation in NFQA. Our code and dataset are available at https://github.com/TeamNLP/Typed-RAG. △ Less

Submitted 21 March, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

Comments: Accepted to NAACL 2025 SRW

arXiv:2503.15855 [pdf, other]

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

Authors: Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim

Abstract: We propose VideoRFSplat, a direct text-to-3D model leveraging a video generation model to generate realistic 3D Gaussian Splatting (3DGS) for unbounded real-world scenes. To generate diverse camera poses and unbounded spatial extent of real-world scenes, while ensuring generalization to arbitrary text prompts, previous methods fine-tune 2D generative models to jointly model camera poses and multi-… ▽ More We propose VideoRFSplat, a direct text-to-3D model leveraging a video generation model to generate realistic 3D Gaussian Splatting (3DGS) for unbounded real-world scenes. To generate diverse camera poses and unbounded spatial extent of real-world scenes, while ensuring generalization to arbitrary text prompts, previous methods fine-tune 2D generative models to jointly model camera poses and multi-view images. However, these methods suffer from instability when extending 2D generative models to joint modeling due to the modality gap, which necessitates additional models to stabilize training and inference. In this work, we propose an architecture and a sampling strategy to jointly model multi-view images and camera poses when fine-tuning a video generation model. Our core idea is a dual-stream architecture that attaches a dedicated pose generation model alongside a pre-trained video generation model via communication blocks, generating multi-view images and camera poses through separate streams. This design reduces interference between the pose and image modalities. Additionally, we propose an asynchronous sampling strategy that denoises camera poses faster than multi-view images, allowing rapidly denoised poses to condition multi-view generation, reducing mutual ambiguity and enhancing cross-modal consistency. Trained on multiple large-scale real-world datasets (RealEstate10K, MVImgNet, DL3DV-10K, ACID), VideoRFSplat outperforms existing text-to-3D direct generation methods that heavily depend on post-hoc refinement via score distillation sampling, achieving superior results without such refinement. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: Project page: https://gohyojun15.github.io/VideoRFSplat/

arXiv:2503.12024 [pdf, other]

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Authors: Byeongjun Park, Hyojun Go, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim

Abstract: Recent progress in 3D/4D scene generation emphasizes the importance of physical alignment throughout video generation and scene reconstruction. However, existing methods improve the alignment separately at each stage, making it difficult to manage subtle misalignments arising from another stage. Here, we present SteerX, a zero-shot inference-time steering method that unifies scene reconstruction i… ▽ More Recent progress in 3D/4D scene generation emphasizes the importance of physical alignment throughout video generation and scene reconstruction. However, existing methods improve the alignment separately at each stage, making it difficult to manage subtle misalignments arising from another stage. Here, we present SteerX, a zero-shot inference-time steering method that unifies scene reconstruction into the generation process, tilting data distributions toward better geometric alignment. To this end, we introduce two geometric reward functions for 3D/4D scene generation by using pose-free feed-forward scene reconstruction models. Through extensive experiments, we demonstrate the effectiveness of SteerX in improving 3D/4D scene generation. △ Less

Submitted 15 March, 2025; originally announced March 2025.

Comments: Project page: https://byeongjun-park.github.io/SteerX/

arXiv:2503.11020 [pdf, other]

Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching

Authors: Ruochen Hou, Mingzhang Zhu, Hyunwoo Nam, Gabriel I. Fernandez, Dennis W. Hong

Abstract: Accurate robot localization is essential for effective operation. Monte Carlo Localization (MCL) is commonly used with known maps but is computationally expensive due to landmark matching for each particle. Humanoid robots face additional challenges, including sensor noise from locomotion vibrations and a limited field of view (FOV) due to camera placement. This paper proposes a fast and robust lo… ▽ More Accurate robot localization is essential for effective operation. Monte Carlo Localization (MCL) is commonly used with known maps but is computationally expensive due to landmark matching for each particle. Humanoid robots face additional challenges, including sensor noise from locomotion vibrations and a limited field of view (FOV) due to camera placement. This paper proposes a fast and robust localization method via iterative landmark matching (ILM) for humanoid robots. The iterative matching process improves the accuracy of the landmark association so that it does not need MCL to match landmarks to particles. Pose estimation with the outlier removal process enhances its robustness to measurement noise and faulty detections. Furthermore, an additional filter can be utilized to fuse inertial data from the inertial measurement unit (IMU) and pose data from localization. We compared ILM with Iterative Closest Point (ICP), which shows that ILM method is more robust towards the error in the initial guess and easier to get a correct matching. We also compared ILM with the Augmented Monte Carlo Localization (aMCL), which shows that ILM method is much faster than aMCL and even more accurate. The proposed method's effectiveness is thoroughly evaluated through experiments and validated on the humanoid robot ARTEMIS during RoboCup 2024 adult-sized soccer competition. △ Less

Submitted 16 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

arXiv:2503.02528 [pdf, ps, other]

Prospects for Pentaquark Baryon Search with the Upgraded LEPS2 Facility

Authors: T. Nakano, S. Ajimura, Y. Asano, S. Dat'e, T. Hashimoto, A. Higashi, T. Hotta, T. Ishikawa, H. Katsuragawa, R. Kobayakawa, H. Kohri, K. Mizutani, Y. Ohashi, H. Ohkuma, S. Y. Ryu, S. Suzuki, S. Tanaka, K. Watanabe, B. Yan, T. Yorita, M. Yosoi, G. Kojima, M. Miyabe, N. Muramatsu, H. Ohnishi , et al. (11 additional authors not shown)

Abstract: We present prospects for the $Θ^+$ pentaquark baryon search using the newly constructed LEPS2 facility at SPring-8. The LEPS2 detector system features significant improvements in acceptance for multi-particle final states compared to previous experiments. Our search employs two complementary strategies: direct production in the $γn \to K^-Θ^+$ reaction using a liquid deuterium target with a photon… ▽ More We present prospects for the $Θ^+$ pentaquark baryon search using the newly constructed LEPS2 facility at SPring-8. The LEPS2 detector system features significant improvements in acceptance for multi-particle final states compared to previous experiments. Our search employs two complementary strategies: direct production in the $γn \to K^-Θ^+$ reaction using a liquid deuterium target with a photon beam up to 2.4 GeV, and $\bar{K}^{*0}$-associated $Θ^+$ production using a liquid hydrogen target with a photon beam up to 2.9 GeV. The extended acceptance covers both forward and large angle regions, effectively spanning the kinematic regions explored by previous LEPS and CLAS experiments. The large acceptance and improved resolution of LEPS2, combined with these complementary approaches, provide unprecedented sensitivity for establishing the existence of the $Θ^+$ or placing definitive upper limits on its production. △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: to be published on Acta Physica Polonica B 56 (2025)

Report number: RCNP-Ex25001

arXiv:2502.20857 [pdf, other]

JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection

Authors: Hyeonuk Nam, Yong-Hwa Park

Abstract: Sound event detection (SED) has significantly benefited from self-supervised learning (SSL) approaches, particularly masked audio transformer for SED (MAT-SED), which leverages masked block prediction to reconstruct missing audio segments. However, while effective in capturing global dependencies, masked block prediction disrupts transient sound events and lacks explicit enforcement of temporal or… ▽ More Sound event detection (SED) has significantly benefited from self-supervised learning (SSL) approaches, particularly masked audio transformer for SED (MAT-SED), which leverages masked block prediction to reconstruct missing audio segments. However, while effective in capturing global dependencies, masked block prediction disrupts transient sound events and lacks explicit enforcement of temporal order, making it less suitable for fine-grained event boundary detection. To address these limitations, we propose JiTTER (Jigsaw Temporal Transformer for Event Reconstruction), an SSL framework designed to enhance temporal modeling in transformer-based SED. JiTTER introduces a hierarchical temporal shuffle reconstruction strategy, where audio sequences are randomly shuffled at both the block-level and frame-level, forcing the model to reconstruct the correct temporal order. This pretraining objective encourages the model to learn both global event structures and fine-grained transient details, improving its ability to detect events with sharp onset-offset characteristics. Additionally, we incorporate noise injection during block shuffle, providing a subtle perturbation mechanism that further regularizes feature learning and enhances model robustness. Experimental results on the DESED dataset demonstrate that JiTTER outperforms MAT-SED, achieving a 5.89% improvement in PSDS, highlighting the effectiveness of explicit temporal reasoning in SSL-based SED. Our findings suggest that structured temporal reconstruction tasks, rather than simple masked prediction, offer a more effective pretraining paradigm for sound event representation learning. △ Less

Submitted 28 February, 2025; originally announced February 2025.

arXiv:2502.07208 [pdf]

Towards Understanding of Frequency Dependence on Sound Event Detection

Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Yong-Hwa Park

Abstract: In this work, various analysis methods are conducted on frequency-dependent methods on SED to further delve into their detailed characteristics and behaviors on SED. While SED has been rapidly advancing through the adoption of various deep learning techniques from other pattern recognition fields, these techniques are often not suitable for SED. To address this issue, two frequency-dependent SED m… ▽ More In this work, various analysis methods are conducted on frequency-dependent methods on SED to further delve into their detailed characteristics and behaviors on SED. While SED has been rapidly advancing through the adoption of various deep learning techniques from other pattern recognition fields, these techniques are often not suitable for SED. To address this issue, two frequency-dependent SED methods were previously proposed: FilterAugment, a data augmentation randomly weighting frequency bands, and frequency dynamic convolution (FDY Conv), an architecture applying frequency adaptive convolution kernels. These methods have demonstrated superior performance in SED, and we aim to further analyze their detailed effectiveness and characteristics in SED. We compare class-wise performance to find out specific pros and cons of FilterAugment and FDY Conv. We apply Gradient-weighted Class Activation Mapping (Grad-CAM), which highlights time-frequency region that is more inferred by the model, on SED models with and without frequency masking and two types of FilterAugment to observe their detailed characteristics. We propose simpler frequency dependent convolution methods and compare them with FDY Conv to further understand which components of FDY Conv affects SED performance. Lastly, we apply PCA to show how FDY Conv adapts dynamic kernel across frequency dimensions on different sound event classes. The results and discussions demonstrate that frequency dependency plays a significant role in sound event detection and further confirms the effectiveness of frequency dependent methods on SED. △ Less

Submitted 10 February, 2025; originally announced February 2025.

arXiv:2412.20638 [pdf, other]

Predicting Long Term Sequential Policy Value Using Softer Surrogates

Authors: Hyunji Nam, Allen Nie, Ge Gao, Vasilis Syrgkanis, Emma Brunskill

Abstract: Off-policy policy evaluation (OPE) estimates the outcome of a new policy using historical data collected from a different policy. However, existing OPE methods cannot handle cases when the new policy introduces novel actions. This issue commonly occurs in real-world domains, like healthcare, as new drugs and treatments are continuously developed. Novel actions necessitate on-policy data collection… ▽ More Off-policy policy evaluation (OPE) estimates the outcome of a new policy using historical data collected from a different policy. However, existing OPE methods cannot handle cases when the new policy introduces novel actions. This issue commonly occurs in real-world domains, like healthcare, as new drugs and treatments are continuously developed. Novel actions necessitate on-policy data collection, which can be burdensome and expensive if the outcome of interest takes a substantial amount of time to observe--for example, in multi-year clinical trials. This raises a key question of how to predict the long-term outcome of a policy after only observing its short-term effects? Though in general this problem is intractable, under some surrogacy conditions, the short-term on-policy data can be combined with the long-term historical data to make accurate predictions about the new policy's long-term value. In two simulated healthcare examples--HIV and sepsis management--we show that our estimators can provide accurate predictions about the policy value only after observing 10\% of the full horizon data. We also provide finite sample analysis of our doubly robust estimators. △ Less

Submitted 2 February, 2025; v1 submitted 29 December, 2024; originally announced December 2024.

Comments: 24 pages, 1 figure

arXiv:2412.13648 [pdf, ps, other]

Model-independent measurement of isospin diffusion in Ni-Ni systems at intermediate energy

Authors: C. Ciampi, J. D. Frankland, D. Gruyer, N. Le Neindre, S. Mallik, R. Bougault, A. Chbihi, L. Baldesi, S. Barlini, E. Bonnet, B. Borderie, A. Camaiani, G. Casini, I. Dekhissi, D. Dell'Aquila, J. A. Dueñas, Q. Fable, F. Gramegna, C. Gouyet, M. Henri, B. Hong, S. Kim, A. Kordyasz, T. Kozik, M. J. Kweon , et al. (16 additional authors not shown)

Abstract: In this work we provide a model-independent experimental evaluation of the degree of isospin equilibration taking place in $^{58,64}$Ni+$^{58,64}$Ni collisions at 32 MeV/nucleon across varying reaction centralities. This result has been obtained by combining the complementary information provided by two different datasets, sharing common characteristics. The first dataset has been acquired with th… ▽ More In this work we provide a model-independent experimental evaluation of the degree of isospin equilibration taking place in $^{58,64}$Ni+$^{58,64}$Ni collisions at 32 MeV/nucleon across varying reaction centralities. This result has been obtained by combining the complementary information provided by two different datasets, sharing common characteristics. The first dataset has been acquired with the INDRA setup and has been used to implement a model-independent reconstruction of the impact parameter. The second dataset has been acquired in the first experimental campaign of the coupled INDRA-FAZIA apparatus at GANIL. The neutron-to-proton content of the quasiprojectile remnant measured by FAZIA has been employed as isospin observable. The effect of isospin diffusion has been evidenced by means of the isospin transport ratio, reported as a function of the impact parameter of the collision. The evolution towards isospin equilibration from semiperipheral to more central collisions is clearly extracted. This experimental result, expanding our previous works (Phys. Rev. C 106, 024603 (2022) and Phys. Rev. C 108, 054611 (2023)), can be compared with the predictions of any transport model, and can thus be used to set constraints on the behavior of the symmetry energy term of the nuclear Equation of State at sub- to saturation densities. △ Less

Submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.12306 [pdf, ps, other]

Ultra-wideband Double-Directionally Resolved Channel Measurements of Line-of-Sight Microcellular Scenarios in the Upper Mid-band

Authors: Naveed A. Abbasi, Kelvin Arana, Jorge Gomez-Ponce, Tathagat Pal, Vikram Vasudevan, Atulya Bist, Omer Gokalp Serbetci, Young Han Nam, Charlie Zhang, Andreas F. Molisch

Abstract: The growing demand for higher data rates and expanded bandwidth is driving the exploration of new frequency ranges, including the upper mid-band spectrum (6-24 GHz), which is a promising candidate for future Frequency Range 3 (FR3) applications. This paper presents ultra-wideband double-directional channel measurements in line-of-sight microcellular scenarios within the upper mid-band spectrum (6-… ▽ More The growing demand for higher data rates and expanded bandwidth is driving the exploration of new frequency ranges, including the upper mid-band spectrum (6-24 GHz), which is a promising candidate for future Frequency Range 3 (FR3) applications. This paper presents ultra-wideband double-directional channel measurements in line-of-sight microcellular scenarios within the upper mid-band spectrum (6-18 GHz). Conducted in an urban street canyon environment, these measurements explore key channel characteristics such as power delay profiles, angular power spectra, path loss, delay spread, and angular spread to provide insights essential for robust communication system design. Our results reveal that path loss values for both omni-directional and best beam configurations are lower than free-space predictions due to multipath contributions from the environment. Analysis also indicates a high degree of stability in delay spread and angular spread across the entire band, with small variation between sub-bands. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2412.08662 [pdf, other]

doi 10.1088/1748-0221/19/12/P12008

Performance of the prototype beam drift chamber for LAMPS at RAON with proton and Carbon-12 beams

Authors: H. Kim, Y. Bae, C. Heo, J. Seo, J. Hwang, D. H. Moon, D. S. Ahn, J. K. Ahn, J. Bae, J. Bok, Y. Cheon, S. W. Choi, S. Do, B. Hong, S. -W. Hong, J. Huh, S. Hwang, Y. Jang, B. Kang, A. Kim, B. Kim, C. Kim, E. -J. Kim, G. Kim, G. Kim , et al. (23 additional authors not shown)

Abstract: Beam Drift Chamber (BDC) is designed to reconstruct the trajectories of incident rare isotope beams provided by RAON (Rare isotope Accelerator complex for ON-line experiments) into the experimental target of LAMPS (Large Acceptance Multi-Purpose Spectrometer). To conduct the performance test of the BDC, the prototype BDC (pBDC) is manufactured and evaluated with the high energy ion beams from HIMA… ▽ More Beam Drift Chamber (BDC) is designed to reconstruct the trajectories of incident rare isotope beams provided by RAON (Rare isotope Accelerator complex for ON-line experiments) into the experimental target of LAMPS (Large Acceptance Multi-Purpose Spectrometer). To conduct the performance test of the BDC, the prototype BDC (pBDC) is manufactured and evaluated with the high energy ion beams from HIMAC (Heavy Ion Medical Accelerator in Chiba) facility in Japan. Two kinds of ion beams, 100 MeV proton, and 200 MeV/u $^{12}$C, have been utilized for this evaluation, and the track reconstruction efficiency and position resolution have been measured as the function of applied high voltage. This paper introduces the construction details and presents the track reconstruction efficiency and position resolution of pBDC. △ Less

Submitted 6 December, 2024; originally announced December 2024.

Comments: 13 pages, 15 figures

Journal ref: JINST 19 (2024) P12008

arXiv:2411.19341 [pdf, other]

An Adversarial Learning Approach to Irregular Time-Series Forecasting

Authors: Heejeong Nam, Jihyun Kim, Jimin Yeom

Abstract: Forecasting irregular time series presents significant challenges due to two key issues: the vulnerability of models to mean regression, driven by the noisy and complex nature of the data, and the limitations of traditional error-based evaluation metrics, which fail to capture meaningful patterns and penalize unrealistic forecasts. These problems result in forecasts that often misalign with human… ▽ More Forecasting irregular time series presents significant challenges due to two key issues: the vulnerability of models to mean regression, driven by the noisy and complex nature of the data, and the limitations of traditional error-based evaluation metrics, which fail to capture meaningful patterns and penalize unrealistic forecasts. These problems result in forecasts that often misalign with human intuition. To tackle these challenges, we propose an adversarial learning framework with a deep analysis of adversarial components. Specifically, we emphasize the importance of balancing the modeling of global distribution (overall patterns) and transition dynamics (localized temporal changes) to better capture the nuances of irregular time series. Overall, this research provides practical insights for improving models and evaluation metrics, and pioneers the application of adversarial learning in the domian of irregular time-series forecasting. △ Less

Submitted 28 November, 2024; originally announced November 2024.

Comments: Accepted to AdvML-Frontiers Workshop @ NeurIPS 2024

arXiv:2411.15540 [pdf, other]

Optical-Flow Guided Prompt Optimization for Coherent Video Generation

Authors: Hyelin Nam, Jaemin Kim, Dohun Lee, Jong Chul Ye

Abstract: While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output quality during inference; however, applying these methods to video diffusion models introduces additional complexity of handling computations across entire sequences.… ▽ More While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output quality during inference; however, applying these methods to video diffusion models introduces additional complexity of handling computations across entire sequences. To address this, we propose a novel framework called MotionPrompt that guides the video generation process via optical flow. Specifically, we train a discriminator to distinguish optical flow between random pairs of frames from real videos and generated ones. Given that prompts can influence the entire video, we optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs. This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content. We demonstrate the effectiveness of our approach across various models. △ Less

Submitted 23 March, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

Comments: CVPR 2025 (poster); project page: https://motionprompt.github.io/

arXiv:2411.14137 [pdf, other]

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Authors: Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu

Abstract: Human communication often relies on visual cues to resolve ambiguity. While humans can intuitively integrate these cues, AI systems often find it challenging to engage in sophisticated multimodal reasoning. We introduce VAGUE, a benchmark evaluating multimodal AI systems' ability to integrate visual context for intent disambiguation. VAGUE consists of 1.6K ambiguous textual expressions, each paire… ▽ More Human communication often relies on visual cues to resolve ambiguity. While humans can intuitively integrate these cues, AI systems often find it challenging to engage in sophisticated multimodal reasoning. We introduce VAGUE, a benchmark evaluating multimodal AI systems' ability to integrate visual context for intent disambiguation. VAGUE consists of 1.6K ambiguous textual expressions, each paired with an image and multiple-choice interpretations, where the correct answer is only apparent with visual context. The dataset spans both staged, complex (Visual Commonsense Reasoning) and natural, personal (Ego4D) scenes, ensuring diversity. Our experiments reveal that existing multimodal AI models struggle to infer the speaker's true intent. While performance consistently improves from the introduction of more visual cues, the overall accuracy remains far below human performance, highlighting a critical gap in multimodal reasoning. Analysis of failure cases demonstrates that current models fail to distinguish true intent from superficial correlations in the visual scene, indicating that they perceive images but do not effectively reason with them. We release our code and data at https://github.com/Hazel-Heejeong-Nam/VAGUE.git. △ Less

Submitted 11 March, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: 31 pages

arXiv:2410.14902 [pdf, other]

Modeling and Analysis of Hybrid GEO-LEO Satellite Networks

Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

Abstract: As the number of low Earth orbit (LEO) satellites rapidly increases, the consideration of frequency sharing or cooperation between geosynchronous Earth orbit (GEO) and LEO satellites is gaining attention. In this paper, we consider a hybrid GEO-LEO satellite network where GEO and LEO satellites are distributed according to independent Poisson point processes (PPPs) and share the same frequency res… ▽ More As the number of low Earth orbit (LEO) satellites rapidly increases, the consideration of frequency sharing or cooperation between geosynchronous Earth orbit (GEO) and LEO satellites is gaining attention. In this paper, we consider a hybrid GEO-LEO satellite network where GEO and LEO satellites are distributed according to independent Poisson point processes (PPPs) and share the same frequency resources. Based on the properties of PPPs, we first analyze satellite-visible probabilities, distance distributions, and association probabilities. Then, we derive an analytical expression for the network's coverage probability. Through Monte Carlo simulations, we verify the analytical results and demonstrate the impact of system parameters on coverage performance. The analytical results effectively estimate the coverage performance in scenarios where GEO and LEO satellites cooperate or share the same resource. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: 5 pages, 4 figures, 1 table, submitted to IEEE Transactions on Vehicular Technology

arXiv:2409.06393 [pdf, other]

doi 10.1140/epjc/s10052-025-13743-8

Scoto-seesaw model implied by flavor-dependent Abelian gauge charge

Authors: Duong Van Loi, N. T. Duy, Cao H. Nam, Phung Van Dong

Abstract: Assuming fundamental fermions possess a new Abelian gauge charge that depends on flavors of both quark and lepton, we obtain a simple extension of the Standard Model, which reveals some new physics insights. The new gauge charge anomaly cancellation not only explains the existence of just three fermion generations as observed but also requires the presence of a unique right-handed neutrino $ν_R$ w… ▽ More Assuming fundamental fermions possess a new Abelian gauge charge that depends on flavors of both quark and lepton, we obtain a simple extension of the Standard Model, which reveals some new physics insights. The new gauge charge anomaly cancellation not only explains the existence of just three fermion generations as observed but also requires the presence of a unique right-handed neutrino $ν_R$ with a non-zero new gauge charge. Further, the new gauge charge breaking supplies a residual matter parity, under which the fundamental fermions and $ν_R$ are even, whereas a right-handed neutrino $N_R$ without the new charge is odd. Consequently, light neutrino masses in our model are generated from the tree-level type-I seesaw mechanism induced by $ν_R$ and from the one-loop scotogenic contribution accommodated by potential dark matter candidates, $N_R$ and dark scalars, odd under the matter parity. We examine new physics phenomena related to the additional gauge boson, which could be observed at colliders. We analyze the constraints imposed on our model by current experimental limits on neutrino masses, neutral meson oscillations, $B$-meson decays, and charged lepton flavor violating processes. We also investigate the potential dark matter candidates by considering relic density and direct detection. △ Less

Submitted 2 February, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: 41 pages, 10 figures, 5 tables; revised version, published in EPJC

Journal ref: Eur. Phys. J. C 85 (2025) 109

arXiv:2408.01040 [pdf, other]

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

Authors: Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

Abstract: In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private… ▽ More In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 23 pages, 11 figures, 8 tables, to be published in Transactions on Machine Learning Research (TMLR)

arXiv:2407.21410 [pdf, other]

Brane-vector dark matter and its connection to inflation and primordial gravitational waves

Authors: Cao H. Nam, Tran N. Hung

Abstract: The scalar mode describing the fluctuation of the 3-brane (the observable universe) in a five-dimensional bulk spacetime compactified on a circle is absorbed by the Kaluza-Klein U(1) gauge field, leading to a massive brane-vector living on the 3-brane. The brane-vector can be responsible for dark matter because it is odd under a $\mathrm{Z}_2$ symmetry, neutral under the Standard Model (SM) symmet… ▽ More The scalar mode describing the fluctuation of the 3-brane (the observable universe) in a five-dimensional bulk spacetime compactified on a circle is absorbed by the Kaluza-Klein U(1) gauge field, leading to a massive brane-vector living on the 3-brane. The brane-vector can be responsible for dark matter because it is odd under a $\mathrm{Z}_2$ symmetry, neutral under the Standard Model (SM) symmetries, and couples extremely weak to the SM particles due to its gravitational origin. Interestingly, the brane-vector dark matter could leave particular imprints on the cosmic microwave background (CMB) and the primordial gravitational waves. Hence, the precise measurements of the CMB and the observations of the primordial gravitational waves generated during the inflation can provide a potential way to probe the extra-dimensions and branes which are the main ingredients of string/M theory. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 20 pages, 6 figures

arXiv:2407.09122 [pdf, other]

Topological equivalence and phase transition rate in holographic thermodynamics of regularized Maxwell theory

Authors: Tran N. Hung, Cao H. Nam

Abstract: Utilizing the holographic dictionary from the proposal that treats Newton's constant as a thermodynamic variable, we establish a thermodynamic topological equivalence between the AdS black holes in the bulk and the thermal states in the dual CFT. The findings further reveal that the thermodynamic topological characteristics of the RegMax AdS black holes are strongly influenced by the characteristi… ▽ More Utilizing the holographic dictionary from the proposal that treats Newton's constant as a thermodynamic variable, we establish a thermodynamic topological equivalence between the AdS black holes in the bulk and the thermal states in the dual CFT. The findings further reveal that the thermodynamic topological characteristics of the RegMax AdS black holes are strongly influenced by the characteristic parameter of the regularized Maxwell theory. Additionally, we investigate the phase transition between low and high entropy thermal states within a canonical ensemble in the dual CFT. Our observations indicate that the phase transition behavior of the thermal states mirrors that of the black holes. By modeling the phase transition process as a stochastic process, we are able to calculate the rates of phase transition between the thermal states. This result enhances our understanding of the dominant processes involved in the phase transition of the thermal states in the dual CFT. △ Less

Submitted 20 August, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: 18 pages, 7 figures

arXiv:2407.08073 [pdf, other]

NDST: Neural Driving Style Transfer for Human-Like Vision-Based Autonomous Driving

Authors: Donghyun Kim, Aws Khalil, Haewoon Nam, Jaerock Kwon

Abstract: Autonomous Vehicles (AV) and Advanced Driver Assistant Systems (ADAS) prioritize safety over comfort. The intertwining factors of safety and comfort emerge as pivotal elements in ensuring the effectiveness of Autonomous Driving (AD). Users often experience discomfort when AV or ADAS drive the vehicle on their behalf. Providing a personalized human-like AD experience, tailored to match users' uniqu… ▽ More Autonomous Vehicles (AV) and Advanced Driver Assistant Systems (ADAS) prioritize safety over comfort. The intertwining factors of safety and comfort emerge as pivotal elements in ensuring the effectiveness of Autonomous Driving (AD). Users often experience discomfort when AV or ADAS drive the vehicle on their behalf. Providing a personalized human-like AD experience, tailored to match users' unique driving styles while adhering to safety prerequisites, presents a significant opportunity to boost the acceptance of AVs. This paper proposes a novel approach, Neural Driving Style Transfer (NDST), inspired by Neural Style Transfer (NST), to address this issue. NDST integrates a Personalized Block (PB) into the conventional Baseline Driving Model (BDM), allowing for the transfer of a user's unique driving style while adhering to safety parameters. The PB serves as a self-configuring system, learning and adapting to an individual's driving behavior without requiring modifications to the BDM. This approach enables the personalization of AV models, aligning the driving style more closely with user preferences while ensuring baseline safety critical actuation. Two contrasting driving styles (Style A and Style B) were used to validate the proposed NDST methodology, demonstrating its efficacy in transferring personal driving styles to the AV system. Our work highlights the potential of NDST to enhance user comfort in AVs by providing a personalized and familiar driving experience. The findings affirm the feasibility of integrating NDST into existing AV frameworks to bridge the gap between safety and individualized driving styles, promoting wider acceptance and improved user experiences. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 11 figures

arXiv:2407.03674 [pdf, other]

Short-Long Policy Evaluation with Novel Actions

Authors: Hyunji Alex Nam, Yash Chandak, Emma Brunskill

Abstract: From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this innovation cycle is the amount of time it takes to observe the downstream effects of a decision policy that incorporates new interventions. The key qu… ▽ More From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this innovation cycle is the amount of time it takes to observe the downstream effects of a decision policy that incorporates new interventions. The key question is whether we can quickly evaluate long-term outcomes of a new decision policy without making long-term observations. Organizations often have access to prior data about past decision policies and their outcomes, evaluated over the full horizon of interest. Motivated by this, we introduce a new setting for short-long policy evaluation for sequential decision making tasks. Our proposed methods significantly outperform prior results on simulators of HIV treatment, kidney dialysis and battery charging. We also demonstrate that our methods can be useful for applications in AI safety by quickly identifying when a new decision policy is likely to have substantially lower performance than past policies. △ Less

Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Added references for related work

arXiv:2406.15725 [pdf, other]

Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

Abstract: To tackle sound event detection (SED), we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilat… ▽ More To tackle sound event detection (SED), we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilated frequency dynamic convolution (PDFD conv) or squeeze-and-Excitation (SE) with time-frame frequency-wise SE (tfwSE). To train MAESTRO labels with coarse temporal resolution, we applied max pooling on prediction for the MAESTRO dataset. Using best ensemble model, we applied self training to obtain pseudo label from DESED weak set, unlabeled set and AudioSet. AudioSet pseudo labels, filtered to focus on high-confidence labels, are used to train on DESED dataset only. We used change-detection-based sound event bounding boxes (cSEBBs) as post processing for ensemble models on self training and submission models. The resulting FreDNet was ranked 2nd in DCASE 2024 Challenge Task 4. △ Less

Submitted 19 September, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: DCASE 2024 Challenge Task 4 technical report, DCASE 2024 Workshop accepted

arXiv:2406.13312 [pdf, other]

Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

Authors: Hyeonuk Nam, Yong-Hwa Park

Abstract: Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates outputs by conventional 2D convolution and FDY conv as static and dynamic branches respectively. PFD-CRNN with proport… ▽ More Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates outputs by conventional 2D convolution and FDY conv as static and dynamic branches respectively. PFD-CRNN with proportion of dynamic branch output as one eighth reduces 51.9% of parameters from FDY-CRNN while retaining the performance. Additionally, we propose multi-dilated frequency dynamic convolution (MDFD conv), which integrates multiple dilated frequency dynamic convolution (DFD conv) branches with different dilation size sets and a static branch within a single convolution layer. Resulting best MDFD-CRNN with five non-dilated FDY Conv branches, three differently dilated DFD Conv branches and a static branch achieved 3.17% improvement in polyphonic sound detection score (PSDS) over FDY conv without class-wise median filter. Application of sound event bounding box as post processing on best MDFD-CRNN achieved true PSDS1 of 0.485, which is the state-of-the-art score in DESED dataset without external dataset or pretrained model. From the results of extensive ablation studies, we discovered that not only multiple dynamic branches but also specific proportion of static branch helps SED. In addition, non-dilated dynamic branches are necessary in addition to dilated dynamic branches in order to obtain optimal SED performance. The results and discussions on ablation studies further enhance understanding and usability of FDY conv variants. △ Less

Submitted 19 September, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: Submitted to ICASSP 2025

arXiv:2406.08070 [pdf, other]

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are inherent limitations of diffusion models, this paper reveals that the problems actually stem from the off-manifold phenomenon associated with CFG, rather than the diffusion models themselves. More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. CFG++ features a surprisingly simple fix to CFG, yet it offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc. Furthermore, CFG++ enables seamless interpolation between unconditional and conditional sampling at lower guidance scales, consistently outperforming traditional CFG at all scales. Moreover, CFG++ can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models. Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance. Project Page: https://cfgpp-diffusion.github.io/. △ Less

Submitted 12 September, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 25 pages, 21 figures. Project Page: https://cfgpp-diffusion.github.io/

arXiv:2406.05341 [pdf, other]

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patterns span larger spectro-temporal range. Therefore, we propose dilated frequency dynamic convolution (DFD conv) which diversifies and expands frequency-adaptive kernels by introducing different dilation sizes to basis kernels. Experiments showed advantages of varying dilation sizes along frequency dimension, and analysis on attention weight variance proved dilated basis kernels are effectively diversified. By adapting class-wise median filter with intersection-based F1 score, proposed DFD-CRNN outperforms FDY-CRNN by 3.12% in terms of polyphonic sound detection score (PSDS). △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted to INTERSPEECH 2024

arXiv:2406.03494 [pdf, other]

Solving Poisson Equations using Neural Walk-on-Spheres

Authors: Hong Chul Nam, Julius Berner, Anima Anandkumar

Abstract: We propose Neural Walk-on-Spheres (NWoS), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations. Leveraging stochastic representations and Walk-on-Spheres methods, we develop novel losses for neural networks based on the recursive solution of Poisson equations on spheres inside the domain. The resulting method is highly parallelizable and does not require spati… ▽ More We propose Neural Walk-on-Spheres (NWoS), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations. Leveraging stochastic representations and Walk-on-Spheres methods, we develop novel losses for neural networks based on the recursive solution of Poisson equations on spheres inside the domain. The resulting method is highly parallelizable and does not require spatial gradients for the loss. We provide a comprehensive comparison against competing methods based on PINNs, the Deep Ritz method, and (backward) stochastic differential equations. In several challenging, high-dimensional numerical examples, we demonstrate the superiority of NWoS in accuracy, speed, and computational costs. Compared to commonly used PINNs, our approach can reduce memory usage and errors by orders of magnitude. Furthermore, we apply NWoS to problems in PDE-constrained optimization and molecular dynamics to show its efficiency in practical applications. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at ICML 2024

arXiv:2405.11094 [pdf, other]

YORI: Autonomous Cooking System Utilizing a Modular Robotic Kitchen and a Dual-Arm Proprioceptive Manipulator

Authors: Donghun Noh, Hyunwoo Nam, Kyle Gillespie, Yeting Liu, Dennis Hong

Abstract: This article introduces the development and implementation of the Yummy Operations Robot Initiative (YORI), an innovative, autonomous robotic cooking system. YORI marks a major advancement in culinary automation, adept at handling a diverse range of cooking tasks, capable of preparing multiple dishes simultaneously, and offering the flexibility to adapt to an extensive array of culinary activities… ▽ More This article introduces the development and implementation of the Yummy Operations Robot Initiative (YORI), an innovative, autonomous robotic cooking system. YORI marks a major advancement in culinary automation, adept at handling a diverse range of cooking tasks, capable of preparing multiple dishes simultaneously, and offering the flexibility to adapt to an extensive array of culinary activities. This versatility is achieved through the use of custom tools and appliances operated by a dual arm manipulator utilizing proprioceptive actuators. The use of proprioceptive actuators enables fast yet precise movements, while allowing for accurate force control and effectively mitigating the inevitable impacts encountered in cooking. These factors underscore this technology's boundless potential. A key to YORI's adaptability is its modular kitchen design, which allows for easy adaptations to accommodate a continuously increasing range of culinary tasks. This article provides a comprehensive look at YORI's design process, and highlights its role in revolutionizing the culinary world by enhancing efficiency, consistency, and versatility in food preparation. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: This manuscript is 13 pages long, includes 10 figures, and cites 20 references. It is to be submitted

arXiv:2405.02499 [pdf, other]

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

arXiv:2404.04819 [pdf, other]

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

Authors: Hyeongjin Nam, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between… ▽ More Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects. There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement. First, for accurate human-object contact estimation, CONTHO initially reconstructs 3D humans and objects and utilizes them as explicit 3D guidance for contact estimation. Second, to refine the initial reconstructions of 3D human and object, we propose a novel contact-based refinement Transformer that effectively aggregates human features and object features based on the estimated human-object contact. The proposed contact-based refinement prevents the learning of erroneous correlation between human and object, which enables accurate 3D reconstruction. As a result, our CONTHO achieves state-of-the-art performance in both human-object contact estimation and joint reconstruction of 3D human and object. The code is publicly available at https://github.com/dqj5182/CONTHO_RELEASE. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Published at CVPR 2024, 19 pages including the supplementary material

arXiv:2403.16652 [pdf, other]

Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

Authors: Osama Ahmad, Zawar Hussain, Hammad Naeem

Abstract: This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with c… ▽ More This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted in ICIESTR-2024

arXiv:2403.08322 [pdf, other]

Generalized free energy and thermodynamic phases of black holes in the gauged Kaluza-Klein theory

Authors: Tran N. Hung, Cao H. Nam

Abstract: In the context of the generalized (off-shell) free energy, we explore the phase emergence and corresponding phase transitions of charged dilaton $\text{AdS}$ black holes in the gauged Kaluza-Klein (KK) theory where the KK vector field is gauged such that the fermionic fields are charged under the U(1)$_{\text{KK}}$ gauge group. The black hole solutions are asymptotic to the AdS$_D$ geometry and ca… ▽ More In the context of the generalized (off-shell) free energy, we explore the phase emergence and corresponding phase transitions of charged dilaton $\text{AdS}$ black holes in the gauged Kaluza-Klein (KK) theory where the KK vector field is gauged such that the fermionic fields are charged under the U(1)$_{\text{KK}}$ gauge group. The black hole solutions are asymptotic to the AdS$_D$ geometry and can be realized as the dimensional reduction of the gauged supergravities on the compact internal manifolds, leading to the restriction as $4\leq D\leq 7$. By studying the behavior of the generalized free energy under the change of the ensemble temperature, we determine the thermodynamic phases and the corresponding phase transitions of black holes. This is confirmed by investigating the heat capacity at the constant pressure and the on-shell free energy. In the canonical ensemble, the thermodynamics of black holes can be classified into three different classes as follows: (i) $D=4$, (ii) $D=5$, and (iii) $D=6,7$. Whereas, in the grand canonical ensemble, the thermodynamics of black holes is independent of the number of spacetime dimensions and the pressure, but depends on the chemical potential $Φ$. The thermodynamic behavior of black holes can be classified into three different classes as follows: (i) $Φ<1$, (ii) $Φ>1$, and (iii) $Φ=1$. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 25 pages, 15 figures

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2402.10595 [pdf, other]

Compact and De-biased Negative Instance Embedding for Multi-Instance Learning on Whole-Slide Image Classification

Authors: Joohyung Lee, Heejeong Nam, Kwanhyung Lee, Sangchul Hahn

Abstract: Whole-slide image (WSI) classification is a challenging task because 1) patches from WSI lack annotation, and 2) WSI possesses unnecessary variability, e.g., stain protocol. Recently, Multiple-Instance Learning (MIL) has made significant progress, allowing for classification based on slide-level, rather than patch-level, annotations. However, existing MIL methods ignore that all patches from norma… ▽ More Whole-slide image (WSI) classification is a challenging task because 1) patches from WSI lack annotation, and 2) WSI possesses unnecessary variability, e.g., stain protocol. Recently, Multiple-Instance Learning (MIL) has made significant progress, allowing for classification based on slide-level, rather than patch-level, annotations. However, existing MIL methods ignore that all patches from normal slides are normal. Using this free annotation, we introduce a semi-supervision signal to de-bias the inter-slide variability and to capture the common factors of variation within normal patches. Because our method is orthogonal to the MIL algorithm, we evaluate our method on top of the recently proposed MIL algorithms and also compare the performance with other semi-supervised approaches. We evaluate our method on two public WSI datasets including Camelyon-16 and TCGA lung cancer and demonstrate that our approach significantly improves the predictive performance of existing MIL algorithms and outperforms other semi-supervised algorithms. We release our code at https://github.com/AITRICS/pathology_mil. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024

arXiv:2402.09289 [pdf, other]

doi 10.1140/epja/s10050-024-01369-5

Study of quasi-projectile properties at Fermi energies in 48Ca projectile systems

Authors: S. Upadhyaya, K. Mazurek, T. Kozik, D. Gruyer, G. Casini, S. Piantelli, L. Baldesi, S. Barlini, B. Borderie, R. Bougault, A. Camaiani, C. Ciampi, M. Cicerchia, M. Ciemala, D. Dell Aquila, J. A. Duenas, Q. Fable, J. D. Frankland, F. Gramegna, M. Henri, B. Hong, A. Kordyasz, M. J. Kweon, N. Le Neindre, I. Lombardo , et al. (10 additional authors not shown)

Abstract: The emission of the pre-equilibrium particles during nuclear collisions at moderate beam energies is still an open question. This influences the properties of the compound nucleus but also changes the interpretation of the quasi-fission process. A systematic analysis of the data obtained by the FAZIA collaboration during a recent experiment with a neutron rich projectile is presented. The full ran… ▽ More The emission of the pre-equilibrium particles during nuclear collisions at moderate beam energies is still an open question. This influences the properties of the compound nucleus but also changes the interpretation of the quasi-fission process. A systematic analysis of the data obtained by the FAZIA collaboration during a recent experiment with a neutron rich projectile is presented. The full range of charged particles detected in the experiment is within the limit of isotopic resolution of the FAZIA detector. Quasi-projectile (QP) fragments were detected in majority thanks to the forward angular acceptance of the experimental setup which was confirmed by introducing cuts based on the HIPSE event generator calculations. The main goal was to compare the experimental results with the HIPSE simulations after introducing these cuts to investigate the influence of the n-rich entrance channel on the QP fragment properties. More specifically, the lowering of N/Z of QP fragments with beam energy was found to be present since the initial phase of the reaction. Thus, pre-equilibrium emissions might be a possible candidate to explain such an effect. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 10 pages, 10 figures

arXiv:2401.04433 [pdf, other]

Non-singular cosmology from non-supersymmetric AdS instability conjecture

Authors: Cao H. Nam

Abstract: We show that the non-supersymmetric AdS instability conjecture can point to how quantum gravity removes the initial Big Bang singularity, leading to a potential resolution for the past-incomplete inflationary universe. From the constraints on the dynamics of the universe realized as the nucleation of a thin-wall bubble mediating the decay of the non-supersymmetric AdS vacuum, we find the critical… ▽ More We show that the non-supersymmetric AdS instability conjecture can point to how quantum gravity removes the initial Big Bang singularity, leading to a potential resolution for the past-incomplete inflationary universe. From the constraints on the dynamics of the universe realized as the nucleation of a thin-wall bubble mediating the decay of the non-supersymmetric AdS vacuum, we find the critical temperature $T_c$ and the critical scale factor $a_c$ for which the universe exists. These critical quantities are all finite and determined in terms of the parameters specifying the stringy 10D AdS vacuum solutions. Additionally, we derive the prediction of quantum gravity for $T_c$ and $a_c$ relying on the inflationary observations. △ Less

Submitted 12 August, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 6 pages, 4 figures, new discussions added, Fig.2 modified, references added, version to be published in PRD

arXiv:2401.04143 [pdf, other]

RHOBIN Challenge: Reconstruction of Human Object Interaction

Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeongjin Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

arXiv:2312.15924 [pdf, other]

Modeling and Analysis of GEO Satellite Networks

Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

Abstract: The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to mod… ▽ More The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to model and analyze GEO satellite networks using stochastic geometry. We model the distribution of GEO satellites in the geostationary orbit according to a binomial point process (BPP) and examine satellite visibility depending on the terminal's latitude. Then, we identify potential distribution cases for GEO satellites and derive case probabilities based on the properties of the BPP. We also obtain the distance distributions between the terminal and GEO satellites and derive the coverage probability of the network. We further approximate the derived expressions using the Poisson limit theorem. Monte Carlo simulations are performed to validate the analytical findings, demonstrating a strong alignment between the analyses and simulations. The simplified analytical results can be used to estimate the coverage performance of GEO satellite networks by effectively modeling the positions of GEO satellites. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

arXiv:2312.01763 [pdf, other]

doi 10.1103/PhysRevC.109.064605

Isospin diffusion from $^{40,48}$Ca$+^{40,48}$Ca experimental data at Fermi energies: Direct comparisons with transport model calculations

Authors: Q. Fable, L. Baldesi, S. Barlini, Eric Bonnet, Bernard Borderie, Remi Bougault, A. Camaiani, G. Casini, A. Chbihi, Caterina Ciampi, J. A. Dueñas, J. D. Frankland, T. Genard, Diego D. Gruyer, Maxime Henri, Byungsik Hong, S. Kim, A. J. Kordyasz, T. Kozik, Arnaud Le Fèvre, Nicolas Le Neindre, Ivano Lombardo, Olivier Lopez, T. Marchi, Paola Marini , et al. (8 additional authors not shown)

Abstract: This article presents an investigation of isospin equilibration in cross-bombarding $^{40,48}$Ca$+^{40,48}$Ca reactions at 35 MeV/nucleon, by comparing experimental data with filtered transport model calculations. Isospin diffusion is studied using the evolution of the isospin transport ratio with centrality. The asymmetry parameter $δ=(N-Z)/A$ of the quasiprojectile (QP) residue is used as isospi… ▽ More This article presents an investigation of isospin equilibration in cross-bombarding $^{40,48}$Ca$+^{40,48}$Ca reactions at 35 MeV/nucleon, by comparing experimental data with filtered transport model calculations. Isospin diffusion is studied using the evolution of the isospin transport ratio with centrality. The asymmetry parameter $δ=(N-Z)/A$ of the quasiprojectile (QP) residue is used as isospin-sensitive observable, while a recent method for impact parameter reconstruction is used for centrality sorting. A benchmark of global observables is proposed to assess the relevance of the antisymmetrized molecular dynamics (AMD) model, coupled to GEMINI++, in the study of dissipative collisions. Our results demonstrate the importance of considering cluster formation to reproduce observables used for isospin transport and centrality studies. Within the AMD model, we prove the applicability of the impact parameter reconstruction method, enabling a direct comparison to the experimental data for the investigation of isospin diffusion. For both, we evidence a tendency to isospin equilibration with an impact parameter decreasing from 9 to 3 fm, while the full equilibration is not reached. A weak sensitivity to the stiffness of the equation of state employed in the model is also observed, with a better reproduction of the experimental trend for the neutron-rich reactions. △ Less

Submitted 6 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Journal ref: Physical Review C, 109 (064605)

arXiv:2311.18608 [pdf, other]

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

Authors: Hyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye

Abstract: With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the diffe… ▽ More With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the difference between scoring functions is insufficient for preserving specific structural elements from the original image, a crucial aspect of image editing. To address this, here we present an embarrassingly simple yet very powerful modification of DDS, called Contrastive Denoising Score (CDS), for latent diffusion models (LDM). Inspired by the similarities and differences between DDS and the contrastive learning for unpaired image-to-image translation(CUT), we introduce a straightforward approach using CUT loss within the DDS framework. Rather than employing auxiliary networks as in the original CUT approach, we leverage the intermediate features of LDM, specifically those from the self-attention layers, which possesses rich spatial information. Our approach enables zero-shot image-to-image translation and neural radiance field (NeRF) editing, achieving structural correspondence between the input and output while maintaining content controllability. Qualitative results and comparisons demonstrates the effectiveness of our proposed method. Project page: https://hyelinnam.github.io/CDS/ △ Less

Submitted 1 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: CVPR 2024 (poster); Project page: https://hyelinnam.github.io/CDS/

arXiv:2311.13384 [pdf, other]

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Authors: Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee

Abstract: With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by… ▽ More With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/ △ Less

Submitted 23 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Project page: https://luciddreamer-cvlab.github.io/

arXiv:2311.06567 [pdf, other]

SCADI: Self-supervised Causal Disentanglement in Latent Variable Models

Authors: Heejeong Nam

Abstract: Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised,… ▽ More Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 12 pages, 12 figures

arXiv:2311.02010 [pdf, other]

A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability

Authors: Lois Curfman McInnes, Michael Heroux, David E. Bernholdt, Anshu Dubey, Elsa Gonsiorowski, Rinku Gupta, Osni Marques, J. David Moulton, Hai Ah Nam, Boyana Norris, Elaine M. Raybourn, Jim Willenbring, Ann Almgren, Ross Bartlett, Kita Cranfill, Stephen Fickas, Don Frederick, William Godoy, Patricia Grubel, Rebecca Hartman-Baker, Axel Huebl, Rose Lynch, Addi Malviya Thakur, Reed Milewicz, Mark C. Miller , et al. (9 additional authors not shown)

Abstract: Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-gene… ▽ More Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software, its sustainability, and the trustworthiness of the results that it produces. Members of the IDEAS project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This paper discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond. △ Less

Submitted 16 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 12 pages, 1 figure

arXiv:2310.04158 [pdf, other]

doi 10.1145/3613424.3614276

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

Authors: Konstantinos Kanellopoulos, Hong Chul Nam, F. Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Davide-Basilio Bartolini, Onur Mutlu

Abstract: Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), an… ▽ More Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), and costly memory accesses, the need for large contiguous memory blocks, and complex OS modifications (for software-managed TLBs). We present Victima, a new software-transparent mechanism that drastically increases the translation reach of the processor by leveraging the underutilized resources of the cache hierarchy. The key idea of Victima is to repurpose L2 cache blocks to store clusters of TLB entries, thereby providing an additional low-latency and high-capacity component that backs up the last-level TLB and thus reduces PTWs. Victima has two main components. First, a PTW cost predictor (PTW-CP) identifies costly-to-translate addresses based on the frequency and cost of the PTWs they lead to. Second, a TLB-aware cache replacement policy prioritizes keeping TLB entries in the cache hierarchy by considering (i) the translation pressure (e.g., last-level TLB miss rate) and (ii) the reuse characteristics of the TLB entries. Our evaluation results show that in native (virtualized) execution environments Victima improves average end-to-end application performance by 7.4% (28.7%) over the baseline four-level radix-tree-based page table design and by 6.2% (20.1%) over a state-of-the-art software-managed TLB, across 11 diverse data-intensive workloads. Victima (i) is effective in both native and virtualized environments, (ii) is completely transparent to application and system software, and (iii) incurs very small area and power overheads on a modern high-end CPU. △ Less

Submitted 5 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

ACM Class: C.0

Showing 1–50 of 253 results for author: Naeem, H