Search | arXiv e-print repository

doi 10.1016/j.compeleceng.2025.110067

A Multilevel Network-assisted Congestion Feedback Mechanism for Network Congestion Control

Authors: Inayat Ali, Seungwoo Hong, Tae Yeon Kim

Abstract: Network-assisted congestion control leveraging Explicit Congestion Notification (ECN) is an effective way to deal with congestion issues on the Internet. However, we believe that the existing ECN mechanism in the TCP/IP protocol stack may require further optimization to effectively address the evolving congestion challenges introduced by emerging technologies like immersive AR/VR applications and… ▽ More Network-assisted congestion control leveraging Explicit Congestion Notification (ECN) is an effective way to deal with congestion issues on the Internet. However, we believe that the existing ECN mechanism in the TCP/IP protocol stack may require further optimization to effectively address the evolving congestion challenges introduced by emerging technologies like immersive AR/VR applications and the burgeoning field of the Internet of Things (IoT). To that end, we propose a multilevel congestion notification mechanism called Enhanced ECN (EECN) that leverages the existing two ECN bits in the IP header to notify two levels of congestion in the network and uses the corresponding two bits in the TCP header to negotiate EECN during the handshake and echo congestion experienced back to the sender. Additionally, we propose a congestion control mechanism that triggers different congestion control responses based on the average RTT and multilevel congestion feedback received from the network, which yields promising results, highlighting the effectiveness of utilizing multilevel congestion feedback. The proposed EECN mechanism reduces packet drop by 70% compared to ECN, by 95% compared to TCP New Reno without ECN, and by 40% compared to VCP. The packets marked are reduced by 96% compared to ECN and 76% compared to VCP. Furthermore, the proposed approach reduces flow completion time by 61% compared to ECN and enhances the throughput of short-lived network flows, which are particularly pronounced in IoT environments. △ Less

Submitted 14 January, 2025; originally announced January 2025.

Comments: 18 pages

Journal ref: Computers and Electrical Engineering 2025

arXiv:2501.08408 [pdf, other]

Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation

Authors: Hansoo Park, Chanwoo Kim, Jihyeon Kim, Hoseong Cho, Nhat Nguyen Bao Truong, Taehwan Kim, Seungryul Baek

Abstract: RGB-based 3D pose estimation methods have been successful with the development of deep learning and the emergence of high-quality 3D pose datasets. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. This… ▽ More RGB-based 3D pose estimation methods have been successful with the development of deep learning and the emergence of high-quality 3D pose datasets. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. This problem might be alleviated by involving diverse data during training, however it is non-trivial to collect such diverse data with corresponding labels (i.e. 3D pose). In this paper, we introduced an unsupervised domain adaptation framework for 3D pose estimation that utilizes the unlabeled data in addition to labeled data via masked image modeling (MIM) framework. Foreground-centric reconstruction and attention regularization are further proposed to increase the effectiveness of unlabeled data usage. Experiments are conducted on the various datasets in human and hand pose estimation tasks, especially using the cross-domain scenario. We demonstrated the effectiveness of ours by achieving the state-of-the-art accuracy on all datasets. △ Less

Submitted 25 February, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

Comments: 16 pages, 7 figures

MSC Class: 68T07

arXiv:2501.06936 [pdf, other]

Full C- and L-band tunable erbium-doped integrated lasers via scalable manufacturing

Authors: Xinru Ji, Xuan Yang, Yang Liu, Zheru Qiu, Grigory Lihachev, Simone Bianconi, Jiale Sun, Andrey Voloshin, Taegon Kim, Joseph C. Olson, Tobias J. Kippenberg

Abstract: Erbium (Er) ions are the gain medium of choice for fiber-based amplifiers and lasers, offering a long excited-state lifetime, slow gain relaxation, low amplification nonlinearity and noise, and temperature stability compared to semiconductor-based platforms. Recent advances in ultra-low-loss silicon nitride (Si$_3$N$_4$) photonic integrated circuits, combined with ion implantation, have enabled th… ▽ More Erbium (Er) ions are the gain medium of choice for fiber-based amplifiers and lasers, offering a long excited-state lifetime, slow gain relaxation, low amplification nonlinearity and noise, and temperature stability compared to semiconductor-based platforms. Recent advances in ultra-low-loss silicon nitride (Si$_3$N$_4$) photonic integrated circuits, combined with ion implantation, have enabled the realization of high-power on-chip Er amplifiers and lasers with performance comparable to fiber-based counterparts, supporting compact photonic systems. Yet, these results are limited by the high (2 MeV) implantation beam energy required for tightly confined Si$_3$N$_4$ waveguides (700 nm height), preventing volume manufacturing of Er-doped photonic integrated circuits. Here, we overcome these limitations and demonstrate the first fully wafer-scale, foundry-compatible Er-doped photonic integrated circuit-based tunable lasers. Using 200 nm-thick Si$_3$N$_4$ waveguides, we reduce the ion beam energy requirement to below 500 keV, enabling efficient wafer-scale implantation with an industrial 300 mm ion implanter. We demonstrate a laser wavelength tuning range of 91 nm, covering nearly the entire optical C- and L-bands, with fiber-coupled output power reaching 36 mW and an intrinsic linewidth of 95 Hz. The temperature-insensitive properties of erbium ions allowed stable laser operation up to 125$^{\circ}$C and lasing with less than 15 MHz drift for over 6 hours at room temperature using a remote fiber pump. The fully scalable, low-cost fabrication of Er-doped waveguide lasers opens the door for widespread adoption in coherent communications, LiDAR, microwave photonics, optical frequency synthesis, and free-space communications. △ Less

Submitted 12 January, 2025; originally announced January 2025.

arXiv:2501.05696 [pdf, ps, other]

Combinatorial identities related to degenerate Stirling numbers of the second kind

Authors: Taekyun Kim, Dae san Kim

Abstract: The study of degenerate versions of certain special polynomials and numbers, which was initiated by Carlitz's work on degenerate Euler and degenerate Bernoulli polynomials, has recently seen renewed interest among mathematicians. The aim of this paper is to study some properties, certain identities, recurrence relations and explicit expressions for degenerate Stirling numbers of the second kind, w… ▽ More The study of degenerate versions of certain special polynomials and numbers, which was initiated by Carlitz's work on degenerate Euler and degenerate Bernoulli polynomials, has recently seen renewed interest among mathematicians. The aim of this paper is to study some properties, certain identities, recurrence relations and explicit expressions for degenerate Stirling numbers of the second kind, which are a degenerate version of the Stirling numbers of the second kind. These numbers appear very frequently when we study various degenerate versions of many special polynomials and numbers. Especially, we consider some closely related polynomials and power series in connection with a degenerate version of Euler's formula for the Stirling numbers of the second kind. △ Less

Submitted 9 January, 2025; originally announced January 2025.

Comments: 18

MSC Class: 11B68; 11B73; 11B83

arXiv:2501.05262 [pdf, other]

QMDB: Quick Merkle Database

Authors: Isaac Zhang, Ryan Zarick, Daniel Wong, Thomas Kim, Bryan Pellegrino, Mignon Li, Kelvin Wong

Abstract: Quick Merkle Database (QMDB) addresses longstanding bottlenecks in blockchain state management by integrating key-value (KV) and Merkle tree storage into a single unified architecture. QMDB delivers a significant throughput improvement over existing architectures, achieving up to 6X over the widely used RocksDB and 8X over NOMT, a leading verifiable database. Its novel append-only twig-based desig… ▽ More Quick Merkle Database (QMDB) addresses longstanding bottlenecks in blockchain state management by integrating key-value (KV) and Merkle tree storage into a single unified architecture. QMDB delivers a significant throughput improvement over existing architectures, achieving up to 6X over the widely used RocksDB and 8X over NOMT, a leading verifiable database. Its novel append-only twig-based design enables one SSD read per state access, O(1) IOs for updates, and in-memory Merkleization on a memory footprint as small as 2.3 bytes per entry, enabling it to run on even modest consumer-grade PCs. QMDB scales seamlessly across both commodity and enterprise hardware, achieving up to 2.28 million state updates per second. This performance enables support for 1 million token transfers per second (TPS), marking QMDB as the first solution achieving such a milestone. QMDB has been benchmarked with workloads exceeding 15 billion entries (10X Ethereum's 2024 state) and has proven the capacity to scale to 280 billion entries on a single server. Furthermore, QMDB introduces historical proofs, unlocking the ability to query its blockchain's historical state at the latest block. QMDB not only meets the demands of current blockchains but also provides a robust foundation for building scalable, efficient, and verifiable decentralized applications across diverse use cases. △ Less

Submitted 1 February, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

Comments: 11 pages, 3 figures

arXiv:2501.05095 [pdf, other]

Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment

Authors: Haoyi Xiu, Xin Liu, Taehoon Kim, Kyoung-Sook Kim

Abstract: The pre-training and fine-tuning paradigm has revolutionized satellite remote sensing applications. However, this approach remains largely underexplored for airborne laser scanning (ALS), an important technology for applications such as forest management and urban planning. In this study, we address this gap by constructing a large-scale ALS point cloud dataset and evaluating its impact on downstr… ▽ More The pre-training and fine-tuning paradigm has revolutionized satellite remote sensing applications. However, this approach remains largely underexplored for airborne laser scanning (ALS), an important technology for applications such as forest management and urban planning. In this study, we address this gap by constructing a large-scale ALS point cloud dataset and evaluating its impact on downstream applications. Our dataset comprises ALS point clouds collected across the contiguous United States, provided by the United States Geological Survey's 3D Elevation Program. To ensure efficient data collection while capturing diverse land cover and terrain types, we introduce a geospatial sampling method that selects point cloud tiles based on land cover maps and digital elevation models. As a baseline self-supervised learning model, we adopt BEV-MAE, a state-of-the-art masked autoencoder for 3D outdoor point clouds, and pre-train it on the constructed dataset. The pre-trained models are subsequently fine-tuned for downstream tasks, including tree species classification, terrain scene recognition, and point cloud semantic segmentation. Our results show that the pre-trained models significantly outperform their scratch counterparts across all downstream tasks, demonstrating the transferability of the representations learned from the proposed dataset. Furthermore, we observe that scaling the dataset using our geospatial sampling method consistently enhances performance, whereas pre-training on datasets constructed with random sampling fails to achieve similar improvements. These findings highlight the utility of the constructed dataset and the effectiveness of our sampling strategy in the pre-training and fine-tuning paradigm. The source code and pre-trained models will be made publicly available at \url{https://github.com/martianxiu/ALS_pretraining}. △ Less

Submitted 9 January, 2025; originally announced January 2025.

arXiv:2501.04979 [pdf, ps, other]

Multiple Populations of the Large Magellanic Cloud Globular Cluster NGC 2257: No Major Environmental Effect on the Formation of Multiple Populations of the Old Globular Clusters in Large Magellanic Cloud

Authors: Jae-Woo Lee, Tae-Hyeong Kim, Hak-Sub Kim, Hyun-Il Sung, Hwihyun Kim, Francesco Di Mille

Abstract: How the environment of the host galaxy affects the formation of multiple populations (MPs) in globular clusters (GCs) is one of the outstanding questions in the near-field cosmology. To understand the true nature of the old GC MPs in the Large Magellanic Cloud (LMC), we study the Ca--CN--CH photometry of the old metal-poor LMC GC NGC 2257. We find the predominantly FG-dominated populational number… ▽ More How the environment of the host galaxy affects the formation of multiple populations (MPs) in globular clusters (GCs) is one of the outstanding questions in the near-field cosmology. To understand the true nature of the old GC MPs in the Large Magellanic Cloud (LMC), we study the Ca--CN--CH photometry of the old metal-poor LMC GC NGC 2257. We find the predominantly FG-dominated populational number ratio of $n$(FG):$n$(SG) = 61:39($\pm$4), where the FG and SG denote the first and second generations. Both the FG and SG have similar cumulative radial distributions, consistent with the idea that NGC 2257 is dynamically old. We obtain [Fe/H] = $-$1.78$\pm$0.00 dex($σ$=0.05 dex) and our metallicity is $\sim$0.2 dex larger than that from the high-resolution spectroscopy by other, due to their significantly lower temperatures by $\sim$ $-$200 K. The NGC 2257 FG shows a somewhat larger metallicity variation than the SG, the first detection of such phenomenon in an old LMC GC, similar to Galactic GCs with MPs, strongly suggesting that it is a general characteristic of GCs with MPs. Interestingly, the NGC 2257 SG does not show a helium enhancement compared to the FG. Our results for the Galactic normal GCs exhibit that the degree of carbon and nitrogen variations are tightly correlated with the GC mass, while NGC 2257 exhibits slightly smaller variations for its mass. We show that old LMC GCs follow the same trends as the Galactic normal GCs in the $Δ$W$_{\rm CF336W,F438W,F814W}$, $N_{\rm FG}/N_{\rm tot}$, and $\log M/M_{\rm \odot}$ domains. Our result indicates that the environment of the host galaxy did not play a major role in the formation and evolution of GC MPs. △ Less

Submitted 9 January, 2025; originally announced January 2025.

Comments: Accepted for publication in The Astronomical Journal; 19 figures and 1 table

arXiv:2501.04594 [pdf, other]

Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Authors: J. Taery Kim, Morgan Byrd, Jack L. Crandell, Bruce N. Walker, Greg Turk, Sehoon Ha

Abstract: Robotic guide dogs hold significant potential to enhance the autonomy and mobility of blind or visually impaired (BVI) individuals by offering universal assistance over unstructured terrains at affordable costs. However, the design of robotic guide dogs remains underexplored, particularly in systematic aspects such as gait controllers, navigation behaviors, interaction methods, and verbal explanat… ▽ More Robotic guide dogs hold significant potential to enhance the autonomy and mobility of blind or visually impaired (BVI) individuals by offering universal assistance over unstructured terrains at affordable costs. However, the design of robotic guide dogs remains underexplored, particularly in systematic aspects such as gait controllers, navigation behaviors, interaction methods, and verbal explanations. Our study addresses this gap by conducting user studies with 18 BVI participants, comprising 15 cane users and three guide dog users. Participants interacted with a quadrupedal robot and provided both quantitative and qualitative feedback. Our study revealed several design implications, such as a preference for a learning-based controller and a rigid handle, gradual turns with asymmetric speeds, semantic communication methods, and explainability. The study also highlighted the importance of customization to support users with diverse backgrounds and preferences, along with practical concerns such as battery life, maintenance, and weather issues. These findings offer valuable insights and design implications for future research and development of robotic guide dogs. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: 12 pages, 4 figures, Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction (HRI'25)

arXiv:2501.03781 [pdf, other]

Quantum Linear Multistep Method for Using a Quantum Oracle with Differential Equations

Authors: Kyoung Keun Park, Kwangyeul Choi, Minwoo Kim, Giwon Song, Taehyun Kim

Abstract: Differential equations are a crucial mathematical tool used in a wide range of applications. If the solution to an initial value problem (IVP) can be transformed into an oracle, it can be utilized in various fields such as search and optimization, achieving quadratic speedup with respect to the number of candidates compared to its classical counterpart. In the past, attempts have been made to impl… ▽ More Differential equations are a crucial mathematical tool used in a wide range of applications. If the solution to an initial value problem (IVP) can be transformed into an oracle, it can be utilized in various fields such as search and optimization, achieving quadratic speedup with respect to the number of candidates compared to its classical counterpart. In the past, attempts have been made to implement such an oracle using the Euler method. In this study, we propose a quantum linear multistep method (QLMM) that applies the linear multistep method, commonly used to numerically solve IVPs on classical computers, to generate a numerical solution of the IVP for use in a quantum oracle. We also propose a method to find the optimal form of QLMM for a given IVP. Finally, through computer simulations, we derive the QLMM formulation for an example IVP and show that the solution from the optimized QLMM can be used in an optimization problem. △ Less

Submitted 9 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

Comments: 16 pages, 10 figures, and 2 tables, Typos have been corrected in this version

arXiv:2501.03034 [pdf, other]

Design and Implementation of the Cosmic Ray Tagger System for the ICARUS detector at FNAL

Authors: A. Aduszkiewicz, L. Bagby, B. Behera, P. Bernardini, S. Bertolucci, M. Betancourt, H. Budd, T. Boone, A. Campos, D. Casazza, V. Cicero, D. Cherdack, T. E. Coan, L. Degli Esposti, D. Di Ferdinando, L. Di Noto, C. Guandalini, M. Guerzoni, A. Heggestuen, C. Hilgenberg, R. Howell, M. Iliescu, G. Ingratta, T. Kim, U. Kose , et al. (28 additional authors not shown)

Abstract: The ICARUS-T600 Liquid Argon Time Projection Chamber is operating at Fermilab at shallow depth and thus exposed to a high flux of cosmic rays that can fake neutrino interactions. A cosmic ray tagging (CRT) system ($\sim$1100 m$^2$), surrounding the cryostat with two layers of fiber embedded plastic scintillators, was developed to mitigate the cosmic ray induced background. Using nanosecond-level t… ▽ More The ICARUS-T600 Liquid Argon Time Projection Chamber is operating at Fermilab at shallow depth and thus exposed to a high flux of cosmic rays that can fake neutrino interactions. A cosmic ray tagging (CRT) system ($\sim$1100 m$^2$), surrounding the cryostat with two layers of fiber embedded plastic scintillators, was developed to mitigate the cosmic ray induced background. Using nanosecond-level timing information, the CRT can distinguish incoming cosmic rays from outgoing particles from neutrino interactions in the TPC. In this paper an overview of the CRT system, its installation and commissioning at Fermilab, and its performance are discussed. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2501.02199 [pdf, other]

doi 10.1002/nag.3956

Can ChatGPT implement finite element models for geotechnical engineering applications?

Authors: Taegu Kim, Tae Sup Yun, Hyoung Suk Suh

Abstract: This study assesses the capability of ChatGPT to generate finite element code for geotechnical engineering applications from a set of prompts. We tested three different initial boundary value problems using a hydro-mechanically coupled formulation for unsaturated soils, including the dissipation of excess pore water pressure through fluid mass diffusion in one-dimensional space, time-dependent dif… ▽ More This study assesses the capability of ChatGPT to generate finite element code for geotechnical engineering applications from a set of prompts. We tested three different initial boundary value problems using a hydro-mechanically coupled formulation for unsaturated soils, including the dissipation of excess pore water pressure through fluid mass diffusion in one-dimensional space, time-dependent differential settlement of a strip footing, and gravity-driven seepage. For each case, initial prompting involved providing ChatGPT with necessary information for finite element implementation, such as balance and constitutive equations, problem geometry, initial and boundary conditions, material properties, and spatiotemporal discretization and solution strategies. Any errors and unexpected results were further addressed through prompt augmentation processes until the ChatGPT-generated finite element code passed the verification/validation test. Our results demonstrate that ChatGPT required minimal code revisions when using the FEniCS finite element library, owing to its high-level interfaces that enable efficient programming. In contrast, the MATLAB code generated by ChatGPT necessitated extensive prompt augmentations and/or direct human intervention, as it involves a significant amount of low-level programming required for finite element analysis, such as constructing shape functions or assembling global matrices. Given that prompt engineering for this task requires an understanding of the mathematical formulation and numerical techniques, this study suggests that while a large language model may not yet replace human programmers, it can greatly assist in the implementation of numerical models. △ Less

Submitted 4 January, 2025; originally announced January 2025.

arXiv:2501.01592 [pdf]

Advances in imaging techniques for the study of individual bacteria and their pathophysiology

Authors: Dohyeon Lee, Hyun-Seung Lee, Moosung Lee, Minhee Kang, Geon Kim, Tae Yeul Kim, Nam Yong Lee, YongKeun Park

Abstract: Bacterial heterogeneity is pivotal for adaptation to diverse environments, posing significant challenges in microbial diagnostics and therapeutic interventions. Recent advancements in high-resolution optical microscopy have revolutionized our ability to observe and characterize individual bacteria, offering unprecedented insights into their metabolic states and behaviors at the single-cell level.… ▽ More Bacterial heterogeneity is pivotal for adaptation to diverse environments, posing significant challenges in microbial diagnostics and therapeutic interventions. Recent advancements in high-resolution optical microscopy have revolutionized our ability to observe and characterize individual bacteria, offering unprecedented insights into their metabolic states and behaviors at the single-cell level. This review discusses the transformative impact of various high-resolution imaging techniques, including fluorescence and label-free imaging, which have enhanced our understanding of bacterial pathophysiology. These methods provide detailed visualizations that are crucial for developing targeted treatments and improving clinical diagnostics. We highlight the integration of these imaging techniques with computational tools, which has facilitated rapid, accurate pathogen identification and real-time monitoring of bacterial responses to treatments. The ongoing development of these optical imaging technologies promises to significantly advance our understanding of microbiology and to catalyze the translation of these insights into practical healthcare solutions. △ Less

Submitted 2 January, 2025; originally announced January 2025.

arXiv:2501.00628 [pdf, other]

Matrix factorization and prediction for high dimensional co-occurrence count data via shared parameter alternating zero inflated Gamma model

Authors: Taejoon Kim, Haiyan Wang

Abstract: High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user… ▽ More High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user-item pairs in e-commerce, which also generates high-dimensional data. The objective is to utilize this data to predict the relevance between items or users. In this paper, we assume that items or users can be represented by unknown dense vectors. The model treats the co-occurrence counts as arising from zero-inflated Gamma random variables and employs cosine similarity between the unknown vectors to summarize item-item relevance. The unknown values are estimated using the shared parameter alternating zero-inflated Gamma regression models (SA-ZIG). Both canonical link and log link models are considered. Two parameter updating schemes are proposed, along with an algorithm to estimate the unknown parameters. Convergence analysis is presented analytically. Numerical studies demonstrate that the SA-ZIG using Fisher scoring without learning rate adjustment may fail to fi nd the maximum likelihood estimate. However, the SA-ZIG with learning rate adjustment performs satisfactorily in our simulation studies. △ Less

Submitted 31 December, 2024; originally announced January 2025.

Comments: 39 pages, 5 figures

arXiv:2501.00623 [pdf, other]

Global dense vector representations for words or items using shared parameter alternating Tweedie model

Authors: Taejoon Kim, Haiyan Wang

Abstract: In this article, we present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform, cooccurring word-word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources. Different from traditional regr… ▽ More In this article, we present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform, cooccurring word-word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources. Different from traditional regression models, there are no observations for covariates. Additionally, the cooccurrence matrix is typically of so high dimension that it does not fit into a computer's memory for modeling. We extract numerical data by defining windows of cooccurrence using weighted count on the continuous scale. Positive probability mass is allowed for zero observations. We present Shared parameter Alternating Tweedie (SA-Tweedie) model and an algorithm to estimate the parameters. We introduce a learning rate adjustment used along with the Fisher scoring method in the inner loop to help the algorithm stay on track of optimizing direction. Gradient descent with Adam update was also considered as an alternative method for the estimation. Simulation studies and an application showed that our algorithm with Fisher scoring and learning rate adjustment outperforms the other two methods. Pseudo-likelihood approach with alternating parameter update was also studied. Numerical studies showed that the pseudo-likelihood approach is not suitable in our shared parameter alternating regression models with unobserved covariates. △ Less

Submitted 31 December, 2024; originally announced January 2025.

Comments: 43 pages 12 figures

arXiv:2501.00210 [pdf, other]

Debunking the CUDA Myth Towards GPU-based AI Systems

Authors: Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, Minsoo Rhu

Abstract: This paper presents a comprehensive evaluation of Intel Gaudi NPUs as an alternative to NVIDIA GPUs, which is currently the de facto standard in AI system design. First, we create a suite of microbenchmarks to compare Intel Gaudi-2 with NVIDIA A100, showing that Gaudi-2 achieves competitive performance not only in primitive AI compute, memory, and communication operations but also in executing sev… ▽ More This paper presents a comprehensive evaluation of Intel Gaudi NPUs as an alternative to NVIDIA GPUs, which is currently the de facto standard in AI system design. First, we create a suite of microbenchmarks to compare Intel Gaudi-2 with NVIDIA A100, showing that Gaudi-2 achieves competitive performance not only in primitive AI compute, memory, and communication operations but also in executing several important AI workloads end-to-end. We then assess Gaudi NPU's programmability by discussing several software-level optimization strategies to employ for implementing critical FBGEMM operators and vLLM, evaluating their efficiency against GPU-optimized counterparts. Results indicate that Gaudi-2 achieves energy efficiency comparable to A100, though there are notable areas for improvement in terms of software maturity. Overall, we conclude that, with effective integration into high-level AI frameworks, Gaudi NPUs could challenge NVIDIA GPU's dominance in the AI server market, though further improvements are necessary to fully compete with NVIDIA's robust software ecosystem. △ Less

Submitted 21 March, 2025; v1 submitted 30 December, 2024; originally announced January 2025.

Comments: Accepted for publication at the 52nd IEEE/ACM International Symposium on Computer Architecture (ISCA-52), 2025

arXiv:2412.19289 [pdf, other]

ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning

Authors: Taewhan Kim, Soeun Lee, Si-Woo Kim, Dong-Jin Kim

Abstract: Recent lightweight image captioning models using retrieved data mainly focus on text prompts. However, previous works only utilize the retrieved text as text prompts, and the visual information relies only on the CLIP visual embedding. Because of this issue, there is a limitation that the image descriptions inherent in the prompt are not sufficiently reflected in the visual embedding space. To tac… ▽ More Recent lightweight image captioning models using retrieved data mainly focus on text prompts. However, previous works only utilize the retrieved text as text prompts, and the visual information relies only on the CLIP visual embedding. Because of this issue, there is a limitation that the image descriptions inherent in the prompt are not sufficiently reflected in the visual embedding space. To tackle this issue, we propose ViPCap, a novel retrieval text-based visual prompt for lightweight image captioning. ViPCap leverages the retrieved text with image information as visual prompts to enhance the ability of the model to capture relevant visual information. By mapping text prompts into the CLIP space and generating multiple randomized Gaussian distributions, our method leverages sampling to explore randomly augmented distributions and effectively retrieves the semantic features that contain image information. These retrieved features are integrated into the image and designated as the visual prompt, leading to performance improvements on the datasets such as COCO, Flickr30k, and NoCaps. Experimental results demonstrate that ViPCap significantly outperforms prior lightweight captioning models in efficiency and effectiveness, demonstrating the potential for a plug-and-play solution. The source code is available at https://github.com/taewhankim/VIPCAP. △ Less

Submitted 24 January, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

Comments: Accepted to AAAI 2025

arXiv:2412.18848 [pdf, other]

Machine Learning-Based Detection of Pump-and-Dump Schemes in Real-Time

Authors: Manuel Bolz, Kevin Bründler, Liam Kane, Panagiotis Patsias, Liam Tessendorf, Krzysztof Gogol, Taehoon Kim, Claudio Tessone

Abstract: Cryptocurrency markets often face manipulation through prevalent pump-and-dump (P&D) schemes, where self-organized Telegram groups, some exceeding two million members, artificially inflate target cryptocurrency prices. These groups sell premium access to inside information, worsening information asymmetry and financial risks for subscribers and all investors. This paper presents a real-time predic… ▽ More Cryptocurrency markets often face manipulation through prevalent pump-and-dump (P&D) schemes, where self-organized Telegram groups, some exceeding two million members, artificially inflate target cryptocurrency prices. These groups sell premium access to inside information, worsening information asymmetry and financial risks for subscribers and all investors. This paper presents a real-time prediction pipeline to forecast target coins and alert investors to possible P&D schemes. In a Poloniex case study, the model accurately identified the target coin among the top five from 50 random coins in 24 out of 43 (55.81%) P&D events. The pipeline uses advanced natural language processing (NLP) to classify Telegram messages, identifying 2,079 past pump events and detecting new ones in real-time. Our analysis also evaluates the susceptibility of token standards - ERC-20, ERC-721, BRC-20, Inscriptions, and Runes - to manipulation and identifies exchanges commonly involved in P&D schemes. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2412.17446 [pdf, other]

Prediction of Star Formation Rates Using an Artificial Neural Network

Authors: Ashraf Ayubinia, Jong-hak Woo, Fatemeh Hafezianzadeh, Taehwan Kim, Changseok Kim

Abstract: In this study, we develop an artificial neural network to estimate the infrared (IR) luminosity and star formation rates (SFR) of galaxies. Our network is trained using 'true' IR luminosity values derived from modeling the IR spectral energy distributions (SEDs) of FIR-detected galaxies. We explore five different sets of input features, each incorporating optical, mid-infrared (MIR), near-infrared… ▽ More In this study, we develop an artificial neural network to estimate the infrared (IR) luminosity and star formation rates (SFR) of galaxies. Our network is trained using 'true' IR luminosity values derived from modeling the IR spectral energy distributions (SEDs) of FIR-detected galaxies. We explore five different sets of input features, each incorporating optical, mid-infrared (MIR), near-infrared (NIR), ultraviolet (UV), and emission line data, along with spectroscopic redshifts and uncertainties. All feature sets yield similar IR luminosity predictions, but including all photometric data leads to slightly improved performance. This suggests that comprehensive photometric information enhances the accuracy of our predictions. Our network is applied to a sample of SDSS galaxies defined as unseen data, and the results are compared with three published catalogs of SFRs. Overall, our network demonstrates excellent performance for star-forming galaxies while we observe discrepancies in composite and AGN samples. These inconsistencies may stem from uncertainties inherent in the compared catalogs or potential limitations in the performance of our network. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: 13 pages, 7 figures, and 2 tables. Accepted for publication in ApJ

arXiv:2412.17288 [pdf, other]

Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples

Authors: Taewoong Kim, Byeonghwi Kim, Jonghyun Choi

Abstract: Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-th… ▽ More Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally propose to correct the mistakes using visual cues from the agent. The proposed scheme allows us to use a few language pairs thanks to the visual cues and outperforms state-of-the-art approaches. Our code is available at https://github.com/snumprlab/flare. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: AAAI 2025 (Project page: https://twoongg.github.io/projects/flare/)

arXiv:2412.16028 [pdf, other]

CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images

Authors: Jungho Lee, Suhwan Cho, Taeoh Kim, Ho-Deok Jang, Minhyeok Lee, Geonho Cha, Dongyoon Wee, Dogyoon Lee, Sangyoun Lee

Abstract: 3D Gaussian Splatting (3DGS) has attracted significant attention for its high-quality novel view rendering, inspiring research to address real-world challenges. While conventional methods depend on sharp images for accurate scene reconstruction, real-world scenarios are often affected by defocus blur due to finite depth of field, making it essential to account for realistic 3D scene representation… ▽ More 3D Gaussian Splatting (3DGS) has attracted significant attention for its high-quality novel view rendering, inspiring research to address real-world challenges. While conventional methods depend on sharp images for accurate scene reconstruction, real-world scenarios are often affected by defocus blur due to finite depth of field, making it essential to account for realistic 3D scene representation. In this study, we propose CoCoGaussian, a Circle of Confusion-aware Gaussian Splatting that enables precise 3D scene representation using only defocused images. CoCoGaussian addresses the challenge of defocus blur by modeling the Circle of Confusion (CoC) through a physically grounded approach based on the principles of photographic defocus. Exploiting 3D Gaussians, we compute the CoC diameter from depth and learnable aperture information, generating multiple Gaussians to precisely capture the CoC shape. Furthermore, we introduce a learnable scaling factor to enhance robustness and provide more flexibility in handling unreliable depth in scenes with reflective or refractive surfaces. Experiments on both synthetic and real-world datasets demonstrate that CoCoGaussian achieves state-of-the-art performance across multiple benchmarks. △ Less

Submitted 15 May, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

Comments: CVPR 2025, Project Page: https://Jho-Yonsei.github.io/CoCoGaussian/

arXiv:2412.14585 [pdf, other]

HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning

Authors: Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim

Abstract: With the growing demand for solutions to real-world video challenges, interest in dense video captioning (DVC) has been on the rise. DVC involves the automatic captioning and localization of untrimmed videos. Several studies highlight the challenges of DVC and introduce improved methods utilizing prior knowledge, such as pre-training and external memory. In this research, we propose a model that l… ▽ More With the growing demand for solutions to real-world video challenges, interest in dense video captioning (DVC) has been on the rise. DVC involves the automatic captioning and localization of untrimmed videos. Several studies highlight the challenges of DVC and introduce improved methods utilizing prior knowledge, such as pre-training and external memory. In this research, we propose a model that leverages the prior knowledge of human-oriented hierarchical compact memory inspired by human memory hierarchy and cognition. To mimic human-like memory recall, we construct a hierarchical memory and a hierarchical memory reading module. We build an efficient hierarchical compact memory by employing clustering of memory events and summarization using large language models. Comparative experiments demonstrate that this hierarchical memory recall process improves the performance of DVC by achieving state-of-the-art performance on YouCook2 and ViTT datasets. △ Less

Submitted 19 December, 2024; originally announced December 2024.

Comments: AAAI2025

arXiv:2412.13875 [pdf, other]

Denoising Nearest Neighbor Graph via Continuous CRF for Visual Re-ranking without Fine-tuning

Authors: Jaeyoon Kim, Yoonki Cho, Taeyong Kim, Sung-Eui Yoon

Abstract: Visual re-ranking using Nearest Neighbor graph~(NN graph) has been adapted to yield high retrieval accuracy, since it is beneficial to exploring an high-dimensional manifold and applicable without additional fine-tuning. The quality of visual re-ranking using NN graph, however, is limited to that of connectivity, i.e., edges of the NN graph. Some edges can be misconnected with negative images. Thi… ▽ More Visual re-ranking using Nearest Neighbor graph~(NN graph) has been adapted to yield high retrieval accuracy, since it is beneficial to exploring an high-dimensional manifold and applicable without additional fine-tuning. The quality of visual re-ranking using NN graph, however, is limited to that of connectivity, i.e., edges of the NN graph. Some edges can be misconnected with negative images. This is known as a noisy edge problem, resulting in a degradation of the retrieval quality. To address this, we propose a complementary denoising method based on Continuous Conditional Random Field (C-CRF) that uses a statistical distance of our similarity-based distribution. This method employs the concept of cliques to make the process computationally feasible. We demonstrate the complementarity of our method through its application to three visual re-ranking methods, observing quality boosts in landmark retrieval and person re-identification (re-ID). △ Less

Submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.13081 [pdf, other]

Prompt Augmentation for Self-supervised Text-guided Image Manipulation

Authors: Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim

Abstract: Text-guided image editing finds applications in various creative and practical fields. While recent studies in image generation have advanced the field, they often struggle with the dual challenges of coherent image transformation and context preservation. In response, our work introduces prompt augmentation, a method amplifying a single input prompt into several target prompts, strengthening text… ▽ More Text-guided image editing finds applications in various creative and practical fields. While recent studies in image generation have advanced the field, they often struggle with the dual challenges of coherent image transformation and context preservation. In response, our work introduces prompt augmentation, a method amplifying a single input prompt into several target prompts, strengthening textual context and enabling localised image editing. Specifically, we use the augmented prompts to delineate the intended manipulation area. We propose a Contrastive Loss tailored to driving effective image editing by displacing edited areas and drawing preserved regions closer. Acknowledging the continuous nature of image manipulations, we further refine our approach by incorporating the similarity concept, creating a Soft Contrastive Loss. The new losses are incorporated to the diffusion model, demonstrating improved or competitive image editing results on public datasets and generated images over state-of-the-art approaches. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.12567 [pdf, ps, other]

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning

Authors: Seunghee Kim, Changhyeon Kim, Taeuk Kim

Abstract: Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of… ▽ More Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels-Easy, Medium, and Hard-facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase. △ Less

Submitted 29 May, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

Comments: ACL 2025

arXiv:2412.12527 [pdf, other]

When to Speak, When to Abstain: Contrastive Decoding with Abstention

Authors: Hyuhng Joon Kim, Youna Kim, Sang-goo Lee, Taeuk Kim

Abstract: Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i.e., parametric) and external (i.e., contextual) knowledge. While substantial efforts have been made to enhance the utilization of both forms of knowledge, situations in which models lack relevant information remain underexplored. To investigate this challenge, we first present a contr… ▽ More Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i.e., parametric) and external (i.e., contextual) knowledge. While substantial efforts have been made to enhance the utilization of both forms of knowledge, situations in which models lack relevant information remain underexplored. To investigate this challenge, we first present a controlled testbed featuring four distinct knowledge access scenarios, including the aforementioned edge case, revealing that conventional LLM usage exhibits insufficient robustness in handling all instances. Addressing this limitation, we propose Contrastive Decoding with Abstention (CDA), a novel training-free decoding method that allows LLMs to generate responses when relevant knowledge is available and to abstain otherwise. CDA estimates the relevance of both knowledge sources for a given input, adaptively deciding which type of information to prioritize and which to exclude. Through extensive experiments, we demonstrate that CDA can effectively perform accurate generation and abstention simultaneously, enhancing reliability and preserving user trust. △ Less

Submitted 16 May, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

Comments: ACL 2025 (main)

arXiv:2412.12518 [pdf, ps, other]

Quantized blow-up dynamics for Calogero--Moser derivative nonlinear Schrödinger equation

Authors: Uihyeon Jeong, Taegyu Kim

Abstract: We consider the Calogero--Moser derivative nonlinear Schrödinger equation (CM-DNLS), an $L^2$-critical nonlinear Schrödinger type equation enjoying a number of numerous structures, such as nonlocal nonlinearity, self-duality, pseudo-conformal symmetry, and complete integrability. In this paper, we construct smooth finite-time blow-up solutions to (CM-DNLS) that exhibit a sequence of discrete blo… ▽ More We consider the Calogero--Moser derivative nonlinear Schrödinger equation (CM-DNLS), an $L^2$-critical nonlinear Schrödinger type equation enjoying a number of numerous structures, such as nonlocal nonlinearity, self-duality, pseudo-conformal symmetry, and complete integrability. In this paper, we construct smooth finite-time blow-up solutions to (CM-DNLS) that exhibit a sequence of discrete blow-up rates, so-called \emph{quantized blow-up rates}. Our strategy is a forward construction of the blow-up dynamics based on modulation analysis. Our main novelty is to utilize the \emph{nonlinear adapted derivative} suited to the \textit{Lax pair structure} and to rely on the \emph{hierarchy of conservation laws} inherent in this structure to control higher-order energies. This approach replaces a repulsivity-based energy method in the bootstrap argument, which significantly simplifies the analysis compared to earlier works. Our result highlights that the integrable structure remains a powerful tool, even in the presence of blow-up solutions. In (CM-DNLS), one of the distinctive features is \emph{chirality}. However, our constructed solutions are not chiral, since we assume the radial (even) symmetry in the gauge transformed equation. This radial assumption simplifies the modulation analysis. △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: 35 pages

MSC Class: 35B44 (primary); 35Q55; 37K10

arXiv:2412.11656 [pdf, other]

Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery

Authors: Ryuhaerang Choi, Taehan Kim, Subin Park, Jennifer G Kim, Sung-Ju Lee

Abstract: Eating disorders (ED) are complex mental health conditions that require long-term management and support. Recent advancements in large language model (LLM)-based chatbots offer the potential to assist individuals in receiving immediate support. Yet, concerns remain about their reliability and safety in sensitive contexts such as ED. We explore the opportunities and potential harms of using LLM-bas… ▽ More Eating disorders (ED) are complex mental health conditions that require long-term management and support. Recent advancements in large language model (LLM)-based chatbots offer the potential to assist individuals in receiving immediate support. Yet, concerns remain about their reliability and safety in sensitive contexts such as ED. We explore the opportunities and potential harms of using LLM-based chatbots for ED recovery. We observe the interactions between 26 participants with ED and an LLM-based chatbot, WellnessBot, designed to support ED recovery, over 10 days. We discovered that our participants have felt empowered in recovery by discussing ED-related stories with the chatbot, which served as a personal yet social avenue. However, we also identified harmful chatbot responses, especially concerning individuals with ED, that went unnoticed partly due to participants' unquestioning trust in the chatbot's reliability. Based on these findings, we provide design implications for safe and effective LLM-based interventions in ED management. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2412.10651 [pdf, other]

LAN: Learning to Adapt Noise for Image Denoising

Authors: Changjin Kim, Tae Hyun Kim, Sungyong Baik

Abstract: Removing noise from images, a.k.a image denoising, can be a very challenging task since the type and amount of noise can greatly vary for each image due to many factors including a camera model and capturing environments. While there have been striking improvements in image denoising with the emergence of advanced deep learning architectures and real-world datasets, recent denoising networks strug… ▽ More Removing noise from images, a.k.a image denoising, can be a very challenging task since the type and amount of noise can greatly vary for each image due to many factors including a camera model and capturing environments. While there have been striking improvements in image denoising with the emergence of advanced deep learning architectures and real-world datasets, recent denoising networks struggle to maintain performance on images with noise that has not been seen during training. One typical approach to address the challenge would be to adapt a denoising network to new noise distribution. Instead, in this work, we shift our focus to adapting the input noise itself, rather than adapting a network. Thus, we keep a pretrained network frozen, and adapt an input noise to capture the fine-grained deviations. As such, we propose a new denoising algorithm, dubbed Learning-to-Adapt-Noise (LAN), where a learnable noise offset is directly added to a given noisy image to bring a given input noise closer towards the noise distribution a denoising network is trained to handle. Consequently, the proposed framework exhibits performance improvement on images with unseen noise, displaying the potential of the proposed research direction. The code is available at https://github.com/chjinny/LAN △ Less

Submitted 13 December, 2024; originally announced December 2024.

Comments: CVPR2024

arXiv:2412.10050 [pdf, other]

ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?

Authors: Taewhan Kim, Hojin Bae, Zeming Li, Xiaoqi Li, Iaroslav Ponomarenko, Ruihai Wu, Hao Dong

Abstract: Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing pointclouds for affordance mapping. However, these approaches are computationally intensive and struggle to adapt to diverse and dynamic environments. Th… ▽ More Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing pointclouds for affordance mapping. However, these approaches are computationally intensive and struggle to adapt to diverse and dynamic environments. This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects using a large pre-trained vision transformer (ViT). We created a dataset of 9.9k simulated and real images to bridge the sim-to-real gap and enhance real-world applicability. By fine-tuning the vision transformer on this small dataset, we significantly improved part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios. This enables effective manipulation across simulated and real-world environments by generating part-level affordance masks, paired with an impedance adaptation policy, sufficiently eliminating the need for complex datasets or perception systems. △ Less

Submitted 18 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

Comments: 8 pages, 6 figures

arXiv:2412.07077 [pdf, other]

Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling

Authors: Donggeun Kim, Yujin Jo, Myungjoo Lee, Taesup Kim

Abstract: The advancement of vision-language models, particularly the Contrastive Language-Image Pre-training (CLIP) model, has revolutionized the field of machine learning by enabling robust zero-shot learning capabilities. These capabilities allow models to understand and respond to previously unseen data without task-specific training. However, adapting CLIP to integrate specialized knowledge from variou… ▽ More The advancement of vision-language models, particularly the Contrastive Language-Image Pre-training (CLIP) model, has revolutionized the field of machine learning by enabling robust zero-shot learning capabilities. These capabilities allow models to understand and respond to previously unseen data without task-specific training. However, adapting CLIP to integrate specialized knowledge from various domains while retaining its zero-shot capabilities remains a significant challenge. To address this, we introduce a novel prompt ensemble learning approach called Group-wise Prompt Ensemble (GPE). This method aims to enhance CLIP's zero-shot capabilities by incorporating new domain knowledge while improving its adaptability and robustness against data distribution shifts. Our approach hinges on three main strategies: prompt grouping with masked attention to optimize CLIP's adaptability while safeguarding its zero-shot capabilities; the incorporation of auxiliary prompts for the seamless integration of new domain insights without disrupting the original model's representation; and an ensemble learning strategy that effectively merges original and new knowledge. Through rigorous experimentation, including more challenging cross-dataset transfer evaluations, our GPE method redefines the benchmarks for the adaptability and efficiency of vision-language models, surpassing existing models across various scenarios. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2412.07029 [pdf, other]

Key Focus Areas and Enabling Technologies for 6G

Authors: Christopher G. Brinton, Mung Chiang, Kwang Taik Kim, David J. Love, Michael Beesley, Morris Repeta, John Roese, Per Beming, Erik Ekudden, Clara Li, Geng Wu, Nishant Batra, Amitava Ghosh, Volker Ziegler, Tingfang Ji, Rajat Prakash, John Smee

Abstract: We provide a taxonomy of a dozen enabling network architectures, protocols, and technologies that will define the evolution from 5G to 6G. These technologies span the network protocol stack, different target deployment environments, and various perceived levels of technical maturity. We outline four areas of societal focus that will be impacted by these technologies, and overview several research… ▽ More We provide a taxonomy of a dozen enabling network architectures, protocols, and technologies that will define the evolution from 5G to 6G. These technologies span the network protocol stack, different target deployment environments, and various perceived levels of technical maturity. We outline four areas of societal focus that will be impacted by these technologies, and overview several research directions that hold the potential to address the problems in these important focus areas. △ Less

Submitted 16 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: This paper has been accepted for publication in the IEEE Communications Magazine. Portions were released online as a report titled 6G Roadmap: A Global Taxonomy in November 2023

arXiv:2412.05951 [pdf, other]

When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining

Authors: Juan Yeo, Jinkwan Jang, Kyubyung Chae, Seongkyu Mun, Taesup Kim

Abstract: Recent studies show that pretrained vision models can boost performance in audio downstream tasks. To enhance the performance further, an additional pretraining stage with large scale audio data is typically required to infuse audio specific knowledge into the vision model. However, such approaches require extensive audio data and a carefully designed objective function. In this work, we propose b… ▽ More Recent studies show that pretrained vision models can boost performance in audio downstream tasks. To enhance the performance further, an additional pretraining stage with large scale audio data is typically required to infuse audio specific knowledge into the vision model. However, such approaches require extensive audio data and a carefully designed objective function. In this work, we propose bypassing the pretraining stage by directly fine-tuning the vision model with our Look Aside Adapter (LoAA) designed for efficient audio understanding. Audio spectrum data is represented across two heterogeneous dimensions time and frequency and we refine adapters to facilitate interactions between tokens across these dimensions. Our experiments demonstrate that our adapters allow vision models to reach or surpass the performance of pretrained audio models in various audio and speech tasks, offering a resource efficient and effective solution for leveraging vision models in audio applications. △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: 5 pages, 3 figures

arXiv:2412.05748 [pdf, other]

Constrained Control for Autonomous Spacecraft Rendezvous: Learning-Based Time Shift Governor

Authors: Taehyeun Kim, Robin Inho Kee, Ilya Kolmanovsky, Anouck Girard

Abstract: This paper develops a Time Shift Governor (TSG)-based control scheme to enforce constraints during rendezvous and docking (RD) missions in the setting of the Two-Body problem. As an add-on scheme to the nominal closed-loop system, the TSG generates a time-shifted Chief spacecraft trajectory as a target reference for the Deputy spacecraft. This modification of the commanded reference trajectory ens… ▽ More This paper develops a Time Shift Governor (TSG)-based control scheme to enforce constraints during rendezvous and docking (RD) missions in the setting of the Two-Body problem. As an add-on scheme to the nominal closed-loop system, the TSG generates a time-shifted Chief spacecraft trajectory as a target reference for the Deputy spacecraft. This modification of the commanded reference trajectory ensures that constraints are enforced while the time shift is reduced to zero to effect the rendezvous. Our approach to TSG implementation integrates an LSTM neural network which approximates the time shift parameter as a function of a sequence of past Deputy and Chief spacecraft states. This LSTM neural network is trained offline from simulation data. We report simulation results for RD missions in the Low Earth Orbit (LEO) and on the Molniya orbit to demonstrate the effectiveness of the proposed control scheme. The proposed scheme reduces the time to compute the time shift parameter in most of the scenarios and successfully completes rendezvous missions. △ Less

Submitted 7 December, 2024; originally announced December 2024.

Comments: Taehyeun Kim and Robin Inho Kee contributed equally to this work. 18 pages, 12 figures

arXiv:2412.03745 [pdf, other]

doi 10.1145/3583780.3614838

Deep Variational Bayesian Modeling of Haze Degradation Process

Authors: Eun Woo Im, Junsung Shin, Sungyong Baik, Tae Hyun Kim

Abstract: Relying on the representation power of neural networks, most recent works have often neglected several factors involved in haze degradation, such as transmission (the amount of light reaching an observer from a scene over distance) and atmospheric light. These factors are generally unknown, making dehazing problems ill-posed and creating inherent uncertainties. To account for such uncertainties an… ▽ More Relying on the representation power of neural networks, most recent works have often neglected several factors involved in haze degradation, such as transmission (the amount of light reaching an observer from a scene over distance) and atmospheric light. These factors are generally unknown, making dehazing problems ill-posed and creating inherent uncertainties. To account for such uncertainties and factors involved in haze degradation, we introduce a variational Bayesian framework for single image dehazing. We propose to take not only a clean image and but also transmission map as latent variables, the posterior distributions of which are parameterized by corresponding neural networks: dehazing and transmission networks, respectively. Based on a physical model for haze degradation, our variational Bayesian framework leads to a new objective function that encourages the cooperation between them, facilitating the joint training of and thereby boosting the performance of each other. In our framework, a dehazing network can estimate a clean image independently of a transmission map estimation during inference, introducing no overhead. Furthermore, our model-agnostic framework can be seamlessly incorporated with other existing dehazing networks, greatly enhancing the performance consistently across datasets and models. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: Published in CIKM 2023, 10 pages, 9 figures

Journal ref: In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management 2023 Oct 21 (pp. 895-904)

arXiv:2412.03710 [pdf, other]

CIKAN: Constraint Informed Kolmogorov-Arnold Networks for Autonomous Spacecraft Rendezvous using Time Shift Governor

Authors: Taehyeun Kim, Anouck Girard, Ilya Kolmanovsky

Abstract: The paper considers a Constrained-Informed Neural Network (CINN) approximation for the Time Shift Governor (TSG), which is an add-on scheme to the nominal closed-loop system used to enforce constraints by time-shifting the reference trajectory in spacecraft rendezvous applications. We incorporate Kolmogorov-Arnold Networks (KANs), an emerging architecture in the AI community, as a fundamental comp… ▽ More The paper considers a Constrained-Informed Neural Network (CINN) approximation for the Time Shift Governor (TSG), which is an add-on scheme to the nominal closed-loop system used to enforce constraints by time-shifting the reference trajectory in spacecraft rendezvous applications. We incorporate Kolmogorov-Arnold Networks (KANs), an emerging architecture in the AI community, as a fundamental component of CINN and propose a Constrained-Informed Kolmogorov-Arnold Network (CIKAN)-based approximation for TSG. We demonstrate the effectiveness of the CIKAN-based TSG through simulations of constrained spacecraft rendezvous missions on highly elliptic orbits and present comparisons between CIKANs, MLP-based CINNs, and the conventional TSG. △ Less

Submitted 6 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 10 pages, 4 figures

arXiv:2412.03667 [pdf, ps, other]

Lie, Noether, Kosmann, and Diffeomorphism Anomalies Redux

Authors: Taeyeon Kim, Piljin Yi

Abstract: The Noether procedure carries an inherent ambiguity due to the necessary local extension, no longer a symmetry, of the global symmetry. The gauging should fix the ambiguity once and for all, however, and, for translations, the general covariance demands us to use the Lie derivative. We argue that, with this alone and without any further tweaking, the Noether energy-momentum $\hat{\mathbb{T}}$ must… ▽ More The Noether procedure carries an inherent ambiguity due to the necessary local extension, no longer a symmetry, of the global symmetry. The gauging should fix the ambiguity once and for all, however, and, for translations, the general covariance demands us to use the Lie derivative. We argue that, with this alone and without any further tweaking, the Noether energy-momentum $\hat{\mathbb{T}}$ must equal the symmetric counterpart, $T$, inevitably and show the equality explicitly for general tensors. For spinors, a subtlety with the Lie derivative itself enters the issue and leads us to the Kosmann lift, often unnoticed by the physics community, from which $T=\hat{\mathbb{T}}$ again emerges straightforwardly and in a naturally symmetric form. Finally, we address how the same Kosmann lift affects the anomaly computations and show that the diffeomorphism anomaly from the seminal paper must be halved, while the venerable anomaly polynomials themselves stand unaffected. We discuss the ramifications of these findings. △ Less

Submitted 23 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 61 pages, no figure, references updated, a footnote revised and relocated

Report number: KIAS-P24066

arXiv:2412.03119 [pdf, ps, other]

Degenerate Eulerian polynomials and numbers

Authors: Taekyun Kim, Dae san Kim

Abstract: The aim of this paper is to study degenerate Eulerian polynomials and degenerate Eulerian numbers, respectively as degenerate versions of the Eulerian polynomials and the Eulerian numbers, and to derive some of their properties. Specifically, we derive an identity, recursive relations, generating function and degenerate version of Worpitzky's identity for the degenerate Eulerian polynomials and nu… ▽ More The aim of this paper is to study degenerate Eulerian polynomials and degenerate Eulerian numbers, respectively as degenerate versions of the Eulerian polynomials and the Eulerian numbers, and to derive some of their properties. Specifically, we derive an identity, recursive relations, generating function and degenerate version of Worpitzky's identity for the degenerate Eulerian polynomials and numbers. In addition, we obtain several results involving the degenerate Stirling numbers of the second kind and the degenerate Bernoulli numbers as well as the degenerate Eulerian numbers. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: 13

MSC Class: 11B68; 11B73; 11B83

arXiv:2412.02344 [pdf, other]

UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices

Authors: Seul-Ki Yeom, Tae-Ho Kim

Abstract: Transformer-based architectures have demonstrated remarkable success across various domains, but their deployment on edge devices remains challenging due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without co… ▽ More Transformer-based architectures have demonstrated remarkable success across various domains, but their deployment on edge devices remains challenging due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Unlike traditional multi-head attention (MHA), which redundantly computes separate attention matrices for each head, Reuse Attention consolidates these computations into a shared attention matrix, significantly reducing memory overhead and computational complexity. Comprehensive experiments on ImageNet-1K and downstream tasks show that the proposed UniForm models leveraging Reuse Attention achieve state-of-the-art imagenet classification accuracy while outperforming existing attention mechanisms, such as Linear Attention and Flash Attention, in inference speed and memory scalability. Notably, UniForm-l achieves a 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference time on edge devices like the Jetson AGX Orin, representing up to a 5x speedup over competing benchmark methods. These results demonstrate the versatility of Reuse Attention across high-performance GPUs and edge platforms, paving the way for broader real-time applications △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 13 Pages, 8 Tables, 7 Figures

arXiv:2412.01239 [pdf]

Light-induced hysteresis of electronic polarization in antiferromagnet FePS3

Authors: Kyung Ik Sim, Byung Cheol Park, Taesoo Kim, Byeong Wook Cho, Jae Hoon Kim, Eun-Mi Choi, Young Hee Lee

Abstract: Research on manipulating materials using light has garnered significant interest, yet examples of controlling electronic polarization in magnetic materials remain scarce. Here, we demonstrate the hysteresis of electronic polarization in the antiferromagnetic semiconductor FePS3 via light. Below the Néel temperature, we observe linear dichroism (i.e., optical anisotropy) without structural symmetry… ▽ More Research on manipulating materials using light has garnered significant interest, yet examples of controlling electronic polarization in magnetic materials remain scarce. Here, we demonstrate the hysteresis of electronic polarization in the antiferromagnetic semiconductor FePS3 via light. Below the Néel temperature, we observe linear dichroism (i.e., optical anisotropy) without structural symmetry breaking. Light-induced net polarization aligns along the a-axis (zigzag direction) at 1.6 eV due to the dipolar polarization and along the b-axis (armchair direction) at 2.0 eV due to the combined effects of dipolar and octupolar polarizations, resulting from charge transfer from the armchair to the zigzag direction by light. Unexpected hysteresis of the electronic polarization occurs at 2.0 eV due to the octupolar polarization, in contrast to the absence of such hysteresis at 1.6 eV. We attribute this to a symmetry breaking of the light-induced phase of FePS3 involving electronic polarization within the spin lattice. This study suggests a new mechanism for generating and controlling electronic polarization in magnetic materials using light, with implications for future device applications. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: 34 pages, 5 figures, 13 supplementary figures

arXiv:2412.00611 [pdf, other]

Phase-resolving spin-wave microscopy using infrared strobe light

Authors: Yuzan Xiong, Andrew Christy, Muntasir Mahdi, Rui Sun, Yi Li, Robert D. Geil, James F. Cahoon, Frank Tsui, Binbin Yang, Tae Hee Kim, Jia-Mian Hu, Dali Sun, Michael C. Hamilton, Valentine Novosad, Wei Zhang

Abstract: The needs for sensitively and reliably probing magnetization dynamics have been increasing in various contexts such as studying novel hybrid magnonic systems, in which the spin dynamics strongly and coherently couple to other excitations, including microwave photons, light photons, or phonons. Recent advances in quantum magnonics also highlight the need for employing magnon phase as quantum state… ▽ More The needs for sensitively and reliably probing magnetization dynamics have been increasing in various contexts such as studying novel hybrid magnonic systems, in which the spin dynamics strongly and coherently couple to other excitations, including microwave photons, light photons, or phonons. Recent advances in quantum magnonics also highlight the need for employing magnon phase as quantum state variables, which is to be detected and mapped out with high precision in on-chip micro- and nano-scale magnonic devices. Here, we demonstrate a facile optical technique that can directly perform concurrent spectroscopic and imaging functionalities with spatial- and phase-resolutions, using infrared strobe light operating at 1550-nm wavelength. To showcase the methodology, we spectroscopically studied the phase-resolved spin dynamics in a bilayer of Permalloy and Y3Fe5O12 (YIG), and spatially imaged the backward volume spin wave modes of YIG in the dipolar spin wave regime. Using the strobe light probe, the detected precessional phase contrast can be directly used to construct the map of the spin wave wavefront, in the continuous-wave regime of spin-wave propagation and in the stationary state, without needing any optical reference path. By selecting the applied field, frequency, and detection phase, the spin wave images can be made sensitive to the precession amplitude and phase. Our results demonstrate that infrared optical strobe light can serve as a versatile platform for magneto-optical probing of magnetization dynamics, with potential implications in investigating hybrid magnonic systems. △ Less

Submitted 30 November, 2024; originally announced December 2024.

Comments: 16 pages, 12 figures

arXiv:2411.18542 [pdf, other]

Fermi surface and pseudogap in highly doped Sr$_{2}$IrO$_{4}$

Authors: Y. Alexanian, A. de la Torre, S. McKweon Walker, M. Straub, G. Gatti, A. Hunter, S. Mandloi, E. Cappelli, S. Riccò, F. Y. Bruno, M. Radovic, N. C. Plumb, M. Shi, J. Osiecki, C. Polley, T. K. Kim, P. Dudin, M. Hoesch, R. S. Perry, A. Tamai, F. Baumberger

Abstract: The fate of the Fermi surface in bulk electron-doped Sr$_{2}$IrO$_{4}$ remains elusive, as does the origin and extension of its pseudogap phase. Here, we use high-resolution angle-resolved photoelectron spectroscopy (ARPES) to investigate the electronic structure of Sr$_{2-x}$La$_{x}$IrO$_{4}$ up to $x=0.2$, a factor of two higher than in previous work. We find that the antinodal pseudogap persist… ▽ More The fate of the Fermi surface in bulk electron-doped Sr$_{2}$IrO$_{4}$ remains elusive, as does the origin and extension of its pseudogap phase. Here, we use high-resolution angle-resolved photoelectron spectroscopy (ARPES) to investigate the electronic structure of Sr$_{2-x}$La$_{x}$IrO$_{4}$ up to $x=0.2$, a factor of two higher than in previous work. We find that the antinodal pseudogap persists up to the highest doping level, and thus beyond the sharp increase in Hall carrier density to $\simeq 1+x$ recently observed above $x^{*}=0.16$ [Y.-T. Hsu et al., Nature Physics 20, 1593 (2024)]. This suggests that doped iridates host a unique phase of matter in which a large Hall density coexists with an anisotropic pseudogap, breaking up the Fermi surface into disconnected arcs. The temperature boundary of the pseudogap is $T^{*}\simeq 200$ K for $x=0.2$, comparable to cuprates and to the energy scale of short range antiferromagnetic correlations in cuprates and iridates. △ Less

Submitted 19 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

Comments: 9 pages, 4 figures. Supplementary Information: 3 pages, 3 figures

arXiv:2411.17995 [pdf, other]

Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion

Authors: Taeheon Kim, Sangyun Chung, Youngjoon Yu, Yong Man Ro

Abstract: Multispectral pedestrian detection is a crucial component in various critical applications. However, a significant challenge arises due to the misalignment between these modalities, particularly under real-world conditions where data often appear heavily misaligned. Conventional methods developed on well-aligned or minimally misaligned datasets fail to address these discrepancies adequately. This… ▽ More Multispectral pedestrian detection is a crucial component in various critical applications. However, a significant challenge arises due to the misalignment between these modalities, particularly under real-world conditions where data often appear heavily misaligned. Conventional methods developed on well-aligned or minimally misaligned datasets fail to address these discrepancies adequately. This paper introduces a new framework for multispectral pedestrian detection designed specifically to handle heavily misaligned datasets without the need for costly and complex traditional pre-processing calibration. By leveraging Large-scale Vision-Language Models (LVLM) for cross-modal semantic alignment, our approach seeks to enhance detection accuracy by aligning semantic information across the RGB and thermal domains. This method not only simplifies the operational requirements but also extends the practical usability of multispectral detection technologies in practical applications. △ Less

Submitted 26 November, 2024; originally announced November 2024.

arXiv:2411.16761 [pdf, other]

Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning

Authors: Ji Hyeok Jung, Eun Tae Kim, Seoyeon Kim, Joo Ho Lee, Bumsoo Kim, Buru Chang

Abstract: Multimodal large language models (MLLMs) act as essential interfaces, connecting humans with AI technologies in multimodal applications. However, current MLLMs face challenges in accurately interpreting object orientation in images due to inconsistent orientation annotations in training data, hindering the development of a coherent orientation understanding. To overcome this, we propose egocentric… ▽ More Multimodal large language models (MLLMs) act as essential interfaces, connecting humans with AI technologies in multimodal applications. However, current MLLMs face challenges in accurately interpreting object orientation in images due to inconsistent orientation annotations in training data, hindering the development of a coherent orientation understanding. To overcome this, we propose egocentric instruction tuning, which aligns MLLMs' orientation understanding with the user's perspective, based on a consistent annotation standard derived from the user's egocentric viewpoint. We first generate egocentric instruction data that leverages MLLMs' ability to recognize object details and applies prior knowledge for orientation understanding. Using this data, we perform instruction tuning to enhance the model's capability for accurate orientation interpretation. In addition, we introduce EgoOrientBench, a benchmark that evaluates MLLMs' orientation understanding across three tasks using images collected from diverse domains. Experimental results on this benchmark show that egocentric instruction tuning significantly improves orientation understanding without compromising overall MLLM performance. The instruction data and benchmark dataset are available on our project page at https://github.com/jhCOR/EgoOrientBench. △ Less

Submitted 29 March, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

Comments: CVPR2025 Camera-ready

arXiv:2411.16713 [pdf, other]

Conditional Text-to-Image Generation with Reference Guidance

Authors: Taewook Kim, Ze Wang, Zhengyuan Yang, Jiang Wang, Lijuan Wang, Zicheng Liu, Qiang Qiu

Abstract: Text-to-image diffusion models have demonstrated tremendous success in synthesizing visually stunning images given textual instructions. Despite remarkable progress in creating high-fidelity visuals, text-to-image models can still struggle with precisely rendering subjects, such as text spelling. To address this challenge, this paper explores using additional conditions of an image that provides v… ▽ More Text-to-image diffusion models have demonstrated tremendous success in synthesizing visually stunning images given textual instructions. Despite remarkable progress in creating high-fidelity visuals, text-to-image models can still struggle with precisely rendering subjects, such as text spelling. To address this challenge, this paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate. In addition, this reference condition empowers the model to be conditioned in ways that the vocabularies of the text tokenizer cannot adequately represent, and further extends the model's generalization to novel capabilities such as generating non-English text spellings. We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references. Each plugin is trained with auxiliary networks and loss functions customized for applications such as English scene-text generation, multi-lingual scene-text generation, and logo-image generation. Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters. △ Less

Submitted 22 November, 2024; originally announced November 2024.

arXiv:2411.16160 [pdf, other]

Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems

Authors: Sunghwan Kim, Tongyoung Kim, Kwangwook Seo, Jinyoung Yeo, Dongha Lee

Abstract: Recent approaches in Conversational Recommender Systems (CRSs) have tried to simulate real-world users engaging in conversations with CRSs to create more realistic testing environments that reflect the complexity of human-agent dialogue. Despite the significant advancements, reliably evaluating the capability of CRSs to elicit user preferences still faces a significant challenge. Existing evaluati… ▽ More Recent approaches in Conversational Recommender Systems (CRSs) have tried to simulate real-world users engaging in conversations with CRSs to create more realistic testing environments that reflect the complexity of human-agent dialogue. Despite the significant advancements, reliably evaluating the capability of CRSs to elicit user preferences still faces a significant challenge. Existing evaluation metrics often rely on target-biased user simulators that assume users have predefined preferences, leading to interactions that devolve into simplistic guessing game. These simulators typically guide the CRS toward specific target items based on fixed attributes, limiting the dynamic exploration of user preferences and struggling to capture the evolving nature of real-user interactions. Additionally, current evaluation metrics are predominantly focused on single-turn recall of target items, neglecting the intermediate processes of preference elicitation. To address this, we introduce PEPPER, a novel CRS evaluation protocol with target-free user simulators constructed from real-user interaction histories and reviews. PEPPER enables realistic user-CRS dialogues without falling into simplistic guessing games, allowing users to gradually discover their preferences through enriched interactions, thereby providing a more accurate and reliable assessment of the CRS's ability to elicit personal preferences. Furthermore, PEPPER presents detailed measures for comprehensively evaluating the preference elicitation capabilities of CRSs, encompassing both quantitative and qualitative measures that capture four distinct aspects of the preference elicitation process. Through extensive experiments, we demonstrate the validity of PEPPER as a simulation environment and conduct a thorough analysis of how effectively existing CRSs perform in preference elicitation and recommendation. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Comments: Work in progress

arXiv:2411.13955 [pdf, other]

A silicon-based ion trap chip protected from semiconductor charging

Authors: Daun Chung, Kwangyeul Choi, Woojun Lee, Chiyoon Kim, Hosung Shon, Jeonghyun Park, Beomgeun Cho, Kyungmin Lee, Suhan Kim, Seungwoo Yoo, Eui Hwan Jung, Changhyun Jung, Jiyong Kang, Kyunghye Kim, Roberts Berkis, Tracy Northup, Dong-Il "Dan'' Cho, Taehyun Kim

Abstract: Silicon-based ion trap chips can benefit from existing advanced fabrication technologies, such as multi-metal layer techniques for two-dimensional architectures and silicon photonics for the integration of on-chip optical components. However, the scalability of these technologies may be compromised by semiconductor charging, where photogenerated charge carriers produce electric potentials that dis… ▽ More Silicon-based ion trap chips can benefit from existing advanced fabrication technologies, such as multi-metal layer techniques for two-dimensional architectures and silicon photonics for the integration of on-chip optical components. However, the scalability of these technologies may be compromised by semiconductor charging, where photogenerated charge carriers produce electric potentials that disrupt ion motion. Inspired by recent studies on charge distribution mechanisms in semiconductors, we developed a silicon-based chip with gold coated on all exposed silicon surfaces. This modification significantly stabilized ion motion compared to a chip without such metallic shielding, a result that underscores the detrimental effects of exposed silicon. With the mitigation of background silicon-induced fields to negligible levels, quantum operations such as sideband cooling and two-ion entangling gates, which were previously infeasible with the unshielded chip, can now be implemented. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2411.13309 [pdf]

Anisotropic manipulation of terahertz spin-waves by spin-orbit torque in a canted antiferromagnet

Authors: T. H. Kim, Jung-Il Kim, Geun-Ju Kim, Kwang-Ho Jang, G. -M. Choi

Abstract: We theoretically and numerically elucidate the electrical control over spin waves in antiferromagnetic materials (AFM) with biaxial anisotropies and Dzyaloshinskii-Moriya interactions. The spin wave dispersion in an AFM manifests as a bifurcated spectrum with distinct high-frequency and low-frequency bands. Utilizing a heterostructure comprised of platinum and the AFM, we demonstrate anisotropic c… ▽ More We theoretically and numerically elucidate the electrical control over spin waves in antiferromagnetic materials (AFM) with biaxial anisotropies and Dzyaloshinskii-Moriya interactions. The spin wave dispersion in an AFM manifests as a bifurcated spectrum with distinct high-frequency and low-frequency bands. Utilizing a heterostructure comprised of platinum and the AFM, we demonstrate anisotropic control of spin-wave bands via spin currents with three-dimensional spin polarizations, encompassing both resonant and propagating wave modes. Moreover, leveraging the confined geometry, we explore the possibility of controlling spin waves within a spectral domain ranging from tens of gigahertz to sub-terahertz frequencies. The implications of our findings suggest the potential for developing a terahertz wave source with electrical tunability, thereby facilitating its incorporation into ultrafast, broadband, and wireless communication technologies. △ Less

Submitted 20 November, 2024; originally announced November 2024.

arXiv:2411.12887 [pdf, other]

Investigation of magnetic excitations and charge order in a van der Waals ferromagnet Fe$_5$GeTe$_2$

Authors: V. K. Bhartiya, T. Kim, J. Li, T. P. Darlington, D. J. Rizzo, Y. Gu., S. Fan, C. Nelson, J. W. Freeland, X. Xu, D. N. Basov, J. Pelliciari, A. F. May, C. Mazzoli, V. Bisogni

Abstract: Understanding the complex ground state of van der Waals (vdW) magnets is essential for designing new materials and devices that leverage these platforms. Here, we investigate a two-dimensional vdW ferromagnet -- Fe$_5$GeTe$_2$-- with one of the highest reported Curie temperatures, to elucidate its magnetic excitations and charge order. Using Fe $L_3 - $edge resonant inelastic x-ray scattering, we… ▽ More Understanding the complex ground state of van der Waals (vdW) magnets is essential for designing new materials and devices that leverage these platforms. Here, we investigate a two-dimensional vdW ferromagnet -- Fe$_5$GeTe$_2$-- with one of the highest reported Curie temperatures, to elucidate its magnetic excitations and charge order. Using Fe $L_3 - $edge resonant inelastic x-ray scattering, we find the dual character of magnetic excitations, consisting of a coherent magnon and a continuum, similar to what is reported for its sister compound Fe$_3$GeTe$_2$. The magnon has an energy of $\approx$ 36 meV at the maximum in-plane momentum transfer ($-$0.35 r.l.u.) allowed at Fe $L_3 - $edge. A broad and non-dispersive continuum extends up to 150 meV, 50$\%$ higher energy than in Fe$_3$GeTe$_2$. Its intensity is sinusoidally modulated along the $L$ direction, with a period matching the inter-slab distance. Our findings suggest that while the unconventional dual character of magnetic excitations is generic to ternary Fe-Ge-Te vdW magnets, the correlation length of the out-of-plane magnetic interaction increases in Fe$_5$GeTe$_2$ as compared to Fe$_3$GeTe$_2$, supporting a stronger three-dimensional character for the former. Furthermore, by investigating the $\pm$(1/3, 1/3, $L$) peaks by resonant x-ray diffraction, we conclude these to have structural origin rather than charge order -- as previously reported -- and suggest doubling of the structural unit cell along the $c-$axis. △ Less

Submitted 20 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

Comments: 17 pages, 3 figures

arXiv:2411.10981 [pdf, other]

Accuracy of Stellar Mass-to-light Ratios of Nearby Galaxies in the Near-Infrared

Authors: Taehyun Kim, Minjin Kim, Luis C. Ho, Yang A. Li, Woong-Seob Jeong, Dohyeong Kim, Yongjung Kim, Bomee Lee, Dongseob Lee, Jeong Hwan Lee, Jeonghyun Pyo, Hyunjin Shim, Suyeon Son, Hyunmi Song, Yujin Yang

Abstract: Future satellite missions are expected to perform all-sky surveys, thus providing the entire sky near-infrared spectral data and consequently opening a new window to investigate the evolution of galaxies. Specifically, the infrared spectral data facilitate the precise estimation of stellar masses of numerous low-redshift galaxies. We utilize the synthetic spectral energy distribution (SED) of 2853… ▽ More Future satellite missions are expected to perform all-sky surveys, thus providing the entire sky near-infrared spectral data and consequently opening a new window to investigate the evolution of galaxies. Specifically, the infrared spectral data facilitate the precise estimation of stellar masses of numerous low-redshift galaxies. We utilize the synthetic spectral energy distribution (SED) of 2853 nearby galaxies drawn from the DustPedia (435) and Stripe 82 regions (2418). The stellar mass-to-light ratio ($M_*/L$) estimation accuracy over a wavelength range of $0.75-5.0$ $μ$m is computed through the SED fitting of the multi-wavelength photometric dataset, which has not yet been intensively explored in previous studies. We find that the scatter in $M_*/L$ is significantly larger in the shorter and longer wavelength regimes due to the effect of the young stellar population and the dust contribution, respectively. While the scatter in $M_*/L$ approaches its minimum ($\sim0.10$ dex) at $\sim1.6$ $μ$m, it remains sensitive to the adopted star formation history model. Furthermore, $M_*/L$ demonstrates weak and strong correlations with the stellar mass and the specific star formation rate (SFR), respectively. Upon adequately correcting the dependence of $M_*/L$ on the specific SFR, the scatter in the $M_*/L$ further reduces to $0.02$ dex at $\sim1.6$ $μ$m. This indicates that the stellar mass can be estimated with an accuracy of $\sim0.02$ dex with a prior knowledge of SFR, which can be estimated using the infrared spectra obtained with future survey missions. △ Less

Submitted 17 November, 2024; originally announced November 2024.

Comments: Accepted for publication in AJ. 19 pages, 14 figures

arXiv:2411.09969 [pdf, other]

Steering AI-Driven Personalization of Scientific Text for General Audiences

Authors: Taewook Kim, Dhruv Agarwal, Jordan Ackerman, Manaswi Saha

Abstract: Digital media platforms (e.g., social media, science blogs) offer opportunities to communicate scientific content to general audiences at scale. However, these audiences vary in their scientific expertise, literacy levels, and personal backgrounds, making effective science communication challenging. To address this challenge, we designed TranSlider, an AI-powered tool that generates personalized t… ▽ More Digital media platforms (e.g., social media, science blogs) offer opportunities to communicate scientific content to general audiences at scale. However, these audiences vary in their scientific expertise, literacy levels, and personal backgrounds, making effective science communication challenging. To address this challenge, we designed TranSlider, an AI-powered tool that generates personalized translations of scientific text based on individual user profiles (e.g., hobbies, location, and education). Our tool features an interactive slider that allows users to steer the degree of personalization from 0 (weakly relatable) to 100 (strongly relatable), leveraging LLMs to generate the translations with given degrees. Through an exploratory study with 15 participants, we investigated both the utility of these AI-personalized translations and how interactive reading features influenced users' understanding and reading experiences. We found that participants who preferred higher degrees of personalization appreciated the relatable and contextual translations, while those who preferred lower degrees valued concise translations with subtle contextualization. Furthermore, participants reported the compounding effect of multiple translations on their understanding of scientific content. Given these findings, we discuss several implications of AI-personalized translation tools in facilitating communication in collaborative contexts. △ Less

Submitted 15 November, 2024; originally announced November 2024.

Comments: 23 pages, 5 figures, 1 table

Showing 151–200 of 2,310 results for author: Kim, t