Skip to main content

Showing 1–50 of 371 results for author: Chai, J

.
  1. arXiv:2506.09655  [pdf, ps, other

    cs.AI cs.LG

    DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

    Authors: Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao

    Abstract: Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted to the 42nd International Conference on Machine Learning (ICML 2025)

  2. arXiv:2506.05904  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

    Authors: Yichi Zhang, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon

    Abstract: Recent advances in conversational AI have been substantial, but developing real-time systems for perceptual task guidance remains challenging. These systems must provide interactive, proactive assistance based on streaming visual inputs, yet their development is constrained by the costly and labor-intensive process of data collection and system evaluation. To address these limitations, we present… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2506.02112  [pdf, ps, other

    cs.CV

    SAB3R: Semantic-Augmented Backbone in 3D Reconstruction

    Authors: Xuweiyi Chen, Tian Xia, Sihan Xu, Jianing Yang, Joyce Chai, Zezhou Cheng

    Abstract: We introduce a new task, Map and Locate, which unifies the traditionally distinct objectives of open-vocabulary segmentation - detecting and segmenting object instances based on natural language queries - and 3D reconstruction, the process of estimating a scene's 3D structure from visual inputs. Specifically, Map and Locate involves generating a point cloud from an unposed video and segmenting obj… ▽ More

    Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: 3D-LLM/VLA @ CVPR2025 | Project page: https://uva-computer-vision-lab.github.io/sab3r/

  4. arXiv:2506.00439  [pdf, ps, other

    cs.LG cs.AI

    RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

    Authors: Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao

    Abstract: Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose Reinforcement Learning-Assisted Ense… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  5. arXiv:2505.23723  [pdf, ps, other

    cs.CL cs.AI cs.LG

    ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

    Authors: Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen

    Abstract: The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, most existing approaches rely heavily on manual prompt engineering, failing to adapt and optimize based on diverse experimental experiences. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  6. arXiv:2505.19381  [pdf, ps, other

    cs.AI cs.CV cs.RO

    DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

    Authors: Anqing Jiang, Yu Gao, Zhigang Sun, Yiru Wang, Jijun Wang, Jinghao Chai, Qian Cao, Yuweng Heng, Hao Jiang, Yunda Dong, Zongzheng Zhang, Xianda Guo, Hao Sun, Hao Zhao

    Abstract: Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action divers… ▽ More

    Submitted 2 June, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

    Comments: 4pages

  7. arXiv:2505.11326  [pdf, ps, other

    cs.CV cs.AI

    Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models

    Authors: Keunwoo Peter Yu, Joyce Chai

    Abstract: Vision-language models (VLMs) have shown remarkable progress in offline tasks such as image captioning and video question answering. However, real-time interactive environments impose new demands on VLMs, requiring them to generate utterances that are not only semantically accurate but also precisely timed. We identify two core capabilities necessary for such settings --… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 18 pages

  8. arXiv:2505.08808  [pdf, ps, other

    cs.CV cs.AI

    SparseMeXT Unlocking the Potential of Sparse Representations for HD Map Construction

    Authors: Anqing Jiang, Jinhao Chai, Yu Gao, Yiru Wang, Yuwen Heng, Zhigang Sun, Hao Sun, Zezhong Zhao, Li Sun, Jian Zhou, Lijuan Zhu, Shugong Xu, Hao Zhao

    Abstract: Recent advancements in high-definition \emph{HD} map construction have demonstrated the effectiveness of dense representations, which heavily rely on computationally intensive bird's-eye view \emph{BEV} features. While sparse representations offer a more efficient alternative by avoiding dense BEV processing, existing methods often lag behind due to the lack of tailored designs. These limitations… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  9. arXiv:2505.02462  [pdf, other

    cs.AI cs.CL cs.GT

    Incentivizing Inclusive Contributions in Model Sharing Markets

    Authors: Enpei Zhang, Jingyi Chai, Rui Ye, Yanfeng Wang, Siheng Chen

    Abstract: While data plays a crucial role in training contemporary AI models, it is acknowledged that valuable public data will be exhausted in a few years, directing the world's attention towards the massive decentralized private data. However, the privacy-sensitive nature of raw data and lack of incentive mechanism prevent these valuable data from being fully exploited. Addressing these challenges, this p… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  10. arXiv:2504.16060  [pdf, other

    cs.CL

    Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

    Authors: Ziqiao Ma, Jing Ding, Xuejun Zhang, Dezhi Luo, Jiahe Ding, Sihan Xu, Yuchen Huang, Run Peng, Joyce Chai

    Abstract: Referring Expression Generation (REG) is a core task for evaluating the pragmatic competence of vision-language systems, requiring not only accurate semantic grounding but also adherence to principles of cooperative communication (Grice, 1975). However, current evaluations of vision-language models (VLMs) often overlook the pragmatic dimension, reducing REG to a region-based captioning task and ne… ▽ More

    Submitted 30 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: Homepage: https://vlm-reg.github.io/

  11. arXiv:2503.14350  [pdf, other

    cs.CV cs.AI cs.CL

    VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation

    Authors: Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

    Abstract: Recent video diffusion models have enhanced video editing, but it remains challenging to handle instructional editing and diverse tasks (e.g., adding, removing, changing) within a unified framework. In this paper, we introduce VEGGIE, a Video Editor with Grounded Generation from Instructions, a simple end-to-end framework that unifies video concept editing, grounding, and reasoning based on divers… ▽ More

    Submitted 19 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: First three authors contributed equally. Project page: https://veggie-gen.github.io/

  12. arXiv:2502.13311  [pdf, other

    cs.CL cs.AI

    Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

    Authors: Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai

    Abstract: Intelligent tutoring agents powered by large language models (LLMs) have been increasingly explored to deliver personalized knowledge in areas such as language learning and science education. However, their capabilities in guiding users to solve complex real-world tasks remain underexplored. To address this limitation, in this work, we focus on coding tutoring, a challenging problem that requires… ▽ More

    Submitted 25 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: Accepted to Findings of ACL 2025

  13. arXiv:2502.02800  [pdf, other

    hep-ph

    The $CP$ violations and branching ratios for $B_c^+\to D_{(s)}^+π^+π^-(K^{+}K^{-})$ from interference of the vector mesons in Perturbative QCD

    Authors: Kun Shuai Ye, Gang Lü, Na-Wang, Jian Chai, Xin-Heng Guo

    Abstract: Within the framework of the perturbative QCD approach utilizing $K_T$ factorization, we have investigated the CP violations and branching ratios in the decay processes of $B_{c}^{+}\to D_{(s)} ^{+}V(V\rightarrowπ^{+}π^{-})$ and $B_{c}^{+}\to D_{(s)}^{+}V(V\rightarrow K^{+}K^{-})$, where V denotes three vector mesons $ρ^0$, $ω$, and $φ$. During the $V\to π^+π^-$ and $V\to K^+K^-$ decay processes, w… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: arXiv admin note: text overlap with arXiv:2309.15351

  14. arXiv:2502.00534  [pdf, ps, other

    stat.ML cs.LG

    Transition Transfer $Q$-Learning for Composite Markov Decision Processes

    Authors: Jinhang Chai, Elynn Chen, Lin Yang

    Abstract: To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific vari… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  15. arXiv:2501.13928  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

    Authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

    Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propos… ▽ More

    Submitted 19 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: CVPR 2025. Project website: https://fast3r-3d.github.io/

  16. arXiv:2501.08783  [pdf, ps, other

    hep-ph

    Form factors of light pseudoscalar mesons from the perturbative QCD approach

    Authors: Jian Chai, Shan Cheng

    Abstract: We study the electromagnetic and meson-photon transition form factors (TFF) of light pseudoscalar mesons from the perturbative QCD (pQCD) approach. To comprehensively account for both the longitudinal and transverse nonperturbative dynamics of hadronic constituents, we incorpoarate intrinsic transverse momentum distributions (iTMDs) alongside the conventional light-cone distribution amplitudes (LC… ▽ More

    Submitted 4 June, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: 46 pages, 14 figures, 6 tables, figure 7 updated, version to appear in JHEP

  17. arXiv:2501.04870  [pdf, other

    stat.ML cs.LG

    Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

    Authors: Jinhang Chai, Elynn Chen, Jianqing Fan

    Abstract: In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when sample sizes are limited. While existing transfer learning methods primarily focus on linear regression settings, they lack direct applicability to reinforcemen… ▽ More

    Submitted 11 April, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  18. arXiv:2412.19252  [pdf, other

    stat.ML cs.LG math.OC

    Localized exploration in contextual dynamic pricing achieves dimension-free regret

    Authors: Jinhang Chai, Yaqi Duan, Jianqing Fan, Kaizheng Wang

    Abstract: We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy, and finally enters a pure exploitation stage. The algorithm is shown to achieve a minimax optimal, dimension-free regret… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: 60 pages, 9 figures

  19. arXiv:2412.11927  [pdf, other

    cs.AI cs.CL

    Transparent and Coherent Procedural Mistake Detection

    Authors: Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J. Corso, Joyce Chai

    Abstract: Procedural mistake detection (PMD) is a challenging problem of classifying whether a human user (observed through egocentric video) has successfully executed a task (specified by a procedural text). Despite significant recent efforts, machine performance in the wild remains nonviable, and the reasoning processes underlying this performance are opaque. As such, we extend PMD to require generating v… ▽ More

    Submitted 27 May, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

  20. arXiv:2412.05941  [pdf, other

    hep-ph

    Shedding light on the intrinsic transversal momentum distributions of pion and kaon

    Authors: Jian Chai, Shan Cheng

    Abstract: We propose to introduce the intrinsic transversal momentum distribution functions (iTMDs), in conjunction with the light-cone distribution amplitudes (LCDAs), to elucidate the probability amplitude of encountering a meson state wherein the partons swiftly traverse along the longitudinal axis while gently oscillating in the transversal plane. The primary motivation stems from the oversight of soft… ▽ More

    Submitted 22 March, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

    Comments: 7 pages, 6 figures, 1 table. Matches the version accepted in Physical Review D (Letter)

  21. arXiv:2412.01708  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review

    Authors: Rui Ye, Xianghe Pang, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen

    Abstract: Scholarly peer review is a cornerstone of scientific advancement, but the system is under strain due to increasing manuscript submissions and the labor-intensive nature of the process. Recent advancements in large language models (LLMs) have led to their integration into peer review, with promising results such as substantial overlaps between LLM- and human-generated reviews. However, the unchecke… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 27 pages, 24 figures

  22. Spin-phase transition in an array of quantum rings controlled by cavity photons

    Authors: Vidar Gudmundsson, Vram Mughnetsyan, Hsi-Sheng Goan, Jeng-Da Chai, Nzar Rauf Abdullah, Chi-Shung Tang, Valeriu Moldoveanu, Andrei Manolescu

    Abstract: We model a spin-phase transition in a two-dimensional square array, or a lateral superlattice, of quantum rings in an external perpendicular homogeneous magnetic field. The electron system is placed in a circular cylindrical far-infrared photon cavity with a single circularly symmetric photon mode. Our numerical results reveal that the spin ordering of the two-dimensional electron gas in each quan… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: RevTeX - pdfLaTeX, 11 pages with 9 included pdf figures

    Journal ref: Phys. Rev. B 111, 115304 (2025)

  23. arXiv:2411.08558  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Effect of Top Al$_2$O$_3$ Interlayer Thickness on Memory Window and Reliability of FeFETs With TiN/Al$_2$O$_3$/Hf$_{0.5}$Zr$_{0.5}$O$_2$/SiO$_x$/Si (MIFIS) Gate Structure

    Authors: Tao Hu, Xinpei Jia, Runhao Han, Jia Yang, Mingkai Bai, Saifei Dai, Zeqi Chen, Yajing Ding, Shuai Yang, Kai Han, Yanrong Wang, Jing Zhang, Yuanyuan Zhao, Xiaoyu Ke, Xiaoqing Sun, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

    Abstract: We investigate the effect of top Al2O3 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistors (Si-FeFETs) with TiN/Al$_2$O$_3$/Hf$_{0.5}$Zr$_{0.5}$O$_2$/SiO$_x$/Si (MIFIS) gate structure. We find that the MW first increases and then remains almost constant with the increasing thickness of the top Al2O3. The phenomenon is attributed to the lower electric… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 7 pages, 12 figures

  24. arXiv:2411.03968  [pdf, ps, other

    hep-ph hep-ex

    $B\to K\bar K(πη)h$ decays in the presence of isovector scalar resonances $a_0(980,1450)$

    Authors: Si-Yang Wang, Zhi-Qing Zhang, Zhi-Jie Sun, Jian Chai, Peng Li

    Abstract: Different from the previous treatment in a two-body framework, we introduce the dimeson distribution amplitudes (DAs) to describe the strong dynamics between the S-wave resonances $a_0(980, 1450)$ and the $K\bar K (πη)$ pair, where the Gegenbauer coefficient required is determined from the experimental data on the time-like form factors involved. The branching ratios and direct CP asymmetries of t… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 22 pages, 2 figures

  25. arXiv:2411.03603  [pdf, other

    cs.MA

    CPIG: Leveraging Consistency Policy with Intention Guidance for Multi-agent Exploration

    Authors: Yuqian Fu, Yuanheng Zhu, Haoran Li, Zijie Zhao, Jiajun Chai, Dongbin Zhao

    Abstract: Efficient exploration is crucial in cooperative multi-agent reinforcement learning (MARL), especially in sparse-reward settings. However, due to the reliance on the unimodal policy, existing methods are prone to falling into the local optima, hindering the effective exploration of better policies. Furthermore, in sparse-reward settings, each agent tends to receive a scarce reward, which poses sign… ▽ More

    Submitted 6 December, 2024; v1 submitted 5 November, 2024; originally announced November 2024.

  26. arXiv:2410.24218  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.RO

    Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use

    Authors: Jiajun Xi, Yinong He, Jianing Yang, Yinpei Dai, Joyce Chai

    Abstract: In real-world scenarios, it is desirable for embodied agents to have the ability to leverage human language to gain explicit or implicit knowledge for learning tasks. Despite recent progress, most previous approaches adopt simple low-level instructions as language inputs, which may not reflect natural human communication. It's not clear how to incorporate rich language use to facilitate task learn… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024 Main. Project website: https://github.com/sled-group/Teachable_RL

  27. arXiv:2410.17385  [pdf, other

    cs.CL cs.CV

    Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

    Authors: Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma

    Abstract: Spatial expressions in situated communication can be ambiguous, as their meanings vary depending on the frames of reference (FoR) adopted by speakers and listeners. While spatial language understanding and reasoning by vision-language models (VLMs) have gained increasing attention, potential ambiguities in these models are still under-explored. To address this issue, we present the COnsistent Mult… ▽ More

    Submitted 17 April, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025 (Oral) | Project page: https://spatial-comfort.github.io/

  28. arXiv:2410.05725  [pdf, other

    cs.CR cs.AI

    KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

    Authors: Wenhao Wang, Xiaoyu Liang, Rui Ye, Jingyi Chai, Siheng Chen, Yanfeng Wang

    Abstract: The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data. However, this practice raises privacy concerns due to the memorization of LLMs. Existing solutions, such as utilizing synthetic data for substitution, struggle to simultaneously improve performance and preserve privacy. They either rely on a local model for generation, resulting in a pe… ▽ More

    Submitted 9 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024 Main

  29. arXiv:2409.14674  [pdf, other

    cs.RO cs.CL cs.CV

    RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

    Authors: Yinpei Dai, Jayjun Lee, Nima Fazeli, Joyce Chai

    Abstract: Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grai… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: Project Website: https://rich-language-failure-recovery.github.io

  30. arXiv:2409.12485  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall physics.app-ph

    Liquid Metal Oxide-assisted Integration of High-k Dielectrics and Metal Contacts for Two-Dimensional Electronics

    Authors: Dasari Venkatakrishnarao, Abhishek Mishra, Yaoju Tarn, Michel Bosman, Rainer Lee, Sarthak Das, Subhrajit Mukherjee, Teymour Talha-Dean, Yiyu Zhang, Siew Lang Teo, Jian Wei Chai, Fabio Bussolotti, Kuan Eng Johnson Goh, Chit Siong Lau

    Abstract: Two-dimensional van der Waals semiconductors are promising for future nanoelectronics. However, integrating high-k gate dielectrics for device applications is challenging as the inert van der Waals material surfaces hinder uniform dielectric growth. Here, we report a liquid metal oxide-assisted approach to integrate ultrathin, high-k HfO2 dielectric on 2D semiconductors with atomically smooth inte… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Journal ref: ACS Nano, 2024

  31. The tuning of para- and diamagnetic cavity photon excitations in a square array of quantum dots in a magnetic field

    Authors: Vidar Gudmundsson, Vram Mughnetsyan, Hsi-Sheng Goan, Jeng-Da Chai, Nzar Rauf Abdullah, Chi-Shung Tang, Valeriu Moldoveanu, Andrei Manolescu

    Abstract: We employ a ``real-time'' excitation scheme to calculate the excitation spectra of a two-dimensional electron system in a square array of quantum dots placed in a circular cylindrical far-infrared photon cavity subjected to a perpendicular homogeneous external magnetic field. The Coulomb interaction of the electrons is handled via spin density functional theory and the para- and the diamagnetic pa… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: RevTeX - pdfLaTeX, 14 pages with 15 included pdf and png figures

    Journal ref: Physical Review B 110, 205301 (2024)

  32. arXiv:2409.07136  [pdf, other

    cs.CL cs.AI cs.MA

    Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models

    Authors: Rui Ye, Rui Ge, Yuchi Fengting, Jingyi Chai, Yanfeng Wang, Siheng Chen

    Abstract: Federated instruction tuning enables multiple clients to collaboratively fine-tune a shared large language model (LLM) that can follow humans' instructions without directly sharing raw data. However, existing literature impractically requires that all the clients readily hold instruction-tuning data (i.e., structured instruction-response pairs), which necessitates massive human annotations since c… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 11 pages, work in progress

  33. arXiv:2409.05847  [pdf, other

    cs.CV

    LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

    Authors: Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, LingLing Li, Hao Fang, Feiyu Pan, Xiankai Lu , et al. (8 additional authors not shown)

    Abstract: Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: ECCV 2024 LSVOS Challenge Report: https://lsvos.github.io/

  34. arXiv:2409.02508  [pdf, other

    cs.CV

    TLD: A Vehicle Tail Light signal Dataset and Benchmark

    Authors: Jinhao Chai, Shiyi Mu, Shugong Xu

    Abstract: Understanding other drivers' intentions is crucial for safe driving. The role of taillights in conveying these intentions is underemphasized in current autonomous driving systems. Accurately identifying taillight signals is essential for predicting vehicle behavior and preventing collisions. Open-source taillight datasets are scarce, often small and inconsistently annotated. To address this gap, w… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  35. arXiv:2408.13582  [pdf, other

    cs.CV

    CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track

    Authors: Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu

    Abstract: Video object segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this technical report, we briefly introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos o… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  36. arXiv:2407.10038  [pdf, ps, other

    math.NT math.RT

    Asai gamma factors over finite fields

    Authors: Jingsong Chai

    Abstract: In this note, we define and study Asai gamma factors over finite fields. We also prove some results about local Asai L-functions over p-adic fields for level zero representations.

    Submitted 13 July, 2024; originally announced July 2024.

  37. arXiv:2407.07035  [pdf, other

    cs.CL cs.CV

    Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

    Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

    Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the… ▽ More

    Submitted 29 December, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Authors contributed equally to this work, and supervisors contributed equal advising to this work; GitHub repository: https://github.com/zhangyuejoslin/VLN-Survey-with-Foundation-Models

  38. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 31 October, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to NeurIPS 2024 | Project page: https://multi-object-hallucination.github.io/

  39. arXiv:2406.17044  [pdf, other

    quant-ph

    Fault-tolerant embedding of quantum circuits on hardware architectures via swap gates

    Authors: Shao-Hen Chiew, Ezequiel Ignacio Rodriguez Chiacchio, Vishal Sharma, Jing Hao Chai, Hui Khoon Ng

    Abstract: In near-term quantum computing devices, connectivity between qubits remain limited by architectural constraints. A computational circuit with given connectivity requirements necessary for multi-qubit gates have to be embedded within physical hardware with fixed connectivity. Long-distance gates have to be done by first routing the relevant qubits together. The simplest routing strategy involves th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  40. arXiv:2406.15478  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Impact of the Top SiO2 Interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

    Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

    Abstract: We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2404.15825

  41. arXiv:2406.10630  [pdf, other

    cs.CL cs.AI cs.CR cs.MA

    Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

    Authors: Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning (FL) enables multiple parties to collaboratively fine-tune an large language model (LLM) without the need of direct data sharing. Ideally, by training on decentralized data that is aligned with human preferences and safety principles, federated instruction tuning can result in an LLM that could behave in a helpful and safe manner. In this paper, we for the first time reveal the… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 18 pages

  42. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 10 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: proposing "bidirectional human-AI alignment" framework after a systematic review of over 400 alignment papers

  43. arXiv:2406.05132  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

    Authors: Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

    Abstract: The integration of language and 3D perception is crucial for embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is a lack of large-scale datasets with dense gr… ▽ More

    Submitted 20 March, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: CVPR 2025. Project website: https://3d-grand.github.io

  44. arXiv:2406.04845  [pdf, other

    cs.CL cs.AI cs.DC cs.LG cs.MA

    FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    Authors: Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous wo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages

  45. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  46. arXiv:2406.03008  [pdf, other

    cs.CV cs.AI cs.CL

    DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

    Authors: Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai

    Abstract: Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpect… ▽ More

    Submitted 15 October, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  47. arXiv:2405.18256  [pdf

    cond-mat.mtrl-sci

    Electrical Control Grain Dimensionality with Multilevel Magnetic Anisotropy

    Authors: Shengyao Li, Sabpreet Bhatti, Siew Lang Teo, Ming Lin, Xinyue Pan, Zherui Yang, Peng Song, Wanghao Tian, Xinyu He, Jianwei Chai, Xian Jun Loh, Qiang Zhu, S. N. Piramanayagam, Xiao Renshaw Wang

    Abstract: In alignment with the increasing demand for larger storage capacity and longer data retention, electrical control of magnetic anisotropy has been a research focus in the realm of spintronics. Typically, magnetic anisotropy is determined by grain dimensionality, which is set during the fabrication of magnetic thin films. Despite the intrinsic correlation between magnetic anisotropy and grain dimens… ▽ More

    Submitted 18 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  48. arXiv:2405.13828  [pdf, other

    cs.CL cs.AI

    Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

    Authors: Ziqiao Ma, Zekun Wang, Joyce Chai

    Abstract: Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent advancements in large language models have primarily adopted a non-interactive training paradigm, and refined pre-trained models through feedback afterw… ▽ More

    Submitted 18 April, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NAACL 2025 (Main) & Workshop on Large Language Models and Cognition @ ICML 2024 (Oral)

  49. arXiv:2405.09187  [pdf, ps, other

    physics.chem-ph cond-mat.mtrl-sci physics.comp-ph quant-ph

    Spin Symmetry in Thermally-Assisted-Occupation Density Functional Theory

    Authors: Yu-Yang Wang, Jeng-Da Chai

    Abstract: For electronic systems with multi-reference (MR) character, Kohn-Sham density functional theory (KS-DFT) with the conventional exchange-correlation (xc) energy functionals can lead to incorrect spin densities and related properties. For example, for H2 dissociation, the spin-restricted and spin-unrestricted solutions obtained with the same xc energy functional in KS-DFT can be distinctly different… ▽ More

    Submitted 29 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: accepted for publication in Phys. Rev. A, 23 pages, 5 figures

    Journal ref: Phys. Rev. A 109, 062808 (2024)

  50. arXiv:2404.15825  [pdf

    physics.app-ph

    Impact of Top SiO2 interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

    Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

    Abstract: We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window.

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 4 page 7 figures