Skip to main content

Showing 51–100 of 984 results for author: Yin., Y

.
  1. arXiv:2503.22718  [pdf, other

    cs.MA

    LLM-ABM for Transportation: Assessing the Potential of LLM Agents in System Analysis

    Authors: Tianming Liu, Jirong Yang, Yafeng Yin

    Abstract: Agent-based modeling approaches represent the state-of-art in modeling travel demand and transportation system dynamics and are valuable tools for transportation planning. However, established agent-based approaches in transportation rely on multi-hierarchical mathematical models to simulate travel behavior, which faces theoretical and practical limitations. The advent of large language models (LL… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted by The 1st Workshop on AI for Urban Planning at AAAI 2025

  2. arXiv:2503.22204  [pdf, other

    cs.CV

    Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting

    Authors: Yiren Lu, Yunlai Zhou, Yiran Qiao, Chaoda Song, Tuo Liang, Jing Ma, Yu Yin

    Abstract: Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Project page: https://vulab-ai.github.io/Segment-then-Splat/

  3. arXiv:2503.22038  [pdf, other

    cs.MA cs.CL

    Debate-Driven Multi-Agent LLMs for Phishing Email Detection

    Authors: Ngoc Tuong Vy Nguyen, Felix D Childress, Yunting Yin

    Abstract: Phishing attacks remain a critical cybersecurity threat. Attackers constantly refine their methods, making phishing emails harder to detect. Traditional detection methods, including rule-based systems and supervised machine learning models, either rely on predefined patterns like blacklists, which can be bypassed with slight modifications, or require large datasets for training and still can gener… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to the 13th International Symposium on Digital Forensics and Security (ISDFS 2025)

    Report number: 2768-1831

    Journal ref: 2025 13th International Symposium on Digital Forensics and Security (ISDFS)

  4. arXiv:2503.21544  [pdf, other

    cs.CL cs.AI cs.LG

    SWI: Speaking with Intent in Large Language Models

    Authors: Yuwei Yin, EunJeong Hwang, Giuseppe Carenini

    Abstract: Intent, typically clearly formulated and planned, functions as a cognitive framework for reasoning and problem-solving. This paper introduces the concept of Speaking with Intent (SWI) in large language models (LLMs), where the explicitly generated intent encapsulates the model's underlying intention and provides high-level planning to guide subsequent analysis and communication. By emulating delib… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 24 pages. Code: https://github.com/YuweiYin/SWI

    ACM Class: I.2.7

  5. arXiv:2503.20233  [pdf, other

    cs.SI cs.AI cs.CE cs.HC

    Dynamic Learning and Productivity for Data Analysts: A Bayesian Hidden Markov Model Perspective

    Authors: Yue Yin

    Abstract: Data analysts are essential in organizations, transforming raw data into insights that drive decision-making and strategy. This study explores how analysts' productivity evolves on a collaborative platform, focusing on two key learning activities: writing queries and viewing peer queries. While traditional research often assumes static models, where performance improves steadily with cumulative le… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 29 pages; a shorter 11-page version is accepted by HCI International (HCII) 2025;

  6. arXiv:2503.19445  [pdf, other

    physics.chem-ph

    LOCAL: A Graph-Based Active Learning Approach for Stability Analysis of DAC@NG Catalysts

    Authors: Yue Yin, Jiangshan He, Hai Xiao

    Abstract: Dual atomic catalysts supported by nitrogen-doped graphene (DAC@NG) offer significant potential in catalytic applications by overcoming intrinsic limitations associated with single atomic catalysts. However, accurately determining their stability and atomic-scale configurations remains computationally challenging due to extensive structural variability. In this study, we present the LOCalization a… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  7. arXiv:2503.19425  [pdf, other

    physics.chem-ph

    Oxidation States in Solids from Data-Driven Paradigms

    Authors: Yue Yin, Hai Xiao

    Abstract: The oxidation state (OS) is an essential chemical concept that embodies chemical intuition but cannot be computed with well-defined physical laws. We establish a data-driven paradigm, with its implementation as Tsinghua Oxidation States in Solids (TOSS), to explicitly compute the OSs in crystal structures as the emergent properties from large-sized datasets based on Bayesian maximum a posteriori p… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  8. arXiv:2503.18461  [pdf, other

    cs.CV

    MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing

    Authors: Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu

    Abstract: Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appear… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 17 pages, 14 figures

  9. arXiv:2503.16965  [pdf, other

    cs.CL cs.CV

    Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning

    Authors: Zhe Hu, Jing Li, Zhongzhu Pu, Hou Pong Chan, Yu Yin

    Abstract: Vision Language Models exhibited immense potential for embodied AI, yet they often lack the sophisticated situational reasoning required for complex decision-making. This paper shows that VLMs can achieve surprisingly strong decision-making performance when visual scenes are represented merely as text-only descriptions, suggesting foundational reasoning can be effectively learned from language. Mo… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

  10. arXiv:2503.15974  [pdf, other

    physics.atom-ph

    Thermal resonance-enhanced transparency in room temperature Rydberg gases

    Authors: Jinlian Hu, Yuechun Jiao, Yuwen Yin, Cheng Lu, Jingxu Bai, Suotang Jia, Weibin Li, Zhengyang Bai, Jianming Zhao

    Abstract: We report the enhanced optical transmission in the coherent, off-resonant excitation of Rydberg atom gases at room temperature via a two-photon process. Here thermal resonance-enhanced transparency (TRET) is induced when the detuning of the two lasers is adjusted to compensate the atomic thermal-motion-induced energy shifts, i.e. single and two-photon Doppler shifts. We show that the atomic veloci… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 7 pages, 4 figures

  11. arXiv:2503.15835  [pdf, other

    cs.CV

    BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

    Authors: Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin

    Abstract: 3D Gaussian Splatting (3DGS) has shown remarkable potential for static scene reconstruction, and recent advancements have extended its application to dynamic scenes. However, the quality of reconstructions depends heavily on high-quality input images and precise camera poses, which are not that trivial to fulfill in real-world scenarios. Capturing dynamic scenes with handheld monocular cameras, fo… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: CVPR2025. Project page at https://vulab-ai.github.io/BARD-GS/

  12. arXiv:2503.12890  [pdf, other

    cond-mat.str-el

    $3d$ flat bands and coupled $4f$ moments in the kagome-honeycomb permanent magnet Sm$_{2}$Co$_{17}$

    Authors: Hao Zheng, Zhiguang Xiao, Ze Pan, Guowei Yang, Yonghao Liu, Jianzhou Bian, Yi Wu, Teng Hua, Jiawen Zhang, Jiayi Lu, Jiong Li, Tulai Sun, Yu Song, Ruihua He, J. Larrea Jiménez, Guanghan Cao, Huiqiu Yuan, Yuanfeng Xu, Yi Yin, Ming Shi, Chao Cao, Yang Liu

    Abstract: Rare earth permanent magnets (REPMs) with both localized moments and itinerant conduction bands are not only important for fundamental research but also have significant technological applications. In particular, Sm$_{\rm 2}$Co$_{\rm 17}$ is a prototypical high-temperture REPM, where the Co atoms form a kagome-honeycomb stacked lattice. Here we report synthesis of epitaxial Sm$_{\rm 2}$Co… ▽ More

    Submitted 19 May, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: 9 pages, 5 figures

  13. arXiv:2503.11251  [pdf, other

    cs.CV cs.CL

    Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

    Authors: Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong , et al. (29 additional authors not shown)

    Abstract: We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results de… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  14. arXiv:2503.10127  [pdf, other

    cs.CV

    PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

    Authors: Runze He, Bo Cheng, Yuhang Ma, Qingxiang Jia, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin

    Abstract: In this paper, we propose a unified layout planning and image generation model, PlanGen, which can pre-plan spatial layout conditions before generating images. Unlike previous diffusion-based models that treat layout planning and layout-to-image as two separate models, PlanGen jointly models the two tasks into one autoregressive transformer using only next-token prediction. PlanGen integrates layo… ▽ More

    Submitted 30 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 15 pages, 12 figures, project page: https://360cvgroup.github.io/PlanGen

  15. arXiv:2503.09937  [pdf, other

    hep-ph

    Spin density matrix for neutral $ρ$ mesons in a pion gas in linear response theory

    Authors: Yi-Liang Yin, Wen-Bo Dong, Cong Yi, Qun Wang

    Abstract: We calculate the spin density matrix for neutral $ρ$ mesons from the spectral function and thermal shear tensor by Kubo formula in the linear response theory, which contributes to the $γ$ correlator for the CME search. We derive the spectral function of neutral $ρ$ mesons with $ρππ$ and $ρρππ$ interactions using the Dyson-Schwinger equation. The thermal shear tensor contribution is obtained from t… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: ReVTex 4.1, 15 pages, 5 figures

  16. arXiv:2503.09242  [pdf, other

    cs.CV

    NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers

    Authors: Yuhang Ma, Bo Cheng, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin

    Abstract: Flow-based transformer models for image generation have achieved state-of-the-art performance with larger model parameters, but their inference deployment cost remains high. To enhance inference performance while maintaining generation quality, we propose progressive rectified flow transformers. We divide the rectified flow into different stages according to resolution, using fewer transformer lay… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  17. arXiv:2503.08157  [pdf, other

    cs.CV

    U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers

    Authors: Zhanjie Zhang, Ao Ma, Ke Cao, Jing Wang, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng, Yuhui Yin

    Abstract: Ultra-high quality artistic style transfer refers to repainting an ultra-high quality content image using the style information learned from the style image. Existing artistic style transfer methods can be categorized into style reconstruction-based and content-style disentanglement-based style transfer approaches. Although these methods can generate some artistic stylized images, they still exhib… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  18. arXiv:2503.08153  [pdf, other

    cs.CV

    WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation

    Authors: Jing Wang, Ao Ma, Ke Cao, Jun Zheng, Zhanjie Zhang, Jiasong Feng, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng, Yuhui Yin, Xiaodan Liang

    Abstract: Recent rapid advancements in text-to-video (T2V) generation, such as SoRA and Kling, have shown great potential for building world simulators. However, current T2V models struggle to grasp abstract physical principles and generate videos that adhere to physical laws. This challenge arises primarily from a lack of clear guidance on physical information due to a significant gap between abstract phys… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  19. arXiv:2503.07654  [pdf, other

    cs.LG

    MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration

    Authors: Jinguang Wang, Jingyu Wang, Haifeng Sun, Tingting Yang, Zirui Zhuang, Wanyi Ning, Yuexi Yin, Qi Qi, Jianxin Liao

    Abstract: Quantization has been widely used to compress and accelerate inference of large language models (LLMs). Existing methods focus on exploring the per-token dynamic calibration to ensure both inference acceleration and model accuracy under 4-bit quantization. However, in autoregressive generation inference of long sequences, the overhead of repeated dynamic quantization and dequantization steps becom… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  20. arXiv:2503.06252  [pdf, other

    cs.CV cs.AI

    Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

    Authors: Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang

    Abstract: In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of "slow thinking" into multimodal large language models (MLLMs). Our core idea is that different levels of reasoning abilities can be combined dynamically to tackle questions with different complexity. To this end, we propose a paradigm of Self-structured Chain of Thought (SCoT), which… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  21. arXiv:2503.04852  [pdf, other

    cs.CV cs.LG

    CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

    Authors: Disheng Liu, Yiran Qiao, Wuche Liu, Yiren Lu, Yunlai Zhou, Tuo Liang, Yu Yin, Jing Ma

    Abstract: True intelligence hinges on the ability to uncover and leverage hidden causal relations. Despite significant progress in AI and computer vision (CV), there remains a lack of benchmarks for assessing models' abilities to infer latent causality from complex visual data. In this paper, we introduce \textsc{\textbf{Causal3D}}, a novel and comprehensive benchmark that integrates structured data (tables… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  22. arXiv:2503.04778  [pdf, ps, other

    physics.gen-ph

    The Aesthetic Imperative of Lev Landau's Geometric Reductionism in Theoretical Physics

    Authors: Jingxu Wu, Yuwei Yin

    Abstract: This paper explores the ontological and epistemological foundations of Lev Landau's theoretical physics through the lens of his unpublished philosophical notes and scientific practice. We identify a unique form of geometric reductionism where physical laws emerge as inevitable consequences of symmetry breaking in progressively constrained phase spaces. Landau's dismissal of quantum interpretation… ▽ More

    Submitted 21 February, 2025; originally announced March 2025.

  23. arXiv:2503.04369  [pdf, other

    cs.CL

    Lost in Literalism: How Supervised Training Shapes Translationese in LLMs

    Authors: Yafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang

    Abstract: Large language models (LLMs) have achieved remarkable success in machine translation, demonstrating impressive performance across diverse languages. However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge in LLM-based translation systems. Despite their pre-training on vast corpora of natural utterances, LLMs exhibit translationese errors… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 19 pages;

  24. arXiv:2503.04306  [pdf, other

    astro-ph.HE

    EP240801a/XRF 240801B: An X-ray Flash Detected by the Einstein Probe and Implications of its Multiband Afterglow

    Authors: Shuai-Qing Jiang, Dong Xu, Agnes P. C. van Hoof, Wei-Hua Lei, Yuan Liu, Hao Zhou, Yong Chen, Shao-Yu Fu, Jun Yang, Xing Liu, Zi-Pei Zhu, Alexei V. Filippenko, Peter G. Jonker, A. S. Pozanenko, He Gao, Xue-Feng Wu, Bing Zhang, Gavin P Lamb, Massimiliano De Pasquale, Shiho Kobayashi, Franz Erik Bauer, Hui Sun, Giovanna Pugliese, Jie An, Valerio D'Elia , et al. (67 additional authors not shown)

    Abstract: We present multiband observations and analysis of EP240801a, a low-energy, extremely soft gamma-ray burst (GRB) discovered on August 1, 2024 by the Einstein Probe (EP) satellite, with a weak contemporaneous signal also detected by Fermi/GBM. Optical spectroscopy of the afterglow, obtained by GTC and Keck, identified the redshift of $z = 1.6734$. EP240801a exhibits a burst duration of 148 s in X-ra… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 22 pages, 11 figures, submitted to ApJ

  25. arXiv:2503.04258  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    TAIL: Text-Audio Incremental Learning

    Authors: Yingfei Sun, Xu Gu, Wei Ji, Hanbin Zhao, Hao Fei, Yifang Yin, Roger Zimmermann

    Abstract: Many studies combine text and audio to capture multi-modal information but they overlook the model's generalization ability on new datasets. Introducing new datasets may affect the feature space of the original dataset, leading to catastrophic forgetting. Meanwhile, large model parameters can significantly impact training performance. To address these limitations, we introduce a novel task called… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 4 figures, 5 tables

    ACM Class: I.2

  26. arXiv:2503.03997  [pdf, other

    physics.flu-dyn

    Efficient neural topology optimization via active learning for enhancing turbulent mass transfer in fluid channels

    Authors: Chenhui Kou, Yuhui Yin, Min Zhu, Shengkun Jia, Yiqing Luo, Xigang Yuana, Lu Lu

    Abstract: The design of fluid channel structures of reactors or separators of chemical processes is key to enhancing the mass transfer processes inside the devices. However, the systematic design of channel topological structures is difficult for complex turbulent flows. Here, we address this challenge by developing a machine learning framework to efficiently perform topology optimization of channel structu… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  27. arXiv:2503.02951  [pdf, other

    cs.LG cs.AI cs.CL

    KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

    Authors: Zhangchen Xu, Yang Liu, Yueqin Yin, Mingyuan Zhou, Radha Poovendran

    Abstract: We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing code-focused resources typically fail to ensure either the breadth of coverage (e.g., spanning simple coding tasks to advanced algorithmic problems) or verifiable correct… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Codes and Data: https://kodcode-ai.github.io/

  28. arXiv:2503.01164  [pdf, other

    cs.CV

    Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis

    Authors: Yitao Zhu, Yuan Yin, Jiaming Li, Mengjie Xu, Zihao Zhao, Honglin Xiong, Sheng Wang, Qian Wang

    Abstract: The adoption of visual foundation models has become a common practice in computer-aided diagnosis (CAD). While these foundation models provide a viable solution for creating generalist medical AI, privacy concerns make it difficult to pre-train or continuously update such models across multiple domains and datasets, leading many studies to focus on specialist models. To address this challenge, we… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  29. arXiv:2503.01143  [pdf, other

    cs.LG

    DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning

    Authors: Teng Pang, Bingzheng Wang, Guoqiang Wu, Yilong Yin

    Abstract: Offline preference-based reinforcement learning (PbRL) mitigates the need for reward definition, aligning with human preferences via preference-driven reward feedback without interacting with the environment. However, the effectiveness of preference-driven reward functions depends on the modeling ability of the learning model, which current MLP-based and Transformer-based methods may fail to adequ… ▽ More

    Submitted 13 May, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

  30. arXiv:2503.00884  [pdf, other

    cs.LG

    Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model

    Authors: Rundong He, Yicong Dong, Lanzhe Guo, Yilong Yin, Tailin Wu

    Abstract: Semi-supervised learning (SSL) effectively leverages unlabeled data and has been proven successful across various fields. Current safe SSL methods believe that unseen classes in unlabeled data harm the performance of SSL models. However, previous methods for assessing the impact of unseen classes on SSL model performance are flawed. They fix the size of the unlabeled dataset and adjust the proport… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Published as a conference paper at ICLR 2025

  31. arXiv:2503.00476  [pdf, other

    cs.LG

    G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition

    Authors: Yicong Dong, Rundong He, Guangyao Chen, Wentao Zhang, Zhongyi Han, Jieming Shi, Yilong Yin

    Abstract: Graph Neural Networks (GNNs) have achieved significant success in machine learning, with wide applications in social networks, bioinformatics, knowledge graphs, and other fields. Most research assumes ideal closed-set environments. However, in real-world open-set environments, graph learning models face challenges in robustness and reliability due to unseen classes. This highlights the need for Gr… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: 10 pages,2 figures

  32. arXiv:2502.21016  [pdf, other

    cond-mat.supr-con

    The $s\pm$ pairing symmetry in the pressured La$_3$Ni$_2$O$_7$ from electron-phonon coupling

    Authors: Yucong Yin, Jun Zhan, Boyang Liu, Xinloong Han

    Abstract: The recently discovered bilayer Ruddlesden-Popper nickelate La$_3$Ni$_2$O$_7$ exhibits superconductivity with a remarkable transition temperature $T_c\approx 80 $ K under applied pressures above 14.0 GPa. This discovery of new family of high-temperature superconductors has garnered significant attention in the condensed matter physics community. In this work, we assume the this high-temperature su… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 7 pages, 7 figures

  33. arXiv:2502.18992  [pdf

    cs.IR

    OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models

    Authors: Hui Feng, Yuntzu Yin, Emiliano Reynares, Jay Nanavati

    Abstract: Biomedical ontologies, which comprehensively define concepts and relations for biomedical entities, are crucial for structuring and formalizing domain-specific information representations. Biomedical code mapping identifies similarity or equivalence between concepts from different ontologies. Obtaining high-quality mapping usually relies on automatic generation of unrefined mapping with ontology d… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: This paper has been accepted as a workshop paper for KEIR@ECIR 2025

  34. arXiv:2502.18442  [pdf, other

    physics.atom-ph physics.app-ph physics.chem-ph quant-ph

    Ion counting and temperature determination of Coulomb-crystallized laser-cooled ions in traps using convolutional neural networks

    Authors: Yanning Yin, Stefan Willitsch

    Abstract: Coulomb crystals, ordered structures of cold ions confined in ion traps, find applications in a variety of research fields. The number and temperature of the ions forming the Coulomb crystals are two key attributes of interest in many trapped-ion experiments. Here, we present a fast and accurate approach of determining these attributes from fluorescence images of the ions based on convolutional ne… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  35. arXiv:2502.16913  [pdf, other

    cs.CV

    HVIS: A Human-like Vision and Inference System for Human Motion Prediction

    Authors: Kedi Lyu, Haipeng Chen, Zhenguang Liu, Yifang Yin, Yukang Lin, Yingying Jiao

    Abstract: Grasping the intricacies of human motion, which involve perceiving spatio-temporal dependence and multi-scale effects, is essential for predicting human motion. While humans inherently possess the requisite skills to navigate this issue, it proves to be markedly more challenging for machines to emulate. To bridge the gap, we propose the Human-like Vision and Inference System (HVIS) for human motio… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  36. arXiv:2502.14627  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

    Authors: Yuguo Yin, Yuxin Xie, Wenyuan Yang, Dongchao Yang, Jinghan Ru, Xianwei Zhuang, Liming Liang, Yuexian Zou

    Abstract: Multilingual audio-text retrieval (ML-ATR) is a challenging task that aims to retrieve audio clips or multilingual texts from databases. However, existing ML-ATR schemes suffer from inconsistencies for instance similarity matching across languages. We theoretically analyze the inconsistency in terms of both multilingual modal alignment direction error and weight error, and propose the theoretical… ▽ More

    Submitted 4 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  37. arXiv:2502.14377  [pdf, other

    cs.CV

    RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

    Authors: Ke Cao, Jing Wang, Ao Ma, Jiasong Feng, Zhanjie Zhang, Xuanhua He, Shanyuan Liu, Bo Cheng, Dawei Leng, Yuhui Yin, Jie Zhang

    Abstract: The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across… ▽ More

    Submitted 23 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Homepage: https://360cvgroup.github.io/RelaCtrl/ Github: https://github.com/360CVGroup/RelaCtrl

  38. arXiv:2502.14213  [pdf, ps, other

    math.NA

    Asynchronous Stochastic Block Projection Algorithm for Solving Linear Systems under Predefined Communication Patterns

    Authors: Yanchen Yin, Yongli Wang

    Abstract: This paper proposes an event-triggered asynchronous distributed randomized block Kaczmarz projection (ER-AD-RBKP) algorithm for efficiently solving large-scale linear systems in resource-constrained and communication-unstable environments. The algorithm enables each agent to update its local state estimate independently and engage in communication only when specific triggering conditions are satis… ▽ More

    Submitted 15 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 20 pages, 2 figures

  39. arXiv:2502.12150  [pdf, ps, other

    cs.CL

    Idiosyncrasies in Large Language Models

    Authors: Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu

    Abstract: In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns in their outputs that can be used to distinguish the models. To do so, we consider a simple classification task: given a particular text output, the objective is to predict the source LLM that generates the text. We evaluate this synthetic task across various groups of LLMs and find that simply fine… ▽ More

    Submitted 16 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Published in ICML 2025. Website at https://eric-mingjie.github.io/llm-idiosyncrasies/index.html

  40. arXiv:2502.12022  [pdf, other

    cs.CL cs.AI

    Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

    Authors: Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, Yufei Wang, Yichun Yin, Yasheng Wang, Lifeng Shang, Qun Liu

    Abstract: Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy ba… ▽ More

    Submitted 25 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 8 pages

  41. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  42. arXiv:2502.10795  [pdf, ps, other

    cs.DS

    Local Gibbs sampling beyond local uniformity

    Authors: Hongyang Liu, Chunyang Wang, Yitong Yin

    Abstract: Local samplers are algorithms that generate random samples based on local queries to high-dimensional distributions, ensuring the samples follow the correct induced distributions while maintaining time complexity that scales locally with the query size. These samplers have broad applications, including deterministic approximate counting [He, Wang, Yin, SODA '23; Feng et al., FOCS '23], sampling fr… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

  43. arXiv:2502.10248  [pdf, other

    cs.CV cs.CL

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    Authors: Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang , et al. (90 additional authors not shown)

    Abstract: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded… ▽ More

    Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 14 figures

  44. arXiv:2502.06604  [pdf, other

    cs.CL

    Do we really have to filter out random noise in pre-training data for language models?

    Authors: Jinghan Ru, Yuxin Xie, Xianwei Zhuang, Yuguo Yin, Zhihui Guo, Zhiming Liu, Qianli Ren, Yuexian Zou

    Abstract: Web-scale pre-training datasets are the cornerstone of LLMs' success. However, text data curated from the Internet inevitably contains random noise caused by decoding errors or unregulated web content. In contrast to previous works that focus on low quality or synthetic data, our study \textbf{provides the first systematic investigation of such random noise through a cohesive ``What-Why-How'' fram… ▽ More

    Submitted 15 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  45. arXiv:2502.05210  [pdf

    q-fin.ST cs.LG

    Regression and Forecasting of U.S. Stock Returns Based on LSTM

    Authors: Shicheng Zhou, Zizhou Zhang, Rong Zhang, Yuchen Yin, Chia Hong Chang, Qinyan Shen

    Abstract: This paper analyses the investment returns of three stock sectors, Manuf, Hitec, and Other, in the U.S. stock market, based on the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model, in order to test the validity of the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model for the three sectors of the… ▽ More

    Submitted 28 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 5pages

  46. arXiv:2502.04689  [pdf, other

    cs.CL cs.AI cs.LG

    ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

    Authors: Yuwei Yin, Giuseppe Carenini

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities on complex evaluation benchmarks, many of which are formulated as question-answering (QA) tasks. Enhancing the performance of LLMs in QA contexts is becoming increasingly vital for advancing their development and applicability. This paper introduces ARR, an intuitive, effective, and general QA solving method that explicitly inc… ▽ More

    Submitted 15 May, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: 21 pages. Code: https://github.com/YuweiYin/ARR

    ACM Class: I.2.7

  47. arXiv:2502.00283  [pdf, other

    cs.HC

    How Generative AI supports human in conceptual design

    Authors: Liuging Chen, Yaxuan Song, Jia Guo, Lingyun Sun, Peter Childs, Yuan Yin

    Abstract: Generative Artificial Intelligence (Generative AI) is a collection of AI technologies that can generate new information such as texts and images. With its strong capabilities, Generative AI has been actively studied in creative design processes. However, limited studies have explored the roles of humans and Generative AI in conceptual design processes, leaving a gap for human-AI collaboration inve… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

    Comments: 20 pages, 2 figures, accepted by Design Science

  48. arXiv:2501.19304  [pdf

    cond-mat.mtrl-sci cond-mat.str-el physics.app-ph

    Solid-state Synapse Based on Magnetoelectrically Coupled Memristor

    Authors: Weichuan Huang, Yue-Wen Fang, Yuewei Yin, Bobo Tian, Wenbo Zhao, Chuangming Hou, Chao Ma, Qi Li, Evgeny Y. Tsymbal, Chun-Gang Duan, Xiaoguang Li

    Abstract: Brain-inspired computing architectures attempt to emulate the computations performed in the neurons and the synapses in human brain. Memristors with continuously tunable resistances are ideal building blocks for artificial synapses. Through investigating the memristor behaviors in a La0.7Sr0.3MnO3/BaTiO3/La0.7Sr0.3MnO3 multiferroic tunnel junction, it was found that the ferroelectric domain dynami… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: 5 figures, 20 pages

    Journal ref: ACS Applied Materials & Interfaces 2018, 10, 6, 5649-5656

  49. arXiv:2501.18858  [pdf, ps, other

    cs.LG cs.AI cs.CL

    BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

    Authors: Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping… ▽ More

    Submitted 6 June, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: ICML 2025

  50. arXiv:2501.15857  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

    Authors: Yutong Yin, Zhaoran Wang

    Abstract: Humans exhibit remarkable compositional reasoning by integrating knowledge from various sources. For example, if someone learns ( B = f(A) ) from one source and ( C = g(B) ) from another, they can deduce ( C=g(B)=g(f(A)) ) even without encountering ( ABC ) together, showcasing the generalization ability of human intelligence. In this paper, we introduce a synthetic learning task, "FTCT" (Fragmente… ▽ More

    Submitted 2 June, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: Accepted by ICLR 2025