Skip to main content

Showing 1–50 of 1,457 results for author: Guo, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10281  [pdf, other

    cs.CV

    MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting

    Authors: Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo

    Abstract: Deep learning approaches for marine fog detection and forecasting have outperformed traditional methods, demonstrating significant scientific and practical importance. However, the limited availability of open-source datasets remains a major challenge. Existing datasets, often focused on a single region or satellite, restrict the ability to evaluate model performance across diverse conditions and… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10075  [pdf, ps, other

    cs.RO cs.CV

    FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

    Authors: Jun Guo, Xiaojian Ma, Yikai Wang, Min Yang, Huaping Liu, Qing Li

    Abstract: This paper investigates training better visual world models for robot manipulation, i.e., models that can predict future visual observations by conditioning on past frames and robot actions. Specifically, we consider world models that operate on RGB-D frames (RGB-D world models). As opposed to canonical approaches that handle dynamics prediction mostly implicitly and reconcile it with visual rende… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Project page: see https://sharinka0715.github.io/FlowDreamer/

  3. arXiv:2505.08774  [pdf, ps, other

    q-bio.BM cs.LG

    Generative Molecular Design with Steerable and Granular Synthesizability Control

    Authors: Jeff Guo, Víctor Sabanza-Gil, Zlatko Jončev, Jeremy S. Luterbacher, Philippe Schwaller

    Abstract: Synthesizability in small molecule generative design remains a bottleneck. Existing works that do consider synthesizability can output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and enabling flexibility to incorporate desired reaction constraints. In this work, we propose a small molecule generative design frame… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.07347  [pdf, other

    cs.CV

    AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography

    Authors: Jiewen Yang, Taoran Huang, Shangwei Ding, Xiaowei Xu, Qinhua Zhao, Yong Jiang, Jiarong Guo, Bin Pu, Jiexuan Zheng, Caojin Zhang, Hongwen Fei, Xiaomeng Li

    Abstract: Echocardiographers can detect pulmonary hypertension using Doppler echocardiography; however, accurately assessing its progression often proves challenging. Right heart catheterization (RHC), the gold standard for precise evaluation, is invasive and unsuitable for routine use, limiting its practicality for timely diagnosis and monitoring of pulmonary hypertension progression. Here, we propose MePH… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.07027  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.NE physics.chem-ph

    LLM-Augmented Chemical Synthesis and Design Decision Programs

    Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang

    Abstract: Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible path… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  6. arXiv:2505.06493  [pdf, other

    cs.CR cs.AI

    System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection

    Authors: Jiawei Guo, Haipeng Cai

    Abstract: Large language models (LLMs) have gained widespread adoption across diverse applications due to their impressive generative capabilities. Their plug-and-play nature enables both developers and end users to interact with these models through simple prompts. However, as LLMs become more integrated into various systems in diverse domains, concerns around their security are growing. Existing studies m… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  7. arXiv:2505.06464  [pdf, ps, other

    cs.AI

    Opening the Scope of Openness in AI

    Authors: Tamara Paris, AJung Moon, Jin Guo

    Abstract: The concept of openness in AI has so far been heavily inspired by the definition and community practice of open source software. This positions openness in AI as having positive connotations; it introduces assumptions of certain advantages, such as collaborative innovation and transparency. However, the practices and benefits of open source software are not fully transferable to AI, which has its… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: To appear in ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2025

  8. arXiv:2505.05192  [pdf, other

    cs.LG

    Long-Term Individual Causal Effect Estimation via Identifiable Latent Representation Learning

    Authors: Ruichu Cai, Junjie Wan, Weilin Chen, Zeqin Yang, Zijian Li, Peng Zhen, Jiecheng Guo

    Abstract: Estimating long-term causal effects by combining long-term observational and short-term experimental data is a crucial but challenging problem in many real-world scenarios. In existing methods, several ideal assumptions, e.g. latent unconfoundedness assumption or additive equi-confounding bias assumption, are proposed to address the latent confounder problem raised by the observational data. Howev… ▽ More

    Submitted 8 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  9. arXiv:2505.04668  [pdf, other

    cs.GR

    SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction

    Authors: Xinran Yang, Donghao Ji, Yuanqi Li, Jie Guo, Yanwen Guo, Junyuan Xie

    Abstract: Neural rendering techniques have made substantial progress in generating photo-realistic 3D scenes. The latest 3D Gaussian Splatting technique has achieved high quality novel view synthesis as well as fast rendering speed. However, 3D Gaussians lack proficiency in defining accurate 3D geometric structures despite their explicit primitive representations. This is due to the fact that Gaussian's att… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025, 8 pages

  10. arXiv:2505.04588  [pdf, other

    cs.CL

    ZeroSearch: Incentivize the Search Capability of LLMs without Searching

    Authors: Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang

    Abstract: Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Do… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  11. arXiv:2505.03538  [pdf, other

    cs.CV

    RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

    Authors: Chuyu Zhao, Hao Huang, Jiashuo Guo, Ziyu Shen, Zhongwei Zhou, Jie Liu, Zekuan Yu

    Abstract: Semi-supervised learning has become a compelling approach for 3D tooth segmentation from CBCT scans, where labeled data is minimal. However, existing methods still face two persistent challenges: limited corrective supervision in structurally ambiguous or mislabeled regions during supervised training and performance degradation caused by unreliable pseudo-labels on unlabeled data. To address these… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  12. arXiv:2505.03132  [pdf, other

    cs.CV cs.AI cs.HC

    VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis

    Authors: Xinyuan Yan, Xiwei Xuan, Jorge Piazentin Ono, Jiajing Guo, Vikram Mohanty, Shekar Arvind Kumar, Liang Gou, Bei Wang, Liu Ren

    Abstract: Real-world machine learning models require rigorous evaluation before deployment, especially in safety-critical domains like autonomous driving and surveillance. The evaluation of machine learning models often focuses on data slices, which are subsets of the data that share a set of characteristics. Data slice finding automatically identifies conditions or data subgroups where models underperform,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  13. arXiv:2505.02628  [pdf, other

    eess.IV cs.CV

    DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Hualiang Wang, Jixiang Chen, Jiewen Yang, Jiarong Guo, Xiaomeng Li

    Abstract: Cone-beam computed tomography (CBCT) is a critical 3D imaging technology in the medical field, while the high radiation exposure required for high-quality imaging raises significant concerns, particularly for vulnerable populations. Sparse-view reconstruction reduces radiation by using fewer X-ray projections while maintaining image quality, yet existing methods face challenges such as high comput… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  14. arXiv:2505.02567  [pdf, other

    cs.CV

    Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

    Authors: Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang

    Abstract: Recent years have seen remarkable progress in both multimodal understanding models and image generation models. Despite their respective successes, these two domains have evolved independently, leading to distinct architectural paradigms: While autoregressive-based architectures have dominated multimodal understanding, diffusion-based models have become the cornerstone of image generation. Recentl… ▽ More

    Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: This work is still in progress; Github project: https://github.com/AIDC-AI/Awesome-Unified-Multimodal-Models

  15. arXiv:2505.02214  [pdf, other

    cs.LG

    An Empirical Study of Qwen3 Quantization

    Authors: Xingyu Zheng, Yuye Li, Haoran Chu, Yue Feng, Xudong Ma, Jie Luo, Jinyang Guo, Haotong Qin, Michele Magno, Xianglong Liu

    Abstract: The Qwen series has emerged as a leading family of open-source Large Language Models (LLMs), demonstrating remarkable capabilities in natural language understanding tasks. With the recent release of Qwen3, which exhibits superior performance across diverse benchmarks, there is growing interest in deploying these models efficiently in resource-constrained environments. Low-bit quantization presents… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  16. arXiv:2505.02146  [pdf, other

    cs.CL cs.LG cs.PL

    QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

    Authors: Shouyang Dong, Yuanbo Wen, Jun Bi, Di Huang, Jiaming Guo, Jianxing Xu, Ruibai Xu, Xinkai Song, Yifan Hao, Xuehai Zhou, Tianshi Chen, Qi Guo, Yunji Chen

    Abstract: Heterogeneous deep learning systems (DLS) such as GPUs and ASICs have been widely deployed in industrial data centers, which requires to develop multiple low-level tensor programs for different platforms. An attractive solution to relieve the programming burden is to transcompile the legacy code of one platform to others. However, current transcompilation techniques struggle with either tremendous… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted to OSDI 2025

  17. arXiv:2505.01859  [pdf, other

    stat.ML cs.LG stat.CO

    Bayesian learning of the optimal action-value function in a Markov decision process

    Authors: Jiaqi Guo, Chon Wai Ho, Sumeetpal S. Singh

    Abstract: The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian framework is used to maintain beliefs about the optimal decisions and the unknown ingredients of the model, which are also to be learned from the data, such as the rewa… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 66 pages

  18. arXiv:2505.01043  [pdf, other

    cs.LG cs.AI

    Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

    Authors: Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved impressive performance across various domains. However, the substantial hardware resources required for their training present a significant barrier to efficiency and scalability. To mitigate this challenge, low-precision training techniques have been widely adopted, leading to notable advancements in training efficiency. Despite these gains, low-precisio… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  19. arXiv:2505.00979  [pdf, other

    cs.CL cs.AI

    Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models

    Authors: Xuhui Jiang, Shengjie Ma, Chengjin Xu, Cehao Yang, Liyu Zhang, Jian Guo

    Abstract: Large Language Models (LLMs) have achieved remarkable success but remain data-inefficient, especially when learning from small, specialized corpora with limited and proprietary data. Existing synthetic data generation methods for continue pre-training focus on intra-document content and overlook cross-document knowledge associations, limiting content diversity and depth. We propose Synthetic-on-Gr… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  20. arXiv:2504.20830  [pdf, other

    cs.CV

    CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

    Authors: Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jingyang Guo, Dongzhan Zhou, Wanli Ouyang, Shixiang Tang

    Abstract: While accurate and user-friendly Computer-Aided Design (CAD) is crucial for industrial design and manufacturing, existing methods still struggle to achieve this due to their over-simplified representations or architectures incapable of supporting multimodal design requirements. In this paper, we attempt to tackle this problem from both methods and datasets aspects. First, we propose a cascade MAR… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  21. arXiv:2504.20471  [pdf, other

    cs.LG cs.AI stat.ME

    The Estimation of Continual Causal Effect for Dataset Shifting Streams

    Authors: Baining Chen, Yiming Zhang, Yuqiao Han, Ruyue Zhang, Ruihuan Du, Zhishuo Zhou, Zhengdan Zhu, Xun Liu, Jiecheng Guo

    Abstract: Causal effect estimation has been widely used in marketing optimization. The framework of an uplift model followed by a constrained optimization algorithm is popular in practice. To enhance performance in the online environment, the framework needs to be improved to address the complexities caused by temporal dataset shift. This paper focuses on capturing the dataset shift from user behavior and d… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  22. arXiv:2504.20409  [pdf, other

    cs.CV

    GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation

    Authors: Jingfeng Guo, Jinnan Chen, Weikai Chen, Zhenyu Sun, Lanjiong Li, Baozhu Zhao, Lingting Zhu, Xin Wang, Qi Liu

    Abstract: This work presents GarmentX, a novel framework for generating diverse, high-fidelity, and wearable 3D garments from a single input image. Traditional garment reconstruction methods directly predict 2D pattern edges and their connectivity, an overly unconstrained approach that often leads to severe self-intersections and physically implausible garment structures. In contrast, GarmentX introduces a… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  23. arXiv:2504.20303  [pdf, other

    cs.CV

    DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes

    Authors: Junlin Guo, James R. Zimmer-Dauphinee, Jordan M. Nieusma, Siqi Lu, Quan Liu, Ruining Deng, Can Cui, Jialin Yue, Yizhe Lin, Tianyuan Yao, Juming Xiong, Junchao Zhu, Chongyu Qu, Yuechen Yang, Mitchell Wilkes, Xiao Wang, Parker VanValkenburgh, Steven A. Wernke, Yuankai Huo

    Abstract: By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional sup… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  24. arXiv:2504.20097  [pdf, other

    cs.CV quant-ph

    Long-Distance Field Demonstration of Imaging-Free Drone Identification in Intracity Environments

    Authors: Junran Guo, Tonglin Mu, Keyuan Li, Jianing Li, Ziyang Luo, Ye Chen, Xiaodong Fan, Jinquan Huang, Minjie Liu, Jinbei Zhang, Ruoyang Qi, Naiting Gu, Shihai Sun

    Abstract: Detecting small objects, such as drones, over long distances presents a significant challenge with broad implications for security, surveillance, environmental monitoring, and autonomous systems. Traditional imaging-based methods rely on high-resolution image acquisition, but are often constrained by range, power consumption, and cost. In contrast, data-driven single-photon-single-pixel light dete… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: 15 pages, 9 figures

  25. arXiv:2504.19163  [pdf, other

    cs.GR

    Bernstein Bounds for Caustics

    Authors: Zhimin Fan, Chen Wang, Yiming Wang, Boxuan Li, Yuxuan Guo, Ling-Qi Yan, Yanwen Guo, Jie Guo

    Abstract: Systematically simulating specular light transport requires an exhaustive search for primitive tuples containing admissible paths. Given the extreme inefficiency of enumerating all combinations, we propose to significantly reduce the search domain by sampling such tuples. The challenge is to design proper sampling probabilities that keep the noise level controllable. Our key insight is that by bou… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH 2025)

  26. arXiv:2504.18768  [pdf, other

    cs.GR cs.CV

    TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians

    Authors: Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Huiwen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, Jie Guo

    Abstract: The emergence of neural and Gaussian-based radiance field methods has led to considerable advancements in novel view synthesis and 3D object reconstruction. Nonetheless, specular reflection and refraction continue to pose significant challenges due to the instability and incorrect overfitting of radiance fields to high-frequency light variations. Currently, even 3D Gaussian Splatting (3D-GS), as a… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: accepted by SIGGRAPH 2025; https://letianhuang.github.io/transparentgs/

  27. arXiv:2504.18600  [pdf, other

    q-fin.CP cs.AI cs.CE

    QuantBench: Benchmarking AI Methods for Quantitative Investment

    Authors: Saizhuo Wang, Hao Kong, Jiadong Guo, Fengrui Hua, Yiyan Qi, Wanyun Zhou, Jiahao Zheng, Xinyu Wang, Lionel M. Ni, Jian Guo

    Abstract: The field of artificial intelligence (AI) in quantitative investment has seen significant advancements, yet it lacks a standardized benchmark aligned with industry practices. This gap hinders research progress and limits the practical application of academic innovations. We present QuantBench, an industrial-grade benchmark platform designed to address this critical need. QuantBench offers three ke… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  28. arXiv:2504.18413  [pdf, other

    cs.IR

    An Empirical Study of Evaluating Long-form Question Answering

    Authors: Ning Xian, Yixing Fan, Ruqing Zhang, Maarten de Rijke, Jiafeng Guo

    Abstract: \Ac{LFQA} aims to generate lengthy answers to complex questions. This scenario presents great flexibility as well as significant challenges for evaluation. Most evaluations rely on deterministic metrics that depend on string or n-gram matching, while the reliability of large language model-based evaluations for long-form answers remains relatively unexplored. We address this gap by conducting an i… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  29. arXiv:2504.17515  [pdf, other

    cs.CV

    Mamba-Sea: A Mamba-based Framework with Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation

    Authors: Zihan Cheng, Jintao Guo, Jian Zhang, Lei Qi, Luping Zhou, Yinghuan Shi, Yang Gao

    Abstract: To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation.… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE TMI 2025. The code is available at https://github.com/orange-czh/Mamba-Sea

  30. arXiv:2504.15494  [pdf, other

    cs.HC cs.SE

    "Ohhh, He's the Boss!": Unpacking Power Dynamics Among Developers, Designers, and End-Users in FLOSS Usability

    Authors: Jazlyn Hellman, Itai Epstein, Jinghui Cheng, Jin L. C. Guo

    Abstract: Addressing usability in free, libre, and open-source software (FLOSS) is a challenging issue, particularly due to a long-existing "by developer, for developer" mentality. Engaging designers and end-users to work with developers can help improve its usability, but unequal power dynamics among those stakeholder roles must be mitigated. To explore how the power of different FLOSS stakeholders manifes… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 30 pages, 4 figures, Accepted to ACM CSCW 2025

  31. arXiv:2504.15415  [pdf, other

    cs.CV cs.CL

    IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

    Authors: David Ma, Yuanxing Zhang, Jincheng Ren, Jarvis Guo, Yifan Yao, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Boyu Feng, Jun Ma, Xiao Gu, Zhoufutu Wen, King Zhu, Yancheng He, Meng Cao, Shiwen Ni, Jiaheng Liu, Wenhao Huang, Ge Zhang, Xiaojie Jin

    Abstract: Existing evaluation frameworks for Multimodal Large Language Models (MLLMs) primarily focus on image reasoning or general video understanding tasks, largely overlooking the significant role of image context in video comprehension. To bridge this gap, we propose IV-Bench, the first comprehensive benchmark for evaluating Image-Grounded Video Perception and Reasoning. IV-Bench consists of 967 videos… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  32. arXiv:2504.14848  [pdf, other

    cs.CV cs.AI

    Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation

    Authors: Yunpu Zhao, Rui Zhang, Junbin Xiao, Ruibo Hou, Jiaming Guo, Zihao Zhang, Yifan Hao, Yunji Chen

    Abstract: Vision-language models (VLMs) excel in various multimodal tasks but frequently suffer from poor calibration, resulting in misalignment between their verbalized confidence and response correctness. This miscalibration undermines user trust, especially when models confidently provide incorrect or fabricated information. In this work, we propose a novel Confidence Calibration through Semantic Perturb… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  33. arXiv:2504.14804  [pdf, ps, other

    cs.CL cs.AI

    Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends

    Authors: Jiaxin GUO, Xiaoyu Chen, Zhiqiang Rao, Jinlong Yang, Zongyao Li, Hengchao Shang, Daimeng Wei, Hao Yang

    Abstract: With the rapid development of deep learning technologies, the field of machine translation has witnessed significant progress, especially with the advent of large language models (LLMs) that have greatly propelled the advancement of document-level translation. However, accurately evaluating the quality of document-level translation remains an urgent issue. This paper first introduces the developme… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  34. arXiv:2504.14600  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results

    Authors: Zheng Chen, Jingkai Wang, Kai Liu, Jue Gong, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Jianxing Zhang, Jinlong Wu, Jun Wang, Zheng Xie, Hakjae Jeon, Suejin Han, Hyung-Ju Chun, Hyunhee Park, Zhicun Yin, Junjie Chen, Ming Liu, Xiaoming Li, Chao Zhou, Wangmeng Zuo, Weixia Zhang, Dingquan Li, Kede Ma , et al. (29 additional authors not shown)

    Abstract: This paper provides a review of the NTIRE 2025 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural, realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_RealWorld_Face_Restoration

  35. arXiv:2504.11895  [pdf, other

    cs.CV

    Search is All You Need for Few-shot Anomaly Detection

    Authors: Qishan Wang, Jia Guo, Shuyong Gao, Haofen Wang, Li Xiong, Junjie Hu, Hanqi Guo, Wenqiang Zhang

    Abstract: Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection, where normal distribution modeling must be accomplished with only a few normal images. While existing approaches typically employ multi-modal foundation models combining language and vision modalities for prompt-guided anomaly detection, these methods often demand sophisticated prompt engineer… ▽ More

    Submitted 8 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

  36. arXiv:2504.11650  [pdf, ps, other

    eess.SY cs.AI cs.LG math.NA

    Data driven approach towards more efficient Newton-Raphson power flow calculation for distribution grids

    Authors: Shengyuan Yan, Farzad Vazinram, Zeynab Kaseb, Lindsay Spoor, Jochen Stiasny, Betul Mamudi, Amirhossein Heydarian Ardakani, Ugochukwu Orji, Pedro P. Vergara, Yu Xiang, Jerry Guo

    Abstract: Power flow (PF) calculations are fundamental to power system analysis to ensure stable and reliable grid operation. The Newton-Raphson (NR) method is commonly used for PF analysis due to its rapid convergence when initialized properly. However, as power grids operate closer to their capacity limits, ill-conditioned cases and convergence issues pose significant challenges. This work, therefore, add… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 7 pages, 9 figures, 3 tables, 14 equations, 1 lemma, and 2 theorems. ICT for Industry 2025 Alliander usecase workshop paper. Oral presentation of this paper accepted and to be given on 16th April 2025 in ICT.OPEN 2025 conference of Netherlands in the Beatrix Theatre in Utrecht

    ACM Class: I.2.8

  37. arXiv:2504.10798  [pdf, other

    cs.IT eess.SP

    AdapCsiNet: Environment-Adaptive CSI Feedback via Scene Graph-Aided Deep Learning

    Authors: Jiayi Liu, Jiajia Guo, Yiming Cui, Chao-Kai Wen, Shi Jin

    Abstract: Accurate channel state information (CSI) is critical for realizing the full potential of multiple-antenna wireless communication systems. While deep learning (DL)-based CSI feedback methods have shown promise in reducing feedback overhead, their generalization capability across varying propagation environments remains limited due to their data-driven nature. Existing solutions based on online trai… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 7 pages, 7figures, submitted to IEEE conference for possible publication

  38. arXiv:2504.10078  [pdf, other

    cs.CE

    Unleashing Expert Opinion from Social Media for Stock Prediction

    Authors: Wanyun Zhou, Saizhuo Wang, Xiang Li, Yiyan Qi, Jian Guo, Xiaowen Chu

    Abstract: While stock prediction task traditionally relies on volume-price and fundamental data to predict the return ratio or price movement trend, sentiment factors derived from social media platforms such as StockTwits offer a complementary and useful source of real-time market information. However, we find that most social media posts, along with the public sentiment they reflect, provide limited value… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  39. arXiv:2504.09527  [pdf, other

    cs.CR

    A Secure Communication Protocol for Remote Keyless Entry System with Adaptive Adjustment of Transmission Parameters

    Authors: Jingjing Guo, Bo Tang, Jiayuan Xu, Qingyi Li, Yuyuan Qin, Xinghua Li

    Abstract: Remote Keyless Entry (RKE) systems have become a standard feature in modern vehicles, yet their unidirectional fixed-frequency radio communication renders them vulnerable to replay attacks, impersonation attacks, cryptanalysis, and intentional interference. Existing cryptographic authentication methods enhance security but often fail to address real-world constraints such as computational efficien… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 15 pages

    MSC Class: 94A60 (Primary); 68M10; 68P25 (Secondary) ACM Class: C.2.2

  40. arXiv:2504.09466  [pdf, other

    cs.CR cs.CL

    AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender

    Authors: Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu

    Abstract: Despite extensive efforts in safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks. Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjus… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 17 pages, 6 figures, 9 tables

  41. arXiv:2504.08600  [pdf, other

    cs.DB

    SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

    Authors: Peixian Ma, Xialie Zhuang, Chengjin Xu, Xuhui Jiang, Ran Chen, Jian Guo

    Abstract: Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the inference performance in complex scenarios involving multi-table joins and nested queries.… ▽ More

    Submitted 12 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  42. arXiv:2504.08388  [pdf, other

    cs.CV cs.AI

    MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft

    Authors: Junliang Guo, Yang Ye, Tianyu He, Haoyu Wu, Yushu Jiang, Tim Pearce, Jiang Bian

    Abstract: World modeling is a crucial task for enabling intelligent agents to effectively interact with humans and operate in dynamic environments. In this work, we propose MineWorld, a real-time interactive world model on Minecraft, an open-ended sandbox game which has been utilized as a common testbed for world modeling. MineWorld is driven by a visual-action autoregressive Transformer, which takes paired… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Technical report. Project page https://aka.ms/mineworld

  43. arXiv:2504.07618  [pdf

    cs.LG

    CTSR: Cartesian tensor-based sparse regression for data-driven discovery of high-dimensional invariant governing equations

    Authors: Boqian Zhang, Juanmian Lei, Guoyou Sun, Shuaibing Ding, Jian Guo

    Abstract: Accurate and concise governing equations are crucial for understanding system dynamics. Recently, data-driven methods such as sparse regression have been employed to automatically uncover governing equations from data, representing a significant shift from traditional first-principles modeling. However, most existing methods focus on scalar equations, limiting their applicability to simple, low-di… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  44. arXiv:2504.07158   

    cs.LG cs.CL

    Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models

    Authors: Ling Team, Caizhi Tang, Chilin Fu, Chunwei Wu, Jia Guo, Jianwen Wang, Jingyu Hu, Liang Jiang, Meng Li, Peng Jiao, Pingping Liu, Shaomian Zheng, Shiwei Liang, Shuaicheng Li, Yalin Zhang, Yingting Wu, Yongkang Liu, Zhenyu Huang

    Abstract: This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaini… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Based on the further discussion of the working group, the current version is deemed unsuitable for release. We are currently undertaking further work that is expected to involve significant revisions, but this process will require some additional time. We plan to proceed with the release once these updates have been fully implemented

  45. arXiv:2504.06895  [pdf, other

    cs.CV

    ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities

    Authors: Dingkun Yan, Xinrui Wang, Yusuke Iwasawa, Yutaka Matsuo, Suguru Saito, Jiaxian Guo

    Abstract: Reference-based sketch colorization methods have garnered significant attention due to their potential applications in the animation production industry. However, most existing methods are trained with image triplets of sketch, reference, and ground truth that are semantically and spatially well-aligned, while real-world references and sketches often exhibit substantial misalignment. This mismatch… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  46. arXiv:2504.06878  [pdf, other

    cond-mat.mtrl-sci cs.LG

    CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines

    Authors: Chen Liang, Diptesh Das, Jiang Guo, Ryo Tamura, Zetian Mao, Koji Tsuda

    Abstract: Solving black-box optimization problems with Ising machines is increasingly common in materials science. However, their application to crystal structure prediction (CSP) is still ineffective due to symmetry agnostic encoding of atomic coordinates. We introduce CRYSIM, an algorithm that encodes the space group, the Wyckoff positions combination, and coordinates of independent atomic sites as separa… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 18 pages, 4 figures, 1 table

  47. arXiv:2504.06551  [pdf, other

    cs.IR

    Bridging Queries and Tables through Entities in Table Retrieval

    Authors: Da Li, Keping Bi, Jiafeng Guo, Xueqi Cheng

    Abstract: Table retrieval is essential for accessing information stored in structured tabular formats; however, it remains less explored than text retrieval. The content of the table primarily consists of phrases and words, which include a large number of entities, such as time, locations, persons, and organizations. Entities are well-studied in the context of text retrieval, but there is a noticeable lack… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  48. arXiv:2504.06511  [pdf, other

    cs.LG

    GTS-LUM: Reshaping User Behavior Modeling with LLMs in Telecommunications Industry

    Authors: Liu Shi, Tianwu Zhou, Wei Xu, Li Liu, Zhexin Cui, Shaoyi Liang, Haoxing Niu, Yichong Tian, Jianwei Guo

    Abstract: As telecommunication service providers shifting their focus to analyzing user behavior for package design and marketing interventions, a critical challenge lies in developing a unified, end-to-end framework capable of modeling long-term and periodic user behavior sequences with diverse time granularities, multi-modal data inputs, and heterogeneous labels. This paper introduces GTS-LUM, a novel use… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  49. arXiv:2504.05831  [pdf, other

    cs.CL

    Leveraging Robust Optimization for LLM Alignment under Distribution Shifts

    Authors: Mingye Zhu, Yi Liu, Junbo Guo, Quan Wang, Yongdong Zhang, Zhendong Mao

    Abstract: Large language models (LLMs) increasingly rely on preference alignment methods to steer outputs toward human values, yet these methods are often constrained by the scarcity of high-quality human-annotated data. To tackle this, recent approaches have turned to synthetic data generated by LLMs as a scalable alternative. However, synthetic data can introduce distribution shifts, compromising the nuan… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  50. arXiv:2504.05541  [pdf, other

    cs.CV

    Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

    Authors: Yunlong Tang, Jing Bi, Chao Huang, Susan Liang, Daiki Shimada, Hang Hua, Yunzhong Xiao, Yizhi Song, Pinxin Liu, Mingqian Feng, Junjia Guo, Zhuo Liu, Luchuan Song, Ali Vosoughi, Jinxi He, Liu He, Zeliang Zhang, Jiebo Luo, Chenliang Xu

    Abstract: We present CAT-V (Caption AnyThing in Video), a training-free framework for fine-grained object-centric video captioning that enables detailed descriptions of user-selected objects through time. CAT-V integrates three key components: a Segmenter based on SAMURAI for precise object segmentation across frames, a Temporal Analyzer powered by TRACE-Uni for accurate event boundary detection and tempora… ▽ More

    Submitted 8 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.