Skip to main content

Showing 1–50 of 201 results for author: Zhan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13846  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Fake it till You Make it: Reward Modeling as Discriminative Prediction

    Authors: Runtao Liu, Jiahao Zhan, Yingqing He, Chen Wei, Alan Yuille, Qifeng Chen

    Abstract: An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models. However, current approaches of reward modeling suffer from implementation complexity due to their reliance on extensive human-annotated preference data or meticulously engineered quality dimensions that are often incomplete and engineering-intensive. Inspired by adver… ▽ More

    Submitted 26 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2506.09409  [pdf, other

    cs.IR

    MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed

    Authors: Jiaqi Samantha Zhan, Crystina Zhang, Shengyao Zhuang, Xueguang Ma, Jimmy Lin

    Abstract: Effective video retrieval remains challenging due to the complexity of integrating visual, auditory, and textual modalities. In this paper, we explore unified retrieval methods using OmniEmbed, a powerful multimodal embedding model from the Tevatron 2.0 toolkit, in the context of the MAGMaR shared task. Evaluated on the comprehensive MultiVENT 2.0 dataset, OmniEmbed generates unified embeddings fo… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  3. arXiv:2506.09108  [pdf, ps, other

    cs.LG cs.AI cs.CL

    SensorLM: Learning the Language of Wearable Sensors

    Authors: Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

    Abstract: We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  4. arXiv:2506.06704  [pdf, ps, other

    cs.CL cs.IR

    Dynamic and Parametric Retrieval-Augmented Generation

    Authors: Weihang Su, Qingyao Ai, Jingtao Zhan, Qian Dong, Yiqun Liu

    Abstract: Retrieval-Augmented Generation (RAG) has become a foundational paradigm for equipping large language models (LLMs) with external knowledge, playing a critical role in information retrieval and knowledge-intensive applications. However, conventional RAG systems typically adopt a static retrieve-then-generate pipeline and rely on in-context knowledge injection, which can be suboptimal for complex ta… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  5. arXiv:2506.00908  [pdf, ps, other

    cs.CV

    DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation

    Authors: Xianbing Sun, Yan Hong, Jiahui Zhan, Jun Lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang

    Abstract: Despite recent progress, most existing virtual try-on methods still struggle to simultaneously address two core challenges: accurately aligning the garment image with the target human body, and preserving fine-grained garment textures and patterns. In this paper, we propose DS-VTON, a dual-scale virtual try-on framework that explicitly disentangles these objectives for more effective modeling. DS-… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  6. arXiv:2505.17745  [pdf, ps, other

    cs.LG cs.AI cs.NE

    MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization

    Authors: Zeyuan Ma, Yue-Jiao Gong, Hongshu Guo, Wenjie Qiu, Sijie Ma, Hongqiao Lian, Jiajun Zhan, Kaixu Chen, Chen Wang, Zhiyang Huang, Zechuan Huang, Guojun Peng, Ran Cheng, Yining Ma

    Abstract: Meta-Black-Box Optimization (MetaBBO) streamlines the automation of optimization algorithm design through meta-learning. It typically employs a bi-level structure: the meta-level policy undergoes meta-training to reduce the manual effort required in developing algorithms for low-level optimization tasks. The original MetaBox (2023) provided the first open-source framework for reinforcement learnin… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  7. arXiv:2505.13886  [pdf, ps, other

    cs.CL

    Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning

    Authors: Jingqi Tong, Jixin Tang, Hangcheng Li, Yurong Mou, Ming Zhang, Jun Zhao, Yanbo Wen, Fan Song, Jiahao Zhan, Yuyang Lu, Chaoran Tao, Zhiyuan Guo, Jizhou Yu, Tianhao Cheng, Changhao Jiang, Zhen Wang, Tao Liang, Zhihui Fei, Mingyang Wan, Guojun Ma, Weifeng Ge, Guanhua Chen, Tao Gui, Xipeng Qiu, Qi Zhang , et al. (1 additional authors not shown)

    Abstract: Visual-language Chain-of-Thought (CoT) data resources are relatively scarce compared to text-only counterparts, limiting the improvement of reasoning capabilities in Vision Language Models (VLMs). However, high-quality vision-language reasoning data is expensive and labor-intensive to annotate. To address this issue, we leverage a promising resource: game code, which naturally contains logical str… ▽ More

    Submitted 3 July, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 63 pages, 23 figures, submitted to NeurIPS 2025

    ACM Class: I.2.7; I.2.10

  8. arXiv:2505.09087  [pdf

    q-bio.BM cs.LG

    A Comparative Review of RNA Language Models

    Authors: He Wang, Yikun Zhang, Jie Chen, Jian Zhan, Yaoqi Zhou

    Abstract: Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or prot… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  9. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  10. arXiv:2505.02466  [pdf, other

    cs.IR

    Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality

    Authors: Xueguang Ma, Luyu Gao, Shengyao Zhuang, Jiaqi Samantha Zhan, Jamie Callan, Jimmy Lin

    Abstract: Recent advancements in large language models (LLMs) have driven interest in billion-scale retrieval models with strong generalization across retrieval tasks and languages. Additionally, progress in large vision-language models has created new opportunities for multimodal retrieval. In response, we have updated the Tevatron toolkit, introducing a unified pipeline that enables researchers to explore… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted in SIGIR 2025 (Demo)

  11. arXiv:2504.18361  [pdf, other

    cs.CV cs.AI

    COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization

    Authors: Haozhen Yan, Yan Hong, Jiahui Zhan, Yikun Ji, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang

    Abstract: Recent advancements in image manipulation have achieved unprecedented progress in generating photorealistic content, but also simultaneously eliminating barriers to arbitrary manipulation and editing, raising concerns about multimedia authenticity and cybersecurity. However, existing Image Manipulation Detection and Localization (IMDL) methodologies predominantly focus on splicing or copy-move for… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 10 pages, 3 figures

  12. arXiv:2504.15577  [pdf

    cs.NI cs.LG

    State-Aware IoT Scheduling Using Deep Q-Networks and Edge-Based Coordination

    Authors: Qingyuan He, Chang Liu, Juecen Zhan, Weiqiang Huang, Ran Hao

    Abstract: This paper addresses the challenge of energy efficiency management faced by intelligent IoT devices in complex application environments. A novel optimization method is proposed, combining Deep Q-Network (DQN) with an edge collaboration mechanism. The method builds a state-action-reward interaction model and introduces edge nodes as intermediaries for state aggregation and policy scheduling. This e… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  13. arXiv:2504.14245  [pdf, other

    cs.CV cs.CL

    Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

    Authors: Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang

    Abstract: Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capa… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    ACM Class: I.2.7; I.2.10

  14. arXiv:2504.09344  [pdf

    cs.LG

    Context-Aware Adaptive Sampling for Intelligent Data Acquisition Systems Using DQN

    Authors: Weiqiang Huang, Juecen Zhan, Yumeng Sun, Xu Han, Tai An, Nan Jiang

    Abstract: Multi-sensor systems are widely used in the Internet of Things, environmental monitoring, and intelligent manufacturing. However, traditional fixed-frequency sampling strategies often lead to severe data redundancy, high energy consumption, and limited adaptability, failing to meet the dynamic sensing needs of complex environments. To address these issues, this paper proposes a DQN-based multi-sen… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  15. arXiv:2504.07307  [pdf, ps, other

    cs.LG stat.ML

    Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

    Authors: Jingxin Zhan, Yuchen Xin, Chenjie Sun, Zhihua Zhang

    Abstract: We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-… ▽ More

    Submitted 7 July, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  16. arXiv:2504.05137  [pdf, other

    cs.CV

    BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation

    Authors: Jinxiang Lai, Wenlong Wu, Jiawei Zhan, Jian Li, Bin-Bin Gao, Jun Liu, Jie Zhang, Song Guo

    Abstract: Box-supervised instance segmentation methods aim to achieve instance segmentation with only box annotations. Recent methods have demonstrated the effectiveness of acquiring high-quality pseudo masks under the teacher-student framework. Building upon this foundation, we propose a BoxSeg framework involving two novel and general modules named the Quality-Aware Module (QAM) and the Peer-assisted Copy… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  17. arXiv:2503.09030  [pdf, other

    cs.LG cs.CV

    Adaptive Temperature Based on Logits Correlation in Knowledge Distillation

    Authors: Kazuhiro Matsuyama, Usman Anjum, Satoko Matsuyama, Tetsuo Shoda, Justin Zhan

    Abstract: Knowledge distillation is a technique to imitate a performance that a deep learning model has, but reduce the size on another model. It applies the outputs of a model to train another model having comparable accuracy. These two distinct models are similar to the way information is delivered in human society, with one acting as the "teacher" and the other as the "student". Softmax plays a role in c… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  18. arXiv:2503.08638  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    YuE: Scaling Open Foundation Models for Long-Form Music Generation

    Authors: Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang , et al. (32 additional authors not shown)

    Abstract: We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: https://github.com/multimodal-art-projection/YuE

  19. arXiv:2503.08516  [pdf, other

    cs.CV

    High-Quality 3D Head Reconstruction from Any Single Portrait Image

    Authors: Jianfu Zhang, Yujie Gao, Jiahui Zhan, Wentao Wang, Yiyi Zhang, Haohua Zhao, Liqing Zhang

    Abstract: In this work, we introduce a novel high-fidelity 3D head reconstruction method from a single portrait image, regardless of perspective, expression, or accessories. Despite significant efforts in adapting 2D generative models for novel view synthesis and 3D optimization, most methods struggle to produce high-quality 3D portraits. The lack of crucial information, such as identity, expression, hair,… ▽ More

    Submitted 18 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  20. arXiv:2502.20682  [pdf, other

    cs.CL cs.AI

    Fine-tuning BERT with Bidirectional LSTM for Fine-grained Movie Reviews Sentiment Analysis

    Authors: Gibson Nkhata, Susan Gauch, Usman Anjum, Justin Zhan

    Abstract: Sentiment Analysis (SA) is instrumental in understanding peoples viewpoints facilitating social media monitoring recognizing products and brands and gauging customer satisfaction. Consequently SA has evolved into an active research domain within Natural Language Processing (NLP). Many approaches outlined in the literature devise intricate frameworks aimed at achieving high accuracy, focusing exclu… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 14 pages, 5 figures, published in International Journal On Advances in Systems and Measurements, volume 16, numbers 3 and 4, 2023

  21. arXiv:2502.18858  [pdf, other

    cs.AI cs.CL cs.CV cs.IR cs.LG

    Evaluating Intelligence via Trial and Error

    Authors: Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma

    Abstract: Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals t… ▽ More

    Submitted 3 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  22. arXiv:2502.18841  [pdf, other

    cs.CL

    Sentiment Analysis of Movie Reviews Using BERT

    Authors: Gibson Nkhata, Usman Anjum, Justin Zhan

    Abstract: Sentiment Analysis (SA) or opinion mining is analysis of emotions and opinions from any kind of text. SA helps in tracking peoples viewpoints and it is an important factor when it comes to social media monitoring product and brand recognition customer satisfaction customer loyalty advertising and promotions success and product acceptance. That is why SA is one of the active research areas in Natur… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 7 pages, 3 figures, published in the proceedings The Fifteenth International Conference on Information, Process, and Knowledge Management (eKNOW 2023)

  23. arXiv:2501.11127  [pdf, other

    math.OC cs.LG stat.ML

    A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise

    Authors: Jingxin Zhan, Yuchen Xin, Kaicheng Jin, Zhihua Zhang

    Abstract: We study a stochastic convex bandit problem where the subgaussian noise parameter is assumed to decrease linearly as the learner selects actions closer and closer to the minimizer of the convex loss function. Accordingly, we propose a Regularized Online Newton Method (RONM) for solving the problem, based on the Online Newton Method (ONM) of arXiv:2406.06506. Our RONM reaches a polylogarithmic regr… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

  24. arXiv:2501.02842  [pdf, other

    cs.IR cs.LG

    Foundations of GenIR

    Authors: Qingyao Ai, Jingtao Zhan, Yiqun Liu

    Abstract: The chapter discusses the foundational impact of modern generative AI models on information access (IA) systems. In contrast to traditional AI, the large-scale training and superior data modeling of generative AI models enable them to produce high-quality, human-like responses, which brings brand new opportunities for the development of IA paradigms. In this chapter, we identify and introduce two… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Chapter 2 of the book on Information Access in the Era of Generative AI

  25. arXiv:2501.00473  [pdf, other

    cs.DL

    Quantifying the Dynamics of Harm Caused by Retracted Research

    Authors: Yunyou Huang, Jiahui Zhao, Dandan Cui, Zhengxin Yang, Bingjie Xia, Qi Liang, Wenjing Liu, Li Ma, Suqin Tang, Tianyong Hao, Zhifei Zhang, Wanling Gao, Jianfeng Zhan

    Abstract: Despite enormous efforts devoted to understand the characteristics and impacts of retracted papers, little is known about the mechanisms underlying the dynamics of their harm and the dynamics of its propagation. Here, we propose a citation-based framework to quantify the harm caused by retracted papers, aiming to uncover why their harm persists and spreads so widely. We uncover an ''attention esca… ▽ More

    Submitted 18 February, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  26. arXiv:2501.00254  [pdf, other

    cs.AI cs.CL

    Automatically Planning Optimal Parallel Strategy for Large Language Models

    Authors: Zongbiao Li, Xiezhao Li, Yinghao Cui, Yijun Chen, Zhixuan Gu, Yuxuan Liu, Wenbo Zhu, Fei Jia, Ke Liu, Qifeng Li, Junyao Zhan, Jiangtao Zhou, Chenxi Zhang, Qike Liu

    Abstract: The number of parameters in large-scale language models based on transformers is gradually increasing, and the scale of computing clusters is also growing. The technology of quickly mobilizing large amounts of computing resources for parallel computing is becoming increasingly important. In this paper, we propose an automatic parallel algorithm that automatically plans the parallel strategy with m… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

  27. arXiv:2412.15674  [pdf, other

    cs.CV

    PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium

    Authors: Xinzhe Li, Jiahui Zhan, Shengfeng He, Yangyang Xu, Junyu Dong, Huaidong Zhang, Yong Du

    Abstract: Personalized image generation has made significant strides in adapting content to novel concepts. However, a persistent challenge remains: balancing the accurate reconstruction of unseen concepts with the need for editability according to the prompt, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image conditio… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: This paper is accepted by AAAI 2025. The code is available at https://github.com/xzhe-Vision/PersonaMagic

  28. arXiv:2412.06335  [pdf, other

    cs.DB

    StructRide: A Framework to Exploit the Structure Information of Shareability Graph in Ridesharing

    Authors: Jiexi Zhan, Yu Chen, Peng Cheng, Lei Chen, Wangze Ni, Xuemin Lin

    Abstract: Ridesharing services play an essential role in modern transportation, which significantly reduces traffic congestion and exhaust pollution. In the ridesharing problem, improving the sharing rate between riders can not only save the travel cost of drivers but also utilize vehicle resources more efficiently. The existing online-based and batch-based methods for the ridesharing problem lack the analy… ▽ More

    Submitted 11 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: ICDE 2025

  29. arXiv:2412.05882  [pdf, other

    cs.LG cs.AI

    Towards Modeling Data Quality and Machine Learning Model Performance

    Authors: Usman Anjum, Chris Trentman, Elrod Caden, Justin Zhan

    Abstract: Understanding the effect of uncertainty and noise in data on machine learning models (MLM) is crucial in developing trust and measuring performance. In this paper, a new model is proposed to quantify uncertainties and noise in data on MLMs. Using the concept of signal-to-noise ratio (SNR), a new metric called deterministic-non-deterministic ratio (DDR) is proposed to formulate performance of a mod… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  30. arXiv:2411.08494  [pdf, other

    cs.PF

    Achieving Consistent and Comparable CPU Evaluation Outcomes

    Authors: Chenxi Wang, Lei Wang, Wanling Gao, Yikang Yang, Yutong Zhou, Jianfeng Zhan

    Abstract: The SPEC CPU2017 benchmark suite is an industry standard for accessing CPU performance. It adheres strictly to some workload and system configurations - arbitrary specificity - while leaving other system configurations undefined - arbitrary ambiguity. This article reveals: (1) Arbitrary specificity proves not meaningful, obscuring many scenarios, as evidenced by significant performance variations,… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  31. arXiv:2410.20130  [pdf

    cs.HC cs.CY

    The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships

    Authors: Renwen Zhang, Han Li, Han Meng, Jinyuan Zhan, Hongyuan Gan, Yi-Chieh Lee

    Abstract: As conversational AI systems increasingly permeate the socio-emotional realms of human life, they bring both benefits and risks to individuals and society. Despite extensive research on detecting and categorizing harms in AI systems, less is known about the harms that arise from social interactions with AI chatbots. Through a mixed-methods analysis of 35,390 conversation excerpts shared on r/repli… ▽ More

    Submitted 26 January, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

  32. arXiv:2410.15774  [pdf, other

    cs.RO cs.CV

    Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

    Authors: Qiao Sun, Huimin Wang, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

    Abstract: Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that man… ▽ More

    Submitted 29 October, 2024; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: 7 pages, 3 figures

  33. arXiv:2410.13638  [pdf, other

    cs.LG cs.AI cs.HC

    Scaling Wearable Foundation Models

    Authors: Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

    Abstract: Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  34. arXiv:2409.15670  [pdf, other

    cs.CR cs.NE

    Data Poisoning-based Backdoor Attack Framework against Supervised Learning Rules of Spiking Neural Networks

    Authors: Lingxin Jin, Meiyu Lin, Wei Jiang, Jinyu Zhan

    Abstract: Spiking Neural Networks (SNNs), the third generation neural networks, are known for their low energy consumption and high robustness. SNNs are developing rapidly and can compete with Artificial Neural Networks (ANNs) in many fields. To ensure that the widespread use of SNNs does not cause serious security incidents, much research has been conducted to explore the robustness of SNNs under adversari… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  35. arXiv:2408.12817  [pdf, other

    cs.LG physics.chem-ph

    Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage

    Authors: Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, Wen Yan

    Abstract: A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this… ▽ More

    Submitted 8 October, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: ByteFF, a machine learning parametrized MMFF. Code available at https://github.com/bytedance/byteff

  36. arXiv:2408.12158  [pdf, other

    cs.CE cs.CY

    Could Bibliometrics Reveal Top Science and Technology Achievements and Researchers? The Case for Evaluatology-based Science and Technology Evaluation

    Authors: Guoxin Kang, Wanling Gao, Lei Wang, Chunjie Luo, Hainan Ye, Qian He, Shaopeng Dai, Jianfeng Zhan

    Abstract: By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition, encompassing e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, and 2 tables

  37. arXiv:2408.08920  [pdf, other

    cs.CR cs.CV

    A Survey of Trojan Attacks and Defenses to Deep Neural Networks

    Authors: Lingxin Jin, Xianyu Wen, Wei Jiang, Jinyu Zhan

    Abstract: Deep Neural Networks (DNNs) have found extensive applications in safety-critical artificial intelligence systems, such as autonomous driving and facial recognition systems. However, recent research has revealed their susceptibility to Neural Network Trojans (NN Trojans) maliciously injected by adversaries. This vulnerability arises due to the intricate architecture and opacity of DNNs, resulting i… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  38. arXiv:2408.03281  [pdf, other

    cs.CL cs.AI cs.LG

    StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

    Authors: Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun

    Abstract: Evaluation is the baton for the development of large language models. Current evaluations typically employ a single-item assessment paradigm for each atomic test objective, which struggles to discern whether a model genuinely possesses the required capabilities or merely memorizes/guesses the answers to specific questions. To this end, we propose a novel evaluation framework referred to as StructE… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024;Benchmark at https://github.com/c-box/StructEval ;Leaderboard at https://huggingface.co/spaces/Bowieee/StructEval_leaderboard

  39. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  40. arXiv:2407.01905  [pdf, other

    cs.CV

    Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning

    Authors: Jiawei Zhan, Jinxiang Lai, Bin-Bin Gao, Jun Liu, Xiaochen Chen, Chengjie Wang

    Abstract: Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  41. arXiv:2407.00247  [pdf, other

    cs.CV

    Prompt Refinement with Image Pivot for Text-to-Image Generation

    Authors: Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin Mao, Shaoping Ma, Tao Mei

    Abstract: For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024

  42. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  43. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  44. arXiv:2406.06474  [pdf, other

    cs.AI cs.CL

    Towards a Personal Health Large Language Model

    Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

    Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 72 pages

  45. arXiv:2406.06464  [pdf, other

    cs.AI cs.CL

    Transforming Wearable Data into Health Insights using Large Language Model Agents

    Authors: Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

    Abstract: Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 38 pages

  46. arXiv:2405.12491  [pdf, other

    cs.SE

    Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

    Authors: Xu Wen, Wanling Gao, Lei Wang, Jianfeng Zhan

    Abstract: The rapid development of domain-specific frameworks has presented us with a significant challenge: The current approach of implementing solutions on a case-by-case basis incurs a theoretical complexity of O(M*N), thereby increasing the cost of porting applications to different hardware platforms. To address these challenges, we propose a systematic methodology that effectively bridges the gap betw… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15pages, 8 figures

  47. arXiv:2405.11427  [pdf, other

    quant-ph cs.LG eess.SP eess.SY math.OC

    Quantum Neural Networks for Solving Power System Transient Simulation Problem

    Authors: Mohammadreza Soltaninia, Junpeng Zhan

    Abstract: Quantum computing, leveraging principles of quantum mechanics, represents a transformative approach in computational methodologies, offering significant enhancements over traditional classical systems. This study tackles the complex and computationally demanding task of simulating power system transients through solving differential algebraic equations (DAEs). We introduce two novel Quantum Neural… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 10 pages, 11 figures

  48. arXiv:2404.00021  [pdf, other

    cs.HC cs.CE cs.CY cs.PF

    Evaluatology: The Science and Engineering of Evaluation

    Authors: Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang

    Abstract: Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 29 pages, 16 figures, and 2 tables

  49. arXiv:2403.19716  [pdf, other

    cs.CL cs.AI cs.CV cs.IR

    Capability-aware Prompt Reformulation Learning for Text-to-Image Generation

    Authors: Jingtao Zhan, Qingyao Ai, Yiqun Liu, Jia Chen, Shaoping Ma

    Abstract: Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease in transforming textual prompts into visual art. However, the efficacy of these systems is intricately linked to the quality of user-provided prompts, which often poses a challenge to users unfamiliar with prompt crafting. This paper addresses this challenge by levera… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at SIGIR 2024

  50. arXiv:2403.18684  [pdf, other

    cs.IR cs.CL

    Scaling Laws For Dense Retrieval

    Authors: Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, Yiqun Liu

    Abstract: Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-in… ▽ More

    Submitted 15 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at SIGIR 2024. V2 fixes a bug in the experiments