Skip to main content

Showing 151–200 of 1,126 results for author: Chéng, Z

.
  1. arXiv:2412.11362  [pdf, other

    eess.IV cs.CV

    VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

    Authors: Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang

    Abstract: Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression indepen… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  2. arXiv:2412.10680  [pdf, other

    cs.CV cs.IR cs.MM

    UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

    Authors: Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua

    Abstract: Universal Cross-Domain Retrieval (UCDR) retrieves relevant images from unseen domains and classes without semantic labels, ensuring robust generalization. Existing methods commonly employ prompt tuning with pre-trained vision-language models but are inherently limited by static prompts, reducing adaptability. We propose UCDR-Adapter, which enhances pre-trained models with adapters and dynamic prom… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: Accepted to WACV 2025. Project link: https://github.com/fine68/UCDR2024

  3. arXiv:2412.08781  [pdf, other

    cs.CV cs.LG

    GMem: A Modular Approach for Ultra-Efficient Generative Models

    Authors: Yi Tang, Peng Sun, Zhenglin Cheng, Tao Lin

    Abstract: Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and infe… ▽ More

    Submitted 11 February, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: 9 pages, 5 figures, 3 tables

  4. arXiv:2412.08086  [pdf, other

    physics.optics

    Frequency-resolved Transient Absorption Spectroscopy for High Pressure System

    Authors: Zi-Qian Cheng, Xiao-Shuang Yin, Liu-Xiang Yang, Hui Dong

    Abstract: Dynamics of materials under high-pressure conditions has been an important focus of materials science, especially in the timescale of pico- and femto-second of electronic and vibrational motion, which is typically probed by ultrafast laser pulses. To probe such dynamics, it requires an integration of high-pressure devices with the ultrafast laser system. In this work, we construct a frequency-reso… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  5. arXiv:2412.08074  [pdf, other

    cs.CV cs.LG

    EM-Net: Gaze Estimation with Expectation Maximization Algorithm

    Authors: Zhang Cheng, Yanxia Wang, Guoyu Xia

    Abstract: In recent years, the accuracy of gaze estimation techniques has gradually improved, but existing methods often rely on large datasets or large models to improve performance, which leads to high demands on computational resources. In terms of this issue, this paper proposes a lightweight gaze estimation model EM-Net based on deep learning and traditional machine learning algorithms Expectation Maxi… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  6. arXiv:2412.05970  [pdf

    cond-mat.mtrl-sci

    Robust magnetoelectric coupling in altermagnetic-ferroelectric type-III multiferroics

    Authors: Wei Sun, Wenxuan Wang, Changhong Yang, Ying Liu, Xiaotian Wang, Shifeng Huang, Zhenxiang Cheng

    Abstract: Multiferroic materials, characterized by the coexisting of ferroelectric polarization (breaking spatial inversion symmetry) and magnetism (breaking time-reversal symmetry), with strong magnetoelectric coupling, are highly sought after for advanced technological applications. Novel altermagnets, distinct from conventional magnets, have recently been revealed to exhibit unique spin polarization prot… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: 16 pages, 4 figures

    Journal ref: Adv. Mater. 2025, 2502575

  7. arXiv:2412.05798  [pdf, other

    cond-mat.str-el

    A new pathway to impact ionization in a photo-excited one-dimensional ionic Hubbard model

    Authors: Zhenyu Cheng, Li Yang, Xiang Hu, Hantao Lu, Zhongbing Huang, Liang Du

    Abstract: Using the time-dependent Lanczos method, we study the non-equilibrium dynamics of the half-filled one-dimensional ionic Hubbard model, deep within the Mott insulating regime, under the influence of a transient laser pulse. In equilibrium, increasing the staggered potential in the Mott regime reduces the Mott gap and broadens the Hubbard bands, creating favorable conditions for impact ionization. A… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures

  8. arXiv:2412.04414  [pdf, other

    quant-ph

    Emergent unitary designs for encoded qubits from coherent errors and syndrome measurements

    Authors: Zihan Cheng, Eric Huang, Vedika Khemani, Michael J. Gullans, Matteo Ippoliti

    Abstract: Unitary $k$-designs are distributions of unitary gates that match the Haar distribution up to its $k$-th statistical moment. They are a crucial resource for randomized quantum protocols. However, their implementation on encoded logical qubits is nontrivial due to the need for magic gates, which can require a large resource overhead. In this work, we propose an efficient approach to generate unitar… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: 15+3 pages, 8+2 figures

  9. arXiv:2412.02335  [pdf, other

    cs.RO cs.LG eess.SY

    An Adaptive Grasping Force Tracking Strategy for Nonlinear and Time-Varying Object Behaviors

    Authors: Ziyang Cheng, Xiangyu Tian, Ruomin Sui, Tiemin Li, Yao Jiang

    Abstract: Accurate grasp force control is one of the key skills for ensuring successful and damage-free robotic grasping of objects. Although existing methods have conducted in-depth research on slip detection and grasping force planning, they often overlook the issue of adaptive tracking of the actual force to the target force when handling objects with different material properties. The optimal parameters… ▽ More

    Submitted 25 April, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

  10. arXiv:2412.01165  [pdf, other

    cs.IT

    Double-Directional V2V Channel Measurement using ReRoMA at 60 GHz

    Authors: Hussein Hammoud, Yuning Zhang, Zihang Cheng, Seun Sangodoyin, Markus Hofer, Faruk Pasic, Thomas M. Pohl, Radek Závorka, Ales Prokes, Thomas Zemen, Christoph F. Mecklenbräuker, Andreas F. Molisch

    Abstract: The coordination of vehicles is a crucial element of autonomous driving, as it enhances the efficiency, convenience, and safety of road traffic. In order to fully exploit the capabilities of such coordination, communication with high data rate and low latency is required. It can be reasonably argued that millimeter-wave (mm-wave) vehicle-to-vehicle (V2V) systems are capable of fulfilling the afore… ▽ More

    Submitted 3 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: 15 pages

  11. Lightweight Gaze Estimation Model Via Fusion Global Information

    Authors: Zhang Cheng, Yanxia Wang

    Abstract: Deep learning-based appearance gaze estimation methods are gaining popularity due to their high accuracy and fewer constraints from the environment. However, existing high-precision models often rely on deeper networks, leading to problems such as large parameters, long training time, and slow convergence. In terms of this issue, this paper proposes a novel lightweight gaze estimation model FGI-Ne… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  12. arXiv:2411.18061  [pdf, other

    cs.CV

    Multi-task Gaze Estimation Via Unidirectional Convolution

    Authors: Zhang Cheng, Yanxia Wang

    Abstract: Using lightweight models as backbone networks in gaze estimation tasks often results in significant performance degradation. The main reason is that the number of feature channels in lightweight networks is usually small, which makes the model expression ability limited. In order to improve the performance of lightweight models in gaze estimation tasks, a network model named Multitask-Gaze is prop… ▽ More

    Submitted 8 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  13. arXiv:2411.17697  [pdf, other

    cs.CV cs.AI

    StableAnimator: High-Quality Identity-Preserving Human Image Animation

    Authors: Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu

    Abstract: Current diffusion models for human image animation struggle to ensure identity (ID) consistency. This paper presents StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses. Building upon a video diffusion model, StableAnimator contains carefully designe… ▽ More

    Submitted 27 November, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

  14. arXiv:2411.17474  [pdf, other

    cs.CV

    Probing the Mid-level Vision Capabilities of Self-Supervised Learning

    Authors: Xuweiyi Chen, Markus Marks, Zezhou Cheng

    Abstract: Mid-level vision capabilities - such as generic object localization and 3D geometric understanding - are not only fundamental to human vision but are also crucial for many real-world applications of computer vision. These abilities emerge with minimal supervision during the early stages of human visual development. Despite their significance, current self-supervised learning (SSL) approaches are p… ▽ More

    Submitted 16 December, 2024; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Project Page: https://midvision-probe.cs.virginia.edu/

  15. arXiv:2411.17467  [pdf, ps, other

    cs.CV

    Learning 3D Representations from Procedural 3D Programs

    Authors: Xuweiyi Chen, Zezhou Cheng

    Abstract: Self-supervised learning has emerged as a promising approach for acquiring transferable 3D representations from unlabeled 3D point clouds. Unlike 2D images, which are widely accessible, acquiring 3D assets requires specialized expertise or professional 3D scanning equipment, making it difficult to scale and raising copyright concerns. To address these challenges, we propose learning 3D representat… ▽ More

    Submitted 4 June, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: SynData4CV @ CVPR2025 | Project Page: https://point-mae-zero.cs.virginia.edu/

  16. arXiv:2411.17091  [pdf, other

    cs.CR

    LESS: Efficient Log Storage System Based on Learned Model and Minimum Attribute Tree

    Authors: Zhiyang Cheng, Zizhen Zhu, Haoran Dang, Hai Wan, Xibin Zhao

    Abstract: In recent years, cyber attacks have become increasingly sophisticated and persistent. Detection and investigation based on the provenance graph can effectively mitigate cyber intrusion. However, in the long time span of defenses, the sheer size of the provenance graph will pose significant challenges to the storage systems. Faced with long-term storage tasks, existing methods are unable to simulta… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  17. arXiv:2411.16833  [pdf, other

    cs.CV

    Open Vocabulary Monocular 3D Object Detection

    Authors: Jin Yao, Hao Gu, Xuweiyi Chen, Jiayun Wang, Zezhou Cheng

    Abstract: In this work, we pioneer the study of open-vocabulary monocular 3D object detection, a novel task that aims to detect and localize objects in 3D space from a single RGB image without limiting detection to a predefined set of categories. We formalize this problem, establish baseline methods, and introduce a class-agnostic approach that leverages open-vocabulary 2D detectors and lifts 2D bounding bo… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: Project page: https://cvlab.cs.virginia.edu/ovmono3d

  18. arXiv:2411.15614  [pdf, ps, other

    math.GT

    Constructing topological biquandles via skew braces

    Authors: Zhiyun Cheng

    Abstract: In this short note, we construct some nontrivial examples of topological biquandle. The key ingredient of the construction is the notion of skew brace.

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: 9 pages, no figures

    MSC Class: 57K12; 16T25

  19. arXiv:2411.15333  [pdf

    cond-mat.str-el cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.supr-con

    Unconventional gapping behavior in a kagome superconductor

    Authors: Md Shafayat Hossain, Qi Zhang, Eun Sang Choi, Danilo Ratkovski, Bernhard Lüscher, Yongkai Li, Yu-Xiao Jiang, Maksim Litskevich, Zi-Jia Cheng, Jia-Xin Yin, Tyler A. Cochran, Brian Casas, Byunghoon Kim, Xian Yang, Jinjin Liu, Yugui Yao, Ali Bangura, Zhiwei Wang, Mark H. Fischer, Titus Neupert, Luis Balicas, M. Zahid Hasan

    Abstract: Determining the types of superconducting order in quantum materials is a challenge, especially when multiple degrees of freedom, such as bands or orbitals, contribute to the fermiology and when superconductivity competes, intertwines, or coexists with other symmetry-breaking orders. Here, we study the Kagome-lattice superconductor CsV3Sb5, in which multiband superconductivity coexists with a charg… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

    Comments: Nature Physics (2024); in press

    Journal ref: Nature Physics 21, 556 (2025)

  20. arXiv:2411.14355  [pdf, other

    nucl-ex

    Measurement of two-neutrino double electron capture half-life of $^{124}$Xe with PandaX-4T

    Authors: PandaX Collaboration, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Zhixing Gao, Lisheng Geng, Karl Giboni, Xunan Guo, Xuyuan Guo, Zichao Guo, Chencheng Han, Ke Han, Changda He, Jinrong He, Di Huang, Houqi Huang, Junting Huang, Ruquan Hou, Yu Hou, Xiangdong Ji , et al. (77 additional authors not shown)

    Abstract: Detailed studies of two-neutrino double electron capture (2$ν$DEC) is a crucial step towards searching for the neutrino-less mode to explore the Majorana nature of neutrinos. We have measured precisely the half-life of the 2$ν$DEC process in $^{124}$Xe, utilizing a total exposure of 1.73 tonne$\cdot$year from the commissioning run and the first science run of the PandaX-4T experiment. A time-depen… ▽ More

    Submitted 16 May, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: 19 pages, 5 figures, 4 tables; version3 accepted by JHEP

  21. arXiv:2411.13057  [pdf, other

    cs.IR cs.AI

    Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao

    Authors: Xu Chen, Zida Cheng, Yuangang Pan, Shuai Xiao, Xiaoming Liu, Jinsong Lan, Qingwen Liu, Ivor W. Tsang

    Abstract: Existing click-through rate (CTR) prediction works have studied the role of feature interaction through a variety of techniques. Each interaction technique exhibits its own strength, and solely using one type could constrain the model's capability to capture the complex feature relationships, especially for industrial large-scale data with enormous users and items. Recent research shows that effec… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: 10 pages

  22. Layered semiconducting electrides in p-block metal oxides

    Authors: Jiaqi Dai, Feng Yang, Cong Wang, Fei Pang, Zhihai Cheng, Wei Ji

    Abstract: In conventional electrides, excess electrons are localized in crystal voids to serve as anions. Most of these electrides are metallic and the metal cations are primarily from the s-block, d-block, or rare-earth elements. Here, we report a class of p-block metal-based electrides found in bilayer SnO and PbO, which are semiconducting and feature electride states in both the valence band (VB) and con… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  23. arXiv:2411.08147  [pdf, other

    cs.CL cs.AI

    Large Language Models Can Self-Improve in Long-context Reasoning

    Authors: Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam

    Abstract: Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to se… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Project Page: https://github.com/SihengLi99/SEALONG

  24. arXiv:2411.05322  [pdf, other

    cs.MM cs.CV

    Rate-aware Compression for NeRF-based Volumetric Video

    Authors: Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, Li Song

    Abstract: The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing solutions typically compress these NeRF representations after the training stage, leading to a separation between representation training and compression. In this… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  25. arXiv:2411.03887  [pdf, ps, other

    cs.AI cs.CR

    Reclaiming "Open AI" -- AI Model Serving Can Be Open Access, Yet Monetizable and Loyal

    Authors: Zerui Cheng, Edoardo Contente, Ben Finch, Oleg Golev, Jonathan Hayase, Andrew Miller, Niusha Moshrefi, Anshul Nasery, Sandeep Nailwal, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath

    Abstract: The rapid rise of AI has split model serving between open-weight distribution, which often lacks owner control and monetization, and opaque API-based approaches that risk user privacy and model transparency, forming a dichotomy that hinders an equitable AI ecosystem. This position paper introduces, rigorously formulates, and champions the Open-access, Monetizable, and Loyal (OML) paradigm for AI m… ▽ More

    Submitted 3 June, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 54 pages

  26. arXiv:2410.23872  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Pressure-dependent magnetotransport measurement in Kagome metal Yb$_{0.5}$Co_3Ge$_3$

    Authors: Zhiyuan Cheng, Yaojia Wang, Heng Wu, Mazhar N. Ali, Julia Y. Chan, Semonti Bhattacharyya

    Abstract: Kagome materials are known to be an ideal platform that hosts a plethora of interesting phases such as topological states, electronic correlation, and magnetism, owing to their unique band structure and geometry. We report magnetotransport measurement in Kagome metal Yb$_{0.5}$Co_3Ge$_3$ as a function of pressure. Below $\sim25^\circ$ K the temperature dependence of resistance shows an upturn that… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  27. arXiv:2410.23170  [pdf, other

    stat.ML cs.LG

    Functional Gradient Flows for Constrained Sampling

    Authors: Shiyue Zhang, Longlin Yu, Ziheng Cheng, Cheng Zhang

    Abstract: Recently, through a unified gradient flow perspective of Markov chain Monte Carlo (MCMC) and variational inference (VI), particle-based variational inference methods (ParVIs) have been proposed that tend to combine the best of both worlds. While typical ParVIs such as Stein Variational Gradient Descent (SVGD) approximate the gradient flow within a reproducing kernel Hilbert space (RKHS), many atte… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 camera-ready (30 pages, 26 figures)

  28. arXiv:2410.22823  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci cond-mat.str-el

    Coexistence of superconductivity and sliding polar metal state in HgPSe3

    Authors: Xiaohui Yu, Wei Zhong, Saori Kawaguchi, Hirokazu Kadobayashi, Xiaolin Wang, Zhenxiang Cheng, Changfeng Chen, Binbin Yue, Jian-Tao Wang, Ho-Kwang Mao, Fang Hong

    Abstract: The simultaneous presence of polarity and metallicity in a material signifies an exotic polar metal state, but such materials are extremely rare, especially in bulk form, due to mutually exclusive nature of the fundamental defining properties. Here, we report experimental findings that HgPSe3 is a robust bulk polar metal at room temperature with a chiral structure stabilized by pressure and, remar… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: 19 pages, 4 main figures + 6 extented figures

  29. arXiv:2410.22211  [pdf, other

    cs.CL

    ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding

    Authors: Kimihiro Hasegawa, Wiradee Imrattanatrai, Zhi-Qi Cheng, Masaki Asada, Susan Holm, Yuran Wang, Ken Fukuda, Teruko Mitamura

    Abstract: Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action segmentation. In this paper, we present a novel evaluation dataset, ProMQA, to measure system advancements i… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: 18 pages, 11 figures

  30. arXiv:2410.19636  [pdf

    cond-mat.str-el cond-mat.mes-hall cond-mat.mtrl-sci

    Pomeranchuk instability of a topological crystal

    Authors: Md Shafayat Hossain, Zahir Muhammad, Rajibul Islam, Zi-Jia Cheng, Yu-Xiao Jiang, Maksim Litskevich, Tyler A. Cochran, Xian P. Yang, Byunghoon Kim, Fei Xue, Ilias E. Perakis, Weisheng Zhao, Mehdi Kargarian, Luis Balicas, Titus Neupert, M. Zahid Hasan

    Abstract: Nematic quantum fluids appear in strongly interacting systems and break the rotational symmetry of the crystallographic lattice. In metals, this is connected to a well-known instability of the Fermi liquid-the Pomeranchuk instability. Using scanning tunneling microscopy, we identified this instability in a highly unusual setting: on the surface of an elemental topological metal, arsenic. By direct… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  31. arXiv:2410.19394  [pdf

    cs.LG cs.AI

    Analysis of Financial Risk Behavior Prediction Using Deep Learning and Big Data Algorithms

    Authors: Haowei Yang, Zhan Cheng, Zhaoyang Zhang, Yuanshuai Luo, Shuaishuai Huang, Ao Xiang

    Abstract: As the complexity and dynamism of financial markets continue to grow, traditional financial risk prediction methods increasingly struggle to handle large datasets and intricate behavior patterns. This paper explores the feasibility and effectiveness of using deep learning and big data algorithms for financial risk behavior prediction. First, the application and advantages of deep learning and big… ▽ More

    Submitted 22 December, 2024; v1 submitted 25 October, 2024; originally announced October 2024.

  32. arXiv:2410.17935  [pdf, other

    stat.ML cs.LG

    Semi-Implicit Functional Gradient Flow for Efficient Sampling

    Authors: Shiyue Zhang, Ziheng Cheng, Cheng Zhang

    Abstract: Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Although functional gradient flows have been introduced to expand the kernel space for better flexibility, the deterministic updating mechanism may… ▽ More

    Submitted 21 March, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: 46 pages, 13 figures

  33. arXiv:2410.17243  [pdf, other

    cs.CV

    Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

    Authors: Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing

    Abstract: Contrastive loss is a powerful approach for representation learning, where larger batch sizes enhance performance by providing more negative samples to better distinguish between similar and dissimilar data. However, scaling batch sizes is constrained by the quadratic growth in GPU memory consumption, primarily due to the full instantiation of the similarity matrix. To address this, we propose a t… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  34. arXiv:2410.17193  [pdf, other

    cs.CV cs.AI

    Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

    Authors: Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, Yang You

    Abstract: Dataset distillation has demonstrated strong performance on simple datasets like CIFAR, MNIST, and TinyImageNet but struggles to achieve similar results in more complex scenarios. In this paper, we propose EDF (emphasizes the discriminative features), a dataset distillation method that enhances key discriminative regions in synthetic images using Grad-CAM activation maps. Our approach is inspired… ▽ More

    Submitted 31 March, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: 24 pages, 13 figures

  35. arXiv:2410.15392  [pdf, other

    cs.CV

    EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

    Authors: Bohao Liao, Wei Zhai, Zengyu Wan, Zhixin Cheng, Wenfei Yang, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha

    Abstract: Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently lo… ▽ More

    Submitted 23 March, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: Project Page: https://lbh666.github.io/ef-3dgs/

  36. Observation of quantum superposition of topological defects in a trapped ion quantum simulator

    Authors: Zhijie Cheng, Yukai Wu, Shijiao Li, Quanxin Mei, Bowen Li, Gangxi Wang, Yue Jiang, Binxiang Qi, Zichao Zhou, Panyu Hou, Luming Duan

    Abstract: Topological defects are discontinuities of a system protected by global properties, with wide applications in mathematics and physics. While previous experimental studies mostly focused on their classical properties, it has been predicted that topological defects can exhibit quantum superposition. Despite the fundamental interest and potential applications in understanding symmetry-breaking dynami… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 8 pages, 6 figures, already published in Science Advances

    Journal ref: Sci. Adv.10,eadr9527(2024)

  37. arXiv:2410.14966  [pdf, other

    cs.CR

    Attack as Defense: Run-time Backdoor Implantation for Image Content Protection

    Authors: Haichuan Zhang, Meiyu Lin, Zhaoyi Liu, Renyuan Li, Zhiyuan Cheng, Carl Yang, Mingjie Tang

    Abstract: As generative models achieve great success, tampering and modifying the sensitive image contents (i.e., human faces, artist signatures, commercial logos, etc.) have induced a significant threat with social impact. The backdoor attack is a method that implants vulnerabilities in a target model, which can be activated through a trigger. In this work, we innovatively prevent the abuse of image conten… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 10 pages, 6 figures

  38. arXiv:2410.14894  [pdf, other

    cs.AI cs.CR cs.LG

    Soft-Label Integration for Robust Toxicity Classification

    Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin-Qiang Cai, Xinyu Xing

    Abstract: Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) m… ▽ More

    Submitted 7 November, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  39. Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

    Authors: Zihao Cheng, Li Zhou, Feng Jiang, Benyou Wang, Haizhou Li

    Abstract: The rapid development of large language models (LLMs), like ChatGPT, has resulted in the widespread presence of LLM-generated content on social media platforms, raising concerns about misinformation, data biases, and privacy violations, which can undermine trust in online discourse. While detecting LLM-generated content is crucial for mitigating these risks, current methods often focus on binary c… ▽ More

    Submitted 6 February, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Social Media, Large Language Models, LLM-generated Text Detection, AI-assisted News Detection; Accepted by WWW2025

    Journal ref: Proceedings of the ACM Web Conference 2025 (WWW '25), April 28-May 2, 2025, Sydney, NSW, Australia

  40. arXiv:2410.12787  [pdf, other

    cs.CV

    The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

    Authors: Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing

    Abstract: Recent advancements in large multimodal models (LMMs) have significantly enhanced performance across diverse tasks, with ongoing efforts to further integrate additional modalities such as video and audio. However, most existing LMMs remain vulnerable to hallucinations, the discrepancy between the factual multimodal input and the generated textual output, which has limited their applicability in va… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Project Page: cmm-damovl.site

  41. arXiv:2410.10366  [pdf, other

    cs.CV cs.AI

    Affinity-Graph-Guided Contractive Learning for Pretext-Free Medical Image Segmentation with Minimal Annotation

    Authors: Zehua Cheng, Di Yuan, Thomas Lukasiewicz

    Abstract: The combination of semi-supervised learning (SemiSL) and contrastive learning (CL) has been successful in medical image segmentation with limited annotations. However, these works often rely on pretext tasks that lack the specificity required for pixel-level segmentation, and still face overfitting issues due to insufficient supervision signals resulting from too few annotations. Therefore, this p… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: BIBM 2024

  42. arXiv:2410.09583  [pdf, other

    cs.CV

    POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search

    Authors: Chong-Yang Xiang, Jun-Yan He, Zhi-Qi Cheng, Xiao Wu, Xian-Sheng Hua

    Abstract: Achieving a balance between accuracy and efficiency is a critical challenge in facial landmark detection (FLD). This paper introduces Parallel Optimal Position Search (POPoS), a high-precision encoding-decoding framework designed to address the limitations of traditional FLD methods. POPoS employs three key contributions: (1) Pseudo-range multilateration is utilized to correct heatmap errors, impr… ▽ More

    Submitted 20 December, 2024; v1 submitted 12 October, 2024; originally announced October 2024.

    Comments: Accepted to AAAI 2025, 9 pages, 6 figures. Code: https://github.com/teslatasy/POPoS

  43. arXiv:2410.08565  [pdf, other

    cs.AI cs.CL cs.CV

    Baichuan-Omni Technical Report

    Authors: Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang, Bowen Ding, Wei Song, Zhenglin Cheng, Yuqi Huo, Song Chen, Xu Li, Da Pan, Shusen Zhang, Xin Wu, Zheng Liang, Jun Liu, Tao Zhang, Keer Lu, Yaqi Zhao, Yanjun Shen, Fan Yang, Kaicheng Yu, Tao Lin, Jianhua Xu , et al. (2 additional authors not shown)

    Abstract: The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-omni, the first open-source 7B Multimodal Large Language Model (MLLM) adept at concurrently processing and analyzing modalities of image, video, audio, and text, while delivering… ▽ More

    Submitted 27 December, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  44. HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections

    Authors: Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi

    Abstract: Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interferen… ▽ More

    Submitted 23 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

  45. arXiv:2410.07946  [pdf

    physics.app-ph

    Field-free spin-orbit switching of canted magnetization in Pt/Co/Ru/RuO2(101) multilayers

    Authors: Yunzhuo Wu, Tong Wu, Haoran Chen, Yongwei Cui, Hongyue Xu, Nan Jiang, Zhen Cheng, Yizheng Wu

    Abstract: Enabling field-free current-induced switching of perpendicular magnetization is essential for advancing spin-orbit-torque magnetic random access memory technology. Our research on the Pt/Co/Ru/RuO2(101) system has successfully demonstrated field-free switching through current injection along the RuO2[010] axis. We discovered that the system exhibits a tilted easy axis, inclined from the out-of-pla… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  46. arXiv:2410.04109  [pdf

    physics.app-ph physics.ao-ph

    Radiative cooling capacity on Earth

    Authors: Cunhai Wang, Hao Chen, Yanyan Feng, Ziming Cheng, Jingchong Liu, Fuqiang Wang

    Abstract: By passively dissipating thermal emission into the ultracold deep space, radiative cooling (RC) is an environment-friendly means for gaining cooling capacity, paving a bright future for global energy saving and carbon dioxide reduction. However, assessing the global RC capacity at the day-to-annual scale remains challenging as the RC capacity significantly depends on geographic and environmental c… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: Four figures

  47. arXiv:2410.04040  [pdf, other

    physics.optics cond-mat.mes-hall

    Flatbands from Bound States in the Continuum for Orbital Angular Momentum Localization

    Authors: Weiwei Zhu, Hongyu Zou, Yong Ge, Yin Wang, Zheyu Cheng, Bing-bing Wang, Shou-qi Yuan, Hong-xiang Sun, Haoran Xue, Baile Zhang

    Abstract: A flatband material is a system characterized by energy bands with zero dispersion, allowing for the compact localization of wavefunctions in real space. This compact localization significantly enhances inter-particle correlations and light-matter interactions, leading to notable advancements such as fractional Chern insulators in condensed matter systems and flat-band lasers in photonics. Previou… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: 15 pages, 4 figures

  48. arXiv:2410.02758  [pdf, other

    quant-ph cond-mat.stat-mech hep-th

    Pseudoentanglement from tensor networks

    Authors: Zihan Cheng, Xiaozhou Feng, Matteo Ippoliti

    Abstract: Pseudoentangled states are defined by their ability to hide their entanglement structure: they are indistinguishable from random states to any observer with polynomial resources, yet can have much less entanglement than random states. Existing constructions of pseudoentanglement based on phase- and/or subset-states are limited in the entanglement structures they can hide: e.g., the states may have… ▽ More

    Submitted 16 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: 5+6 pages, 3 figures. v2: fixed typos and minor issues

  49. arXiv:2410.01300  [pdf

    physics.chem-ph

    Atmospheric Pressure Ammonia Synthesis on AuRu Catalysts Enabled by Plasmon-Controlled Hydrogenation and Nitrogen-species Desorption

    Authors: Lin Yuan, Briley B. Bourgeois, Elijah Begin, Yirui Zhang, Alan X. Dai, Zhihua Cheng, Amy S. McKeown-Green, Zhichen Xue, Yi Cui, Kun Xu, Yu Wang, Matthew R. Jones, Yi Cui, Arun Majumdar, Junwei Lucas Bao, Jennifer A. Dionne

    Abstract: Ammonia is a key component of fertilizer and a potential clean fuel and hydrogen carrier. The Haber-Bosch process for ammonia synthesis consumes more than half of industrial hydrogen and contributes up to ~3% of global greenhouse gas emissions. Light-driven reactions via surface plasmon resonances offer a less energy-intensive pathway for ammonia production by altering reaction intermediates. Here… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 21 pages, 4 figures, journal article submission soon

  50. arXiv:2409.19043  [pdf, other

    quant-ph

    Parallel Quantum Signal Processing Via Polynomial Factorization

    Authors: John M. Martyn, Zane M. Rossi, Kevin Z. Cheng, Yuan Liu, Isaac L. Chuang

    Abstract: Quantum signal processing (QSP) is a methodology for constructing polynomial transformations of a linear operator encoded in a unitary. Applied to an encoding of a state $ρ$, QSP enables the evaluation of nonlinear functions of the form $\text{tr}(P(ρ))$ for a polynomial $P(x)$, which encompasses relevant properties like entropies and fidelity. However, QSP is a sequential algorithm: implementing… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Report number: MIT-CTP/5780