Skip to main content

Showing 101–150 of 922 results for author: Zheng, C

.
  1. arXiv:2412.05154  [pdf, other

    cs.CV cs.AI

    Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection

    Authors: Chaoda Zheng, Feng Wang, Naiyan Wang, Shuguang Cui, Zhen Li

    Abstract: While 3D object bounding box (bbox) representation has been widely used in autonomous driving perception, it lacks the ability to capture the precise details of an object's intrinsic geometry. Recently, occupancy has emerged as a promising alternative for 3D scene perception. However, constructing a high-resolution occupancy map remains infeasible for large scenes due to computational constraints.… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024

  2. arXiv:2412.01253  [pdf, other

    cs.CL cs.AI cs.LG

    Yi-Lightning Technical Report

    Authors: Alan Wake, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qicheng Hu, Shawn Wang, Shijun Zhou , et al. (19 additional authors not shown)

    Abstract: This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert seg… ▽ More

    Submitted 22 January, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  3. arXiv:2412.00542  [pdf, other

    cs.AI cs.CV

    Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective

    Authors: Jiangmeng Li, Zehua Zang, Qirui Ji, Chuxiong Sun, Wenwen Qiang, Junge Zhang, Changwen Zheng, Fuchun Sun, Hui Xiong

    Abstract: Representations learned by self-supervised approaches are generally considered to possess sufficient generalizability and discriminability. However, we disclose a nontrivial mutual-exclusion relationship between these critical representation properties through an exploratory demonstration on self-supervised learning. State-of-the-art self-supervised methods tend to enhance either generalizability… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

    Comments: Accepted by IJCV, 2024

  4. arXiv:2411.18613  [pdf, other

    cs.CV

    CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

    Authors: Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Holynski

    Abstract: We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconst… ▽ More

    Submitted 18 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Project page: https://cat-4d.github.io/

  5. arXiv:2411.17773  [pdf, other

    cs.CV

    Efficient Multi-modal Large Language Models via Visual Token Grouping

    Authors: Minbin Huang, Runhui Huang, Han Shi, Yimeng Chen, Chuanyang Zheng, Xiangguo Sun, Xin Jiang, Zhenguo Li, Hong Cheng

    Abstract: The development of Multi-modal Large Language Models (MLLMs) enhances Large Language Models (LLMs) with the ability to perceive data formats beyond text, significantly advancing a range of downstream applications, such as visual question answering and image captioning. However, the substantial computational costs associated with processing high-resolution images and videos pose a barrier to their… ▽ More

    Submitted 2 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

  6. arXiv:2411.16301  [pdf, other

    cs.CV cs.LG

    DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation

    Authors: Yuxuan Yang, Jingyao Wang, Tao Geng, Wenwen Qiang, Changwen Zheng, Fuchun Sun

    Abstract: Interior design is a complex and creative discipline involving aesthetics, functionality, ergonomics, and materials science. Effective solutions must meet diverse requirements, typically producing multiple deliverables such as renderings and design drawings from various perspectives. Consequently, interior design processes are often inefficient and demand significant creativity. With advances in m… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 32 pages

  7. arXiv:2411.15526  [pdf, other

    eess.IV cs.CV

    Multi-scale Cascaded Large-Model for Whole-body ROI Segmentation

    Authors: Rui Hao, Dayu Tan, Yansen Su, Chunhou Zheng

    Abstract: Organs-at-risk segmentation is critical for ensuring the safety and precision of radiotherapy and surgical procedures. However, existing methods for organs-at-risk image segmentation often suffer from uncertainties and biases in target selection, as well as insufficient model validation experiments, limiting their generality and reliability in practical applications. To address these issues, we pr… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  8. arXiv:2411.15183  [pdf, other

    physics.chem-ph cs.AI q-bio.BM

    Balancing property optimization and constraint satisfaction for constrained multi-property molecular optimization

    Authors: Xin Xia, Yajie Zhang, Xiangxiang Zeng, Xingyi Zhang, Chunhou Zheng, Yansen Su

    Abstract: Molecular optimization, which aims to discover improved molecules from a vast chemical search space, is a critical step in chemical development. Various artificial intelligence technologies have demonstrated high effectiveness and efficiency on molecular optimization tasks. However, few of these technologies focus on balancing property optimization with constraint satisfaction, making it difficult… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  9. arXiv:2411.12994  [pdf, other

    astro-ph.SR

    Revisiting the activity-rotation relation for evolved stars

    Authors: Henggeng Han, Song Wang, Xue Li, Chuanjie Zheng, Jifeng Liu

    Abstract: The magnetic dynamo mechanism of giant stars remains an open question, which can be explored by investigating their activity-rotation relations with multiple proxies. By using the data from the LAMOST and \emph{GALEX} surveys, we carried out a comprehensive study of activity-rotation relations of evolved stars based on \cahk lines, $\rm{Hα}$ lines and near ultraviolet (NUV) emissions. Our results… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: APJ accepted

  10. arXiv:2411.12026  [pdf, other

    astro-ph.CO

    Modified Gravity Constraints from the Full Shape Modeling of Clustering Measurements from DESI 2024

    Authors: M. Ishak, J. Pan, R. Calderon, K. Lodha, G. Valogiannis, A. Aviles, G. Niz, L. Yi, C. Zheng, C. Garcia-Quintero, A. de Mattia, L. Medina-Varela, J. L. Cervantes-Cota, U. Andrade, D. Huterer, H. E. Noriega, G. Zhao, A. Shafieloo, W. Fang, S. Ahlen, D. Bianchi, D. Brooks, E. Burtin, E. Chaussidon, T. Claybaugh , et al. (45 additional authors not shown)

    Abstract: We present cosmological constraints on deviations from general relativity (GR) from the first-year of clustering observations from the Dark Energy Spectroscopic Instrument (DESI) in combination with other datasets. We first consider the $μ(a,k)$-$Σ(a,k)$ modified gravity (MG) parametrization (as well as $η(a,k)$) in flat $Λ$CDM and $w_0 w_a$CDM backgrounds. Using a functional form for time-only ev… ▽ More

    Submitted 20 December, 2024; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: 55 pages, 13 figures. This DESI Collaboration Publication is part of the 2024 publication series using the first year of observations (see https://data.desi.lbl.gov/doc/papers/). Added 3 figures and more discussions

  11. arXiv:2411.11621  [pdf, other

    physics.plasm-ph physics.acc-ph

    Plasma acceleration of polarized particle beams

    Authors: Lars Reichwein, Zheng Gong, Chuan Zheng, Liangliang Ji, Alexander Pukhov, Markus Büscher

    Abstract: Spin-polarized particle beams are of interest for applications like deep-inelastic scattering, e.g. to gain further understanding of the proton's nuclear structure. With the advent of high-intensity laser facilities, laser-plasma-based accelerators offer a promising alternative to common radiofrequency-based accelerators, as they can shorten the required acceleration length significantly. However,… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: 42 pages, 14 figures, submitted to Reports on Progress in Physics

  12. arXiv:2411.08837  [pdf, other

    astro-ph.SR astro-ph.HE

    A massive white dwarf or low-mass neutron star discovered by LAMOST

    Authors: Xinlin Zhao, Song Wang, Pengfei Wang, Chuanjie Zheng, Haibo Yuan, Jifeng Liu

    Abstract: We report the discovery of a close binary J0606+2132 (Gaia DR3 3423365496448406272) with $P_{\rm obs}=2.77$ days containing a possible massive white dwarf or a neutron star using the LAMOST spectroscopic data. By a joint fitting of the radial velocity from LAMOST and the light curve from TESS, we derived a circular Keplerian orbit with an inclination of $i=$81.31$^{\circ}$… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 17 pages, 8 figures, accepted for publication in APJ

  13. arXiv:2411.06746  [pdf, other

    cs.LG

    Neuromodulated Meta-Learning

    Authors: Jingyao Wang, Huijie Guo, Wenwen Qiang, Jiangmeng Li, Changwen Zheng, Hui Xiong, Gang Hua

    Abstract: Humans excel at adapting perceptions and actions to diverse environments, enabling efficient interaction with the external world. This adaptive capability relies on the biological nervous system (BNS), which activates different brain regions for distinct tasks. Meta-learning similarly trains machines to handle multiple tasks but relies on a fixed network structure, not as flexible as BNS. To inves… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  14. arXiv:2411.06307  [pdf, other

    cs.SD eess.AS

    Acoustic Volume Rendering for Neural Impulse Response Fields

    Authors: Zitong Lan, Chenhao Zheng, Zhiwei Zheng, Mingmin Zhao

    Abstract: Realistic audio synthesis that captures accurate acoustic phenomena is essential for creating immersive experiences in virtual and augmented reality. Synthesizing the sound received at any position relies on the estimation of impulse response (IR), which characterizes how sound propagates in one scene along different paths before arriving at the listener's position. In this paper, we present Acous… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Spotlight

  15. arXiv:2411.04924  [pdf, other

    cs.CV

    MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

    Authors: Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai

    Abstract: We introduce MVSplat360, a feed-forward approach for 360° novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively comb… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024, Project page: https://donydchen.github.io/mvsplat360, Code: https://github.com/donydchen/mvsplat360

  16. arXiv:2411.01107  [pdf

    physics.optics

    High-space-bandwidth product characterization of metalenses with Fourier ptychographic microscopy

    Authors: Chuanjian Zheng, Wenli Wang, Yanfang Ji, Yao Hu, Shaohui Zhang, Qun Hao

    Abstract: Large numerical aperture (NA) and large aperture metalenses have shown significant performance and abundant applications in biomedical and astronomical imaging fields. However, the high space-bandwidth product (SBP) requirements for measuring the phase of these metalenses, characterized by small phase periods and large apertures, have resulted in no effective techniques for sufficient characteriza… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  17. arXiv:2410.21549  [pdf, other

    cs.IR cs.CL

    Semantic Search Evaluation

    Authors: Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh

    Abstract: We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system. We introduce a metric called "on-topic rate" to measure the percentage of results that are relevant to the query. To achieve this, we design a pipeline that defines a golden query set, retrieves the top K results for eac… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted by 3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024)

  18. arXiv:2410.20230  [pdf, other

    cs.RO

    FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

    Authors: Yulin Li, Zhicheng Song, Chunxin Zheng, Zhihai Bi, Kai Chen, Michael Yu Wang, Jun Ma

    Abstract: In this work, we present FRTree planner, a novel robot navigation framework that leverages a tree structure of free regions, specifically designed for navigation in cluttered and unknown environments with narrow passages. The framework continuously incorporates real-time perceptive information to identify distinct navigation options and dynamically expands the tree toward explorable and traversabl… ▽ More

    Submitted 13 February, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

  19. arXiv:2410.19577  [pdf, ps, other

    cond-mat.supr-con cond-mat.mes-hall

    Landau-Level Quantization and Band Splitting of FeSe Monolayers Revealed by Scanning Tunneling Spectroscopy

    Authors: Wantong Huang, Haicheng Lin, Yuguo Yin, Cheng Zheng, Wei Chen, Lichen Ji, Jack Hughes, Fedor Kusmartsev, Anna Kusmartseva, Qi-Kun Xue, Xi Chen, Shuai-Hua Ji

    Abstract: Two-dimensional (2D) superconductors that reside on substrates must be influenced by Rashba spin-orbit coupling (SOC). The intriguing effect of Rashba-type SOCs on iron-based superconductors (IBSs) has remained largely a mystery. In this work, we unveil modified Landau-level spectroscopy and the intricate band splitting of FeSe monolayers through the precision of scanning tunneling spectroscopy, w… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: 21 pages, 5 figures

  20. arXiv:2410.13032  [pdf, other

    cs.AI cs.LG stat.ML

    Hypothesis Testing the Circuit Hypothesis in LLMs

    Authors: Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei

    Abstract: Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. But how can we evaluate this hypothesis? In this paper, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothe… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Code available here: https://github.com/blei-lab/circuitry

  21. arXiv:2410.10527  [pdf, other

    cs.CV

    Motion-guided small MAV detection in complex and non-planar scenes

    Authors: Hanqing Guo, Canlun Zheng, Shiyu Zhao

    Abstract: In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 8 pages, 6 figures

    Journal ref: Pattern Recognition Letters 2024

  22. arXiv:2410.10102  [pdf, other

    cs.GR math.NA

    Trust-Region Eigenvalue Filtering for Projected Newton

    Authors: Honglin Chen, Hsueh-Ti Derek Liu, Alec Jacobson, David I. W. Levin, Changxi Zheng

    Abstract: We introduce a novel adaptive eigenvalue filtering strategy to stabilize and accelerate the optimization of Neo-Hookean energy and its variants under the Projected Newton framework. For the first time, we show that Newton's method, Projected Newton with eigenvalue clamping and Projected Newton with absolute eigenvalue filtering can be unified using ideas from the generalized trust region method. B… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: SIGGRAPH Asia 2024 (Conference track). Project page: https://www.cs.columbia.edu/cg/trust-region/

  23. New JWST redshifts for the host galaxies of CDF-S XT1 and XT2: understanding their nature

    Authors: J. Quirola-Vásquez, F. E. Bauer, P. G. Jonker, A. Levan, W. N. Brandt, M. Ravasio, D. Eappachen, Y. Q. Xue, X. C. Zheng

    Abstract: CDF-S XT1 and XT2 are considered two canonical extragalactic fast X-ray transients (FXTs). In this work, we report new constraints on both FXTs, based on recent JWST NIRCam and MIRI photometry, as well as NIRspec spectroscopy for CDF-S XT2 that allow us to improve our understanding of their distances, energetics, and host galaxy properties compared to the pre-JWST era. We use the available HST and… ▽ More

    Submitted 24 February, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: The manuscript was accepted by Astronomy & Astrophysics in January 2025

    Journal ref: A&A 695, A279 (2025)

  24. arXiv:2410.08935  [pdf, other

    cs.RO

    Voxel-SLAM: A Complete, Accurate, and Versatile LiDAR-Inertial SLAM System

    Authors: Zheng Liu, Haotian Li, Chongjian Yuan, Xiyuan Liu, Jiarong Lin, Rundong Li, Chunran Zheng, Bingyang Zhou, Wenyi Liu, Fu Zhang

    Abstract: In this work, we present Voxel-SLAM: a complete, accurate, and versatile LiDAR-inertial SLAM system that fully utilizes short-term, mid-term, long-term, and multi-map data associations to achieve real-time estimation and high precision mapping. The system consists of five modules: initialization, odometry, local mapping, loop closure, and global mapping, all employing the same map representation,… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  25. arXiv:2410.06854  [pdf, other

    cs.GR cs.HC

    Focal Surface Holographic Light Transport using Learned Spatially Adaptive Convolutions

    Authors: Chuanjun Zheng, Yicheng Zhan, Liang Shi, Ozan Cakmakci, Kaan Akşit

    Abstract: Computer-Generated Holography (CGH) is a set of algorithmic methods for identifying holograms that reconstruct Three-Dimensional (3D) scenes in holographic displays. CGH algorithms decompose 3D scenes into multiplanes at different depth levels and rely on simulations of light that propagated from a source plane to a targeted plane. Thus, for n planes, CGH typically optimizes holograms using n plan… ▽ More

    Submitted 14 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: SIGGRAPH Asia 2024 Technical Communications

  26. arXiv:2410.05739  [pdf, other

    cs.SD cs.AI eess.AS

    Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

    Authors: Cheng Chi, Xiaoyu Li, Andong Li, Yuxuan Ke, Xiaodong Li, Chengshi Zheng

    Abstract: Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly.… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  27. arXiv:2410.04798  [pdf, other

    cs.CL

    DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

    Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

    Abstract: The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks. In general, the attention scores are determined simply by the key-query products. However, this work's occasional trial (combining DAPE and NoPE) of including additional MLPs on attention scores without position encodi… ▽ More

    Submitted 10 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Tech Report. Compared to DAPE, this work (DAPE V2) further analyzes the length extrapolation problem and translate the length extrapolation issue into a well-understood feature map processing problem. arXiv admin note: text overlap with arXiv:2405.14722

  28. arXiv:2410.03090  [pdf, other

    cs.CL cs.LG

    UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference

    Authors: Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Lingpeng Kong, Ngai Wong

    Abstract: Deploying large language models (LLMs) is challenging due to their high memory and computational demands, especially during long-context inference. While key-value (KV) caching accelerates inference by reusing previously computed keys and values, it also introduces significant memory overhead. Existing KV cache compression methods such as eviction and merging typically compress the KV cache after… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  29. arXiv:2410.02719  [pdf, other

    cs.CL

    UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

    Authors: Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong

    Abstract: We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) that utilizes Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. This span uncertainty enhances model calibration, improving robustness and mitigating semantic inconsistencies introduced by random chunking. Leveraging this insight, we propose an efficient un… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  30. arXiv:2410.00772  [pdf, other

    cs.CV cs.LG

    On the Generalization and Causal Explanation in Self-Supervised Learning

    Authors: Wenwen Qiang, Zeen Song, Ziyin Gu, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong

    Abstract: Self-supervised learning (SSL) methods learn from unlabeled data and achieve high generalization performance on downstream tasks. However, they may also suffer from overfitting to their training data and lose the ability to adapt to new tasks. To investigate this phenomenon, we conduct experiments on various SSL methods and datasets and make two observations: (1) Overfitting occurs abruptly in lat… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  31. arXiv:2409.19699  [pdf, other

    quant-ph math-ph

    Efficient Verification of Stabilizer Code Subspaces with Local Measurements

    Authors: Congcong Zheng, Xutao Yu, Zaichen Zhang, Ping Xu, Kun Wang

    Abstract: We address the task of verifying whether a quantum computer, designed to be protected by a specific stabilizer code, correctly encodes the corresponding logical qubits. To achieve this, we develop a general framework for subspace verification and explore several stabilizer code subspaces of practical significance. First, we present two efficient verification strategies for general stabilizer code… ▽ More

    Submitted 7 December, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: After the submission of this work, we have become aware of a related work by Chen et al. in arXiv:2410.12551

  32. arXiv:2409.19676  [pdf, other

    cs.CV cs.AI

    See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning

    Authors: Chengxin Zheng, Junzhong Ji, Yanzhao Shi, Xiaodan Zhang, Liangqiong Qu

    Abstract: Brain CT report generation is significant to aid physicians in diagnosing cranial diseases. Recent studies concentrate on handling the consistency between visual and textual pathological features to improve the coherence of report. However, there exist some challenges: 1) Redundant visual representing: Massive irrelevant areas in 3D scans distract models from representing salient visual contexts.… ▽ More

    Submitted 1 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: Our work has been accepted by EMNLP2024 findings

  33. arXiv:2409.17830  [pdf, other

    cs.CV

    Unsupervised Learning Based Multi-Scale Exposure Fusion

    Authors: Chaobing Zheng, Shiqian Wu, Zhenggguo Li

    Abstract: Unsupervised learning based multi-scale exposure fusion (ULMEF) is efficient for fusing differently exposed low dynamic range (LDR) images into a higher quality LDR image for a high dynamic range (HDR) scene. Unlike supervised learning, loss functions play a crucial role in the ULMEF. In this paper, novel loss functions are proposed for the ULMEF and they are defined by using all the images to be… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 11 pages

  34. arXiv:2409.16997  [pdf, other

    cs.LG cs.AI

    INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

    Authors: Shimao Chen, Zirui Liu, Zhiying Wu, Ce Zheng, Peizhuang Cong, Zihan Jiang, Yuhan Wu, Lei Su, Tong Yang

    Abstract: As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length. FlashAttention accelerates attention computation and reduces its memory usage by leveraging the GPU memory hierarchy. A promising research direction is to integrate FlashAttention with quantization methods. This paper introduces INT-F… ▽ More

    Submitted 26 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

  35. arXiv:2409.15269  [pdf, other

    cs.CV

    ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

    Authors: Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges

    Abstract: While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reco… ▽ More

    Submitted 28 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: Project page: https://moygcc.github.io/ReLoo/

  36. arXiv:2409.14741  [pdf, other

    cs.CV cs.AI

    Less yet robust: crucial region selection for scene recognition

    Authors: Jianqi Zhang, Mengxuan Wang, Jingyao Wang, Lingyu Si, Changwen Zheng, Fanjiang Xu

    Abstract: Scene recognition, particularly for aerial and underwater images, often suffers from various types of degradation, such as blurring or overexposure. Previous works that focus on convolutional neural networks have been shown to be able to extract panoramic semantic features and perform well on scene recognition tasks. However, low-quality images still impede model performance due to the inappropria… ▽ More

    Submitted 20 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  37. arXiv:2409.14228  [pdf, other

    cs.HC

    Mentigo: An Intelligent Agent for Mentoring Students in the Creative Problem Solving Process

    Authors: Siyu Zha, Yujia Liu, Chengbo Zheng, Jiaqi XU, Fuze Yu, Jiangtao Gong, Yingqing XU

    Abstract: With the increasing integration of large lauguage models (LLMs) in education, there is growing interest in using AI agents to support student learning in creative tasks. This study presents an interactive Mentor Agent system named Mentigo, which is designed to assist middle school students in the creative problem solving (CPS) process. We created a comprehensive dataset of real classroom interacti… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: Comments: 19 pages, 5 figures. Submitted to CHI 2025

    MSC Class: 68U35 (Primary); 68T50 (Secondary) ACM Class: H.5.2; K.3.1

  38. arXiv:2409.11505  [pdf, other

    cs.IR

    Perceptions of Edinburgh: Capturing Neighbourhood Characteristics by Clustering Geoparsed Local News

    Authors: Andreas Grivas, Claire Grover, Richard Tobin, Clare Llewellyn, Eleojo Oluwaseun Abubakar, Chunyu Zheng, Chris Dibben, Alan Marshall, Jamie Pearce, Beatrice Alex

    Abstract: The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community information that may be useful in health studies. Here we… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Preprint - paper under submission

  39. arXiv:2409.08474  [pdf, other

    cs.LG cs.CV

    Rethinking Meta-Learning from a Learning Lens

    Authors: Jingyao Wang, Wenwen Qiang, Changwen Zheng, Hui Xiong, Gang Hua

    Abstract: Meta-learning seeks to learn a well-generalized model initialization from training tasks to solve unseen tasks. From the "learning to learn" perspective, the quality of the initialization is modeled with one-step gradient decent in the inner loop. However, contrary to theoretical expectations, our empirical analysis reveals that this may expose meta-learning to underfitting. To bridge the gap betw… ▽ More

    Submitted 6 May, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

  40. arXiv:2409.05310  [pdf, other

    cs.RO cs.CV

    Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems

    Authors: Jianheng Liu, Chunran Zheng, Yunfei Wan, Bowen Wang, Yixi Cai, Fu Zhang

    Abstract: This paper presents a unified surface reconstruction and rendering framework for LiDAR-visual systems, integrating Neural Radiance Fields (NeRF) and Neural Distance Fields (NDF) to recover both appearance and structural information from posed images and point clouds. We address the structural visible gap between NeRF and NDF by utilizing a visible-aware occupancy map to classify space into the fre… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  41. arXiv:2409.04679  [pdf, other

    cs.CV

    Neural Augmentation Based Panoramic High Dynamic Range Stitching

    Authors: Chaobing Zheng, Yilun Xu, Weihai Chen, Shiqian Wu, Sen Zhang, Zhengguo Li

    Abstract: Due to saturated regions of inputting low dynamic range (LDR) images and large intensity changes among the LDR images caused by different exposures, it is challenging to produce an information enriched panoramic LDR image without visual artifacts for a high dynamic range (HDR) scene through stitching multiple geometrically synchronized LDR images with different exposures and pairwise overlapping f… ▽ More

    Submitted 20 February, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: 11 pages

  42. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 31 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 6 figures

  43. arXiv:2409.00992  [pdf, other

    cs.RO

    MFCalib: Single-shot and Automatic Extrinsic Calibration for LiDAR and Camera in Targetless Environments Based on Multi-Feature Edge

    Authors: Tianyong Ye, Wei Xu, Chunran Zheng, Yukang Cui

    Abstract: This paper presents MFCalib, an innovative extrinsic calibration technique for LiDAR and RGB camera that operates automatically in targetless environments with a single data capture. At the heart of this method is using a rich set of edge information, significantly enhancing calibration accuracy and robustness. Specifically, we extract both depth-continuous and depth-discontinuous edges, along wit… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 8 pages, 10 figures, accepted by IROS2024

  44. arXiv:2408.16228  [pdf, other

    cs.RO cs.LG

    Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

    Authors: Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang

    Abstract: Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO)… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 14 figures

    Journal ref: Conference on Robot Learning, 2024

  45. arXiv:2408.14089  [pdf, other

    cs.IT eess.SP

    Mini-Slot-Assisted Short Packet URLLC:Differential or Coherent Detection?

    Authors: Canjian Zheng, Fu-Chun Zheng, Jingjing Luo, Pengcheng Zhu, Xiaohu You, Daquan Feng

    Abstract: One of the primary challenges in short packet ultra-reliable and low-latency communications (URLLC) is to achieve reliable channel estimation and data detection while minimizing the impact on latency performance. Given the small packet size in mini-slot-assisted URLLC, relying solely on pilot-based coherent detection is almost impossible to meet the seemingly contradictory requirements of high cha… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figures, journal

  46. arXiv:2408.14035  [pdf, other

    cs.RO cs.CV

    FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

    Authors: Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, Yunfan Ren, Rong Wang, Fanle Meng, Fu Zhang

    Abstract: This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 30 pages, 31 figures, due to the limitation that 'The abstract field cannot exceed 1,920 characters', the abstract presented here is shorter than the one in the PDF file

  47. arXiv:2408.13912  [pdf, other

    cs.CV cs.LG

    Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

    Authors: Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu

    Abstract: In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extend… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Our project page can be found at: https://splatt3r.active.vision/

  48. arXiv:2408.13861  [pdf, ps, other

    math.DS

    Topological rigidity of closures of certain sparse unipotent orbits in finite-volume quotients of $\prod_{i=1}^k\operatorname{SL}_2(\mathbb R)$

    Authors: Cheng Zheng

    Abstract: We give a simple proof about the topological rigidity of closures of certain sparse unipotent orbits in $G/Γ$ where $G=\prod_{i=1}^k\operatorname{SL}_2(\mathbb R)$ and $Γ$ is an irreducible lattice in $G$.

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 18 pages

    MSC Class: Primary 37A17; Secondary 11J99

  49. arXiv:2408.13598  [pdf, other

    astro-ph.HE astro-ph.IM

    Advancing Gamma-Ray Burst Identification through Transfer Learning with Convolutional Neural Networks

    Authors: Peng Zhang, Bing Li, Ren-zhou Gui, Shao-lin Xiong, Yu Wang, Yan-qiu Zhang, Chen-wei Wang, Jia-cong Liu, Wang-chen Xue, Chao Zheng, Zheng-hang Yu, Wen-long Zhang

    Abstract: The Rapid and accurate identification of Gamma-Ray Bursts (GRBs) is crucial for unraveling their origins. However, current burst search algorithms frequently miss low-threshold signals or lack universality for observations. In this study, we propose a novel approach utilizing transfer learning experiment based on convolutional neural network (CNN) to establish a universal GRB identification method… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 17 pages, 7 figures

  50. arXiv:2408.10519  [pdf, other

    cs.DC cs.DS

    Almost Optimal Algorithms for Token Collision in Anonymous Networks

    Authors: Sirui Bai, Xinyu Fu, Xudong Wu, Penghui Yao, Chaodong Zheng

    Abstract: In distributed systems, situations often arise where some nodes each holds a collection of tokens, and all nodes collectively need to determine whether all tokens are distinct. For example, if each token represents a logged-in user, the problem corresponds to checking whether there are duplicate logins. Similarly, if each token represents a data object or a timestamp, the problem corresponds to ch… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.