Skip to main content

Showing 1–15 of 15 results for author: Sui, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15395  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Real-time Endoscopic Image Denoising System

    Authors: Yu Xing, Shishi Huang, Meng Lv, Guo Chen, Huailiang Wang, Lingzhi Sui

    Abstract: Endoscopes featuring a miniaturized design have significantly enhanced operational flexibility, portability, and diagnostic capability while substantially reducing the invasiveness of medical procedures. Recently, single-use endoscopes equipped with an ultra-compact analogue image sensor measuring less than 1mm x 1mm bring revolutionary advancements to medical diagnosis. They reduce the structural… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2505.23359  [pdf, ps, other

    cs.CV

    VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

    Authors: Yuanxin Liu, Kun Ouyang, Haoning Wu, Yi Liu, Lin Sui, Xinhao Li, Yan Zhong, Y. Charles, Xinyu Zhou, Xu Sun

    Abstract: Recent studies have shown that long chain-of-thought (CoT) reasoning can significantly enhance the performance of large language models (LLMs) on complex tasks. However, this benefit is yet to be demonstrated in the domain of video understanding, since most existing benchmarks lack the reasoning depth required to demonstrate the advantages of extended CoT chains. While recent efforts have proposed… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project Page: https://llyx97.github.io/video_reason_bench/

  3. arXiv:2505.06668  [pdf, ps, other

    cs.CV cs.LG eess.IV

    StableMotion: Repurposing Diffusion-Based Image Priors for Motion Estimation

    Authors: Ziyi Wang, Haipeng Li, Lin Sui, Tianhao Zhou, Hai Jiang, Lang Nie, Shuaicheng Liu

    Abstract: We present StableMotion, a novel framework leverages knowledge (geometry and content priors) from pretrained large-scale image diffusion models to perform motion estimation, solving single-image-based image rectification tasks such as Stitched Image Rectangling (SIR) and Rolling Shutter Correction (RSC). Specifically, StableMotion framework takes text-to-image Stable Diffusion (SD) models as backb… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  4. arXiv:2504.07491  [pdf, ps, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (70 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 23 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Updated Kimi-VL-A3B-Thinking-2506 information

  5. arXiv:2503.06526  [pdf, other

    cs.CV cs.MM

    TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

    Authors: Chen-Lin Zhang, Lin Sui, Shuming Liu, Fangzhou Mu, Zhangcheng Wang, Bernard Ghanem

    Abstract: Temporal localization in untrimmed videos, which aims to identify specific timestamps, is crucial for video understanding but remains challenging. This task encompasses several subtasks, including temporal action localization, temporal video grounding, moment retrieval, and generic event boundary detection. Existing methods in each subfield are typically designed for specific tasks and lack genera… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: Code & models will be released at https://github.com/sming256/TimeLoc. The first 4 authors contributes equally

  6. arXiv:2407.17792  [pdf, other

    cs.CV

    Harnessing Temporal Causality for Advanced Temporal Action Detection

    Authors: Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem

    Abstract: As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and… ▽ More

    Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 1st in Moment Queries track at the Ego4D Challenge 2024; 1st in Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024

  7. arXiv:2307.02025  [pdf, other

    cs.CV

    NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023

    Authors: Lin Sui, Fangzhou Mu, Yin Li

    Abstract: This report describes our submission to the Ego4D Moment Queries Challenge 2023. Our submission extends ActionFormer, a latest method for temporal action localization. Our extension combines an improved ground-truth assignment strategy during training and a refined version of SoftNMS at inference time. Our solution is ranked 2nd on the public leaderboard with 26.62% average mAP and 45.69% Recall@1… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  8. arXiv:2306.15230  [pdf, other

    cs.IT eess.SP

    Probability of Error for Optimal Codes in a Reconfigurable Intelligent Surface Aided URLLC System

    Authors: Likun Sui, Zihuai Lin

    Abstract: The lower bound on the decoding error probability for the optimal code given a signal-to-noise ratio and a code rate are investigated in this letter for the reconfigurable intelligent surface (RIS) communication system over a Rician fading channel at the short blocklength regime, which is the key characteristic of ultra-reliable low-latency communications (URLLC) to meet the need for strict adhere… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  9. arXiv:2208.12020  [pdf, other

    cs.IT eess.SP

    Performance Analysis for Reconfigurable Intelligent Surface Assisted MIMO Systems

    Authors: Likun Sui, Zihuai Lin, Pei Xiao, Branka Vucetic

    Abstract: This paper investigates the maximal achievable rate for a given average error probability and blocklength for the reconfigurable intelligent surface (RIS) assisted multiple-input and multiple-output (MIMO) system. The result consists of a finite blocklength channel coding achievability bound and a converse bound based on the Berry-Esseen theorem, the Mellin transform and the mutual information. Nu… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

  10. arXiv:2206.03064  [pdf, other

    cs.CV

    A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector

    Authors: Lin Sui, Chen-Lin Zhang, Lixin Gu, Feng Han

    Abstract: Spatial-temporal action detection is a vital part of video understanding. Current spatial-temporal action detection methods mostly use an object detector to obtain person candidates and classify these person candidates into different action categories. So-called two-stage methods are heavy and hard to apply in real-world applications. Some existing methods build one-stage pipelines, But a large pe… ▽ More

    Submitted 27 October, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted By WACV 2023

  11. A Study of Single Statement Bugs Involving Dynamic Language Features

    Authors: Li Sui, Shawn Rasheed, Amjed Tahir, Jens Dietrich

    Abstract: Dynamic language features are widely available in programming languages to implement functionality that can adapt to multiple usage contexts, enabling reuse. Functionality such as data binding , object-relational mapping and user interface builders can be heavily dependent on these features. However, their use has risks and downsides as they affect the soundness of static analyses and techniques t… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Accepted at the 30th IEEE/ACM International Conference on Program Comprehension (ICPC 2022) - ERA track

  12. arXiv:2201.10042  [pdf, other

    cs.IT eess.SP

    Performance Analysis of Multiple-Antenna Ambient Backscatter Systems at Finite Blocklengths

    Authors: Likun Sui, Zihuai Lin, Pei Xiao, H. Vincent Poor, Branka Vucetic

    Abstract: This paper analyzes the maximal achievable rate for a given blocklength and error probability over a multiple-antenna ambient backscatter channel with perfect channel state information at the receiver. The result consists of a finite blocklength channel coding achievability bound and a converse bound based on the Neyman-Pearson test and the normal approximation based on the Berry- Esseen Theorem.… ▽ More

    Submitted 20 March, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  13. arXiv:2106.04073  [pdf, other

    cs.CV

    Salvage of Supervision in Weakly Supervised Object Detection

    Authors: Lin Sui, Chen-Lin Zhang, Jianxin Wu

    Abstract: Weakly supervised object detection~(WSOD) has recently attracted much attention. However, the lack of bounding-box supervision makes its accuracy much lower than fully supervised object detection (FSOD), and currently modern FSOD techniques cannot be applied to WSOD. To bridge the performance and technical gaps between WSOD and FSOD, this paper proposes a new framework, Salvage of Supervision (SoS… ▽ More

    Submitted 9 May, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: accepted by CVPR 2022

  14. arXiv:1902.07463  [pdf, other

    cs.DC cs.CV

    DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

    Authors: Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yu Wang, Yi Shan

    Abstract: The convolutional neural network (CNN) has become a state-of-the-art method for several artificial intelligence domains in recent years. The increasingly complex CNN models are both computation-bound and I/O-bound. FPGA-based accelerators driven by custom instruction set architecture (ISA) achieve a balance between generality and efficiency, but there is much on them left to be optimized. We propo… ▽ More

    Submitted 25 July, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: 18 pages, 9 figures, 5 tables

  15. arXiv:1402.0273  [pdf

    cs.CY cs.HC

    The Designing of Online Multiple Intelligence Tools for Lecturers at Polytechnic

    Authors: Sazilah Salam, Siti Nurul Mahfuzah Mohamad, Norasiken Bakar, Linda Khoo Mei Sui

    Abstract: This paper addresses the designing of Online Multiple Intelligence (MI) Teaching Tools for Polytechnic lecturers. These teaching tools can assist lecturers to create their own teaching materials without having any knowledge of Information Technology (IT) especially in programming. The theory of MI is used in this paper and this theory postulates that everybody has at least two or more intelligence… ▽ More

    Submitted 2 February, 2014; originally announced February 2014.

    Comments: 7 pages, 4 figures, 1 table, International Journal of Soft Computing and Software Engineering [JSCSE], Vol. 3, No. 3