Skip to main content

Showing 1–50 of 122 results for author: Shu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18934  [pdf, ps, other

    cs.LG cs.AI cs.IR cs.SI

    Chi-Square Wavelet Graph Neural Networks for Heterogeneous Graph Anomaly Detection

    Authors: Xiping Li, Xiangyu Dong, Xingyi Zhang, Kun Xie, Yuanhao Feng, Bo Wang, Guilin Li, Wuxiong Zeng, Xiujun Shu, Sibo Wang

    Abstract: Graph Anomaly Detection (GAD) in heterogeneous networks presents unique challenges due to node and edge heterogeneity. Existing Graph Neural Network (GNN) methods primarily focus on homogeneous GAD and thus fail to address three key issues: (C1) Capturing abnormal signal and rich semantics across diverse meta-paths; (C2) Retaining high-frequency content in HIN dimension alignment; and (C3) Learnin… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  2. arXiv:2505.18897  [pdf, ps, other

    cs.IR cs.AI

    Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuning

    Authors: Dipanwita Saha, Anis Zaman, Hua Zou, Ning Chen, Xinxin Shu, Nadia Vase, Abraham Bagherjeiran

    Abstract: In search advertising, keyword matching connects user queries with relevant ads. While token-based matching increases ad coverage, it can reduce relevance due to overly permissive semantic expansion. This work extends keyword reach through document-side semantic keyword expansion, using a language model to broaden token-level matching without altering queries. We propose a solution using a pre-tra… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  3. arXiv:2505.18668  [pdf, ps, other

    cs.CV cs.CL

    ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

    Authors: Zhen Li, Duan Li, Yukai Guo, Xinyuan Guo, Bowen Li, Lanxi Xiao, Shenyu Qiao, Jiashu Chen, Zijian Wu, Hui Zhang, Xinhuan Shu, Shixia Liu

    Abstract: Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the under… ▽ More

    Submitted 7 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: 56 pages

  4. arXiv:2505.15274  [pdf, ps, other

    cs.AI

    Identification of Probabilities of Causation: A Complete Characterization

    Authors: Xin Shu, Shuai Wang, Ang Li

    Abstract: Probabilities of causation are fundamental to modern decision-making. Pearl first introduced three binary probabilities of causation, and Tian and Pearl later derived tight bounds for them using Balke's linear programming. The theoretical characterization of probabilities of causation with multi-valued treatments and outcomes has remained unresolved for decades, limiting the scope of causality-bas… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  5. arXiv:2505.09331  [pdf, other

    cs.LG

    MUST: Multi-Scale Structural-Temporal Link Prediction Model for UAV Ad Hoc Networks

    Authors: Cunlai Pu, Fangrui Wu, Rajput Ramiz Sharafat, Guangzhao Dai, Xiangbo Shu

    Abstract: Link prediction in unmanned aerial vehicle (UAV) ad hoc networks (UANETs) aims to predict the potential formation of future links between UAVs. In adversarial environments where the route information of UAVs is unavailable, predicting future links must rely solely on the observed historical topological information of UANETs. However, the highly dynamic and sparse nature of UANET topologies present… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2504.17033  [pdf, ps, other

    cs.DS

    Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

    Authors: Ran Duan, Jiayi Mao, Xiao Mao, Xinkai Shu, Longhui Yin

    Abstract: We give a deterministic $O(m\log^{2/3}n)$-time algorithm for single-source shortest paths (SSSP) on directed graphs with real non-negative edge weights in the comparison-addition model. This is the first result to break the $O(m+n\log n)$ time bound of Dijkstra's algorithm on sparse graphs, showing that Dijkstra's algorithm is not optimal for SSSP.

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 17 pages

  7. arXiv:2504.13430  [pdf, ps, other

    cs.GT cs.DS

    The Long Arm of Nashian Allocation in Online $p$-Mean Welfare Maximization

    Authors: Zhiyi Huang, Chui Shan Lee, Xinkai Shu, Zhaozi Wang

    Abstract: We study the online allocation of divisible items to $n$ agents with additive valuations for $p$-mean welfare maximization, a problem introduced by Barman, Khan, and Maiti~(2022). Our algorithmic and hardness results characterize the optimal competitive ratios for the entire spectrum of $-\infty \le p \le 1$. Surprisingly, our improved algorithms for all $p \le \frac{1}{\log n}$ are simply the gre… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  8. arXiv:2504.10079  [pdf, other

    cs.CV

    Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

    Authors: Hongyu Qu, Ling Xing, Rui Yan, Yazhou Yao, Guo-Sen Xie, Xiangbo Shu

    Abstract: Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars. Existing methods typically learn frame-level representations independently for each video by designing various inter-frame temporal modeling strategies. However, they neglect explicit relation modeling between videos and tasks, thus failing to capture shared temporal patterns across videos and reuse te… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  9. arXiv:2504.01396  [pdf, other

    cs.CV

    All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

    Authors: Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, Xi Li

    Abstract: The exponential growth of AI-generated images (AIGIs) underscores the urgent need for robust and generalizable detection methods. In this paper, we establish two key principles for AIGI detection through systematic analysis: (1) All Patches Matter: Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains s… ▽ More

    Submitted 29 May, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  10. arXiv:2503.23529  [pdf, other

    cs.CV

    ViLAaD: Enhancing "Attracting and Dispersing'' Source-Free Domain Adaptation with Vision-and-Language Model

    Authors: Shuhei Tarashima, Xinqi Shu, Norio Tagawa

    Abstract: Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target dataset from a different domain without access to the source data. Conventional SFDA methods are limited by the information encoded in the pre-trained source model and the unlabeled target data. Recently, approaches leveraging auxiliary resources have emerged, yet remain in their early stages, offering ample… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 15 pages

  11. arXiv:2503.17080  [pdf, other

    cs.CV

    Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection

    Authors: Gensheng Pei, Tao Chen, Yujia Wang, Xinhao Cai, Xiangbo Shu, Tianfei Zhou, Yazhou Yao

    Abstract: The CLIP model has demonstrated significant advancements in aligning visual and language modalities through large-scale pre-training on image-text pairs, enabling strong zero-shot classification and retrieval capabilities on various domains. However, CLIP's training remains computationally intensive, with high demands on both data processing and memory. To address these challenges, recent masking… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: accepted by CVPR 2025

  12. arXiv:2503.08262  [pdf, ps, other

    cs.DS cs.NI

    Cost-driven prunings for iterative solving of constrained routing problem with SRLG-disjoint protection

    Authors: P. A. Mosharev, Choon-Meng Lee, Xu Shu, Xiaoshan Zhang, Man-Hong Yung

    Abstract: The search for the optimal pair of active and protection paths in a network with Shared Risk Link Groups (SRLG) is a challenging but high-value problem in the industry that is inevitable in ensuring reliable connections on the modern Internet. We propose a new approach to solving this problem, with a novel use of statistical analysis of the distribution of paths with respect to their cost, which i… ▽ More

    Submitted 9 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: Included more references of works in adjacent areas. Discussed in more details the nature of SRLG in experiment dataset. Corrected writing style

  13. arXiv:2502.08076  [pdf, other

    cs.HC

    RouteFlow: Trajectory-Aware Animated Transitions

    Authors: Duan Li, Xinyuan Guo, Xinhuan Shu, Lanxi Xiao, Lingyun Yu, Shixia Liu

    Abstract: Animating objects' movements is widely used to facilitate tracking changes and observing both the global trend and local hotspots where objects converge or diverge. Existing methods, however, often obscure critical local hotspots by only considering the start and end positions of objects' trajectories. To address this gap, we propose RouteFlow, a trajectory-aware animated transition method that ef… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: Accepted to CHI 2025

  14. arXiv:2502.06501  [pdf, other

    cs.CV

    Learning Clustering-based Prototypes for Compositional Zero-shot Learning

    Authors: Hongyu Qu, Jianan Wei, Xiangbo Shu, Wenguan Wang

    Abstract: Learning primitive (i.e., attribute and object) concepts from seen compositions is the primary challenge of Compositional Zero-Shot Learning (CZSL). Existing CZSL solutions typically rely on oversimplified data assumptions, e.g., modeling each primitive with a single centroid primitive representation, ignoring the natural diversities of the attribute (resp. object) when coupled with different obje… ▽ More

    Submitted 22 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted to ICLR 2025; Project page: https://github.com/quhongyu/ClusPro

  15. arXiv:2502.00791  [pdf, other

    cs.CL cs.CV

    Vision-centric Token Compression in Large Language Model

    Authors: Ling Xing, Alex Jinpeng Wang, Rui Yan, Xiangbo Shu, Jinhui Tang

    Abstract: Real-world applications are stretching context windows to hundreds of thousand of tokens while Large Language Models (LLMs) swell from billions to trillions of parameters. This dual expansion send compute and memory costs skyrocketing, making token compression indispensable. We introduce Vision Centric Token Compression (Vist), a slow-fast compression framework that mirrors human reading: the fast… ▽ More

    Submitted 19 May, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

  16. arXiv:2501.09720  [pdf, other

    cs.CV cs.AI

    A Simple Aerial Detection Baseline of Multimodal Language Models

    Authors: Qingyun Li, Yushi Chen, Xinya Shu, Dong Chen, Xin He, Yi Yu, Xue Yang

    Abstract: The multimodal language models (MLMs) based on generative pre-trained Transformer are considered powerful candidates for unifying various domains and tasks. MLMs developed for remote sensing (RS) have demonstrated outstanding performance in multiple tasks, such as visual question answering and visual grounding. In addition to visual grounding that detects specific objects corresponded to given ins… ▽ More

    Submitted 31 January, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

    Comments: 4 pages, 1 table, 4 figures

  17. arXiv:2501.09349  [pdf, other

    cs.CL cs.HC

    ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

    Authors: Fen Wang, Bomiao Wang, Xueli Shu, Zhen Liu, Zekai Shao, Chao Liu, Siming Chen

    Abstract: Effective chart summary can significantly reduce the time and effort decision makers spend interpreting charts, enabling precise and efficient communication of data insights. Previous studies have faced challenges in generating accurate and semantically rich summaries of time-series data charts. In this paper, we identify summary elements and common hallucination types in the generation of time-se… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  18. arXiv:2412.17619  [pdf, other

    cs.CV

    Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection

    Authors: Fenfang Tao, Guo-Sen Xie, Fang Zhao, Xiangbo Shu

    Abstract: Few-shot anomaly detection (FSAD) aims to detect unseen anomaly regions with the guidance of very few normal support images from the same class. Existing FSAD methods usually find anomalies by directly designing complex text prompts to align them with visual features under the prevailing large vision-language model paradigm. However, these methods, almost always, neglect intrinsic contextual infor… ▽ More

    Submitted 16 April, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted to AAAI 2025

  19. arXiv:2412.06510  [pdf, other

    cs.CV cs.AI

    AnomalyControl: Learning Cross-modal Semantic Features for Controllable Anomaly Synthesis

    Authors: Shidan He, Lei Liu, Xiujun Shu, Bo Wang, Yuanhao Feng, Shen Zhao

    Abstract: Anomaly synthesis is a crucial approach to augment abnormal data for advancing anomaly inspection. Based on the knowledge from the large-scale pre-training, existing text-to-image anomaly synthesis methods predominantly focus on textual information or coarse-aligned visual features to guide the entire generation process. However, these methods often lack sufficient descriptors to capture the compl… ▽ More

    Submitted 18 April, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

  20. arXiv:2411.18328  [pdf, other

    cs.CV

    EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond

    Authors: Meiqi Cao, Xiangbo Shu, Jiachao Zhang, Rui Yan, Zechao Li, Jinhui Tang

    Abstract: Event-based Action Recognition (EAR) possesses the advantages of high-temporal resolution capturing and privacy preservation compared with traditional action recognition. Current leading EAR solutions typically follow two regimes: project unconstructed event streams into dense constructed event frames and adopt powerful frame-specific networks, or employ lightweight point-specific networks to hand… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  21. arXiv:2411.17532  [pdf, other

    cs.CV

    FTMoMamba: Motion Generation with Frequency and Text State Space Models

    Authors: Chengjian Li, Xiangbo Shu, Qiongjie Cui, Yazhou Yao, Jinhui Tang

    Abstract: Diffusion models achieve impressive performance in human motion generation. However, current approaches typically ignore the significance of frequency-domain information in capturing fine-grained motions within the latent space (e.g., low frequencies correlate with static poses, and high frequencies align with fine-grained motions). Additionally, there is a semantic discrepancy between text and mo… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 8 pages, 6 figures

  22. arXiv:2411.16053  [pdf, other

    cs.CV cs.AI

    UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

    Authors: Guangzhao Dai, Jian Zhao, Yuantao Chen, Yusen Qin, Hao Zhao, Guosen Xie, Yazhou Yao, Xiangbo Shu, Xuelong Li

    Abstract: Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements. In contrast to navigation in discrete environments with predefined trajectories, VLN in Continuous Environments (VLN-CE) presents greater challenges, as the agent is free to navigate any unobstructed location and is more vulnerable to visual occlusion… ▽ More

    Submitted 16 March, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

  23. arXiv:2411.14717  [pdf, other

    cs.LG cs.CL cs.CV

    FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

    Authors: Binqian Xu, Xiangbo Shu, Haiyang Mei, Guosen Xie, Basura Fernando, Jinhui Tang

    Abstract: Multimodal Large Language Models (MLLMs) have made significant advancements, demonstrating powerful capabilities in processing and understanding multimodal data. Fine-tuning MLLMs with Federated Learning (FL) allows for expanding the training data scope by including private data sources, thereby enhancing their practical applicability in privacy-sensitive domains. However, current research remains… ▽ More

    Submitted 8 March, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  24. arXiv:2411.02542  [pdf, other

    cs.LG cs.SI

    Enhancing Graph Neural Networks in Large-scale Traffic Incident Analysis with Concurrency Hypothesis

    Authors: Xiwen Chen, Sayed Pedram Haeri Boroujeni, Xin Shu, Huayu Li, Abolfazl Razi

    Abstract: Despite recent progress in reducing road fatalities, the persistently high rate of traffic-related deaths highlights the necessity for improved safety interventions. Leveraging large-scale graph-based nationwide road network data across 49 states in the USA, our study first posits the Concurrency Hypothesis from intuitive observations, suggesting a significant likelihood of incidents occurring at… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: Accepted by Sigspatial 2024

  25. arXiv:2411.02265  [pdf, other

    cs.CL cs.AI

    Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

    Authors: Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Xuemeng Huang, Fengzong Lian, Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Jun Xia, Tao Yang, Suncong Zheng, Kan Wu , et al. (83 additional authors not shown)

    Abstract: In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logica… ▽ More

    Submitted 6 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 17 pages, 4 Figures

  26. arXiv:2410.18260  [pdf, other

    eess.IV cs.CV

    Predicting total time to compress a video corpus using online inference systems

    Authors: Xin Shu, Vibhoothi Vibhoothi, Anil Kokaram

    Abstract: Predicting the computational cost of compressing/transcoding clips in a video corpus is important for resource management of cloud services and VOD (Video On Demand) providers. Currently, customers of cloud video services are unaware of the cost of transcoding their files until the task is completed. Previous work concentrated on predicting perclip compression time, and thus estimating the cost of… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted by IEEE International Conference on Visual Communications and Image Processing (VCIP) 2024

  27. arXiv:2410.13213  [pdf, other

    cs.AI cs.LG

    LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

    Authors: Caigao Jiang, Xiang Shu, Hong Qian, Xingyu Lu, Jun Zhou, Aimin Zhou, Yang Yu

    Abstract: Optimization problems are prevalent across various scenarios. Formulating and then solving optimization problems described by natural language often requires highly specialized human expertise, which could block the widespread application of optimization-based decision making. To automate problem formulation and solving, leveraging large language models (LLMs) has emerged as a potential way. Howev… ▽ More

    Submitted 2 March, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

  28. arXiv:2410.06868  [pdf, ps, other

    cs.DS cs.GT

    Online Matching Meets Sampling Without Replacement

    Authors: Zhiyi Huang, Chui Shan Lee, Jianqiao Lu, Xinkai Shu

    Abstract: Sampling without replacement is a natural online rounding strategy for converting fractional bipartite matching into an integral one. In Online Bipartite Matching, we can use the Balance algorithm to fractionally match each online vertex, and then sample an unmatched offline neighbor with probability proportional to the fractional matching. In Online Stochastic Matching, we can take the solution t… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  29. arXiv:2409.19835  [pdf, other

    cs.CV eess.IV

    MoCoLSK: Modality Conditioned High-Resolution Downscaling for Land Surface Temperature

    Authors: Qun Dai, Chunyang Yuan, Yimian Dai, Yuxuan Li, Xiang Li, Kang Ni, Jianhui Xu, Xiangbo Shu, Jian Yang

    Abstract: Land Surface Temperature (LST) is a critical parameter for environmental studies, but directly obtaining high spatial resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing. Guided LST downscaling has emerged as an alternative solution to overcome these limitations, but current methods often neglect spatial non-stationarity, and there is a lack of… ▽ More

    Submitted 2 March, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE TGRS

  30. arXiv:2409.17792  [pdf, ps, other

    cs.CV

    Reblurring-Guided Single Image Defocus Deblurring: A Learning Framework with Misaligned Training Pairs

    Authors: Dongwei Ren, Xinya Shu, Yu Li, Xiaohe Wu, Jin Li, Wangmeng Zuo

    Abstract: For single image defocus deblurring, acquiring well-aligned training pairs (or training triplets), i.e., a defocus blurry image, an all-in-focus sharp image (and a defocus blur map), is a challenging task for developing effective deblurring models. Existing image defocus deblurring methods typically rely on training data collected by specialized imaging equipment, with the assumption that these pa… ▽ More

    Submitted 26 June, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted to International Journal of Computer Vision. The source code and dataset are available at https://github.com/ssscrystal/Reblurring-guided-JDRL

  31. arXiv:2409.07967  [pdf, other

    cs.CV

    Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization

    Authors: Ling Xing, Hongyu Qu, Rui Yan, Xiangbo Shu, Jinhui Tang

    Abstract: Dense-localization Audio-Visual Events (DAVE) aims to identify time boundaries and corresponding categories for events that are both audible and visible in a long video, where events may co-occur and exhibit varying durations. However, complex audio-visual scenes often involve asynchronization between modalities, making accurate localization challenging. Existing DAVE solutions extract audio and v… ▽ More

    Submitted 9 May, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

  32. arXiv:2408.07483  [pdf, other

    cs.HC

    Visualization Atlases: Explaining and Exploring Complex Topics through Data, Visualization, and Narration

    Authors: Jinrui Wang, Xinhuan Shu, Benjamin Bach, Uta Hinrichs

    Abstract: This paper defines, analyzes, and discusses the emerging genre of visualization atlases. We currently witness an increase in web-based, data-driven initiatives that call themselves "atlases" while explaining complex, contemporary issues through data and visualizations: climate change, sustainability, AI, or cultural discoveries. To understand this emerging genre and inform their design, study, and… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  33. arXiv:2408.01272  [pdf, other

    cs.HC

    Does This Have a Particular Meaning? Interactive Pattern Explanation for Network Visualizations

    Authors: Xinhuan Shu, Alexis Pister, Junxiu Tang, Fanny Chevalier, Benjamin Bach

    Abstract: This paper presents an interactive technique to explain visual patterns in network visualizations to analysts who do not understand these visualizations and who are learning to read them. Learning a visualization requires mastering its visual grammar and decoding information presented through visual marks, graphical encodings, and spatial configurations. To help people learn network visualization… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: to be published in IEEE VIS 2024

  34. arXiv:2407.20078  [pdf, other

    cs.CV

    Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset

    Authors: Mengxuan Xiao, Qun Dai, Yiming Zhu, Kehua Guo, Huan Wang, Xiangbo Shu, Jian Yang, Yimian Dai

    Abstract: Infrared small target detection poses unique challenges due to the scarcity of intrinsic target features and the abundance of similar background distractors. We argue that background semantics play a pivotal role in distinguishing visually similar objects for this task. To address this, we introduce a new task--clustered infrared small target detection, and present DenseSIRST, a novel benchmark da… ▽ More

    Submitted 2 November, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  35. arXiv:2407.19727  [pdf, other

    cs.IR

    Adaptive Utilization of Cross-scenario Information for Multi-scenario Recommendation

    Authors: Xiufeng Shu, Ruidong Han, Xiang Li, Wei Lin

    Abstract: Recommender system of the e-commerce platform usually serves multiple business scenarios. Multi-scenario Recommendation (MSR) is an important topic that improves ranking performance by leveraging information from different scenarios. Recent methods for MSR mostly construct scenario shared or specific modules to model commonalities and differences among scenarios. However, when the amount of data a… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  36. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  37. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  38. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 5 March, 2025; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted to TMM 2025

  39. arXiv:2401.10039  [pdf, other

    cs.CV

    GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

    Authors: Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

    Abstract: Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We prop… ▽ More

    Submitted 11 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  40. arXiv:2312.11225  [pdf, other

    cs.CR

    MAD-MulW: A Multi-Window Anomaly Detection Framework for BGP Security Events

    Authors: Songtao Peng, Yiping Chen, Xincheng Shu, Wu Shuai, Shenhao Fang, Zhongyuan Ruan, Qi Xuan

    Abstract: In recent years, various international security events have occurred frequently and interacted between real society and cyberspace. Traditional traffic monitoring mainly focuses on the local anomalous status of events due to a large amount of data. BGP-based event monitoring makes it possible to perform differential analysis of international events. For many existing traffic anomaly detection meth… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 10 pages, 8 figures

  41. arXiv:2309.15728  [pdf, other

    cs.SI

    Line Graph Neural Networks for Link Weight Prediction

    Authors: Jinbi Liang, Cunlai Pu, Xiangbo Shu, Yongxiang Xia, Chengyi Xia

    Abstract: In real-world networks, predicting the weight (strength) of links is as crucial as predicting the existence of the links themselves. Previous studies have primarily used shallow graph features for link weight prediction, limiting the prediction performance. In this paper, we propose a new link weight prediction method, namely Line Graph Neural Networks for Link Weight Prediction (LGLWP), which lea… ▽ More

    Submitted 28 October, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  42. CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions

    Authors: Haoqin Hong, Yue Zhou, Xiangyu Shu, Xiaofang Hu

    Abstract: Traffic sign detection is an important research direction in intelligent driving. Unfortunately, existing methods often overlook extreme conditions such as fog, rain, and motion blur. Moreover, the end-to-end training strategy for image denoising and object detection models fails to utilize inter-model information effectively. To address these issues, we propose CCSPNet, an efficient feature extra… ▽ More

    Submitted 3 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Journal ref: 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-8

  43. arXiv:2308.15795  [pdf, other

    cs.CV

    Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking

    Authors: Yukun Su, Ruizhou Sun, Xin Shu, Yu Zhang, Qingyao Wu

    Abstract: Multi-Object Tracking (MOT) is a crucial computer vision task that aims to predict the bounding boxes and identities of objects simultaneously. While state-of-the-art methods have made remarkable progress by jointly optimizing the multi-task problems of detection and Re-ID feature learning, yet, few approaches explore to tackle the occlusion issue, which is a long-standing challenge in the MOT fie… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  44. arXiv:2308.14105  [pdf, other

    cs.CV cs.AI

    Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

    Authors: Xiujun Shu, Wei Wen, Liangsheng Xu, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun

    Abstract: Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of… ▽ More

    Submitted 22 June, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

  45. arXiv:2308.04197  [pdf, other

    cs.CV

    D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

    Authors: Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun

    Abstract: Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter requires laborious timestamp annotations. In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  46. arXiv:2308.04040  [pdf, other

    cs.HC

    WonderFlow: Narration-Centric Design of Animated Data Videos

    Authors: Yun Wang, Leixian Shen, Zhengxin You, Xinhuan Shu, Bongshin Lee, John Thompson, Haidong Zhang, Dongmei Zhang

    Abstract: Creating an animated data video enriched with audio narration takes a significant amount of time and effort and requires expertise. Users not only need to design complex animations, but also turn written text scripts into audio narrations and synchronize visual changes with the narrations. This paper presents WonderFlow, an interactive authoring tool, that facilitates narration-centric design of a… ▽ More

    Submitted 6 June, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by TVCG

  47. arXiv:2307.11957  [pdf, other

    physics.optics cs.CV cs.ET cs.LG

    High-performance real-world optical computing trained by in situ gradient-based model-free optimization

    Authors: Guangyuan Zhao, Xin Shu, Renjie Zhou

    Abstract: Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a gradient-based model-free optimization (G-MFO) method based on a Monte Carlo gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical com… ▽ More

    Submitted 21 November, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: The paper titled "High-performance real-world optical computing trained by in situ gradient-based model-free optimization" has been accepted at ICCP&TPAMI 2024. For more details, please visit the [project page](https://shuxin626.github.io/mfo_optical_computing/index.html)

  48. arXiv:2307.04139  [pdf, ps, other

    cs.DS

    A Randomized Algorithm for Single-Source Shortest Path on Undirected Real-Weighted Graphs

    Authors: Ran Duan, Jiayi Mao, Xinkai Shu, Longhui Yin

    Abstract: In undirected graphs with real non-negative weights, we give a new randomized algorithm for the single-source shortest path (SSSP) problem with running time $O(m\sqrt{\log n \cdot \log\log n})$ in the comparison-addition model. This is the first algorithm to break the $O(m+n\log n)$ time bound for real-weighted sparse graphs by Dijkstra's algorithm with Fibonacci heaps. Previous undirected non-neg… ▽ More

    Submitted 4 October, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

    Comments: 17 pages

    MSC Class: 68W20 ACM Class: F.2.2

  49. Creating Emordle: Animating Word Cloud for Emotion Expression

    Authors: Liwenhan Xie, Xinhuan Shu, Jeon Cheol Su, Yun Wang, Siming Chen, Huamin Qu

    Abstract: We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multi… ▽ More

    Submitted 14 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted in IEEE Transactions on Visualization and Computer Graphics

  50. arXiv:2304.04023  [pdf, other

    cs.CV

    Attack-Augmentation Mixing-Contrastive Skeletal Representation Learning

    Authors: Binqian Xu, Xiangbo Shu, Jiachao Zhang, Rui Yan, Guo-Sen Xie

    Abstract: Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such app… ▽ More

    Submitted 2 October, 2024; v1 submitted 8 April, 2023; originally announced April 2023.