Skip to main content

Showing 1–50 of 68 results for author: Chai, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.04421  [pdf, other

    cs.IR

    LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

    Authors: Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, Di Wu

    Abstract: Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Rec… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  2. arXiv:2502.09662  [pdf, other

    q-bio.QM cs.CV eess.IV

    Generalizable Cervical Cancer Screening via Large-scale Pretraining and Test-Time Adaptation

    Authors: Hao Jiang, Cheng Jin, Huangjing Lin, Yanning Zhou, Xi Wang, Jiabo Ma, Li Ding, Jun Hou, Runsheng Liu, Zhizhong Chai, Luyang Luo, Huijuan Shi, Yinling Qian, Qiong Wang, Changzhong Li, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Cervical cancer is a leading malignancy in female reproductive system. While AI-assisted cytology offers a cost-effective and non-invasive screening solution, current systems struggle with generalizability in complex clinical scenarios. To address this issue, we introduced Smart-CCS, a generalizable Cervical Cancer Screening paradigm based on pretraining and adaptation to create robust and general… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  3. arXiv:2502.05558  [pdf, other

    cs.IR

    Large Memory Network for Recommendation

    Authors: Hui Lu, Zheng Chai, Yuchao Zheng, Zhe Chen, Deping Xie, Peng Xu, Xun Zhou, Di Wu

    Abstract: Modeling user behavior sequences in recommender systems is essential for understanding user preferences over time, enabling personalized and accurate recommendations for improving user retention and enhancing business values. Despite its significance, there are two challenges for current sequential modeling approaches. From the spatial dimension, it is difficult to mutually perceive similar users'… ▽ More

    Submitted 17 February, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

    Journal ref: WWW 2025

  4. Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

    Authors: Zheng Chai, Hui Lu, Di Chen, Qin Ren, Yuchao Zheng, Xun Zhou

    Abstract: Users generally exhibit complex behavioral patterns and diverse intentions in multiple business scenarios of super applications like Douyin, presenting great challenges to current industrial multi-domain recommenders. To mitigate the discrepancies across diverse domains, researches and industrial practices generally emphasize sophisticated network structures to accomodate diverse data distribution… ▽ More

    Submitted 28 April, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

    Journal ref: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

  5. arXiv:2410.00773  [pdf, other

    cs.AI cs.CL

    BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

    Authors: Xuwu Wang, Qiwen Cui, Yunzhe Tao, Yiran Wang, Ziwei Chai, Xiaotian Han, Boyi Liu, Jianbo Yuan, Jing Su, Guoyin Wang, Tingkai Liu, Liyu Chen, Tianyi Liu, Tao Sun, Yufeng Zhang, Sirui Zheng, Quanzeng You, Yang Yang, Hongxia Yang

    Abstract: Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, th… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  6. arXiv:2407.05010  [pdf, other

    cs.CV

    PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

    Authors: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  7. arXiv:2406.04629  [pdf, other

    cs.CV cs.GR cs.MM

    STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

    Authors: Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

    Abstract: The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Di… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Tech report

  8. arXiv:2403.18569  [pdf, other

    cs.LG cs.AI

    PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction

    Authors: Yuxiang Zhao, Zhuomin Chai, Xun Jiang, Yibo Lin, Runsheng Wang, Ru Huang

    Abstract: IR drop on the power delivery network (PDN) is closely related to PDN's configuration and cell current consumption. As the integrated circuit (IC) design is growing larger, dynamic IR drop simulation becomes computationally unaffordable and machine learning based IR drop prediction has been explored as a promising solution. Although CNN-based methods have been adapted to IR drop prediction task in… ▽ More

    Submitted 5 December, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Journal ref: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2024

  9. arXiv:2403.16854  [pdf, other

    cs.CL cs.AI

    An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

    Authors: Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang

    Abstract: We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instru… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  10. arXiv:2402.12984  [pdf, other

    cs.CL cs.AI

    Can GNN be Good Adapter for LLMs?

    Authors: Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, Qi Zhu

    Abstract: Recently, large language models (LLMs) have demonstrated superior capabilities in understanding and zero-shot learning on textual data, promising significant advances for many text-related domains. In the graph domain, various real-world scenarios also involve textual data, where tasks and node features can be described by text. These text-attributed graphs (TAGs) have broad applications in social… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW'24

  11. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  12. arXiv:2402.02168  [pdf, other

    cs.LG cs.AI cs.SI

    Enhancing Cross-domain Link Prediction via Evolution Process Modeling

    Authors: Xuanwen Huang, Wei Chow, Yize Zhu, Yang Wang, Ziwei Chai, Chunping Wang, Lei Chen, Yang Yang

    Abstract: This work proposes DyExpert, a dynamic graph model for cross-domain link prediction. It can explicitly model historical evolving processes to learn the evolution pattern of a specific downstream graph and subsequently make pattern-specific link predictions. DyExpert adopts a decode-only transformer and is capable of efficiently parallel training and inference by \textit{conditioned link generation… ▽ More

    Submitted 5 February, 2025; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW'25

  13. arXiv:2401.07314  [pdf, other

    cs.AI cs.CV cs.RO

    MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: LLM/VLM-based VLN Agents. Accepted to ACL 2024. Project: https://chen-judge.github.io/MapGPT/

  14. arXiv:2401.05507  [pdf, other

    cs.CL cs.AI

    InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

    Authors: Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Qianli Ma, Guoyin Wang, Xuwu Wang, Jing Su, Jingjing Xu, Ming Zhu, Yao Cheng, Jianbo Yuan, Jiwei Li, Kun Kuang, Yang Yang, Hongxia Yang, Fei Wu

    Abstract: In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorpora… ▽ More

    Submitted 11 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: 27 pages, 7 figures, work in progress

  15. arXiv:2401.00625  [pdf, ps, other

    cs.LG

    Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

    Authors: Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Xinyuan Song, Carl Yang, Yue Cheng, Liang Zhao

    Abstract: The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims t… ▽ More

    Submitted 29 December, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: GitHub repo: https://github.com/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers

  16. arXiv:2310.05845  [pdf, other

    cs.CL cs.AI

    GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

    Authors: Ziwei Chai, Tianjie Zhang, Liang Wu, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, Yang Yang

    Abstract: The advancement of Large Language Models (LLMs) has remarkably pushed the boundaries towards artificial general intelligence (AGI), with their exceptional ability on understanding diverse types of information, including but not limited to images and audio. Despite this progress, a critical gap remains in empowering LLMs to proficiently understand and reason on graph data. Recent studies underscore… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  17. arXiv:2308.13466  [pdf, other

    cs.LG

    Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

    Authors: Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao

    Abstract: Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which… ▽ More

    Submitted 10 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: Preprint. Do not distribute

  18. arXiv:2307.11618  [pdf, other

    cs.CV

    Divide and Adapt: Active Domain Adaptation via Customized Learning

    Authors: Duojun Huang, Jichang Li, Weikai Chen, Junshi Huang, Zhenhua Chai, Guanbin Li

    Abstract: Active domain adaptation (ADA) aims to improve the model adaptation performance by incorporating active learning (AL) techniques to label a maximally-informative subset of target samples. Conventional AL methods do not consider the existence of domain shift, and hence, fail to identify the truly valuable samples in the context of domain adaptation. To accommodate active learning and domain adaptio… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: CVPR2023, Highlight paper

  19. arXiv:2306.17699  [pdf, other

    cs.CV

    Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

    Authors: Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai, Xiaolin Wei, Liang Lin, Yizhou Yu

    Abstract: In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous methods that only consider ID samples to be useful and aim to filter out OOD ones completely during training, we argue that the exploration and exploitation of both ID and OOD s… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  20. arXiv:2306.13301  [pdf, other

    cs.CV

    Deep Omni-supervised Learning for Rib Fracture Detection from Chest Radiology Images

    Authors: Zhizhong Chai, Luyang Luo, Huangjing Lin, Pheng-Ann Heng, Hao Chen

    Abstract: Deep learning (DL)-based rib fracture detection has shown promise of playing an important role in preventing mortality and improving patient outcome. Normally, developing DL-based object detection models requires a huge amount of bounding box annotation. However, annotating medical data is time-consuming and expertise-demanding, making obtaining a large amount of fine-grained annotations extremely… ▽ More

    Submitted 19 January, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: TMI 2024. Zhizhong Chai and Luyang Luo contributed equally. Code is available via: https://github.com/zhizhongchai/ORF-Net/tree/main

  21. arXiv:2305.05374  [pdf, other

    cs.LG cs.AI

    HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction

    Authors: Yuxiang Zhao, Zhuomin Chai, Yibo Lin, Runsheng Wang, Ru Huang

    Abstract: Accurate early congestion prediction can prevent unpleasant surprises at the routing stage, playing a crucial character in assisting designers to iterate faster in VLSI design cycles. In this paper, we introduce a novel strategy to fully incorporate topological and geometrical features of circuits by making several key designs in our network architecture. To be more specific, we construct two indi… ▽ More

    Submitted 12 June, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

    Journal ref: 2023 IEEE International Symposium of EDA

  22. arXiv:2305.03378  [pdf, other

    cs.CV cs.LG

    Towards Effective Collaborative Learning in Long-Tailed Recognition

    Authors: Zhengzhuo Xu, Zenghao Chai, Chengyin Xu, Chun Yuan, Haiqin Yang

    Abstract: Real-world data usually suffers from severe class imbalance and long-tailed distributions, where minority classes are significantly underrepresented compared to the majority ones. Recent research prefers to utilize multi-expert architectures to mitigate the model uncertainty on the minority, where collaborative learning is employed to aggregate the knowledge of experts, i.e., online distillation.… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  23. arXiv:2304.03994  [pdf, other

    cs.CV

    RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors

    Authors: Rui-Qi Wu, Zheng-Peng Duan, Chun-Le Guo, Zhi Chai, Chong-Yi Li

    Abstract: Existing dehazing approaches struggle to process real-world hazy images owing to the lack of paired real data and robust priors. In this work, we present a new paradigm for real image dehazing from the perspectives of synthesizing more realistic hazy data and introducing more robust priors into the network. Specifically, (1) instead of adopting the de facto physical scattering model, we rethink th… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: Acceptted by CVPR 2023

  24. Towards Accurate Post-Training Quantization for Vision Transformer

    Authors: Yifu Ding, Haotong Qin, Qinghua Yan, Zhenhua Chai, Junjie Liu, Xiaolin Wei, Xianglong Liu

    Abstract: Vision transformer emerges as a potential architecture for vision tasks. However, the intense computation and non-negligible delay hinder its application in the real world. As a widespread model compression technique, existing post-training quantization methods still cause severe performance drops. We find the main reasons lie in (1) the existing calibration metric is inaccurate in measuring the q… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 9 pages, 5 figures, accepted by ACM Multimedia 2022

  25. arXiv:2303.11225  [pdf, other

    cs.CV cs.GR

    HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details

    Authors: Zenghao Chai, Tianke Zhang, Tianyu He, Xu Tan, Tadas Baltrušaitis, HsiangTao Wu, Runnan Li, Sheng Zhao, Chun Yuan, Jiang Bian

    Abstract: 3D Morphable Models (3DMMs) demonstrate great potential for reconstructing faithful and animatable 3D facial surfaces from a single image. The facial surface is influenced by the coarse shape, as well as the static detail (e,g., person-specific appearance) and dynamic detail (e.g., expression-driven wrinkles). Previous work struggles to decouple the static and dynamic details through image-level s… ▽ More

    Submitted 23 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023, camera-ready version; Project page: https://project-hiface.github.io/

  26. arXiv:2302.06845  [pdf, other

    cs.CV

    SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization

    Authors: Chen Tang, Kai Ouyang, Zenghao Chai, Yunpeng Bai, Yuan Meng, Zhi Wang, Wenwu Zhu

    Abstract: Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation i.e., the policy) for each layer, especially when using large-scale datasets such as ISLVRC-2012. This limits the practicality of MPQ in real-world deployment scenarios. To address this issue, this paper proposes a novel method for efficiently searching for effective MPQ policie… ▽ More

    Submitted 22 August, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  27. arXiv:2301.12935  [pdf, other

    cs.LG cs.AI cs.CV

    ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

    Authors: Shengmeng Li, Luping Liu, Zenghao Chai, Runnan Li, Xu Tan

    Abstract: Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can be formulated as diffusion ordinary differential equations (ODEs), various fast sampling methods can be derived from solving diffusion ODEs. However, we notice that previous sampling methods with fixed analytica… ▽ More

    Submitted 6 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: 16 pages, 12 figures

  28. arXiv:2212.02015  [pdf, other

    cs.CV cs.LG

    Learning Imbalanced Data with Vision Transformers

    Authors: Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai, Chun Yuan

    Abstract: The real-world data tends to be heavily imbalanced and severely skew the data-driven deep neural networks, which makes Long-Tailed Recognition (LTR) a massive challenging task. Existing LTR methods seldom train Vision Transformers (ViTs) with Long-Tailed (LT) data, while the off-the-shelf pretrain weight of ViTs always leads to unfair comparisons. In this paper, we systematically investigate the V… ▽ More

    Submitted 8 March, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023, camera-ready version; Code: https://github.com/XuZhengzhuo/LiVT

  29. arXiv:2209.10907  [pdf, other

    cs.CV

    DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching

    Authors: Ranran Huang, Jiancheng Cai, Chao Li, Zhuoyuan Wu, Xinmin Liu, Zhenhua Chai

    Abstract: The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameteri… ▽ More

    Submitted 5 January, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: 8 pages, 7 figures

  30. arXiv:2208.06866  [pdf, other

    cs.CV cs.LG

    HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval

    Authors: Chengyin Xu, Zenghao Chai, Zhengzhuo Xu, Chun Yuan, Yanbo Fan, Jue Wang

    Abstract: Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper, we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios, where the pair loss suffers high computati… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

    Comments: Accepted by ACM International Conference on Multimedia (ACM MM) 2022

  31. arXiv:2208.06857  [pdf, other

    cs.CV

    Underwater Ranker: Learn Which Is Better and How to Be Better

    Authors: Chunle Guo, Ruiqi Wu, Xin Jin, Linghao Han, Zhi Chai, Weidong Zhang, Chongyi Li

    Abstract: In this paper, we present a ranking-based underwater image quality assessment (UIQA) method, abbreviated as URanker. The URanker is built on the efficient conv-attentional image Transformer. In terms of underwater images, we specially devise (1) the histogram prior that embeds the color distribution of an underwater image as histogram token to attend global degradation and (2) the dynamic cross-sc… ▽ More

    Submitted 26 November, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

    Comments: 9 pages, 10 figures

  32. CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA)

    Authors: Zhuomin Chai, Yuxiang Zhao, Yibo Lin, Wei Liu, Runsheng Wang, Ru Huang

    Abstract: The electronic design automation (EDA) community has been actively exploring machine learning (ML) for very large-scale integrated computer-aided design (VLSI CAD). Many studies explored learning-based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building ML models usually requires a large amount of data, most studies can only genera… ▽ More

    Submitted 31 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

    Journal ref: SCIENCE CHINA Information Sciences 2022

  33. arXiv:2207.01842  [pdf, other

    cs.CV

    ORF-Net: Deep Omni-supervised Rib Fracture Detection from Chest CT Scans

    Authors: Zhizhong Chai, Huangjing Lin, Luyang Luo, Pheng-Ann Heng, Hao Chen

    Abstract: Most of the existing object detection works are based on the bounding box annotation: each object has a precise annotated box. However, for rib fractures, the bounding box annotation is very labor-intensive and time-consuming because radiologists need to investigate and annotate the rib fractures on a slice-by-slice basis. Although a few studies have proposed weakly-supervised methods or semi-supe… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  34. arXiv:2206.00057  [pdf, other

    cs.LG cs.DC

    Distributed Graph Neural Network Training with Periodic Stale Representation Synchronization

    Authors: Zheng Chai, Guangji Bai, Liang Zhao, Yue Cheng

    Abstract: Despite the recent success of Graph Neural Networks, it remains challenging to train a GNN on large graphs with millions of nodes and billions of edges, which are prevalent in many graph-based applications. Traditional sampling-based methods accelerate GNN training by dropping edges and nodes, which impairs the graph integrity and model performance. Differently, distributed GNN algorithms accelera… ▽ More

    Submitted 2 October, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: Preprint: 20 pages, 9 figures

  35. arXiv:2204.04715  [pdf, other

    cs.CV

    Image Harmonization by Matching Regional References

    Authors: Ziyue Zhu, Zhao Zhang, Zheng Lin, Ruiqi Wu, Zhi Chai, Chun-Le Guo

    Abstract: To achieve visual consistency in composite images, recent image harmonization methods typically summarize the appearance pattern of global background and apply it to the global foreground without location discrepancy. However, for a real image, the appearances (illumination, color temperature, saturation, hue, texture, etc) of different regions can vary significantly. So previous methods, which tr… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

  36. arXiv:2203.09729  [pdf, other

    cs.CV cs.GR

    REALY: Rethinking the Evaluation of 3D Face Reconstruction

    Authors: Zenghao Chai, Haoxian Zhang, Jing Ren, Di Kang, Zhengzhuo Xu, Xuefei Zhe, Chun Yuan, Linchao Bao

    Abstract: The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan. We observe that aligning two shapes with different reference points can largely affect the evaluation results. This poses difficulties for precisely diagnosing and improving a 3D face reconstruction method. In this paper, we propose a novel evaluati… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022, camera-ready version; Project page: https://realy3dface.com; Code: https://github.com/czh-98/REALY

  37. Conquering Data Variations in Resolution: A Slice-Aware Multi-Branch Decoder Network

    Authors: Shuxin Wang, Shilei Cao, Zhizhong Chai, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng

    Abstract: Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the perf… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Published by IEEE TMI

  38. arXiv:2201.02620  [pdf, other

    cs.LG cs.CV

    Compressing Models with Few Samples: Mimicking then Replacing

    Authors: Huanyu Wang, Junjie Liu, Xin Ma, Yang Yong, Zhenhua Chai, Jianxin Wu

    Abstract: Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher mo… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: 12 pages, 3 figures

  39. arXiv:2112.10310  [pdf, other

    cs.CV

    Contrastive Attention Network with Dense Field Estimation for Face Completion

    Authors: Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei

    Abstract: Most modern face completion approaches adopt an autoencoder or its variants to restore missing regions in face images. Encoders are often utilized to learn powerful representations that play an important role in meeting the challenges of sophisticated learning tasks. Specifically, various kinds of masks are often presented in face images in the wild, forming complex patterns, especially in this ha… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

    Comments: Accepted by Pattern Recognition 2021. arXiv admin note: substantial text overlap with arXiv:2010.15643

  40. arXiv:2112.02225  [pdf, other

    cs.CV

    HHF: Hashing-guided Hinge Function for Deep Hashing Retrieval

    Authors: Chengyin Xu, Zenghao Chai, Zhengzhuo Xu, Hongjia Li, Qiruyi Zuo, Lingyu Yang, Chun Yuan

    Abstract: Deep hashing has shown promising performance in large-scale image retrieval. However, latent codes extracted by Deep Neural Networks (DNNs) will inevitably lose semantic information during the binarization process, which damages the retrieval accuracy and makes it challenging. Although many existing approaches perform regularization to alleviate quantization errors, we figure out an incompatible c… ▽ More

    Submitted 12 January, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

  41. arXiv:2112.01335  [pdf, other

    cs.CV

    Semantic-Sparse Colorization Network for Deep Exemplar-based Colorization

    Authors: Yunpeng Bai, Chao Dong, Zenghao Chai, Andong Wang, Zhengzhuo Xu, Chun Yuan

    Abstract: Exemplar-based colorization approaches rely on reference image to provide plausible colors for target gray-scale image. The key and difficulty of exemplar-based colorization is to establish an accurate correspondence between these two images. Previous approaches have attempted to construct such a correspondence but are faced with two obstacles. First, using luminance channels for the calculation o… ▽ More

    Submitted 18 July, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted by ECCV2022; 14 pages, 10 figures

  42. arXiv:2111.03874  [pdf, other

    cs.CV cs.LG

    Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

    Authors: Zhengzhuo Xu, Zenghao Chai, Chun Yuan

    Abstract: Real-world data universally confronts a severe class-imbalance problem and exhibits a long-tailed distribution, i.e., most labels are associated with limited instances. The naïve models supervised by such datasets would prefer dominant labels, encounter a serious generalization challenge and become poorly calibrated. We propose two novel methods from the prior perspective to alleviate this dilemma… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

    Comments: Accepted at NeurIPS 2021

  43. arXiv:2110.12978  [pdf, other

    cs.CV

    MoDeRNN: Towards Fine-grained Motion Details for Spatiotemporal Predictive Learning

    Authors: Zenghao Chai, Zhengzhuo Xu, Chun Yuan

    Abstract: Spatiotemporal predictive learning (ST-PL) aims at predicting the subsequent frames via limited observed sequences, and it has broad applications in the real world. However, learning representative spatiotemporal features for prediction is challenging. Moreover, chaotic uncertainty among consecutive frames exacerbates the difficulty in long-term prediction. This paper concentrates on improving pre… ▽ More

    Submitted 12 February, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: Accepted at ICASSP 2022

  44. arXiv:2109.08466  [pdf, other

    cs.CV cs.RO

    LOF: Structure-Aware Line Tracking based on Optical Flow

    Authors: Meixiang Quan, Zheng Chai, Xiao Liu

    Abstract: Lines provide the significantly richer geometric structural information about the environment than points, so lines are widely used in recent Visual Odometry (VO) works. Since VO with lines use line tracking results to locate and map, line tracking is a crucial component in VO. Although the state-of-the-art line tracking methods have made great progress, they are still heavily dependent on line de… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 7 pages, 4 figures

  45. arXiv:2109.00200  [pdf

    cs.RO

    A real-time global re-localization framework for 3D LiDAR SLAM

    Authors: Ziqi Chai, Xiaoyu Shi, Yan Zhou, Zhenhua Xiong

    Abstract: Simultaneous localization and mapping (SLAM) has been a hot research field in the past years. Against the backdrop of more affordable 3D LiDAR sensors, research on 3D LiDAR SLAM is becoming increasingly popular. Furthermore, the re-localization problem with a point cloud map is the foundation for other SLAM applications. In this paper, a template matching framework is proposed to re-localize a rob… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: 7 pages, 8 figures, 5 tables

  46. arXiv:2109.00151  [pdf, other

    cs.LG cs.DC

    Asynchronous Federated Learning for Sensor Data with Concept Drift

    Authors: Yujing Chen, Zheng Chai, Yue Cheng, Huzefa Rangwala

    Abstract: Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fixed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

  47. arXiv:2108.05617  [pdf, other

    cs.CV

    Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

    Authors: Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

    Abstract: Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. While the mainstream technique seeks to completely filter out the OOD samples for semi-supervised learning (SSL), we propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced fea… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021

  48. arXiv:2108.04466  [pdf, ps, other

    cs.CV

    Method Towards CVPR 2021 SimLocMatch Challenge

    Authors: Xiaopeng Bi, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

    Abstract: This report describes Megvii-3D team's approach towards SimLocMatch Challenge @ CVPR 2021 Image Matching Workshop.

    Submitted 10 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

  49. arXiv:2108.04453  [pdf, other

    cs.CV cs.AI

    Method Towards CVPR 2021 Image Matching Challenge

    Authors: Xiaopeng Bi, Yu Chen, Xinyang Liu, Dehao Zhang, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

    Abstract: This report describes Megvii-3D team's approach towards CVPR 2021 Image Matching Workshop.

    Submitted 10 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

  50. arXiv:2105.09837  [pdf, other

    cs.LG cs.DC math.OC

    Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM Framework

    Authors: Junxiang Wang, Hongyi Li, Zheng Chai, Yongchao Wang, Yue Cheng, Liang Zhao

    Abstract: While Graph Neural Networks (GNNs) are popular in the deep learning community, they suffer from several challenges including over-smoothing, over-squashing, and gradient vanishing. Recently, a series of models have attempted to relieve these issues by first augmenting the node features and then imposing node-wise functions based on Multi-Layer Perceptron (MLP), which are widely referred to as GA-M… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted by the IEEE Transactions on Neural Networks and Learning Systems (TNNLS). arXiv admin note: substantial text overlap with arXiv:2009.02868