Skip to main content

Showing 1–50 of 84 results for author: Yin, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19767  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

    Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao

    Abstract: Large language models (LLMs) have achieved remarkable progress in reasoning tasks, yet the optimal integration of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) remains a fundamental challenge. Through comprehensive analysis of token distributions, learning dynamics, and integration mechanisms from entropy-based perspectives, we reveal key differences between these paradigms: SFT ind… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  2. arXiv:2506.16652  [pdf, ps, other

    cs.RO cs.CV cs.LG cs.SE

    CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

    Authors: Guang Yin, Yitong Li, Yixuan Wang, Dale McConachie, Paarth Shah, Kunimatsu Hashimoto, Huan Zhang, Katherine Liu, Yunzhu Li

    Abstract: Natural language instructions for robotic manipulation tasks often exhibit ambiguity and vagueness. For instance, the instruction "Hang a mug on the mug tree" may involve multiple valid actions if there are several mugs and branches to choose from. Existing language-conditioned policies typically rely on end-to-end models that jointly handle high-level semantic understanding and low-level action g… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted to Robotics: Science and Systems (RSS) 2025. The first three authors contributed equally. Project Page: https://robopil.github.io/code-diffuser/

  3. arXiv:2506.00439  [pdf, ps, other

    cs.LG cs.AI

    RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

    Authors: Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao

    Abstract: Ensembling large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks. However, existing methods typically rely on fixed weighting strategies that fail to adapt to the dynamic, context-dependent characteristics of LLM capabilities. In this work, we propose Reinforcement Learning-Assisted Ense… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  4. arXiv:2505.18280  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior

    Authors: Tsai Hor Chan, Dora Yan Zhang, Guosheng Yin, Lequan Yu

    Abstract: Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices ar… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: To appear in TPAMI

  5. arXiv:2505.16429  [pdf, other

    cs.CL cs.AI

    Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

    Authors: Song Jin, Juntian Zhang, Yuhan Liu, Xun Zhang, Yufei Zhang, Guojun Yin, Fei Jiang, Wei Lin, Rui Yan

    Abstract: Evaluating and iterating upon recommender systems is crucial, yet traditional A/B testing is resource-intensive, and offline methods struggle with dynamic user-platform interactions. While agent-based simulation is promising, existing platforms often lack a mechanism for user actions to dynamically reshape the environment. To bridge this gap, we introduce RecInter, a novel agent-based simulation p… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  6. arXiv:2503.22748  [pdf, other

    cs.LG cs.AI

    Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting

    Authors: Gongzhu Yin, Hongli Zhang, Yi Luo, Yuchen Yang, Kun Lu, Chao Meng

    Abstract: Temporal Knowledge Graph (TKG) forecasting is crucial for predicting future events using historical data. With the surge of Large Language Models (LLMs), recent studies have begun exploring their integration into TKG forecasting and achieved some success. However, they still face limitations such as limited input length, inefficient output generation, and resource-intensive refinement, which under… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: To be published in the 30th International Conference on Database Systems for Advanced Applications (DASFAA 2025)

    ACM Class: I.2.4

  7. Inductive Link Prediction on N-ary Relational Facts via Semantic Hypergraph Reasoning

    Authors: Gongzhu Yin, Hongli Zhang, Yuchen Yang, Yi Luo

    Abstract: N-ary relational facts represent semantic correlations among more than two entities. While recent studies have developed link prediction (LP) methods to infer missing relations for knowledge graphs (KGs) containing n-ary relational facts, they are generally limited to transductive settings. Fully inductive settings, where predictions are made on previously unseen entities, remain a significant cha… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: To be published in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD'25)

    ACM Class: I.2.4

  8. arXiv:2503.19383  [pdf, other

    cs.CV

    MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

    Authors: Yukang Lin, Hokit Fung, Jianjin Xu, Zeping Ren, Adela S. M. Lau, Guosheng Yin, Xiu Li

    Abstract: Recent portrait animation methods have made significant strides in generating realistic lip synchronization. However, they often lack explicit control over head movements and facial expressions, and cannot produce videos from multiple viewpoints, resulting in less controllable and expressive animations. Moreover, text-guided portrait animation remains underexplored, despite its user-friendly natur… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  9. arXiv:2503.13883  [pdf, ps, other

    cs.CV

    YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction

    Authors: Ziyu Lin, Yunfan Wu, Yuhang Ma, Junzhou Chen, Ronghui Zhang, Jiaming Wu, Guodong Yin, Liang Lin

    Abstract: Traffic sign detection is essential for autonomous driving and Advanced Driver Assistance Systems (ADAS). However, existing methods struggle with low-light conditions due to issues like indistinct small-object features, limited feature interaction, and poor image quality, which degrade detection accuracy and speed. To address this issue, we propose YOLO-LLTS, an end-to-end real-time traffic sign d… ▽ More

    Submitted 29 June, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  10. arXiv:2503.05514  [pdf, other

    eess.SP cs.AI

    Noise-Robust Radio Frequency Fingerprint Identification Using Denoise Diffusion Model

    Authors: Guolin Yin, Junqing Zhang, Yuan Ding, Simon Cotton

    Abstract: Securing Internet of Things (IoT) devices presents increasing challenges due to their limited computational and energy resources. Radio Frequency Fingerprint Identification (RFFI) emerges as a promising authentication technique to identify wireless devices through hardware impairments. RFFI performance under low signal-to-noise ratio (SNR) scenarios is significantly degraded because the minute har… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 6 pages, 8 figures, WCNC 2025

  11. arXiv:2502.16654  [pdf, other

    cs.CV

    VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer

    Authors: Xikai Tang, Ye Huang, Guangqiang Yin, Lixin Duan

    Abstract: We present VPNeXt, a new and simple model for the Plain Vision Transformer (ViT). Unlike the many related studies that share the same homogeneous paradigms, VPNeXt offers a fresh perspective on dense representation based on ViT. In more detail, the proposed VPNeXt addressed two concerns about the existing paradigm: (1) Is it necessary to use a complex Transformer Mask Decoder architecture to obtai… ▽ More

    Submitted 24 February, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Tech report, minor fix

  12. arXiv:2502.13555  [pdf, other

    cs.LG cs.AI

    Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs

    Authors: Yushi Feng, Tsai Hor Chan, Guosheng Yin, Lequan Yu

    Abstract: Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which re… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  13. arXiv:2502.00527  [pdf, other

    cs.LG cs.CL

    PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration

    Authors: Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei Lin, Rui Yan

    Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently add… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: preprint

  14. arXiv:2501.14197  [pdf, other

    cs.LG cs.SI stat.ML

    Bi-directional Curriculum Learning for Graph Anomaly Detection: Dual Focus on Homogeneity and Heterogeneity

    Authors: Yitong Hao, Enbo He, Yue Zhang, Guisheng Yin

    Abstract: Graph anomaly detection (GAD) aims to identify nodes from a graph that are significantly different from normal patterns. Most previous studies are model-driven, focusing on enhancing the detection effect by improving the model structure. However, these approaches often treat all nodes equally, neglecting the different contributions of various nodes to the training. Therefore, we introduce graph cu… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 8pages, 5 figures

  15. arXiv:2501.13418  [pdf, other

    cs.CV cs.AI

    Rethinking the Sample Relations for Few-Shot Classification

    Authors: Guowei Yin, Sheng Huang, Luwen Huangfu, Yi Zhang, Xiaohong Zhang

    Abstract: Feature quality is paramount for classification performance, particularly in few-shot scenarios. Contrastive learning, a widely adopted technique for enhancing feature quality, leverages sample relations to extract intrinsic features that capture semantic information and has achieved remarkable success in Few-Shot Learning (FSL). Nevertheless, current few-shot contrastive learning approaches often… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 32 pages

  16. arXiv:2501.02086  [pdf, ps, other

    cs.CL

    Instruction-Following Pruning for Large Language Models

    Authors: Bairu Hou, Qibin Chen, Jianyu Wang, Guoli Yin, Chong Wang, Nan Du, Ruoming Pang, Shiyu Chang, Tao Lei

    Abstract: With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approa… ▽ More

    Submitted 2 June, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: ICML 2025

  17. arXiv:2412.13771  [pdf, other

    cs.IR cs.AI cs.CL

    Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization

    Authors: Guanghan Li, Xun Zhang, Yufei Zhang, Yifan Yin, Guojun Yin, Wei Lin

    Abstract: Large language models (LLMs), endowed with exceptional reasoning capabilities, are adept at discerning profound user interests from historical behaviors, thereby presenting a promising avenue for the advancement of recommendation systems. However, a notable discrepancy persists between the sparse collaborative semantics typically found in recommendation systems and the dense token representations… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 7 pages, 3 figures, AAAI 2025

  18. arXiv:2411.01475  [pdf, other

    cs.RO

    Interaction-Aware Trajectory Prediction for Safe Motion Planning in Autonomous Driving: A Transformer-Transfer Learning Approach

    Authors: Jinhao Liang, Chaopeng Tan, Longhao Yan, Jingyuan Zhou, Guodong Yin, Kaidi Yang

    Abstract: A critical aspect of safe and efficient motion planning for autonomous vehicles (AVs) is to handle the complex and uncertain behavior of surrounding human-driven vehicles (HDVs). Despite intensive research on driver behavior prediction, existing approaches typically overlook the interactions between AVs and HDVs assuming that HDV trajectories are not affected by AV actions. To address this gap, we… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  19. arXiv:2410.17488  [pdf, other

    cs.RO cs.CV cs.LG

    GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

    Authors: Yixuan Wang, Guang Yin, Binghao Huang, Tarik Kelestemur, Jiuguang Wang, Yunzhu Li

    Abstract: Diffusion-based policies have shown remarkable capability in executing complex robotic manipulation tasks but lack explicit characterization of geometry and semantics, which often limits their ability to generalize to unseen objects and layouts. To enhance the generalization capabilities of Diffusion Policy, we introduce a novel framework that incorporates explicit spatial and semantic information… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted to Conference on Robot Learning (CoRL 2024). Project Page: https://robopil.github.io/GenDP/

  20. arXiv:2410.08449  [pdf, ps, other

    cs.LG eess.SY

    Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

    Authors: George Yin, Vikram Krishnamurthy

    Abstract: We analyze the finite sample regret of a decreasing step size stochastic gradient algorithm. We assume correlated noise and use a perturbed Lyapunov function as a systematic approach for the analysis. Finally we analyze the escape time of the iterates using large deviations theory.

    Submitted 10 October, 2024; originally announced October 2024.

  21. arXiv:2410.07138  [pdf, other

    q-bio.NC cs.LG stat.AP

    Diagnosis and Pathogenic Analysis of Autism Spectrum Disorder Using Fused Brain Connection Graph

    Authors: Lu Wei, Yi Huang, Guosheng Yin, Fode Zhang, Manxue Zhang, Bin Liu

    Abstract: We propose a model for diagnosing Autism spectrum disorder (ASD) using multimodal magnetic resonance imaging (MRI) data. Our approach integrates brain connectivity data from diffusion tensor imaging (DTI) and functional MRI (fMRI), employing graph neural networks (GNNs) for fused graph classification. To improve diagnostic accuracy, we introduce a loss function that maximizes inter-class and minim… ▽ More

    Submitted 21 September, 2024; originally announced October 2024.

  22. arXiv:2408.07569  [pdf, other

    cs.LG cs.AI

    Multi-task Heterogeneous Graph Learning on Electronic Health Records

    Authors: Tsai Hor Chan, Guosheng Yin, Kyongtae Bae, Lequan Yu

    Abstract: Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper t… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by Neural Networks

  23. arXiv:2408.04682  [pdf, other

    cs.CL cs.AI cs.LG

    ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

    Authors: Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang

    Abstract: Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful… ▽ More

    Submitted 16 April, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

  24. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  25. arXiv:2407.18961  [pdf, other

    cs.AI

    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

    Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

    Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More

    Submitted 15 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  26. arXiv:2407.11448  [pdf, other

    cs.CV

    cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process

    Authors: Yihang Chen, Tsai Hor Chan, Guosheng Yin, Yuming Jiang, Lequan Yu

    Abstract: Multiple instance learning (MIL) has been extensively applied to whole slide histopathology image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance (e.g., mean difference) between instances, fails to accurately approximate the true feature distribution of each instance, leading to biased slide-level representations. Moreover, the scarcity… ▽ More

    Submitted 19 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  27. arXiv:2403.15944  [pdf, other

    cs.CV cs.AI eess.IV

    Adaptive Super Resolution For One-Shot Talking-Head Generation

    Authors: Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu

    Abstract: The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  28. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  29. arXiv:2401.15175  [pdf, other

    cs.CV

    Kitchen Food Waste Image Segmentation and Classification for Compost Nutrients Estimation

    Authors: Raiyan Rahman, Mohsena Chowdhury, Yueyang Tang, Huayi Gao, George Yin, Guanghui Wang

    Abstract: The escalating global concern over extensive food wastage necessitates innovative solutions to foster a net-zero lifestyle and reduce emissions. The LILA home composter presents a convenient means of recycling kitchen scraps and daily food waste into nutrient-rich, high-quality compost. To capture the nutritional information of the produced compost, we have created and annotated a large high-resol… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  30. arXiv:2401.09386  [pdf, other

    cs.CV

    Tri$^{2}$-plane: Thinking Head Avatar via Feature Pyramid

    Authors: Luchuan Song, Pinxin Liu, Lele Chen, Guojun Yin, Chenliang Xu

    Abstract: Recent years have witnessed considerable achievements in facial avatar reconstruction with neural volume rendering. Despite notable advancements, the reconstruction of complex and dynamic head movements from monocular videos still suffers from capturing and restoring fine-grained details. In this work, we propose a novel approach, named Tri$^2$-plane, for monocular photo-realistic volumetric head… ▽ More

    Submitted 10 July, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 24 pages, 9 figures and 3 tables

  31. arXiv:2401.01625  [pdf, other

    cs.SI cs.CY cs.LG

    SCALA: Sparsification-based Contrastive Learning for Anomaly Detection on Attributed Networks

    Authors: Enbo He, Yitong Hao, Yue Zhang, Guisheng Yin, Lina Yao

    Abstract: Anomaly detection on attributed networks aims to find the nodes whose behaviors are significantly different from other majority nodes. Generally, network data contains information about relationships between entities, and the anomaly is usually embodied in these relationships. Therefore, how to comprehensively model complex interaction patterns in networks is still a major focus. It can be observe… ▽ More

    Submitted 8 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 9 pages, 14 figures

  32. arXiv:2310.16587  [pdf, other

    cs.LG cs.AI cs.CV

    Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations

    Authors: Tsai Hor Chan, Kin Wai Lau, Jiajun Shen, Guosheng Yin, Lequan Yu

    Abstract: Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  33. arXiv:2310.05804  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

    Authors: Haoyu Zhang, Yu Wang, Guanghao Yin, Kejun Liu, Yuanyuan Liu, Tianshu Yu

    Abstract: Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptiv… ▽ More

    Submitted 14 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Published in EMNLP 2023

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  34. arXiv:2310.00068  [pdf, other

    cs.GR cs.AI cs.MM

    Emotional Listener Portrait: Neural Listener Head Generation with Emotion

    Authors: Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, Chenliang Xu

    Abstract: Listener head generation centers on generating non-verbal behaviors (e.g., smile) of a listener in reference to the information delivered by a speaker. A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle th… ▽ More

    Submitted 8 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted by ICCV2023

  35. arXiv:2309.01328  [pdf, ps, other

    cs.IT

    Restoration Guarantee of Image Inpainting via Low Rank Patch Matrix Completion

    Authors: Jian-Feng Cai, Jae Kyu Choi, Jingyang Li, Guojian Yin

    Abstract: In recent years, patch-based image restoration approaches have demonstrated superior performance compared to conventional variational methods. This paper delves into the mathematical foundations underlying patch-based image restoration methods, with a specific focus on establishing restoration guarantees for patch-based image inpainting, leveraging the assumption of self-similarity among patches.… ▽ More

    Submitted 19 November, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

  36. arXiv:2307.04336  [pdf

    cs.AI cs.LG cs.SI

    Source-Aware Embedding Training on Heterogeneous Information Networks

    Authors: Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin

    Abstract: Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published in Data Intelligence 2023

  37. arXiv:2307.04189  [pdf, ps, other

    cs.CV

    Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning

    Authors: Tsai Hor Chan, Fernando Julio Cendra, Lan Ma, Guosheng Yin, Lequan Yu

    Abstract: Graph-based methods have been extensively applied to whole-slide histopathology image (WSI) analysis due to the advantage of modeling the spatial relationships among different entities. However, most of the existing methods focus on modeling WSIs with homogeneous graphs (e.g., with homogeneous node type). Despite their successes, these works are incapable of mining the complex structural relations… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: Accepted by CVPR 2023

  38. arXiv:2306.15932  [pdf, other

    cs.CV cs.LG

    NIPD: A Federated Learning Person Detection Benchmark Based on Real-World Non-IID Data

    Authors: Kangning Yin, Zhen Ding, Zhihua Dong, Dongsheng Chen, Jie Fu, Xinhui Ji, Guangqiang Yin, Zhiguo Wang

    Abstract: Federated learning (FL), a privacy-preserving distributed machine learning, has been rapidly applied in wireless communication networks. FL enables Internet of Things (IoT) clients to obtain well-trained models while preventing privacy leakage. Person detection can be deployed on edge devices with limited computing power if combined with FL to process the video data directly at the edge. However,… ▽ More

    Submitted 11 August, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 8 pages, 5 figures, 3 tables, FL-IJCAI 23 conference

  39. arXiv:2305.02814  [pdf, other

    cs.MM cs.AI cs.CV cs.HC

    Noise-Resistant Multimodal Transformer for Emotion Recognition

    Authors: Yuanyuan Liu, Haoyu Zhang, Yibing Zhan, Zijing Chen, Guanghao Yin, Lin Wei, Zhe Chen

    Abstract: Multimodal emotion recognition identifies human emotions from various data modalities like video, text, and audio. However, we found that this task can be easily affected by noisy information that does not contain useful semantics. To this end, we present a novel paradigm that attempts to extract noise-resistant features in its pipeline and introduces a noise-aware learning scheme to effectively i… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  40. arXiv:2303.16532  [pdf, other

    cs.LG q-fin.ST stat.AP

    Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network

    Authors: Min Hu, Zhizhong Tan, Bin Liu, Guosheng Yin

    Abstract: This study aims to address the challenges of futures price prediction in high-frequency trading (HFT) by proposing a continuous learning factor predictor based on graph neural networks. The model integrates multi-factor pricing theories with real-time market dynamics, effectively bypassing the limitations of existing methods that lack financial theory guidance and ignore various trend signals and… ▽ More

    Submitted 19 December, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  41. arXiv:2303.00334  [pdf, other

    eess.IV cs.CV

    Online Streaming Video Super-Resolution with Convolutional Look-Up Table

    Authors: Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, Xiaohong Liu, Huan Yang, Yuqing Yang, Dongsheng Li, Lili Qiu

    Abstract: Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to… ▽ More

    Submitted 25 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  42. arXiv:2212.05814  [pdf, other

    cs.LG stat.ML

    GWRBoost:A geographically weighted gradient boosting method for explainable quantification of spatially-varying relationships

    Authors: Han Wang, Zhou Huang, Ganmin Yin, Yi Bao, Xiao Zhou, Yong Gao

    Abstract: The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inf… ▽ More

    Submitted 15 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures, 4 tables

  43. arXiv:2212.04320  [pdf

    cs.AR cs.LG

    A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface

    Authors: Guodong Yin, Mufeng Zhou, Yiming Chen, Wenjun Tang, Zekun Yang, Mingyen Lee, Xirui Du, Jinshan Yue, Jiaxin Liu, Huazhong Yang, Yongpan Liu, Xueqing Li

    Abstract: Performing data-intensive tasks in the von Neumann architecture is challenging to achieve both high performance and power efficiency due to the memory wall bottleneck. Computing-in-memory (CiM) is a promising mitigation approach by enabling parallel in-situ multiply-accumulate (MAC) operations within the memory with support from the peripheral interface and datapath. SRAM-based charge-domain CiM (… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 November, 2022; originally announced December 2022.

    Comments: Accepted by IEEE 48th European Solid-State Circuits Conference (ESSCIRC 2022)

  44. arXiv:2208.08600  [pdf

    cs.AR

    GRAPHIC: GatheR-And-Process in Highly parallel with In-SSD Compression Architecture in Very Large-Scale Graph

    Authors: Yiming Chen, Guohao Dai, Mufeng Zhou, Mingyen Lee, Nagadastagiri Challapalle, Guodong Yin, Zekun Yang, Yongpan Liu, Huazhong Yang, Vijaykrishnan Narayanan, Xueqing Li

    Abstract: Graph convolutional network (GCN), an emerging algorithm for graph computing, has achieved promising performance in graphstructure tasks. To achieve acceleration for data-intensive and sparse graph computing, ASICs such as GCNAX have been proposed for efficient execution of aggregation and combination in GCN. GCNAX reducing 8x DRAM accesses compared with previous efforts. However, as graphs have r… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 9 pages, 16 figures

  45. arXiv:2208.00967  [pdf, other

    cs.CV

    Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

    Authors: Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

    Abstract: Graph-based models have achieved great success in person re-identification tasks recently, which compute the graph topology structure (affinities) among different people first and then pass the information across them to achieve stronger features. But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two i… ▽ More

    Submitted 14 November, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

  46. arXiv:2208.00847  [pdf

    cs.CV

    MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild

    Authors: Yuanyuan Liu, Wei Dai, Chuanxu Feng, Wenbin Wang, Guanghao Yin, Jiabei Zeng, Shiguang Shan

    Abstract: Dynamic facial expression recognition (FER) databases provide important data support for affective computing and applications. However, most FER databases are annotated with several basic mutually exclusive emotional categories and contain only one modality, e.g., videos. The monotonous labels and modality cannot accurately imitate human emotions and fulfill applications in the real world. In this… ▽ More

    Submitted 14 August, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: This paper has been accepted by ACM MM'22

  47. YOLoC: DeploY Large-Scale Neural Network by ROM-based Computing-in-Memory using ResiduaL Branch on a Chip

    Authors: Yiming Chen, Guodong Yin, Zhanhong Tan, Mingyen Lee, Zekun Yang, Yongpan Liu, Huazhong Yang, Kaisheng Ma, Xueqing Li

    Abstract: Computing-in-memory (CiM) is a promising technique to achieve high energy efficiency in data-intensive matrix-vector multiplication (MVM) by relieving the memory bottleneck. Unfortunately, due to the limited SRAM capacity, existing SRAM-based CiM needs to reload the weights from DRAM in large-scale networks. This undesired fact weakens the energy efficiency significantly. This work, for the first… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 6 pages, 14 figures. to be published in DAC 2022

    Journal ref: Design Automation Conference 2022

  48. DouFu: A Double Fusion Joint Learning Method For Driving Trajectory Representation

    Authors: Han Wang, Zhou Huang, Xiao Zhou, Ganmin Yin, Yi Bao, Yi Zhang

    Abstract: Driving trajectory representation learning is of great significance for various location-based services, such as driving pattern mining and route recommendation. However, previous representation generation approaches tend to rarely address three challenges: 1) how to represent the intricate semantic intentions of mobility inexpensively; 2) complex and weak spatial-temporal dependencies due to the… ▽ More

    Submitted 14 October, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: 11 pages, 7 figures

  49. arXiv:2202.13123  [pdf, other

    cs.CV eess.IV

    Content-Variant Reference Image Quality Assessment via Knowledge Distillation

    Authors: Guanghao Yin, Wei Wang, Zehuan Yuan, Chuchu Han, Wei Ji, Shouqian Sun, Changhu Wang

    Abstract: Generally, humans are more skilled at perceiving differences between high-quality (HQ) and low-quality (LQ) images than directly judging the quality of a single LQ image. This situation also applies to image quality assessment (IQA). Although recent no-reference (NR-IQA) methods have made great progress to predict image quality free from the reference image, they still have the potential to achiev… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: AAAI2022 oral accepted

  50. arXiv:2110.07436  [pdf, other

    cs.LG

    Asymmetric Graph Representation Learning

    Authors: Zhuo Tan, Bin Liu, Guosheng Yin

    Abstract: Despite the enormous success of graph neural networks (GNNs), most existing GNNs can only be applicable to undirected graphs where relationships among connected nodes are two-way symmetric (i.e., information can be passed back and forth). However, there is a vast amount of applications where the information flow is asymmetric, leading to directed graphs where information can only be passed in one… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.