Skip to main content

Showing 1–50 of 68 results for author: Hou, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.00034  [pdf, other

    quant-ph cs.LG

    Quantum Generative Models for Image Generation: Insights from MNIST and MedMNIST

    Authors: Chi-Sheng Chen, Wei An Hou, Hsiang-Wei Hu, Zhen-Sheng Cai

    Abstract: Quantum generative models offer a promising new direction in machine learning by leveraging quantum circuits to enhance data generation capabilities. In this study, we propose a hybrid quantum-classical image generation framework that integrates variational quantum circuits into a diffusion-based model. To improve training dynamics and generation quality, we introduce two novel noise strategies: i… ▽ More

    Submitted 3 April, 2025; v1 submitted 30 March, 2025; originally announced April 2025.

  2. arXiv:2502.18874  [pdf, other

    cs.CL cs.AI

    Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework

    Authors: Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

    Abstract: Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios. Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models, such as GPT-4. However, these methods are largely limited to text-based analyses under predefined general criteria, resulting in reduc… ▽ More

    Submitted 3 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  3. arXiv:2502.10329  [pdf, other

    cs.SD cs.CR cs.MM eess.AS

    VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect

    Authors: Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu

    Abstract: The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 9 pages, four figures

  4. arXiv:2502.01377  [pdf, other

    cs.CE cs.AI

    Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

    Authors: Zhi Zhang, Yan Liu, Mengxia Gao, Yu Yang, Jiannong Cao, Wai Kai Hou, Shirley Li, Sonata Yau, Yun Kwok Wing, Tatia M. C. Lee

    Abstract: Psychological resilience, defined as the ability to rebound from adversity, is crucial for mental health. Compared with traditional resilience assessments through self-reported questionnaires, resilience assessments based on neurological data offer more objective results with biological markers, hence significantly enhancing credibility. This paper proposes a novel data-efficient model to address… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  5. arXiv:2501.18401  [pdf, other

    cs.CV

    MatIR: A Hybrid Mamba-Transformer Image Restoration Model

    Authors: Juan Wen, Weiyan Hou, Luc Van Gool, Radu Timofte

    Abstract: In recent years, Transformers-based models have made significant progress in the field of image restoration by leveraging their inherent ability to capture complex contextual features. Recently, Mamba models have made a splash in the field of computer vision due to their ability to handle long-range dependencies and their significant computational efficiency compared to Transformers. However, Mamb… ▽ More

    Submitted 30 January, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: 10 pages, 9 figures

  6. arXiv:2501.15043  [pdf, other

    cs.CV

    Prompt-Aware Controllable Shadow Removal

    Authors: Kerui Chen, Zhiliang Wu, Wenjin Hou, Kun Li, Hehe Fan, Yi Yang

    Abstract: Shadow removal aims to restore the image content in shadowed regions. While deep learning-based methods have shown promising results, they still face key challenges: 1) uncontrolled removal of all shadows, or 2) controllable removal but heavily relies on precise shadow region masks. To address these issues, we introduce a novel paradigm: prompt-aware controllable shadow removal. Unlike existing ap… ▽ More

    Submitted 2 February, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  7. arXiv:2412.17219  [pdf, other

    cs.CV

    Discriminative Image Generation with Diffusion Models for Zero-Shot Learning

    Authors: Dingjie Fu, Wenjin Hou, Shiming Chen, Shuhuang Chen, Xinge You, Salman Khan, Fahad Shahbaz Khan

    Abstract: Generative Zero-Shot Learning (ZSL) methods synthesize class-related features based on predefined class semantic prototypes, showcasing superior performance. However, this feature generation paradigm falls short of providing interpretable insights. In addition, existing approaches rely on semantic prototypes annotated by human experts, which exhibit a significant limitation in their scalability to… ▽ More

    Submitted 25 December, 2024; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: Tech report, 16 pages

  8. arXiv:2411.10937  [pdf, other

    cs.CV cs.CL

    Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry

    Authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liu

    Abstract: Comprehensively understanding surgical scenes in Surgical Visual Question Answering (Surgical VQA) requires reasoning over multiple objects. Previous approaches address this task using cross-modal fusion strategies to enhance reasoning ability. However, these methods often struggle with limited scene understanding and question comprehension, and some rely on external resources (e.g., pre-extracted… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

  9. arXiv:2411.10309  [pdf, other

    cs.CV

    Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting

    Authors: Ziqi Xie, Xiao Lai, Weidong Zhao, Siqi Jiang, Xianhui Liu, Wenlong Hou

    Abstract: Current image stitching methods often produce noticeable seams in challenging scenarios such as uneven hue and large parallax. To tackle this problem, we propose the Reference-Driven Inpainting Stitcher (RDIStitcher), which reformulates the image fusion and rectangling as a reference-based inpainting model, incorporating a larger modification fusion area and stronger modification intensity than pr… ▽ More

    Submitted 7 March, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: 18 pages, 10 figures

  10. arXiv:2411.01178  [pdf, other

    cs.IR

    LLM4PR: Improving Post-Ranking in Search Engine with Large Language Models

    Authors: Yang Yan, Yihao Wang, Chi Zhang, Wenyuan Hou, Kang Pan, Xingkai Ren, Zelun Wu, Zhixin Zhai, Enyun Yu, Wenwu Ou, Yang Song

    Abstract: Alongside the rapid development of Large Language Models (LLMs), there has been a notable increase in efforts to integrate LLM techniques in information retrieval (IR) and search engines (SE). Recently, an additional post-ranking stage is suggested in SE to enhance user satisfaction in practical applications. Nevertheless, research dedicated to enhancing the post-ranking stage through LLMs remains… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

  11. arXiv:2410.07230  [pdf, other

    eess.SP cs.HC cs.LG

    RFBoost: Understanding and Boosting Deep WiFi Sensing via Physical Data Augmentation

    Authors: Weiying Hou, Chenshu Wu

    Abstract: Deep learning shows promising performance in wireless sensing. However, deep wireless sensing (DWS) heavily relies on large datasets. Unfortunately, building comprehensive datasets for DWS is difficult and costly, because wireless data depends on environmental factors and cannot be labeled offline. Despite recent advances in few-shot/cross-domain learning, DWS is still facing data scarcity issues.… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 2, Article 58 (June 2024), 26 pages

    Journal ref: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 2, Article 58 (June 2024), 26 pages

  12. arXiv:2410.06638  [pdf, other

    cs.CL cs.AI

    Subtle Errors Matter: Preference Learning via Error-injected Self-editing

    Authors: Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

    Abstract: Large Language Models (LLMs) have exhibited strong mathematical reasoning prowess, tackling tasks ranging from basic arithmetic to advanced competition-level problems. However, frequently occurring subtle yet critical errors, such as miscalculations or incorrect substitutions, limit the LLMs' full potential. Existing studies to improve mathematical ability typically involve applying preference lea… ▽ More

    Submitted 3 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

  13. arXiv:2410.01556  [pdf, other

    cs.CL cs.AI cs.LG

    Integrative Decoding: Improve Factuality via Implicit Self-consistency

    Authors: Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

    Abstract: Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative De… ▽ More

    Submitted 23 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR 2025

  14. arXiv:2408.14868  [pdf, other

    cs.CV

    ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

    Authors: Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

    Abstract: Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive f… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  15. arXiv:2408.01998  [pdf, other

    cs.CV

    What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks

    Authors: Yuetian Wang, Wenjin Hou, Qinmu Peng, Xinge You

    Abstract: Fine-grained recognition, a pivotal task in visual signal processing, aims to distinguish between similar subclasses based on discriminative information present in samples. However, prevailing methods often erroneously focus on background areas, neglecting the capture of genuinely effective discriminative information from the subject, thus impeding practical application. To facilitate research int… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  16. arXiv:2406.13960  [pdf, other

    cs.CL cs.AI

    AutoPal: Autonomous Adaptation to Users for Personal AI Companionship

    Authors: Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Xian Wu, Yefeng Zheng

    Abstract: Previous research has demonstrated the potential of AI agents to act as companions that can provide constant emotional support for humans. In this paper, we emphasize the necessity of autonomous adaptation in personal AI companionship, an underexplored yet promising direction. Such adaptability is crucial as it can facilitate more tailored interactions with users and allow the agent to evolve in r… ▽ More

    Submitted 17 October, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  17. arXiv:2406.13934  [pdf, other

    cs.CL cs.AI

    Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

    Authors: Kaishuai Xu, Yi Cheng, Wenjun Hou, Qiaoyu Tan, Wenjie Li

    Abstract: Medical dialogue systems have attracted significant attention for their potential to act as medical assistants. Enabling these medical systems to emulate clinicians' diagnostic reasoning process has been the long-standing research focus. Previous studies rudimentarily realized the simulation of clinicians' diagnostic process by fine-tuning language models on high-quality dialogue datasets. Nonethe… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 Findings

  18. arXiv:2404.14808  [pdf, other

    cs.CV

    Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

    Authors: Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

    Abstract: Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor gene… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  19. arXiv:2404.07713  [pdf, other

    cs.CV cs.LG

    Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

    Authors: Shiming Chen, Wenjin Hou, Salman Khan, Fahad Shahbaz Khan

    Abstract: Zero-shot learning (ZSL) recognizes the unseen classes by conducting visual-semantic interactions to transfer semantic knowledge from seen classes to unseen ones, supported by semantic information (e.g., attributes). However, existing ZSL methods simply extract visual features using a pre-trained network backbone (i.e., CNN or ViT), which fail to learn matched visual-semantic correspondences for r… ▽ More

    Submitted 22 July, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR'24

  20. arXiv:2404.04617  [pdf, other

    cs.CV

    Empowering Image Recovery_ A Multi-Attention Approach

    Authors: Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc Van Gool

    Abstract: We propose Diverse Restormer (DART), a novel image restoration method that effectively integrates information from various sources (long sequences, local and global regions, feature dimensions, and positional dimensions) to address restoration challenges. While Transformer models have demonstrated excellent performance in image restoration due to their self-attention mechanism, they face limitatio… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, 12 tables

    MSC Class: 68T07 (Primary) 168T45 (Secondary) ACM Class: I.4.4

  21. arXiv:2403.00894  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Comparing large language models and human programmers for generating programming code

    Authors: Wenpin Hou, Zhicheng Ji

    Abstract: We systematically evaluated the performance of seven large language models in generating programming code using various prompt strategies, programming languages, and task difficulties. GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGe… ▽ More

    Submitted 4 October, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Journal ref: Adv Sci (Weinh). 2024 Dec 30:e2412279. Epub ahead of print. PMID: 39736107

  22. arXiv:2402.12844  [pdf, other

    cs.CV cs.CL

    ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation

    Authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liu

    Abstract: Previous research on radiology report generation has made significant progress in terms of increasing the clinical accuracy of generated reports. In this paper, we emphasize another crucial quality that it should possess, i.e., inter-report consistency, which refers to the capability of generating consistent reports for semantically equivalent radiographs. This quality is even of greater significa… ▽ More

    Submitted 26 September, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  23. arXiv:2401.06541  [pdf, other

    cs.CL cs.AI

    Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

    Authors: Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

    Abstract: Medical dialogue systems have attracted growing research attention as they have the potential to provide rapid diagnoses, treatment plans, and health consultations. In medical dialogues, a proper diagnosis is crucial as it establishes the foundation for future consultations. Clinicians typically employ both intuitive and analytic reasoning to formulate a differential diagnosis. This reasoning proc… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Work in progress

  24. arXiv:2312.11111  [pdf, other

    cs.AI cs.CL cs.HC

    The Good, The Bad, and Why: Unveiling Emotions in Generative AI

    Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

    Abstract: Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifica… ▽ More

    Submitted 7 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: International Conference on Machine Learning (ICML) 2024; an extension to EmotionPrompt (arXiv:2307.11760)

  25. arXiv:2311.16832  [pdf, other

    cs.CL cs.AI

    CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

    Authors: Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can custom… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Work in progress

  26. arXiv:2311.05050  [pdf, other

    cs.LG quant-ph

    Quantum Generative Modeling of Sequential Data with Trainable Token Embedding

    Authors: Wanda Hou, Miao Li, Yi-Zhuang You

    Abstract: Generative models are a class of machine learning models that aim to learn the underlying probability distribution of data. Unlike discriminative models, generative models focus on capturing the data's inherent structure, allowing them to generate new samples that resemble the original data. To fully exploit the potential of modeling probability distributions using quantum physics, a quantum-inspi… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures

  27. arXiv:2310.13864  [pdf, other

    cs.CL

    RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

    Authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Wenjie Li, Jiang Liu

    Abstract: Automating radiology report generation can significantly alleviate radiologists' workloads. Previous research has primarily focused on realizing highly concise observations while neglecting the precise attributes that determine the severity of diseases (e.g., small pleural effusion). Since incorrect attributes will lead to imprecise radiology reports, strengthening the generation process with prec… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023

  28. Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection

    Authors: Muslim Chochlov, Gul Aftab Ahmed, James Vincent Patten, Guoxian Lu, Wei Hou, David Gregg, Jim Buckley

    Abstract: Code clones can detrimentally impact software maintenance and manually detecting them in very large codebases is impractical. Additionally, automated approaches find detection of Type 3 and Type 4 (inexact) clones very challenging. While the most recent artificial deep neural networks (for example BERT-based artificial neural networks) seem to be highly effective in detecting such clones, their pa… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 10 pages, 2 figures, 38th IEEE International Conference on Software Maintenance and Evolution

  29. arXiv:2308.09915  [pdf, other

    cs.CV cs.LG

    EGANS: Evolutionary Generative Adversarial Network Search for Zero-Shot Learning

    Authors: Shiming Chen, Shihuang Chen, Wenjin Hou, Weiping Ding, Xinge You

    Abstract: Zero-shot learning (ZSL) aims to recognize the novel classes which cannot be collected for training a prediction model. Accordingly, generative models (e.g., generative adversarial network (GAN)) are typically used to synthesize the visual samples conditioned by the class semantic vectors and achieve remarkable progress for ZSL. However, existing GAN-based generative ZSL methods are based on hand-… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted to TEVC

  30. arXiv:2308.05421  [pdf, other

    cs.CV cs.MM

    Progressive Spatio-temporal Perception for Audio-Visual Question Answering

    Authors: Guangyao Li, Wenxuan Hou, Di Hu

    Abstract: Audio-Visual Question Answering (AVQA) task aims to answer questions about different visual objects, sounds, and their associations in videos. Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest. Oppositely, only focusing o… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  31. arXiv:2308.04370  [pdf, other

    cs.CV

    When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study

    Authors: Juan Wen, Shupeng Cheng, Peng Xu, Bowen Zhou, Radu Timofte, Weiyan Hou, Luc Van Gool

    Abstract: Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. For instance, low-resolution surveillance images can be successively processed by super-resolution techniques and camouflaged object detection. However, in previous work, these two areas are always studied in isolation. In this paper, we, for the first time, conduct a… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 23 pages with 8 figures

    MSC Class: 68T45 ACM Class: I.4.3

  32. arXiv:2307.11760  [pdf, other

    cs.CL cs.AI cs.HC

    Large Language Models Understand and Can be Enhanced by Emotional Stimuli

    Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

    Abstract: Emotional intelligence significantly impacts our daily behaviors and interactions. Although Large Language Models (LLMs) are increasingly viewed as a stride toward artificial general intelligence, exhibiting impressive performance in numerous tasks, it is still uncertain if LLMs can genuinely grasp psychological emotional stimuli. Understanding and responding to emotional cues gives humans a disti… ▽ More

    Submitted 12 November, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Technical report; updated the std error for human study; short version (v1) was accepted by LLM@IJCAI'23; 32 pages; more work: https://llm-enhance.github.io/

  33. arXiv:2307.04427  [pdf, other

    astro-ph.HE astro-ph.GA cs.LG

    Observation of high-energy neutrinos from the Galactic plane

    Authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, S. Baur, R. Bay, J. J. Beatty, K. -H. Becker, J. Becker Tjus , et al. (364 additional authors not shown)

    Abstract: The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrin… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Submitted on May 12th, 2022; Accepted on May 4th, 2023

    Journal ref: Science 380, 6652, 1338-1343 (2023)

  34. arXiv:2306.09431  [pdf, other

    cs.MM

    Towards Long Form Audio-visual Video Understanding

    Authors: Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu

    Abstract: We live in a world filled with never-ending streams of multimodal information. As a more natural recording of the real scenario, long form audio-visual videos are expected as an important bridge for better exploring and understanding the world. In this paper, we propose the multisensory temporal event localization task in long form videos and strive to tackle the associated challenges. To facilita… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  35. arXiv:2306.06931  [pdf, other

    cs.LG cs.CV

    Evolving Semantic Prototype Improves Generative Zero-Shot Learning

    Authors: Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang

    Abstract: In zero-shot learning (ZSL), generative methods synthesize class-related sample features based on predefined semantic prototypes. They advance the ZSL performance by synthesizing unseen class sample features for better training the classifier. We observe that each class's predefined semantic prototype (also referred to as semantic embedding or condition) does not accurately match its real semantic… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML'23

  36. arXiv:2306.06466  [pdf, other

    cs.CL

    ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning

    Authors: Wenjun Hou, Kaishuai Xu, Yi Cheng, Wenjie Li, Jiang Liu

    Abstract: This paper explores the task of radiology report generation, which aims at generating free-text descriptions for a set of radiographs. One significant challenge of this task is how to correctly maintain the consistency between the images and the lengthy report. Previous research explored solving this issue through planning-based methods, which generate reports only based on high-level plans. Howev… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023

  37. arXiv:2306.04893  [pdf, other

    cs.CV

    Coping with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization

    Authors: Shuo Ye, Shujian Yu, Wenjin Hou, Yu Wang, Xinge You

    Abstract: Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species. Previous studies always implicitly assume that the training and test data have the same underlying distributions, and that features extracted by modern backbone architectures remain discriminative and generalize well to unseen test data. However, we empirically justify that th… ▽ More

    Submitted 9 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Manuscript accepted by CVIU, code is available at Github

  38. arXiv:2305.18109  [pdf, other

    cs.CL cs.AI

    Medical Dialogue Generation via Dual Flow Modeling

    Authors: Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

    Abstract: Medical dialogue systems (MDS) aim to provide patients with medical services, such as diagnosis and prescription. Since most patients cannot precisely describe their symptoms, dialogue understanding is challenging for MDS. Previous studies mainly addressed this by extracting the mentioned medical entities as critical dialogue history information. In this work, we argue that it is also essential to… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted as Findings of ACL 2023

  39. arXiv:2302.12095  [pdf, other

    cs.AI cs.CL cs.LG

    On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

    Authors: Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie

    Abstract: ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct… ▽ More

    Submitted 29 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Highlighted paper at ICLR 2023 workshop on Trustworthy and Reliable Large-Scale Machine Learning Models; code is at: https://github.com/microsoft/robustlearn; more works: https://llm-eval.github.io/

  40. arXiv:2209.03042  [pdf, other

    hep-ex astro-ph.IM cs.LG physics.data-an physics.ins-det

    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

    Authors: R. Abbasi, M. Ackermann, J. Adams, N. Aggarwal, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, K. -H. Becker , et al. (359 additional authors not shown)

    Abstract: IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: Prepared for submission to JINST

  41. arXiv:2208.10833  [pdf, other

    cs.SE cs.AI cs.LG

    LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction

    Authors: Hongcheng Guo, Yuhui Guo, Renjie Chen, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Weichao Hou, Liangfan Zheng, Bo Zhang

    Abstract: Fully supervised log anomaly detection methods suffer the heavy burden of annotating massive unlabeled log data. Recently, many semi-supervised methods have been proposed to reduce annotation costs with the help of parsed templates. However, these methods consider each keyword independently, which disregards the correlation between keywords and the contextual relationships among log sequences. In… ▽ More

    Submitted 11 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: 12 pages

  42. arXiv:2208.08280  [pdf, other

    cs.CL

    Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

    Authors: Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang

    Abstract: Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited la… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: Accepted by COLING 2022

  43. arXiv:2208.07204  [pdf, other

    cs.LG cs.AI cs.CV

    USB: A Unified Semi-supervised Learning Benchmark for Classification

    Authors: Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang

    Abstract: Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issu… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted by NeurIPS'22 dataset and benchmark track; code at https://github.com/microsoft/Semi-supervised-learning

  44. arXiv:2206.12169  [pdf, other

    cs.LG cs.AI

    AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

    Authors: Wenzheng Hou, Qianqian Xu, Zhiyong Yang, Shilong Bao, Yuan He, Qingming Huang

    Abstract: It is well-known that deep learning models are vulnerable to adversarial examples. Existing studies of adversarial training have made great progress against this challenge. As a typical trait, they often assume that the class distribution is overall balanced. However, long-tail datasets are ubiquitous in a wide spectrum of applications, where the amount of head class instances is larger than the t… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  45. arXiv:2206.09783  [pdf, other

    eess.AS cs.CL cs.SD

    Boosting Cross-Domain Speech Recognition with Self-Supervision

    Authors: Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

    Abstract: The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (S… ▽ More

    Submitted 30 July, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

  46. arXiv:2205.07246  [pdf, other

    cs.LG cs.CV

    FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

    Authors: Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie

    Abstract: Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization. However, we argue that existing methods might fail to utilize the unlabeled data more effectively since they either use a pre-defined / fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior perfo… ▽ More

    Submitted 31 January, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: Accepted by ICLR 2023. Code: https://github.com/microsoft/Semi-supervised-learning

  47. arXiv:2205.04121  [pdf, other

    cs.CV cs.HC

    Identifying Fixation and Saccades in Virtual Reality

    Authors: Xiao-lin Chen, Wen-jun Hou

    Abstract: Gaze recognition can significantly reduce the amount of eye movement data for a better understanding of cognitive and visual processing. Gaze recognition is an essential precondition for eye-based interaction applications in virtual reality. However, the three-dimensional characteristics of virtual reality environments also pose new challenges to existing recognition algorithms. Based on seven eva… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  48. arXiv:2112.08643  [pdf, other

    cs.CV cs.AI

    TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

    Authors: Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao

    Abstract: Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones. Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention, which ignore the transferability and discriminative attribute localization of visual features. In this paper, we propose… ▽ More

    Submitted 13 December, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: This is an extention of AAAI'22 paper (TransZero). Accepted to TPAMI. arXiv admin note: substantial text overlap with arXiv:2112.01683

  49. arXiv:2112.07225  [pdf, other

    cs.CV cs.AI cs.LG

    Margin Calibration for Long-Tailed Visual Recognition

    Authors: Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki

    Abstract: The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. W… ▽ More

    Submitted 7 October, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Accepted by Asian Conference on Machine Learning (ACML) 2022; 16 pages

  50. arXiv:2110.08263  [pdf, other

    cs.LG cs.CV

    FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

    Authors: Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

    Abstract: The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue… ▽ More

    Submitted 28 January, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; camera-ready version; 16 pages with appendix; code: https://github.com/TorchSSL/TorchSSL