Skip to main content

Showing 1–23 of 23 results for author: Bae, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21616  [pdf, ps, other

    cs.CL cs.CY

    TIM: A Large-Scale Dataset and large Timeline Intelligence Model for Open-domain Timeline Summarization

    Authors: Chuanrui Hu, Wei Hu, Penghang Yu, Hua Zhang, Bing-Kun Bao

    Abstract: Open-domain Timeline Summarization (TLS) is crucial for monitoring the evolution of news topics. To identify changes in news topics, existing methods typically employ general Large Language Models (LLMs) to summarize relevant timestamps from retrieved news. While general LLMs demonstrate capabilities in zero-shot news summarization and timestamp localization, they struggle with assessing topic rel… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  2. arXiv:2506.16712  [pdf, ps, other

    cs.CL cs.AI

    ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models

    Authors: Bin Chen, Xinzge Gao, Chuanrui Hu, Penghang Yu, Hua Zhang, Bing-Kun Bao

    Abstract: Generative Reward Models (GRMs) provide greater flexibility than scalar reward models in capturing human preferences, but their effectiveness is limited by poor reasoning capabilities. This often results in incomplete or overly speculative reasoning paths, leading to hallucinations or missing key information in complex tasks. We address this challenge with ReasonGRM, a three-stage generative rewar… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.13405  [pdf, ps, other

    cs.CL

    RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis

    Authors: Pengzuo Wu, Yuhang Yang, Guangcheng Zhu, Chao Ye, Hong Gu, Xu Lu, Ruixuan Xiao, Bowen Bao, Yijing He, Liangyu Zha, Wentao Ye, Junbo Zhao, Haobo Wang

    Abstract: With the rapid advancement of Large Language Models (LLMs), there is an increasing need for challenging benchmarks to evaluate their capabilities in handling complex tabular data. However, existing benchmarks are either based on outdated data setups or focus solely on simple, flat table structures. In this paper, we introduce RealHiTBench, a comprehensive benchmark designed to evaluate the perform… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: ACL 2025

  4. arXiv:2505.19201  [pdf, ps, other

    cs.CL

    DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

    Authors: Yunhai Hu, Tianhua Xia, Zining Liu, Rahul Raman, Xingyu Liu, Bo Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang

    Abstract: Speculative decoding (SD) has emerged as a powerful method for accelerating autoregressive generation in large language models (LLMs), yet its integration into vision-language models (VLMs) remains underexplored. We introduce DREAM, a novel speculative decoding framework tailored for VLMs that combines three key innovations: (1) a cross-attention-based mechanism to inject intermediate features fro… ▽ More

    Submitted 29 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  5. arXiv:2412.12327  [pdf, other

    cs.LG

    Leveraging Group Classification with Descending Soft Labeling for Deep Imbalanced Regression

    Authors: Ruizhi Pu, Gezheng Xu, Ruiyi Fang, Binkun Bao, Charles X. Ling, Boyu Wang

    Abstract: Deep imbalanced regression (DIR), where the target values have a highly skewed distribution and are also continuous, is an intriguing yet under-explored problem in machine learning. While recent works have already shown that incorporating various classification-based regularizers can produce enhanced outcomes, the role of classification remains elusive in DIR. Moreover, such regularizers (e.g.… ▽ More

    Submitted 19 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

  6. arXiv:2412.05619  [pdf, other

    cs.CV

    Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

    Authors: Ming Tao, Bing-Kun Bao, Yaowei Wang, Changsheng Xu

    Abstract: Large pretrained diffusion models have demonstrated impressive generation capabilities and have been adapted to various downstream tasks. However, unlike Large Language Models (LLMs) that can learn multiple tasks in a single model based on instructed data, diffusion models always require additional branches, task-specific training strategies, and losses for effective adaptation to different downst… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: 11 pages, 11 figures

  7. arXiv:2406.00323  [pdf, other

    cs.IR cs.MM

    BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation

    Authors: Qile Fan, Penghang Yu, Zhiyi Tan, Bing-Kun Bao, Guanming Lu

    Abstract: Multimedia recommender systems focus on utilizing behavioral information and content information to model user preferences. Typically, it employs pre-trained feature encoders to extract content features, then fuses them with behavioral features. However, pre-trained feature encoders often extract features from the entire content simultaneously, including excessive preference-irrelevant details. We… ▽ More

    Submitted 13 January, 2025; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by AAAI2025

  8. arXiv:2404.05979  [pdf, other

    cs.CV

    StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

    Abstract: Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability i… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages

  9. arXiv:2403.10744  [pdf, ps, other

    cs.AI

    Game and Reference: Policy Combination Synthesis for Epidemic Prevention and Control

    Authors: Zhiyi Tan, Bingkun Bao

    Abstract: In recent years, epidemic policy-making models are increasingly being used to provide reference for governors on prevention and control policies against catastrophic epidemics such as SARS, H1N1 and COVID-19. Existing studies are currently constrained by two issues: First, previous methods develop policies based on effect evaluation, since few of factors in real-world decision-making can be modele… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 16 pages, single line, 7 figures, written with Springer conference template

  10. arXiv:2309.15363  [pdf, other

    cs.IR

    LD4MRec: Simplifying and Powering Diffusion Model for Multimedia Recommendation

    Authors: Jiarui Zhu, Jun Hou, Penghang Yu, Zhiyi Tan, Bing-Kun Bao

    Abstract: Multimedia recommendation aims to predict users' future behaviors based on observed behaviors and item content information. However, the inherent noise contained in observed behaviors easily leads to suboptimal recommendation performance. Recently, the diffusion model's ability to generate information from noise presents a promising solution to this issue, prompting us to explore its application i… ▽ More

    Submitted 12 April, 2025; v1 submitted 26 September, 2023; originally announced September 2023.

  11. arXiv:2308.15840  [pdf, other

    cs.LG cs.AI physics.soc-ph q-bio.PE

    MSGNN: Multi-scale Spatio-temporal Graph Neural Network for Epidemic Forecasting

    Authors: Mingjie Qiu, Zhiyi Tan, Bing-kun Bao

    Abstract: Infectious disease forecasting has been a key focus and proved to be crucial in controlling epidemic. A recent trend is to develop forecast-ing models based on graph neural networks (GNNs). However, existing GNN-based methods suffer from two key limitations: (1) Current models broaden receptive fields by scaling the depth of GNNs, which is insuffi-cient to preserve the semantics of long-range conn… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 29 pages

    Report number: DAMI-D-23-00319R2

    Journal ref: Data Min Knowl Disc (2024)

  12. Multi-View Graph Convolutional Network for Multimedia Recommendation

    Authors: Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao

    Abstract: Multimedia recommendation has received much attention in recent years. It models user preferences based on both behavior information and item multimodal information. Though current GCN-based methods achieve notable success, they suffer from two limitations: (1) Modality noise contamination to the item representations. Existing methods often mix modality features and behavior features in a single v… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: MM'23

  13. arXiv:2304.14226  [pdf, other

    cs.LG cs.AI cs.PF

    TorchBench: Benchmarking PyTorch with High API Surface Coverage

    Authors: Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu

    Abstract: Deep learning (DL) has been a revolutionary technique in various domains. To facilitate the model development and deployment, many deep learning frameworks are proposed, among which PyTorch is one of the most popular solutions. The performance of ecosystem around PyTorch is critically important, which saves the costs of training models and reduces the response time of model inferences. In this pap… ▽ More

    Submitted 24 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

  14. arXiv:2303.16557  [pdf, other

    cs.CV cs.AI

    Self-accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain Method

    Authors: Hong-Jun Choi, Dongbin Na, Kyungjin Cho, Byunguk Bae, Seo Taek Kong, Hyunjoon An

    Abstract: This study presents a novel approach to bone age assessment (BAA) using a multi-view, multi-task classification model based on the Sauvegrain method. A straightforward solution to automating the Sauvegrain method, which assesses a maturity score for each landmark in the elbow and predicts the bone age, is to train classifiers independently to score each region of interest (RoI), but this approach… ▽ More

    Submitted 30 March, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 13 pages

  15. arXiv:2301.12959  [pdf, other

    cs.CV cs.AI

    GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu

    Abstract: Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the ima… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: 11 pages

  16. arXiv:2206.01160  [pdf, other

    cs.CV cs.MM

    DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

    Abstract: Text-guided image editing models have shown remarkable results. However, there remain two problems. First, they employ fixed manipulation modules for various editing requirements (e.g., color changing, texture changing, content adding and removing), which results in over-editing or insufficient editing. Second, they do not clearly distinguish between text-required and text-irrelevant parts, which… ▽ More

    Submitted 20 August, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  17. arXiv:2107.04768  [pdf, other

    cs.MM cs.AI cs.CV

    DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering

    Authors: Jianyu Wang, Bing-Kun Bao, Changsheng Xu

    Abstract: Video question answering is a challenging task, which requires agents to be able to understand rich video contents and perform spatial-temporal reasoning. However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer wi… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: 12 pages, 12 figures

    Journal ref: IEEE Transactions on Multimedia 2021

  18. arXiv:2106.05735  [pdf, other

    eess.IV cs.CV cs.LG

    The Medical Segmentation Decathlon

    Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

    Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    MSC Class: 68T07

  19. arXiv:2010.08924   

    cs.LG cs.AI

    Meta-path Free Semi-supervised Learning for Heterogeneous Networks

    Authors: Shin-woo Park, Byung Jun Bae, Jinyoung Yeo, Seung-won Hwang

    Abstract: Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved superior performance in tasks such as node classification. However, analyzing heterogeneous graph of different types of nodes and links still brings great challenges for injecting the heterogeneity into a graph neural network. A general remedy is to manually or automatically design meta-paths to tr… ▽ More

    Submitted 6 January, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: The technical description of [Proposed Models] section has an error. Especially, the training process

  20. arXiv:2008.05865  [pdf, other

    cs.CV

    DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

    Authors: Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu

    Abstract: Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks… ▽ More

    Submitted 14 October, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

  21. arXiv:2004.13840  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Using LSTM to Translate French to Senegalese Local Languages: Wolof as a Case Study

    Authors: Lo Alla, Dione Cheikh Bamba, Nguer Elhadji Mamadou, Ba Sileye O. Ba, Lo Moussa

    Abstract: In this paper, we propose a neural machine translation system for Wolof, a low-resource Niger-Congo language. First we gathered a parallel corpus of 70000 aligned French-Wolof sentences. Then we developped a baseline LSTM based encoder-decoder architecture which was further extended to bidirectional LSTMs with attention mechanisms. Our models are trained on a limited amount of parallel French-Wolo… ▽ More

    Submitted 27 March, 2020; originally announced April 2020.

    Comments: 4 pages, 2 tables, ICLR AfricaNLP2020 workshop

  22. arXiv:1904.00623  [pdf, other

    cs.AI cs.CV cs.LG cs.MM

    Constructing Hierarchical Q&A Datasets for Video Story Understanding

    Authors: Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang

    Abstract: Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of ques… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted to AAAI 2019 Spring Symposium Series : Story-Enabled Intelligence

  23. arXiv:1607.05327  [pdf, other

    cs.RO cs.HC

    Emotional Storytelling using Virtual and Robotic Agents

    Authors: Sandra Costa, Alberto Brunete, Byung-Chull Bae, Nikolaos Mavridis

    Abstract: In order to create effective storytelling agents three fundamental questions must be answered: first, is a physically embodied agent preferable to a virtual agent or a voice-only narration? Second, does a human voice have an advantage over a synthesised voice? Third, how should the emotional trajectory of the different characters in a story be related to a storyteller's facial expressions during s… ▽ More

    Submitted 18 July, 2016; originally announced July 2016.

    Comments: 14 pages, 10 Figures, 3 Tables