-
Beyond Linearity: Squeeze-and-Recalibrate Blocks for Few-Shot Whole Slide Image Classification
Authors:
Conghao Xiong,
Zhengrui Guo,
Zhe Xu,
Yifei Zhang,
Raymond Kai-Yu Tong,
Si Yong Yeo,
Hao Chen,
Joseph J. Y. Sung,
Irwin King
Abstract:
Deep learning has advanced computational pathology but expert annotations remain scarce. Few-shot learning mitigates annotation burdens yet suffers from overfitting and discriminative feature mischaracterization. In addition, the current few-shot multiple instance learning (MIL) approaches leverage pretrained vision-language models to alleviate these issues, but at the cost of complex preprocessin…
▽ More
Deep learning has advanced computational pathology but expert annotations remain scarce. Few-shot learning mitigates annotation burdens yet suffers from overfitting and discriminative feature mischaracterization. In addition, the current few-shot multiple instance learning (MIL) approaches leverage pretrained vision-language models to alleviate these issues, but at the cost of complex preprocessing and high computational cost. We propose a Squeeze-and-Recalibrate (SR) block, a drop-in replacement for linear layers in MIL models to address these challenges. The SR block comprises two core components: a pair of low-rank trainable matrices (squeeze pathway, SP) that reduces parameter count and imposes a bottleneck to prevent spurious feature learning, and a frozen random recalibration matrix that preserves geometric structure, diversifies feature directions, and redefines the optimization objective for the SP. We provide theoretical guarantees that the SR block can approximate any linear mapping to arbitrary precision, thereby ensuring that the performance of a standard MIL model serves as a lower bound for its SR-enhanced counterpart. Extensive experiments demonstrate that our SR-MIL models consistently outperform prior methods while requiring significantly fewer parameters and no architectural changes.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions
Authors:
Youngwan Jin,
Michal Kovac,
Yagiz Nalcakan,
Hyeongjin Ju,
Hanbin Song,
Sanghyeop Yeo,
Shiho Kim
Abstract:
Current autonomous driving algorithms heavily rely on the visible spectrum, which is prone to performance degradation in adverse conditions like fog, rain, snow, glare, and high contrast. Although other spectral bands like near-infrared (NIR) and long-wave infrared (LWIR) can enhance vision perception in such situations, they have limitations and lack large-scale datasets and benchmarks. Short-wav…
▽ More
Current autonomous driving algorithms heavily rely on the visible spectrum, which is prone to performance degradation in adverse conditions like fog, rain, snow, glare, and high contrast. Although other spectral bands like near-infrared (NIR) and long-wave infrared (LWIR) can enhance vision perception in such situations, they have limitations and lack large-scale datasets and benchmarks. Short-wave infrared (SWIR) imaging offers several advantages over NIR and LWIR. However, no publicly available large-scale datasets currently incorporate SWIR data for autonomous driving. To address this gap, we introduce the RGB and SWIR Multispectral Driving (RASMD) dataset, which comprises 100,000 synchronized and spatially aligned RGB-SWIR image pairs collected across diverse locations, lighting, and weather conditions. In addition, we provide a subset for RGB-SWIR translation and object detection annotations for a subset of challenging traffic scenarios to demonstrate the utility of SWIR imaging through experiments on both object detection and RGB-to-SWIR image translation. Our experiments show that combining RGB and SWIR data in an ensemble framework significantly improves detection accuracy compared to RGB-only approaches, particularly in conditions where visible-spectrum sensors struggle. We anticipate that the RASMD dataset will advance research in multispectral imaging for autonomous driving and robust perception systems.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Cooperative Dilemmas in Rational Debate
Authors:
Toby Handfield,
Julián Garcia,
Christian Hilbe,
Shang Long Yeo
Abstract:
As an epistemic activity, rational debate and discussion requires cooperation, yet involves a tension between collective and individual interests. While all participants benefit from collective outcomes like reaching consensus on true beliefs, individuals face personal costs when changing their minds. This creates an incentive for each debater to let others bear the cognitive burden of exploring a…
▽ More
As an epistemic activity, rational debate and discussion requires cooperation, yet involves a tension between collective and individual interests. While all participants benefit from collective outcomes like reaching consensus on true beliefs, individuals face personal costs when changing their minds. This creates an incentive for each debater to let others bear the cognitive burden of exploring alternative perspectives. We present a model to examine the strategic dynamics between debaters motivated by two competing goals: discovering truth and minimizing belief revisions. Our model demonstrates that this tension creates social dilemmas where strategies that are optimal for individuals systematically undermine the collective pursuit of truth. Paradoxically, our analysis reveals that increasing debaters' motivation to seek truth can sometimes produce equilibria with worse outcomes for collective truth discovery. These findings illuminate why rational debate can fail to achieve optimal epistemic outcomes, even when participants genuinely value truth.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
PVChat: Personalized Video Chat with One-Shot Learning
Authors:
Yufei Shi,
Weilong Yan,
Gang Xu,
Yumeng Li,
Yucheng Chen,
Zhenxi Li,
Fei Richard Yu,
Ming Li,
Si Yong Yeo
Abstract:
Video large language models (ViLLMs) excel in general video understanding, e.g., recognizing activities like talking and eating, but struggle with identity-aware comprehension, such as "Wilson is receiving chemotherapy" or "Tom is discussing with Sarah", limiting their applicability in smart healthcare and smart home environments. To address this limitation, we propose a one-shot learning framewor…
▽ More
Video large language models (ViLLMs) excel in general video understanding, e.g., recognizing activities like talking and eating, but struggle with identity-aware comprehension, such as "Wilson is receiving chemotherapy" or "Tom is discussing with Sarah", limiting their applicability in smart healthcare and smart home environments. To address this limitation, we propose a one-shot learning framework PVChat, the first personalized ViLLM that enables subject-aware question answering (QA) from a single video for each subject. Our approach optimizes a Mixture-of-Heads (MoH) enhanced ViLLM on a synthetically augmented video-QA dataset, leveraging a progressive image-to-video learning strategy. Specifically, we introduce an automated augmentation pipeline that synthesizes identity-preserving positive samples and retrieves hard negatives from existing video corpora, generating a diverse training dataset with four QA types: existence, appearance, action, and location inquiries. To enhance subject-specific learning, we propose a ReLU Routing MoH attention mechanism, alongside two novel objectives: (1) Smooth Proximity Regularization for progressive learning through exponential distance scaling and (2) Head Activation Enhancement for balanced attention routing. Finally, we adopt a two-stage training strategy, transitioning from image pre-training to video fine-tuning, enabling a gradual learning process from static attributes to dynamic representations. We evaluate PVChat on diverse datasets covering medical scenarios, TV series, anime, and real-world footage, demonstrating its superiority in personalized feature understanding after learning from a single video, compared to state-of-the-art ViLLMs.
△ Less
Submitted 8 July, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
-
Future-Aware Interaction Network For Motion Forecasting
Authors:
Shijie Li,
Xun Xu,
Si Yong Yeo,
Xulei Yang
Abstract:
Motion forecasting is a crucial component of autonomous driving systems, enabling the generation of accurate and smooth future trajectories to ensure safe navigation to the destination. In previous methods, potential future trajectories are often absent in the scene encoding stage, which may lead to suboptimal outcomes. Additionally, prior approaches typically employ transformer architectures for…
▽ More
Motion forecasting is a crucial component of autonomous driving systems, enabling the generation of accurate and smooth future trajectories to ensure safe navigation to the destination. In previous methods, potential future trajectories are often absent in the scene encoding stage, which may lead to suboptimal outcomes. Additionally, prior approaches typically employ transformer architectures for spatiotemporal modeling of trajectories and map information, which suffer from the quadratic scaling complexity of the transformer architecture. In this work, we propose an interaction-based method, named Future-Aware Interaction Network, that introduces potential future trajectories into scene encoding for a comprehensive traffic representation. Furthermore, a State Space Model (SSM), specifically Mamba, is introduced for both spatial and temporal modeling. To adapt Mamba for spatial interaction modeling, we propose an adaptive reordering strategy that transforms unordered data into a structured sequence. Additionally, Mamba is employed to refine generated future trajectories temporally, ensuring more consistent predictions. These enhancements not only improve model efficiency but also enhance the accuracy and diversity of predictions. We conduct comprehensive experiments on the widely used Argoverse 1 and Argoverse 2 datasets, demonstrating that the proposed method achieves superior performance compared to previous approaches in a more efficient way. The code will be released according to the acceptance.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Authors:
Ziyang Zhang,
Yang Yu,
Yucheng Chen,
Xulei Yang,
Si Yong Yeo
Abstract:
Despite significant progress in Vision-Language Pre-training (VLP), current approaches predominantly emphasize feature extraction and cross-modal comprehension, with limited attention to generating or transforming visual content. This gap hinders the model's ability to synthesize coherent and novel visual representations from textual prompts, thereby reducing the effectiveness of multi-modal learn…
▽ More
Despite significant progress in Vision-Language Pre-training (VLP), current approaches predominantly emphasize feature extraction and cross-modal comprehension, with limited attention to generating or transforming visual content. This gap hinders the model's ability to synthesize coherent and novel visual representations from textual prompts, thereby reducing the effectiveness of multi-modal learning. In this work, we propose MedUnifier, a unified VLP framework tailored for medical data. MedUnifier seamlessly integrates text-grounded image generation capabilities with multi-modal learning strategies, including image-text contrastive alignment, image-text matching and image-grounded text generation. Unlike traditional methods that reply on continuous visual representations, our approach employs visual vector quantization, which not only facilitates a more cohesive learning strategy for cross-modal understanding but also enhances multi-modal generation quality by effectively leveraging discrete representations. Our framework's effectiveness is evidenced by the experiments on established benchmarks, including uni-modal tasks (supervised fine-tuning), cross-modal tasks (image-text retrieval and zero-shot image classification), and multi-modal tasks (medical report generation, image synthesis), where it achieves state-of-the-art performance across various tasks. MedUnifier also offers a highly adaptable tool for a wide range of language and vision tasks in healthcare, marking advancement toward the development of a generalizable AI model for medical applications.
△ Less
Submitted 20 April, 2025; v1 submitted 2 March, 2025;
originally announced March 2025.
-
Enhancing Deliberativeness: Evaluating the Impact of Multimodal Reflection Nudges
Authors:
ShunYi Yeo,
Zhuoqun Jiang,
Anthony Tang,
Simon Tangi Perrault
Abstract:
Nudging participants with text-based reflective nudges enhances deliberation quality on online deliberation platforms. The effectiveness of multimodal reflective nudges, however, remains largely unexplored. Given the multi-sensory nature of human perception, incorporating diverse modalities into self-reflection mechanisms has the potential to better support various reflective styles. This paper ex…
▽ More
Nudging participants with text-based reflective nudges enhances deliberation quality on online deliberation platforms. The effectiveness of multimodal reflective nudges, however, remains largely unexplored. Given the multi-sensory nature of human perception, incorporating diverse modalities into self-reflection mechanisms has the potential to better support various reflective styles. This paper explores how presenting reflective nudges of different types (direct: persona and indirect: storytelling) in different modalities (text, image, video and audio) affects deliberation quality. We conducted two user studies with 20 and 200 participants respectively. The first study identifies the preferred modality for each type of reflective nudges, revealing that text is most preferred for persona and video is most preferred for storytelling. The second study assesses the impact of these modalities on deliberation quality. Our findings reveal distinct effects associated with each modality, providing valuable insights for developing more inclusive and effective online deliberation platforms.
△ Less
Submitted 7 February, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation
Authors:
Sangyeop Yeo,
Seung-won Hwang,
Yu-Seung Ma
Abstract:
The use of Large Language Models (LLMs) for code generation has gained significant attention in recent years. Existing methods often aim to improve the quality of generated code by incorporating additional contextual information or guidance into input prompts. Many of these approaches adopt sequential reasoning strategies, mimicking human-like step-by-step thinking. However, such strategies may co…
▽ More
The use of Large Language Models (LLMs) for code generation has gained significant attention in recent years. Existing methods often aim to improve the quality of generated code by incorporating additional contextual information or guidance into input prompts. Many of these approaches adopt sequential reasoning strategies, mimicking human-like step-by-step thinking. However, such strategies may constrain flexibility, as they do not always align with the structured characteristics of programming languages. This paper introduces the Chain of Grounded Objectives (CGO), a method that embeds functional objectives into input prompts to enhance code generation. By leveraging appropriately structured objectives as input and avoiding explicit sequential procedures, CGO adapts effectively to the structured nature of programming tasks. Empirical evaluations demonstrate that CGO effectively enhances code generation, addressing limitations of existing approaches.
△ Less
Submitted 28 May, 2025; v1 submitted 22 January, 2025;
originally announced January 2025.
-
MindCoder: Automated and Controllable Reasoning Chain in Qualitative Analysis
Authors:
Jie Gao,
Zhiyao Shu,
Shun Yi Yeo
Abstract:
Extracting insights from qualitative analysis involves a series of reasoning steps, such as open coding, grouping, and identifying themes. We introduce the MindCoder reasoning chain, built on Chain-of-Thought (CoT) prompting, to support the insight extraction process step by step-including topic clustering, code labeling, conceptualization, and reporting. We designed the MindCoder web application…
▽ More
Extracting insights from qualitative analysis involves a series of reasoning steps, such as open coding, grouping, and identifying themes. We introduce the MindCoder reasoning chain, built on Chain-of-Thought (CoT) prompting, to support the insight extraction process step by step-including topic clustering, code labeling, conceptualization, and reporting. We designed the MindCoder web application to help users 1) automatically run this reasoning chain (i.e., obtain analysis report results in approximately 3-5 minutes) and 2) interactively control the reasoning process on demand. Our technical evaluations assess its reliability across various data types and demonstrate that simulated human iteration can potentially enhance coding quality. A user study further confirmed positive feedback regarding MindCoder's automation and its on-demand reasoning functionality.
△ Less
Submitted 16 April, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
Authors:
Yannis Montreuil,
Shu Heng Yeo,
Axel Carlier,
Lai Xing Ng,
Wei Tsang Ooi
Abstract:
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-…
▽ More
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions while optimizing computational efficiency. Our approach integrates a principled allocation strategy with theoretical guarantees on optimal deferral that balances performance and cost. Empirical evaluations on SQuADv1, SQuADv2, and TriviaQA demonstrate that our method enhances answer reliability while significantly reducing computational overhead, making it well-suited for scalable and efficient EQA deployment.
△ Less
Submitted 18 February, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning
Authors:
Yannis Montreuil,
Shu Heng Yeo,
Axel Carlier,
Lai Xing Ng,
Wei Tsang Ooi
Abstract:
The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leve…
▽ More
The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and $(\mathcal{G}, \mathcal{R})$-consistent, ensuring convergence to the Bayes-optimal rejector. We derive explicit consistency bounds tied to the cross-entropy surrogate and the $L_1$-norm of agent-specific costs, and extend minimizability gap analysis to the multi-expert two-stage regime. We also make explicit how shared representation learning--commonly used in multi-task models--affects these consistency guarantees. Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach and highlight the limitations of existing L2D methods in multi-task scenarios.
△ Less
Submitted 23 May, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction
Authors:
Jin Jie Sean Yeo,
Ee-Leng Tan,
Jisheng Bai,
Santi Peksi,
Woon-Seng Gan
Abstract:
In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model b…
▽ More
In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU Urban Acoustic Scene 2022 Mobile development dataset yielded the highest average testing accuracy of (62.21, 59.82, 56.81, 53.03, 47.97)% on split (100, 50, 25, 10, 5)% respectively over the three systems.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Not Too Long, Not Too Short: Goldilocks Principle of 'Optimal' Reflection Time on Online Deliberation Platforms
Authors:
ShunYi Yeo,
Simon Tangi Perrault
Abstract:
The deliberative potential of online platforms has been widely examined but the impact of reflection time on the quality of deliberation remains under-explored. This paper presents two user studies involving 100 and 72 participants respectively, to investigate the impact of reflection time on the quality of deliberation in minute-scale deliberations. In the first study, we identified an optimal re…
▽ More
The deliberative potential of online platforms has been widely examined but the impact of reflection time on the quality of deliberation remains under-explored. This paper presents two user studies involving 100 and 72 participants respectively, to investigate the impact of reflection time on the quality of deliberation in minute-scale deliberations. In the first study, we identified an optimal reflection time for composing short opinion comments. In the second study, we introduced four distinct interface-based time nudges aimed at encouraging reflection near the optimal time. While these nudges may not improve the quality of deliberation, they effectively prolonged reflection periods. Additionally, we observed mixed effects on users' experience, influenced by the nature of the time nudges. Our findings suggest that reflection time is crucial, particularly for users who typically deliberate below the optimal reflection threshold.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading
Authors:
Zhiyuan Yang,
Bo Zhang,
Yufei Shi,
Ningze Zhong,
Johnathan Loh,
Huihui Fang,
Yanwu Xu,
Si Yong Yeo
Abstract:
Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accur…
▽ More
Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accuracy compared to single-modality methods. However, it remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution. Moreover, existing methods overlook the uncertainty estimation of different modalities, leading to unreliable predictions. To address these challenges, we propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage. Specifically, the supervised contrastive loss is employed to enhance the discriminative power in the feature extraction process, resulting in more effective features. In addition, we utilize the Frangi vesselness algorithm as a preprocessing step to incorporate vessel information to assist in the prediction. In the decision-level fusion stage, an evidence theory-based multi-modality classifier is employed to combine multi-source information with uncertainty estimation. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The code is available at \url{https://github.com/master-Shix/ETSCL}.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
Authors:
Sangyeop Yeo,
Yoojin Jang,
Jaejun Yoo
Abstract:
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution…
▽ More
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution matching, leveraging maximum mean discrepancy to facilitate effective knowledge distillation. Simultaneously, NICKEL employs an interactive compression method that enhances the communication between the student generator and discriminator, achieving a balanced and stable compression process. Our comprehensive evaluation on the StyleGAN2 architecture with the FFHQ dataset shows the effectiveness of our approach, with NICKEL & DiME achieving FID scores of 10.45 and 15.93 at compression rates of 95.73% and 98.92%, respectively. Remarkably, our methods sustain generative quality even at an extreme compression rate of 99.69%, surpassing the previous state-of-the-art performance by a large margin. These findings not only demonstrate our methodologies' capacity to significantly lower GANs' computational demands but also pave the way for deploying high-quality GAN models in settings with limited resources. Our code will be released soon.
△ Less
Submitted 4 September, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Help Me Reflect: Leveraging Self-Reflection Interface Nudges to Enhance Deliberativeness on Online Deliberation Platforms
Authors:
Shun Yi Yeo,
Gionnieve Lim,
Jie Gao,
Weiyu Zhang,
Simon Tangi Perrault
Abstract:
The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distin…
▽ More
The deliberative potential of online platforms has been widely examined. However, little is known about how various interface-based reflection nudges impact the quality of deliberation. This paper presents two user studies with 12 and 120 participants, respectively, to investigate the impacts of different reflective nudges on the quality of deliberation. In the first study, we examined five distinct reflective nudges: persona, temporal prompts, analogies and metaphors, cultural prompts and storytelling. Persona, temporal prompts, and storytelling emerged as the preferred nudges for implementation on online deliberation platforms. In the second study, we assess the impacts of these preferred reflectors more thoroughly. Results revealed a significant positive impact of these reflectors on deliberative quality. Specifically, persona promotes a deliberative environment for balanced and opinionated viewpoints while temporal prompts promote more individualised viewpoints. Our findings suggest that the choice of reflectors can significantly influence the dynamics and shape the nature of online discussions.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head
Authors:
Qian Wu,
Si Yong Yeo,
Yufei Chen,
Jun Liu
Abstract:
Accurate localization of cephalometric landmarks holds great importance in the fields of orthodontics and orthognathics due to its potential for automating key point labeling. In the context of landmark detection, particularly in cephalometrics, it has been observed that existing methods often lack standardized pipelines and well-designed bias reduction processes, which significantly impact their…
▽ More
Accurate localization of cephalometric landmarks holds great importance in the fields of orthodontics and orthognathics due to its potential for automating key point labeling. In the context of landmark detection, particularly in cephalometrics, it has been observed that existing methods often lack standardized pipelines and well-designed bias reduction processes, which significantly impact their performance. In this paper, we revisit a related task, human pose estimation (HPE), which shares numerous similarities with cephalometric landmark detection (CLD), and emphasize the potential for transferring techniques from the former field to benefit the latter. Motivated by this insight, we have developed a robust and adaptable benchmark based on the well-established HPE codebase known as MMPose. This benchmark can serve as a dependable baseline for achieving exceptional CLD performance. Furthermore, we introduce an upscaling design within the framework to further enhance performance. This enhancement involves the incorporation of a lightweight and efficient super-resolution module, which generates heatmap predictions on high-resolution features and leads to further performance refinement, benefiting from its ability to reduce quantization bias. In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one. The code for our method is available at https://github.com/5k5000/CLdetection2023.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Impact of Human-AI Interaction on User Trust and Reliance in AI-Assisted Qualitative Coding
Authors:
Jie Gao,
Junming Cao,
ShunYi Yeo,
Kenny Tsu Wei Choo,
Zheng Zhang,
Toby Jia-Jun Li,
Shengdong Zhao,
Simon Tangi Perrault
Abstract:
While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted…
▽ More
While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted a mixed-methods split-plot 3x3 study, involving 30 participants, and a follow-up study with 6 participants, exploring varying text selection and code length in the use of our AIQCs system for qualitative analysis. Our results indicate that qualitative open coding should be conceptualized as a series of distinct subtasks, each with differing levels of complexity, and therefore, should be given tailored design considerations. We further observed a discrepancy between perceived and behavioral measures, and emphasized the potential challenges of under- and over-reliance on AIQCs systems. Additional design implications were also proposed for consideration.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Code Will Tell: Visual Identification of Ponzi Schemes on Ethereum
Authors:
Xiaolin Wen,
Kim Siang Yeo,
Yong Wang,
Ling Cheng,
Feida Zhu,
Min Zhu
Abstract:
Ethereum has become a popular blockchain with smart contracts for investors nowadays. Due to the decentralization and anonymity of Ethereum, Ponzi schemes have been easily deployed and caused significant losses to investors. However, there are still no explainable and effective methods to help investors easily identify Ponzi schemes and validate whether a smart contract is actually a Ponzi scheme.…
▽ More
Ethereum has become a popular blockchain with smart contracts for investors nowadays. Due to the decentralization and anonymity of Ethereum, Ponzi schemes have been easily deployed and caused significant losses to investors. However, there are still no explainable and effective methods to help investors easily identify Ponzi schemes and validate whether a smart contract is actually a Ponzi scheme. To fill the research gap, we propose PonziLens, a novel visualization approach to help investors achieve early identification of Ponzi schemes by investigating the operation codes of smart contracts. Specifically, we conduct symbolic execution of opcode and extract the control flow for investing and rewarding with critical opcode instructions. Then, an intuitive directed-graph based visualization is proposed to display the investing and rewarding flows and the crucial execution paths, enabling easy identification of Ponzi schemes on Ethereum. Two usage scenarios involving both Ponzi and non-Ponzi schemes demonstrate the effectiveness of PonziLens.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Can We Find Strong Lottery Tickets in Generative Models?
Authors:
Sangyeop Yeo,
Yoojin Jang,
Jy-yong Sohn,
Dongyoon Han,
Jaejun Yoo
Abstract:
Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algor…
▽ More
Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding
Authors:
Xun Long Ng,
Kian Eng Ong,
Qichen Zheng,
Yun Ni,
Si Yong Yeo,
Jun Liu
Abstract:
Understanding animals' behaviors is significant for a wide range of applications. However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, th…
▽ More
Understanding animals' behaviors is significant for a wide range of applications. However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors. The wild animal footages used in our dataset record different times of the day in extensive range of environments containing variations in backgrounds, viewpoints, illumination and weather conditions. More specifically, our dataset contains 50 hours of annotated videos to localize relevant animal behavior segments in long videos for the video grounding task, 30K video sequences for the fine-grained multi-label action recognition task, and 33K frames for the pose estimation task, which correspond to a diverse range of animals with 850 species across 6 major animal classes. Such a challenging and comprehensive dataset shall be able to facilitate the community to develop, adapt, and evaluate various types of advanced methods for animal behavior analysis. Moreover, we propose a Collaborative Action Recognition (CARe) model that learns general and specific features for action recognition with unseen new animals. This method achieves promising performance in our experiments. Our dataset can be found at https://sutdcv.github.io/Animal-Kingdom.
△ Less
Submitted 3 June, 2022; v1 submitted 17 April, 2022;
originally announced April 2022.
-
Differentiable Simulation of Inertial Musculotendons
Authors:
Ying Wang,
Jasper Verheul,
Sang-Hoon Yeo,
Nima Khademi Kalantari,
Shinjiro Sueda
Abstract:
We propose a simple and practical approach for incorporating the effects of muscle inertia, which has been ignored by previous musculoskeletal simulators in both graphics and biomechanics. We approximate the inertia of the muscle by assuming that muscle mass is distributed along the centerline of the muscle. We express the motion of the musculotendons in terms of the motion of the skeletal joints…
▽ More
We propose a simple and practical approach for incorporating the effects of muscle inertia, which has been ignored by previous musculoskeletal simulators in both graphics and biomechanics. We approximate the inertia of the muscle by assuming that muscle mass is distributed along the centerline of the muscle. We express the motion of the musculotendons in terms of the motion of the skeletal joints using a chain of Jacobians, so that at the top level, only the reduced degrees of freedom of the skeleton are used to completely drive both bones and musculotendons. Our approach can handle all commonly used musculotendon path types, including those with multiple path points and wrapping surfaces. For muscle paths involving wrapping surfaces, we use neural networks to model the Jacobians, trained using existing wrapping surface libraries, which allows us to effectively handle the Jacobian discontinuities that occur when musculotendon paths collide with wrapping surfaces. We demonstrate support for higher-order time integrators, complex joints, inverse dynamics, Hill-type muscle models, and differentiability. In the limit, as the muscle mass is reduced to zero, our approach gracefully degrades to traditional simulators without support for muscle inertia. Finally, it is possible to mix and match inertial and non-inertial musculotendons, depending on the application.
△ Less
Submitted 22 September, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Authors:
Sangho Yeo,
Minho Bae,
Minjoong Jeong,
Oh-kyoung Kwon,
Sangyoon Oh
Abstract:
Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed…
▽ More
Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed. However, to use gossip-based methods in general cases, the validation accuracy for a large mini-batch needs to be verified. To verify this, we first empirically study the characteristics of gossip methods in a large mini-batch problem and observe that the gossip methods preserve higher validation accuracy than AllReduce-SGD(Stochastic Gradient Descent) when the number of batch sizes is increased and the number of workers is fixed. However, the delayed parameter propagation of the gossip-based models decreases validation accuracy in large node scales. To cope with this problem, we propose Crossover-SGD that alleviates the delay propagation of weight parameters via segment-wise communication and load balancing random network topology. We also adapt hierarchical communication to limit the number of workers in gossip-based communication methods. To validate the effectiveness of our proposed method, we conduct empirical experiments and observe that our Crossover-SGD shows higher node scalability than SGP(Stochastic Gradient Push).
△ Less
Submitted 17 October, 2022; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Learning by Semantic Similarity Makes Abstractive Summarization Better
Authors:
Wonjin Yoon,
Yoon Sun Yeo,
Minbyul Jeong,
Bong-Jun Yi,
Jaewoo Kang
Abstract:
By harnessing pre-trained language models, summarization models had rapid progress recently. However, the models are mainly assessed by automatic evaluation metrics such as ROUGE. Although ROUGE is known for having a positive correlation with human evaluation scores, it has been criticized for its vulnerability and the gap between actual qualities. In this paper, we compare the generated summaries…
▽ More
By harnessing pre-trained language models, summarization models had rapid progress recently. However, the models are mainly assessed by automatic evaluation metrics such as ROUGE. Although ROUGE is known for having a positive correlation with human evaluation scores, it has been criticized for its vulnerability and the gap between actual qualities. In this paper, we compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM, using a crowd-sourced human evaluation metric. Interestingly, model-generated summaries receive higher scores relative to reference summaries. Stemming from our experimental results, we first argue the intrinsic characteristics of the CNN/DM dataset, the progress of pre-trained language models, and their ability to generalize on the training data. Finally, we share our insights into the model-generated summaries and presents our thought on learning methods for abstractive summarization.
△ Less
Submitted 2 June, 2021; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Towards Debiasing Fact Verification Models
Authors:
Tal Schuster,
Darsh J Shah,
Yun Jie Serene Yeo,
Daniel Filizzola,
Enrico Santus,
Regina Barzilay
Abstract:
Fact verification requires validating a claim in the context of evidence. We show, however, that in the popular FEVER dataset this might not necessarily be the case. Claim-only classifiers perform competitively with top evidence-aware models. In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any…
▽ More
Fact verification requires validating a claim in the context of evidence. We show, however, that in the popular FEVER dataset this might not necessarily be the case. Claim-only classifiers perform competitively with top evidence-aware models. In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence. We create an evaluation set that avoids those idiosyncrasies. The performance of FEVER-trained models significantly drops when evaluated on this test set. Therefore, we introduce a regularization method which alleviates the effect of bias in the training data, obtaining improvements on the newly created test set. This work is a step towards a more sound evaluation of reasoning capabilities in fact verification models.
△ Less
Submitted 30 August, 2019; v1 submitted 14 August, 2019;
originally announced August 2019.
-
On the Closest Vector Problem for Lattices Constructed from Polynomials and Their Cryptographic Applications
Authors:
Zhe Li,
San Ling,
Chaoping Xing,
Sze Ling Yeo
Abstract:
In this paper, we propose new classes of trapdoor functions to solve the closest vector problem in lattices. Specifically, we construct lattices based on properties of polynomials for which the closest vector problem is hard to solve unless some trapdoor information is revealed. We thoroughly analyze the security of our proposed functions using state-of-the-art attacks and results on lattice reduc…
▽ More
In this paper, we propose new classes of trapdoor functions to solve the closest vector problem in lattices. Specifically, we construct lattices based on properties of polynomials for which the closest vector problem is hard to solve unless some trapdoor information is revealed. We thoroughly analyze the security of our proposed functions using state-of-the-art attacks and results on lattice reductions. Finally, we describe how our functions can be used to design quantum-safe encryption schemes with reasonable public key sizes. In particular, our scheme can offer around $106$ bits of security with a public key size of around $6.4$ $\texttt{KB}$. Our encryption schemes are efficient with respect to key generation, encryption and decryption.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.
-
A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification
Authors:
Xulei Yang,
Zeng Zeng,
Si Yong Yeo,
Colin Tan,
Hong Liang Tey,
Yi Su
Abstract:
In this study, a multi-task deep neural network is proposed for skin lesion analysis. The proposed multi-task learning model solves different tasks (e.g., lesion segmentation and two independent binary lesion classifications) at the same time by exploiting commonalities and differences across tasks. This results in improved learning efficiency and potential prediction accuracy for the task-specifi…
▽ More
In this study, a multi-task deep neural network is proposed for skin lesion analysis. The proposed multi-task learning model solves different tasks (e.g., lesion segmentation and two independent binary lesion classifications) at the same time by exploiting commonalities and differences across tasks. This results in improved learning efficiency and potential prediction accuracy for the task-specific models, when compared to training the individual models separately. The proposed multi-task deep learning model is trained and evaluated on the dermoscopic image sets from the International Skin Imaging Collaboration (ISIC) 2017 Challenge - Skin Lesion Analysis towards Melanoma Detection, which consists of 2000 training samples and 150 evaluation samples. The experimental results show that the proposed multi-task deep learning model achieves promising performances on skin lesion segmentation and classification. The average value of Jaccard index for lesion segmentation is 0.724, while the average values of area under the receiver operating characteristic curve (AUC) on two individual lesion classifications are 0.880 and 0.972, respectively.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
On the last fall degree of zero-dimensional Weil descent systems
Authors:
Ming-Deh A. Huang,
Michiel Kosters,
Yun Yang,
Sze Ling Yeo
Abstract:
In this article we will discuss a new, mostly theoretical, method for solving (zero-dimensional) polynomial systems, which lies in between Gröbner basis computations and the heuristic first fall degree assumption and is not based on any heuristic. This method relies on the new concept of last fall degree.
Let $k$ be a finite field of cardinality $q^n$ and let $k'$ be its subfield of cardinality…
▽ More
In this article we will discuss a new, mostly theoretical, method for solving (zero-dimensional) polynomial systems, which lies in between Gröbner basis computations and the heuristic first fall degree assumption and is not based on any heuristic. This method relies on the new concept of last fall degree.
Let $k$ be a finite field of cardinality $q^n$ and let $k'$ be its subfield of cardinality $q$. Let $\mathcal{F} \subset k[X_0,\ldots,X_{m-1}]$ be a finite subset generating a zero-dimensional ideal. We give an upper bound of the last fall degree of the Weil descent system of $\mathcal{F}$, which depends on $q$, $m$, the last fall degree of $\mathcal{F}$, the degree of $\mathcal{F}$ and the number of solutions of $\mathcal{F}$, but not on $n$. This shows that such Weil descent systems can be solved efficiently if $n$ grows. In particular, we apply these results for multi-HFE and essentially show that multi-HFE is insecure.
Finally, we discuss that the degree of regularity (or last fall degree) of Weil descent systems coming from summation polynomials to solve the elliptic curve discrete logarithm problem might depend on $n$, since such systems without field equations are not zero-dimensional.
△ Less
Submitted 17 June, 2015; v1 submitted 11 May, 2015;
originally announced May 2015.
-
New Constant-Weight Codes from Propagation Rules
Authors:
Yeow Meng Chee,
Chaoping Xing,
Sze Ling Yeo
Abstract:
This paper proposes some simple propagation rules which give rise to new binary constant-weight codes.
This paper proposes some simple propagation rules which give rise to new binary constant-weight codes.
△ Less
Submitted 9 August, 2010;
originally announced August 2010.
-
Energy-Efficient Scheduling of HPC Applications in Cloud Computing Environments
Authors:
Saurabh Kumar Garg,
Chee Shin Yeo,
Arun Anandasivam,
Rajkumar Buyya
Abstract:
The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. They need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based o…
▽ More
The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. They need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost, which will reduce the profit margin of Cloud providers, but also high carbon emissions which is not environmentally sustainable. Hence, energy-efficient solutions are required that can address the high increase in the energy consumption from the perspective of not only Cloud provider but also from the environment. To address this issue we propose near-optimal scheduling policies that exploits heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors such as energy cost, carbon emission rate, workload, and CPU power efficiency which changes across different data center depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 30% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.
△ Less
Submitted 7 September, 2009;
originally announced September 2009.
-
Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities
Authors:
Rajkumar Buyya,
Chee Shin Yeo,
Srikumar Venugopal
Abstract:
This keynote paper: presents a 21st century vision of computing; identifies various computing paradigms promising to deliver the vision of computing utilities; defines Cloud computing and provides the architecture for creating market-oriented Clouds by leveraging technologies such as VMs; provides thoughts on market-based resource management strategies that encompass both customer-driven service…
▽ More
This keynote paper: presents a 21st century vision of computing; identifies various computing paradigms promising to deliver the vision of computing utilities; defines Cloud computing and provides the architecture for creating market-oriented Clouds by leveraging technologies such as VMs; provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; presents some representative Cloud platforms especially those developed in industries along with our current work towards realising market-oriented resource allocation of Clouds by leveraging the 3rd generation Aneka enterprise Grid technology; reveals our early thoughts on interconnecting Clouds for dynamically creating an atmospheric computing environment along with pointers to future community research; and concludes with the need for convergence of competing IT paradigms for delivering our 21st century vision.
△ Less
Submitted 26 August, 2008;
originally announced August 2008.
-
Utility Computing and Global Grids
Authors:
Chee Shin Yeo,
Marcos Dias de Assuncao,
Jia Yu,
Anthony Sulistio,
Srikumar Venugopal,
Martin Placek,
Rajkumar Buyya
Abstract:
This chapter focuses on the use of Grid technologies to achieve utility computing. An overview of how Grids can support utility computing is first presented through the architecture of Utility Grids. Then, utility-based resource allocation is described in detail at each level of the architecture. Finally, some industrial solutions for utility computing are discussed.
This chapter focuses on the use of Grid technologies to achieve utility computing. An overview of how Grids can support utility computing is first presented through the architecture of Utility Grids. Then, utility-based resource allocation is described in detail at each level of the architecture. Finally, some industrial solutions for utility computing are discussed.
△ Less
Submitted 12 May, 2006;
originally announced May 2006.