Skip to main content

Showing 1–50 of 312 results for author: Chan, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09855  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

    Authors: Alexander Y. Ku, Thomas L. Griffiths, Stephanie C. Y. Chan

    Abstract: Transformer models learn in two distinct modes: in-weights learning (IWL), encoding knowledge into model weights, and in-context learning (ICL), adapting flexibly to context without weight modification. To better understand the interplay between these learning modes, we draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding (akin to IWL, adapting over generati… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.08910  [pdf, ps, other

    cs.CV cs.CL

    Behind Maya: Building a Multilingual Vision Language Model

    Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

    Abstract: In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pre… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted at VLMs4ALL CVPR 2025 Workshop; corrected workshop name spelling

  3. arXiv:2505.04616  [pdf, other

    cs.CV

    Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

    Authors: Feng Liu, Nicholas Chimitt, Lanqing Guo, Jitesh Jain, Aditya Kane, Minchul Kim, Wes Robbins, Yiyang Su, Dingqiang Ye, Xingguang Zhang, Jie Zhu, Siddharth Satyakam, Christopher Perry, Stanley H. Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu

    Abstract: We address the problem of whole-body person recognition in unconstrained environments. This problem arises in surveillance scenarios such as those in the IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) program, where biometric data is captured at long standoff distances, elevated viewing angles, and under adverse atmospheric conditions (e.g., turbulence and high wind v… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 18 pages, 12 figures

  4. arXiv:2505.00661  [pdf, ps, other

    cs.CL cs.AI cs.LG

    On the generalization of language models from in-context learning and finetuning: a controlled study

    Authors: Andrew K. Lampinen, Arslan Chaudhry, Stephanie C. Y. Chan, Cody Wild, Diane Wan, Alex Ku, Jörg Bornschein, Razvan Pascanu, Murray Shanahan, James L. McClelland

    Abstract: Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained on, or fail to make simple logical deductions based on trained information. These failures to generalize from fine-tuning can hinder practical application of these models. On the other hand, language… ▽ More

    Submitted 6 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  5. arXiv:2504.13462  [pdf, other

    cs.LG

    Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling

    Authors: Hui Yeok Wong, Chee Kau Lim, Chee Seng Chan

    Abstract: Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance signi… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  6. arXiv:2504.13151  [pdf, other

    cs.LG cs.AI cs.CL

    MIB: A Mechanistic Interpretability Benchmark

    Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

    Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of meaningful and lasting evaluation standards, we propose MIB, a benchmark with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or specific causal variables in neural language models. The circuit localization track… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  7. arXiv:2504.03888  [pdf, other

    cs.HC cs.AI

    Investigating Affective Use and Emotional Well-being on ChatGPT

    Authors: Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal, Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, Pattie Maes

    Abstract: As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affe… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  8. arXiv:2504.02697  [pdf, other

    cs.CV eess.IV

    Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation

    Authors: Xingguang Zhang, Nicholas Chimitt, Xijun Wang, Yu Yuan, Stanley H. Chan

    Abstract: Atmospheric turbulence is a major source of image degradation in long-range imaging systems. Although numerous deep learning-based turbulence mitigation (TM) methods have been proposed, many are slow, memory-hungry, and do not generalize well. In the spatial domain, methods based on convolutional operators have a limited receptive field, so they cannot handle a large spatial dependency required by… ▽ More

    Submitted 12 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Highlight (extended), project page: https://xg416.github.io/MambaTM/

  9. arXiv:2504.01848  [pdf, other

    cs.AI cs.CL

    PaperBench: Evaluating AI's Ability to Replicate AI Research

    Authors: Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, Tejal Patwardhan

    Abstract: We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including understanding paper contributions, developing a codebase, and successfully executing experiments. For objective evaluation, we develop rubrics that hierarchically decompose each replication task into… ▽ More

    Submitted 7 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: 30 pages, 14 figures

  10. arXiv:2503.23250  [pdf, other

    cs.CR cs.AI

    Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions

    Authors: Shih-Han Chan

    Abstract: Security threats like prompt injection attacks pose significant risks to applications that integrate Large Language Models (LLMs), potentially leading to unauthorized actions such as API misuse. Unlike previous approaches that aim to detect these attacks on a best-effort basis, this paper introduces a novel method that appends an Encrypted Prompt to each user prompt, embedding current permissions.… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  11. arXiv:2503.21676  [pdf, other

    cs.CL cs.LG

    How do language models learn facts? Dynamics, curricula and hallucinations

    Authors: Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De

    Abstract: Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechani… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  12. arXiv:2503.17813  [pdf, ps, other

    cs.CR

    Connectedness: a dimension of security bug severity assessment for measuring uncertainty

    Authors: Shue Long Chan

    Abstract: Current frameworks for evaluating security bug severity, such as the Common Vulnerability Scoring System (CVSS), prioritize the ratio of exploitability to impact. This paper suggests that the above approach measures the "known knowns" but inadequately addresses the "known unknowns" especially when there exist multiple possible exploit paths and side effects, which introduce significant uncertainty… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  13. arXiv:2503.17473  [pdf, other

    cs.HC

    How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Randomized Controlled Study

    Authors: Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal

    Abstract: AI chatbots, especially those with voice capabilities, have become increasingly human-like, with more users seeking emotional support and companionship from them. Concerns are rising about how such interactions might impact users' loneliness and socialization with real people. We conducted a four-week randomized, controlled, IRB-approved experiment (n=981, >300K messages) to investigate how AI cha… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  14. arXiv:2503.10526  [pdf, other

    cs.CV

    NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval

    Authors: Zengrong Lin, Zheng Wang, Tianwen Qian, Pan Mu, Sixian Chan, Cong Bai

    Abstract: Cross-modal retrieval aims to bridge the semantic gap between different modalities, such as visual and textual data, enabling accurate retrieval across them. Despite significant advancements with models like CLIP that align cross-modal representations, a persistent challenge remains: the hubness problem, where a small subset of samples (hubs) dominate as nearest neighbors, leading to biased repres… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025, 18 pages, 7 figures, 13 tables

  15. arXiv:2503.09130  [pdf, other

    cs.GR cs.CV cs.MM

    InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

    Authors: Jiun Tian Hoe, Weipeng Hu, Wei Zhou, Chao Xie, Ziwei Wang, Chee Seng Chan, Xudong Jiang, Yap-Peng Tan

    Abstract: This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editin… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Website: https://jiuntian.github.io/interactedit

  16. arXiv:2503.05825  [pdf, ps, other

    cs.RO eess.SY

    A Human-In-The-Loop Simulation Framework for Evaluating Control Strategies in Gait Assistive Robots

    Authors: Yifan Wang, Sherwin Stephen Chan, Mingyuan Lei, Lek Syn Lim, Henry Johan, Bingran Zuo, Wei Tech Ang

    Abstract: As the global population ages, effective rehabilitation and mobility aids will become increasingly critical. Gait assistive robots are promising solutions, but designing adaptable controllers for various impairments poses a significant challenge. This paper presented a Human-In-The-Loop (HITL) simulation framework tailored specifically for gait assistive robots, addressing unique challenges posed… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  17. arXiv:2503.05631  [pdf, other

    cs.LG

    Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

    Authors: Aaditya K. Singh, Ted Moskovitz, Sara Dragutinovic, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe

    Abstract: In-context learning (ICL) is a powerful ability that emerges in transformer models, enabling them to learn from context without weight updates. Recent work has established emergent ICL as a transient phenomenon that can sometimes disappear after long training times. In this work, we sought a mechanistic understanding of these transient dynamics. Firstly, we find that, after the disappearance of IC… ▽ More

    Submitted 10 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 20 pages, 18 figures

  18. arXiv:2502.18749  [pdf, other

    cs.RO

    Simulating Safe Bite Transfer in Robot-Assisted Feeding with a Soft Head and Articulated Jaw

    Authors: Yi Heng San, Vasanthamaran Ravichandram, J-Anne Yow, Sherwin Stephen Chan, Yifan Wang, Wei Tech Ang

    Abstract: Ensuring safe and comfortable bite transfer during robot-assisted feeding is challenging due to the close physical human-robot interaction required. This paper presents a novel approach to modeling physical human-robot interaction in a physics-based simulator (MuJoCo) using soft-body dynamics. We integrate a flexible head model with a rigid skeleton while accounting for internal dynamics, enabling… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 6 pages, 6 figures

  19. arXiv:2502.12176  [pdf, other

    cs.LG cs.AI

    Ten Challenging Problems in Federated Foundation Models

    Authors: Tao Fan, Hanlin Gu, Xuemei Cao, Chee Seng Chan, Qian Chen, Yiqiang Chen, Yihui Feng, Yang Gu, Jiaxiang Geng, Bing Luo, Shuoling Liu, Win Kent Ong, Chao Ren, Jiaqi Shao, Chuan Sun, Xiaoli Tang, Hong Xi Tae, Yongxin Tong, Shuyue Wei, Fan Wu, Wei Xi, Mingcong Xu, He Yang, Xin Yang, Jiangpeng Yan , et al. (8 additional authors not shown)

    Abstract: Federated Foundation Models (FedFMs) represent a distributed learning paradigm that fuses general competences of foundation models as well as privacy-preserving capabilities of federated learning. This combination allows the large foundation models and the small local domain models at the remote clients to learn from each other in a teacher-student learning setting. This paper provides a comprehen… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  20. arXiv:2502.04322  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

    Authors: Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi

    Abstract: Despite extensive safety alignment efforts, large language models (LLMs) remain vulnerable to jailbreak attacks that elicit harmful behavior. While existing studies predominantly focus on attack methods that require technical expertise, two critical questions remain underexplored: (1) Are jailbroken responses truly useful in enabling average users to carry out harmful actions? (2) Do safety vulner… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  21. arXiv:2502.01801  [pdf, other

    cs.HC

    MemPal: Leveraging Multimodal AI and LLMs for Voice-Activated Object Retrieval in Homes of Older Adults

    Authors: Natasha Maniar, Samantha W. T. Chan, Wazeer Zulfikar, Scott Ren, Christine Xu, Pattie Maes

    Abstract: Older adults have increasing difficulty with retrospective memory, hindering their abilities to perform daily activities and posing stress on caregivers to ensure their wellbeing. Recent developments in Artificial Intelligence (AI) and large context-aware multimodal models offer an opportunity to create memory support systems that assist older adults with common issues like object finding. This pa… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 15 pages

    ACM Class: F.2.2, I.2.7

  22. arXiv:2501.08505  [pdf, other

    cs.CV eess.IV

    Yuan: Yielding Unblemished Aesthetics Through A Unified Network for Visual Imperfections Removal in Generated Images

    Authors: Zhenyu Yu, Chee Seng Chan

    Abstract: Generative AI presents transformative potential across various domains, from creative arts to scientific visualization. However, the utility of AI-generated imagery is often compromised by visual flaws, including anatomical inaccuracies, improper object placements, and misplaced textual elements. These imperfections pose significant challenges for practical applications. To overcome these limitati… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  23. arXiv:2501.04144  [pdf, other

    cs.CV cs.GR

    Chirpy3D: Creative Fine-grained 3D Object Fabrication via Part Sampling

    Authors: Kam Woh Ng, Jing Yang, Jia Wei Sii, Jiankang Deng, Chee Seng Chan, Yi-Zhe Song, Tao Xiang, Xiatian Zhu

    Abstract: We present Chirpy3D, a novel approach for fine-grained 3D object generation, tackling the challenging task of synthesizing creative 3D objects in a zero-shot setting, with access only to unposed 2D images of seen categories. Without structured supervision -- such as camera poses, 3D part annotations, or object-specific labels -- the model must infer plausible 3D structures, capture fine-grained de… ▽ More

    Submitted 28 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 19 pages

  24. arXiv:2412.20381  [pdf, other

    cs.CV cs.MM

    Protégé: Learn and Generate Basic Makeup Styles with Generative Adversarial Networks (GANs)

    Authors: Jia Wei Sii, Chee Seng Chan

    Abstract: Makeup is no longer confined to physical application; people now use mobile apps to digitally apply makeup to their photos, which they then share on social media. However, while this shift has made makeup more accessible, designing diverse makeup styles tailored to individual faces remains a challenge. This challenge currently must still be done manually by humans. Existing systems, such as makeup… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: 8 pages, 5 figures

  25. arXiv:2412.16429  [pdf, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  26. arXiv:2412.14327  [pdf, other

    cs.CV

    Personalized Generative Low-light Image Denoising and Enhancement

    Authors: Xijun Wang, Prateek Chennuri, Yu Yuan, Bole Ma, Xingguang Zhang, Stanley Chan

    Abstract: While smartphone cameras today can produce astonishingly good photos, their performance in low light is still not completely satisfactory because of the fundamental limits in photon shot noise and sensor read noise. Generative image restoration methods have demonstrated promising results compared to traditional methods, but they suffer from hallucinatory content generation when the signal-to-noise… ▽ More

    Submitted 10 March, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

  27. arXiv:2412.09619  [pdf, other

    cs.CV

    SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

    Authors: Dongting Hu, Jierun Chen, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri, Rajesh Singh, Yerlan Idelbayev, Junli Cao, Yanyu Li, Kwang-Ting Cheng, S. -H. Gary Chan, Mingming Gong, Sergey Tulyakov, Anil Kag, Yanwu Xu, Jian Ren

    Abstract: Existing text-to-image (T2I) diffusion models face several limitations, including large model sizes, slow runtime, and low-quality generation on mobile devices. This paper aims to address all of these challenges by developing an extremely small and fast T2I model that generates high-resolution and high-quality images on mobile platforms. We propose several techniques to achieve this goal. First, w… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  28. arXiv:2412.07112  [pdf, other

    cs.CV cs.CL

    Maya: An Instruction Finetuned Multilingual Multimodal Model

    Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

    Abstract: The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to und… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  29. arXiv:2412.03782  [pdf, other

    cs.CL cs.LG

    The broader spectrum of in-context learning

    Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Aaditya K. Singh, Murray Shanahan

    Abstract: The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can b… ▽ More

    Submitted 9 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

  30. arXiv:2412.02168  [pdf, other

    cs.CV

    Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

    Authors: Yu Yuan, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan

    Abstract: Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-consistent images. This limitation not only hinders the adoption of generative tools in professional p… ▽ More

    Submitted 24 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Accepted by CVPR 2025. Project page: https://generative-photography.github.io/project/

  31. arXiv:2411.14701  [pdf, other

    cs.RO

    Personalised 3D Human Digital Twin with Soft-Body Feet for Walking Simulation

    Authors: Kum Yew Loke, Sherwin Stephen Chan, Mingyuan Lei, Henry Johan, Bingran Zuo, Wei Tech Ang

    Abstract: With the increasing use of assistive robots in rehabilitation and assisted mobility of human patients, there has been a need for a deeper understanding of human-robot interactions particularly through simulations, allowing an understanding of these interactions in a digital environment. There is an emphasis on accurately modelling personalised 3D human digital twins in these simulations, to glean… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 10 pages, 16th International Conference on Social Robotics

  32. arXiv:2411.12946  [pdf, other

    cs.CL cs.LG

    A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

    Authors: Gabriel Chua, Shing Yee Chan, Shaun Khoo

    Abstract: Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often rely on curated examples or custom classifiers, suffer from high false-positive rates, limited adaptability, and the impracticality of requiring real-world data that is not available in pre-production. In this paper, we introd… ▽ More

    Submitted 9 April, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: 8 pages, 5 figures

    MSC Class: 68T50 ACM Class: I.2.7

  33. arXiv:2411.06618  [pdf, other

    cs.LG cs.DC

    Using Diffusion Models as Generative Replay in Continual Federated Learning -- What will Happen?

    Authors: Yongsheng Mei, Liangqi Yuan, Dong-Jun Han, Kevin S. Chan, Christopher G. Brinton, Tian Lan

    Abstract: Federated learning (FL) has become a cornerstone in decentralized learning, where, in many scenarios, the incoming data distribution will change dynamically over time, introducing continuous learning (CL) problems. This continual federated learning (CFL) task presents unique challenges, particularly regarding catastrophic forgetting and non-IID input data. Existing solutions include using a replay… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

  34. arXiv:2411.02278  [pdf, other

    cs.SE

    Is This the Same Code? A Comprehensive Study of Decompilation Techniques for WebAssembly Binaries

    Authors: Wei-Cheng Wu, Yutian Yan, Hallgrimur David Egilsson, David Park, Steven Chan, Christophe Hauser, Weihang Wang

    Abstract: WebAssembly is a low-level bytecode language designed for client-side execution in web browsers. The need for decompilation techniques that recover high-level source code from WASM binaries has grown as WASM continues to gain widespread adoption and its security concerns. However little research has been done to assess the quality of decompiled code from WASM. This paper aims to fill this gap by c… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: SecureComm'24: Proceedings of the 20th EAI International Conference on Security and Privacy in Communication Networks

  35. arXiv:2411.00248  [pdf, other

    cs.CL

    A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making

    Authors: Yubin Kim, Chanwoo Park, Hyewon Jeong, Cristina Grau-Vilchez, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park

    Abstract: Medical Decision-Making (MDM) is a multi-faceted process that requires clinicians to assess complex multi-modal patient data patient, often collaboratively. Large Language Models (LLMs) promise to streamline this process by synthesizing vast medical knowledge and multi-modal health data. However, single-agent are often ill-suited for nuanced medical contexts requiring adaptable, collaborative prob… ▽ More

    Submitted 19 November, 2024; v1 submitted 31 October, 2024; originally announced November 2024.

    Comments: Under Review for ML4H 2024

  36. arXiv:2410.23129  [pdf, other

    cs.LG cs.CV stat.ML

    Why Fine-grained Labels in Pretraining Benefit Generalization?

    Authors: Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley Chan, Enming Luo

    Abstract: Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data for downstream tasks, often yields better generalization than pretraining with coarse-labeled data. While there is ample empirical evidence supporting this, the theoretical justification remains an open problem. This paper addresses this gap by introducing a "hi… ▽ More

    Submitted 10 December, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.16887

  37. Quanta Video Restoration

    Authors: Prateek Chennuri, Yiheng Chi, Enze Jiang, G. M. Dilshan Godaliyadda, Abhiram Gnanasambandam, Hamid R. Sheikh, Istvan Gyongy, Stanley H. Chan

    Abstract: The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when th… ▽ More

    Submitted 14 November, 2024; v1 submitted 19 October, 2024; originally announced October 2024.

    Comments: Accepted at European Conference on Computer Vision (ECCV) 2024, Milano, Italy, Sept 29 - Oct 4, 2024, Part XL, LNCS 15098

    Journal ref: European Conference on Computer Vision (ECCV) 2024

  38. arXiv:2410.10922  [pdf, other

    cs.LG cs.CR cs.CV

    A few-shot Label Unlearning in Vertical Federated Learning

    Authors: Hanlin Gu, Hong Xi Tae, Chee Seng Chan, Lixin Fan

    Abstract: This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), an area that has received limited attention compared to horizontal federated learning. We introduce the first approach specifically designed to tackle label unlearning in VFL, focusing on scenarios where the active party aims to mitigate the risk of label leakage. Our method leverages a limited amount o… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: We introduce the first method for label unlearning in vertical federated learning (VFL), focused on preventing label leakage by the active party

  39. arXiv:2410.09314  [pdf, other

    cs.CL cs.AI

    \llinstruct: An Instruction-tuned model for English Language Proficiency Assessments

    Authors: Debanjan Ghosh, Sophia Chan

    Abstract: We present \llinstruct: An 8B instruction-tuned model that is designed to generate content for English Language Proficiency Assessments (ELPA) and related applications. Our work involves creating a new dataset of 70K instructions and explanations in the ELPA domain and using these to fine-tune Llama-3 8B models (SFT) of different sizes (e.g., SFT-17K, SFT-50K and SFT-70K). Human evaluations are co… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  40. arXiv:2410.08794  [pdf, other

    cs.LG cs.AI

    M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

    Authors: Zhongyi Yu, Zhenghao Wu, Shuhan Zhong, Weifeng Su, S. -H. Gary Chan, Chul-Ho Lee, Weipeng Zhuo

    Abstract: Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of explicitly considering the `missingness' information i… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  41. arXiv:2410.07095  [pdf, other

    cs.CL

    MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

    Authors: Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Mądry

    Abstract: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Ka… ▽ More

    Submitted 26 February, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 10 pages, 17 pages appendix. Equal contribution by first seven authors, authors randomized. ICLR version

  42. Leveraging AI-Generated Emotional Self-Voice to Nudge People towards their Ideal Selves

    Authors: Cathy Mengying Fang, Phoebe Chua, Samantha Chan, Joanne Leong, Andria Bao, Pattie Maes

    Abstract: Emotions, shaped by past experiences, significantly influence decision-making and goal pursuit. Traditional cognitive-behavioral techniques for personal development rely on mental imagery to envision ideal selves, but may be less effective for individuals who struggle with visualization. This paper introduces Emotional Self-Voice (ESV), a novel system combining emotionally expressive language mode… ▽ More

    Submitted 9 April, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

  43. arXiv:2409.08895  [pdf, other

    cs.HC cs.AI

    Synthetic Human Memories: AI-Edited Images and Videos Can Implant False Memories and Distort Recollection

    Authors: Pat Pataranutaporn, Chayapatr Archiwaranguprok, Samantha W. T. Chan, Elizabeth Loftus, Pattie Maes

    Abstract: AI is increasingly used to enhance images and videos, both intentionally and unintentionally. As AI editing tools become more integrated into smartphones, users can modify or animate photos into realistic videos. This study examines the impact of AI-altered visuals on false memories--recollections of events that didn't occur or deviate from reality. In a pre-registered study, 200 participants were… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 22 pages, 11 figures, 2 tables

  44. arXiv:2408.16465  [pdf, other

    cs.HC

    Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

    Authors: Szeyi Chan, Shihan Fu, Jiachen Li, Bingsheng Yao, Smit Desai, Mirjana Prpa, Dakuo Wang

    Abstract: Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for cont… ▽ More

    Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  45. arXiv:2408.11847  [pdf, other

    cs.CL

    Prompto: An open source library for asynchronous querying of LLM endpoints

    Authors: Ryan Sze-Yin Chan, Federico Nanni, Angus R. Williams, Edwin Brown, Liam Burke-Moore, Ed Chapman, Kate Onslow, Tvesha Sippy, Jonathan Bright, Evelina Gabasova

    Abstract: Recent surge in Large Language Model (LLM) availability has opened exciting avenues for research. However, efficiently interacting with these models presents a significant hurdle since LLMs often reside on proprietary or self-hosted API endpoints, each requiring custom code for interaction. Conducting comparative studies between different models can therefore be time-consuming and necessitate sign… ▽ More

    Submitted 16 December, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  46. arXiv:2408.10624  [pdf, other

    cs.CV cs.AI

    WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification

    Authors: Yonggan Wu, Ling-Chao Meng, Yuan Zichao, Sixian Chan, Hong-Qiang Wang

    Abstract: For the visible-infrared person re-identification (VI-ReID) task, one of the primary challenges lies in significant cross-modality discrepancy. Existing methods struggle to conduct modality-invariant information mining. They often focus solely on mining singular dimensions like spatial or channel, and overlook the extraction of specific-modality multi-dimension information. To fully mine modality-… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 18 pages, 5 figures

  47. arXiv:2408.06731  [pdf, other

    cs.CY cs.AI cs.CL

    Large language models can consistently generate high-quality content for election disinformation operations

    Authors: Angus R. Williams, Liam Burke-Moore, Ryan Sze-Yin Chan, Florence E. Enock, Federico Nanni, Tvesha Sippy, Yi-Ling Chung, Evelina Gabasova, Kobi Hackenburg, Jonathan Bright

    Abstract: Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate con… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  48. arXiv:2408.04681  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

    Authors: Samantha Chan, Pat Pataranutaporn, Aditya Suri, Wazeer Zulfikar, Pattie Maes, Elizabeth F. Loftus

    Abstract: This study examines the impact of AI on human false memories -- recollections of events that did not occur or deviate from actual occurrences. It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews. Four conditions were tested: control, survey-based, pre-scripted chatbot, and generative chatbot using a large language model (L… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  49. arXiv:2408.02960  [pdf, other

    cs.AI

    Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

    Authors: Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

    Abstract: Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to… ▽ More

    Submitted 17 December, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to AAAI 2025

  50. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.