Skip to main content

Showing 1–50 of 136 results for author: Wong, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00185  [pdf

    eess.IV cs.AI cs.CV

    Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

    Authors: Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

    Abstract: Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundatio… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 42 pages, 3 composite figures, 4 tables

  2. A Sea of Cyber Threats: Maritime Cybersecurity from the Perspective of Mariners

    Authors: Anna Raymaker, Akshaya Kumar, Miuyin Yong Wong, Ryan Pickren, Animesh Chhotaray, Frank Li, Saman Zonouz, Raheem Beyah

    Abstract: Maritime systems, including ships and ports, are critical components of global infrastructure, essential for transporting over 80% of the world's goods and supporting internet connectivity. However, these systems face growing cybersecurity threats, as shown by recent attacks disrupting Maersk, one of the world's largest shipping companies, causing widespread impacts on international trade. The uni… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 18 pages, 2 figures, To appear in the Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25)

  3. arXiv:2506.02461  [pdf, ps, other

    cs.CL

    XToM: Exploring the Multilingual Theory of Mind for Large Language Models

    Authors: Chunkit Chan, Yauwai Yim, Hongchuan Zeng, Zhiying Zou, Xinyuan Cheng, Zhifan Sun, Zheye Deng, Kawai Chung, Yuzhuo Ao, Yixiang Fan, Cheng Jiayang, Ercong Nie, Ginny Y. Wong, Helmut Schmid, Hinrich Schütze, Simon See, Yangqiu Song

    Abstract: Theory of Mind (ToM), the ability to infer mental states in others, is pivotal for human social cognition. Existing evaluations of ToM in LLMs are largely limited to English, neglecting the linguistic diversity that shapes human cognition. This limitation raises a critical question: can LLMs exhibit Multilingual Theory of Mind, which is the capacity to reason about mental states across diverse lin… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  4. arXiv:2505.10261  [pdf

    cs.CL cs.AI

    The Evolving Landscape of Generative Large Language Models and Traditional Natural Language Processing in Medicine

    Authors: Rui Yang, Huitao Li, Matthew Yu Heng Wong, Yuhe Ke, Xin Li, Kunyu Yu, Jingchi Liao, Jonathan Chong Kai Liew, Sabarinath Vinod Nair, Jasmine Chiat Ling Ong, Irene Li, Douglas Teodoro, Chuan Hong, Daniel Shu Wei Ting, Nan Liu

    Abstract: Natural language processing (NLP) has been traditionally applied to medicine, and generative large language models (LLMs) have become prominent recently. However, the differences between them across different medical tasks remain underexplored. We analyzed 19,123 studies, finding that generative LLMs demonstrate advantages in open-ended tasks, while traditional NLP dominates in information extract… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  5. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.07835  [pdf, other

    cs.NI cs.AI cs.MA

    Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards

    Authors: Alex C. Y. Wong, Duncan McFarlane, C. Ellarby, M. Lee, M. Kuok

    Abstract: Twenty-five years ago, the specification of the Intelligent Product was established, envisaging real-time connectivity that not only enables products to gather accurate data about themselves but also allows them to assess and influence their own destiny. Early work by the Auto-ID project focused on creating a single, open-standard repository for storing and retrieving product information, laying a… ▽ More

    Submitted 14 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: 18 pages, 1 Figure, 3 Tables; Corrected typo in Section 3.4 heading

    ACM Class: I.2.11; C.3; C.2.4

  7. arXiv:2504.13462  [pdf, other

    cs.LG

    Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling

    Authors: Hui Yeok Wong, Chee Kau Lim, Chee Seng Chan

    Abstract: Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance signi… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  8. arXiv:2504.05081  [pdf, other

    cs.CL

    The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

    Authors: Tianshi Zheng, Yixiang Chen, Chengxi Li, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs) through the generation of explicit explanatory rationales. However, our study reveals a surprising contradiction to this prevailing perspective. Through extensive experiments involving 16 state-of-the-art LLMs and nine diverse pattern-based in-context learni… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 30 pages, 12 tables, 6 figures

  9. arXiv:2504.02283  [pdf

    cs.LG

    Ga$_2$O$_3$ TCAD Mobility Parameter Calibration using Simulation Augmented Machine Learning with Physics Informed Neural Network

    Authors: Le Minh Long Nguyen, Edric Ong, Matthew Eng, Yuhao Zhang, Hiu Yung Wong

    Abstract: In this paper, we demonstrate the possibility of performing automatic Technology Computer-Aided-Design (TCAD) parameter calibration using machine learning, verified with experimental data. The machine only needs to be trained by TCAD data. Schottky Barrier Diode (SBD) fabricated with emerging ultra-wide-bandgap material, Gallium Oxide (Ga$_2$O$_3$), is measured and its current-voltage (IV) is used… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 4 pages, 3 figures

  10. arXiv:2503.18436  [pdf, other

    cs.LG

    Distributionally Robust Federated Learning: An ADMM Algorithm

    Authors: Wen Bai, Yi Wong, Xiao Qiao, Chin Pang Ho

    Abstract: Federated learning (FL) aims to train machine learning (ML) models collaboratively using decentralized data, bypassing the need for centralized data aggregation. Standard FL models often assume that all data come from the same unknown distribution. However, in practical situations, decentralized data frequently exhibit heterogeneity. We propose a novel FL model, Distributionally Robust Federated L… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  11. arXiv:2503.12440  [pdf, ps, other

    cs.CL

    HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs

    Authors: Tsz Chung Cheng, Chung Shing Cheng, Chaak Ming Lau, Eugene Tin-Ho Lam, Chun Yat Wong, Hoi On Yu, Cheuk Hei Chong

    Abstract: The ability of language models to comprehend and interact in diverse linguistic and cultural landscapes is crucial. The Cantonese language used in Hong Kong presents unique challenges for natural language processing due to its rich cultural nuances and lack of dedicated evaluation datasets. The HKCanto-Eval benchmark addresses this gap by evaluating the performance of large language models (LLMs)… ▽ More

    Submitted 6 July, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2503.11861  [pdf, other

    cs.LG cs.HC cs.IT

    Banking on Feedback: Text Analysis of Mobile Banking iOS and Google App Reviews

    Authors: Yekta Amirkhalili, Ho Yi Wong

    Abstract: The rapid growth of mobile banking (m-banking), especially after the COVID-19 pandemic, has reshaped the financial sector. This study analyzes consumer reviews of m-banking apps from five major Canadian banks, collected from Google Play and iOS App stores. Sentiment analysis and topic modeling classify reviews as positive, neutral, or negative, highlighting user preferences and areas for improveme… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  13. arXiv:2502.11176  [pdf, other

    cs.CL

    LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning

    Authors: Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Modern large language models (LLMs) employ various forms of logical inference, both implicitly and explicitly, when addressing reasoning tasks. Understanding how to optimally leverage these inference paradigms is critical for advancing LLMs' reasoning capabilities. This paper adopts an exploratory approach by introducing a controlled evaluation environment for analogical reasoning -- a fundamental… ▽ More

    Submitted 9 April, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 21 pages

  14. arXiv:2501.08710  [pdf, other

    cs.LG stat.ML

    Disentangled Interleaving Variational Encoding

    Authors: Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan

    Abstract: Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principl… ▽ More

    Submitted 16 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  15. arXiv:2412.20418  [pdf, other

    eess.IV cs.CV

    Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

    Authors: Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multimodal data, which is unrealistic for real-world clinical images, particularly for indistinct and diffuse regions such as liver tumors. In this paper,… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  16. arXiv:2412.15614  [pdf, other

    cs.CR cs.CV

    Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM

    Authors: Yangyang Guo, Ziwei Xu, Xilie Xu, YongKang Wong, Liqiang Nie, Mohan Kankanhalli

    Abstract: This technical report introduces our top-ranked solution that employs two approaches, \ie suffix injection and projected gradient descent (PGD) , to address the TiFA workshop MLLM attack challenge. Specifically, we first append the text from an incorrectly labeled option (pseudo-labeled) to the original query as a suffix. Using this modified query, our second approach applies the PGD method to add… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: ICML TiFA Challenge Technical Report

  17. arXiv:2412.14757  [pdf, other

    quant-ph cs.NI

    Space-time Peer-to-Peer Distribution of Multi-party Entanglement for Any Quantum Network

    Authors: Yuexun Huang, Xiangyu Ren, Bikun Li, Yat Wong, Zhiding Liang, Liang Jiang

    Abstract: Graph states are a class of important multiparty entangled states, of which bell pairs are the special case. Realizing a robust and fast distribution of arbitrary graph states in the downstream layer of the quantum network can be essential for further large-scale quantum networks. We propose a novel quantum network protocol called P2PGSD inspired by the classical Peer-to-Peer (P2P) network to effi… ▽ More

    Submitted 5 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

    Journal ref: Proc. ACM Meas. Anal. Comput. Syst. 9, 2, Article 31 (June 2025)

  18. arXiv:2412.10726  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

    Authors: Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang

    Abstract: The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and re… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  19. Artificial Intelligence without Restriction Surpassing Human Intelligence with Probability One: Theoretical Insight into Secrets of the Brain with AI Twins of the Brain

    Authors: Guang-Bin Huang, M. Brandon Westover, Eng-King Tan, Haibo Wang, Dongshun Cui, Wei-Ying Ma, Tiantong Wang, Qi He, Haikun Wei, Ning Wang, Qiyuan Tian, Kwok-Yan Lam, Xin Yao, Tien Yin Wong

    Abstract: Artificial Intelligence (AI) has apparently become one of the most important techniques discovered by humans in history while the human brain is widely recognized as one of the most complex systems in the universe. One fundamental critical question which would affect human sustainability remains open: Will artificial intelligence (AI) evolve to surpass human intelligence in the future? This paper… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: Accepted by journal Neurocomputing

  20. arXiv:2412.04629  [pdf, ps, other

    cs.HC cs.CY cs.IR

    Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

    Authors: Li Shi, Houjiang Liu, Yian Wong, Utkarsh Mujumdar, Dan Zhang, Jacek Gwizdka, Matthew Lease

    Abstract: Large language models (LLMs) are enabling designers to give life to exciting new user experiences for information access. In this work, we present a system that generates LLM personas to debate a topic of interest from different perspectives. How might information seekers use and benefit from such a system? Can centering information access around diverse viewpoints help to mitigate thorny challeng… ▽ More

    Submitted 29 May, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  21. arXiv:2410.20635  [pdf, other

    cs.RO

    Generating and Optimizing Topologically Distinct Guesses for Mobile Manipulator Path Planning

    Authors: Rufus Cheuk Yin Wong, Mayank Sewlia, Adrian Wiltz, Dimos V. Dimarogonas

    Abstract: Optimal path planning often suffers from getting stuck in a local optimum. This is often the case for mobile manipulators due to nonconvexities induced by obstacles and robot kinematics. This paper attempts to circumvent this issue by proposing a pipeline to obtain multiple distinct local optima. By evaluating and selecting the optimum among multiple distinct local optima, it is likely to obtain a… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

  22. arXiv:2410.20436  [pdf, other

    cs.CV

    CoralSCOP-LAT: Labeling and Analyzing Tool for Coral Reef Images with Dense Mask

    Authors: Yuk-Kwan Wong, Ziqiang Zheng, Mingzhe Zhang, David Suggett, Sai-Kit Yeung

    Abstract: Images of coral reefs provide invaluable information, which is essentially critical for surveying and monitoring the coral reef ecosystems. Robust and precise identification of coral reef regions within surveying imagery is paramount for assessing coral coverage, spatial distribution, and other statistical analyses. However, existing coral reef analytical approaches mainly focus on sparse points s… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Comments: The coral reef labeling and analysis tool is available at https://coralscop.hkustvgd.com/

  23. arXiv:2410.04239  [pdf, other

    cs.CL

    Persona Knowledge-Aligned Prompt Tuning Method for Online Debate

    Authors: Chunkit Chan, Cheng Jiayang, Xin Liu, Yauwai Yim, Yuxin Jiang, Zheye Deng, Haoran Li, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Debate is the process of exchanging viewpoints or convincing others on a particular issue. Recent research has provided empirical evidence that the persuasiveness of an argument is determined not only by language usage but also by communicator characteristics. Researchers have paid much attention to aspects of languages, such as linguistic features and discourse structures, but combining argument… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: Accepted to ECAI 2024

  24. The Future of HCI-Policy Collaboration

    Authors: Qian Yang, Richmond Y Wong, Steven J Jackson, Sabine Junginger, Margaret D Hagan, Thomas Gilbert, John Zimmerman

    Abstract: Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy i… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24)

  25. arXiv:2408.15287  [pdf, other

    quant-ph cs.LG

    Quantum-Powered Personalized Learning

    Authors: Yifan Zhou, Chong Cheng Xu, Mingi Song, Yew Kee Wong

    Abstract: This paper explores the transformative potential of quantum computing in the realm of personalized learning. Traditional machine learning models and GPU-based approaches have long been utilized to tailor educational experiences to individual student needs. However, these methods face significant challenges in terms of scalability, computational efficiency, and real-time adaptation to the dynamic n… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures

  26. arXiv:2408.07921  [pdf

    cs.LG

    Physics-Informed Neural Network for Predicting Out-of-Training-Range TCAD Solution with Minimized Domain Expertise

    Authors: Albert Lu, Yu Foon Chau, Hiu Yung Wong

    Abstract: Machine learning (ML) is promising in assisting technology computer-aided design (TCAD) simulations to alleviate difficulty in convergence and prolonged simulation time. While ML is widely used in TCAD, they either require access to the internal solver, require extensive domain expertise, are only trained by terminal quantities such as currents and voltages, and/or lack out-of-training-range predi… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  27. arXiv:2408.05564  [pdf, other

    cs.NE cs.CE

    Meta-heuristic Optimizer Inspired by the Philosophy of Yi Jing

    Authors: Yisheng Yang, Sim Kuan Goh, Qing Cai, Shen Yuong Wong, Ho-Kin Tang

    Abstract: Drawing inspiration from the philosophy of Yi Jing, the Yin-Yang pair optimization (YYPO) algorithm has been shown to achieve competitive performance in single objective optimizations, in addition to the advantage of low time complexity when compared to other population-based meta-heuristics. Building upon a reversal concept in Yi Jing, we propose the novel Yi optimization (YI) algorithm. Specific… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. arXiv admin note: substantial text overlap with arXiv:2104.08564

  28. arXiv:2407.07666  [pdf

    cs.CL cs.AI

    A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

    Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

    Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  29. arXiv:2406.15527  [pdf, other

    cs.LG cs.CL

    Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

    Authors: Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

    Abstract: Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as cluster… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  30. arXiv:2406.04629  [pdf, other

    cs.CV cs.GR cs.MM

    STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

    Authors: Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

    Abstract: The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Di… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Tech report

  31. Ethics Pathways: A Design Activity for Reflecting on Ethics Engagement in HCI Research

    Authors: Inha Cha, Ajit G. Pillai, Richmond Y. Wong

    Abstract: This paper introduces Ethics Pathways, a design activity aimed at understanding HCI and design researchers' ethics engagements and flows during their research process. Despite a strong ethical commitment in these fields, challenges persist in grasping the complexity of researchers' engagement with ethics -- practices conducted to operationalize ethics -- in situated institutional contexts. Ethics… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted at ACM Designing Interactive Systems (DIS) 2024

  32. arXiv:2405.13911  [pdf, other

    cs.CV cs.AI cs.CL

    TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

    Authors: Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang

    Abstract: Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises from the inherent complexity of videos and the inefficient language supervision in recent web-collected video-text datasets. In this paper, we introduc… ▽ More

    Submitted 3 November, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024 (Spotlight)

  33. arXiv:2405.12538  [pdf, other

    cs.CV

    Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

    Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

    Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  34. Broadening Privacy and Surveillance: Eliciting Interconnected Values with a Scenarios Workbook on Smart Home Cameras

    Authors: Richmond Y. Wong, Jason Caleb Valdez, Ashten Alexander, Ariel Chiang, Olivia Quesada, James Pierce

    Abstract: We use a design workbook of speculative scenarios as a values elicitation activity with 14 participants. The workbook depicts use case scenarios with smart home camera technologies that involve surveillance and uneven power relations. The scenarios were initially designed by the researchers to explore scenarios of privacy and surveillance within three social relationships involving "primary" and "… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS '23)

  35. arXiv:2403.16304  [pdf

    cs.CR

    SoK: An Essential Guide For Using Malware Sandboxes In Security Applications: Challenges, Pitfalls, and Lessons Learned

    Authors: Omar Alrawi, Miuyin Yong Wong, Athanasios Avgetidis, Kevin Valakuzhy, Boladji Vinny Adjibi, Konstantinos Karakatsanis, Mustaque Ahamad, Doug Blough, Fabian Monrose, Manos Antonakakis

    Abstract: Malware sandboxes provide many benefits for security applications, but they are complex. These complexities can overwhelm new users in different research areas and make it difficult to select, configure, and use sandboxes. Even worse, incorrectly using sandboxes can have a negative impact on security applications. In this paper, we address this knowledge gap by systematizing 84 representative pape… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  36. arXiv:2403.12381  [pdf, other

    cs.CE

    Explainable AutoML (xAutoML) with adaptive modeling for yield enhancement in semiconductor smart manufacturing

    Authors: Weihong Zhai, Xiupeng Shi, Yiik Diew Wong, Qing Han, Lisheng Chen

    Abstract: Enhancing yield is recognized as a paramount driver to reducing production costs in semiconductor smart manufacturing. However, optimizing and ensuring high yield rates is a highly complex and technical challenge, especially while maintaining reliable yield diagnosis and prognosis, and this shall require understanding all the confounding factors in a complex condition. This study proposes a domain… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  37. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  38. Stuck-at Faults in ReRAM Neuromorphic Circuit Array and their Correction through Machine Learning

    Authors: Vedant Sawal, Hiu Yung Wong

    Abstract: In this paper, we study the inference accuracy of the Resistive Random Access Memory (ReRAM) neuromorphic circuit due to stuck-at faults (stuck-on, stuck-off, and stuck at a certain resistive value). A simulation framework using Python is used to perform supervised machine learning (neural network with 3 hidden layers, 1 input layer, and 1 output layer) of handwritten digits and construct a corres… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  39. arXiv:2402.10646  [pdf, other

    cs.CL

    AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation

    Authors: Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs' abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LL… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  40. arXiv:2402.09108  [pdf, other

    quant-ph cs.CR

    Novel Long Distance Free Space Quantum Secure Direct Communication for Web 3.0 Networks

    Authors: Yifan Zhou, Xinlin Zhou, Zi Yan Li, Yew Kee Wong, Yan Shing Liang

    Abstract: With the advent of Web 3.0, the swift advancement of technology confronts an imminent threat from quantum computing. Security protocols safeguarding the integrity of Web 2.0 and Web 3.0 are growing more susceptible to both quantum attacks and sophisticated classical threats. The article introduces our novel long-distance free-space quantum secure direct communication (LF QSDC) as a method to safeg… ▽ More

    Submitted 29 August, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 17 pages, 6 figures

  41. arXiv:2401.14074  [pdf, other

    cs.CV cs.LG

    ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

    Authors: Y. Liu, L. Lin, K. K. Y. Wong, X. Tang

    Abstract: Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medi… ▽ More

    Submitted 23 December, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  42. arXiv:2312.03231  [pdf, other

    cs.LG cs.AI cs.CV cs.HC eess.AS

    Deep Multimodal Fusion for Surgical Feedback Classification

    Authors: Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

    Abstract: Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In th… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: Published in Proceedings of Machine Learning for Health 2024

  43. arXiv:2311.07604  [pdf, other

    cs.LG cs.AI cs.CV cs.CY

    Finetuning Text-to-Image Diffusion Models for Fairness

    Authors: Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli

    Abstract: The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: ICLR 2024 oral presentation

  44. arXiv:2311.03032  [pdf, other

    cs.RO

    Reconfigurable, Transformable Soft Pneumatic Actuator with Tunable 3D Deformations for Dexterous Soft Robotics Applications

    Authors: Dickson Chiu Yu Wong, Mingtan Li, Shijie Kang, Lifan Luo, Hongyu Yu

    Abstract: Numerous soft actuators based on PneuNet design have already been proposed and extensively employed across various soft robotics applications in recent years. Despite their widespread use, a common limitation of most existing designs is that their action is pre-determined during the fabrication process, thereby restricting the ability to modify or alter their function during operation. To address… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Submitted to Soft Robotics Journal. 12 pages, 10 figures

  45. arXiv:2310.16684  [pdf, other

    cs.CV

    Local Statistics for Generative Image Detection

    Authors: Yung Jer Wong, Teck Khim Ng

    Abstract: Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvements in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthes… ▽ More

    Submitted 3 March, 2025; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 4 pages

  46. arXiv:2310.05210  [pdf, other

    cs.AI cs.CL

    TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

    Authors: Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argumen… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to the 10th Workshop on Argument Mining, co-located with EMNLP 2023

  47. VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Journal ref: The latest VisionFM work has been published in NEJM AI, 2024

  48. arXiv:2309.16738  [pdf, other

    cs.CV

    ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens

    Authors: Yangyang Guo, Haoyu Zhang, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli

    Abstract: Learning a versatile language-image model is computationally prohibitive under a limited computing budget. This paper delves into the \emph{efficient language-image pre-training}, an area that has received relatively little attention despite its importance in reducing computational cost and footprint. To that end, we propose a vision token pruning and merging method ELIP, to remove less influentia… ▽ More

    Submitted 17 November, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  49. arXiv:2309.03031  [pdf, other

    cs.CV

    MCM: Multi-condition Motion Synthesis Framework for Multi-scenario

    Authors: Zeyu Ling, Bo Han, Yongkang Wong, Mohan Kangkanhalli, Weidong Geng

    Abstract: The objective of the multi-condition human motion synthesis task is to incorporate diverse conditional inputs, encompassing various forms like text, music, speech, and more. This endows the task with the capability to adapt across multiple scenarios, ranging from text-to-motion and music-to-dance, among others. While existing research has primarily focused on single conditions, the multi-condition… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  50. arXiv:2308.08561  [pdf

    q-bio.BM cs.AI cs.LG

    Implementation of The Future of Drug Discovery: QuantumBased Machine Learning Simulation (QMLS)

    Authors: Yifan Zhou, Yan Shing Liang, Yew Kee Wong, Haichuan Qiu, Yu Xi Wu, Bin He

    Abstract: The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of… ▽ More

    Submitted 5 September, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 13 pages, 6 figures

    Journal ref: International Journal of Computer Science and Mobile Applications, Vol 11 Issue 5,May- 2023