Search | arXiv e-print repository

One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees

Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a si… ▽ More We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a single end-to-end objective. We define a cost-sensitive loss and derive a novel convex surrogate that is independent of the cardinality parameter $k$, enabling generalization across Top-$k$ regimes without retraining. Our formulation recovers the Top-1 deferral policy of prior score-based methods as a special case, and we prove that our surrogate is both Bayes-consistent and $\mathcal{H}$-consistent under mild assumptions. We further introduce an adaptive variant, Top-$k(x)$, which dynamically selects the number of consulted entities per input to balance predictive accuracy and consultation cost. Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral, while Top-$k(x)$ achieves superior accuracy-cost trade-offs by tailoring allocations to input complexity. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2504.16533 [pdf, other]

SafeSpect: Safety-First Augmented Reality Heads-up Display for Drone Inspections

Authors: Peisen Xu, Jérémie Garcia, Wei Tsang Ooi, Christophe Jouffrais

Abstract: Current tablet-based interfaces for drone operations often impose a heavy cognitive load on pilots and reduce situational awareness by dividing attention between the video feed and the real world. To address these challenges, we designed a heads-up augmented reality (AR) interface that overlays in-situ information to support drone pilots in safety-critical tasks. Through participatory design works… ▽ More Current tablet-based interfaces for drone operations often impose a heavy cognitive load on pilots and reduce situational awareness by dividing attention between the video feed and the real world. To address these challenges, we designed a heads-up augmented reality (AR) interface that overlays in-situ information to support drone pilots in safety-critical tasks. Through participatory design workshops with professional pilots, we identified key features and developed an adaptive AR interface that dynamically switches between task and safety views to prevent information overload. We evaluated our prototype by creating a realistic building inspection task and comparing three interfaces: a 2D tablet, a static AR, and our adaptive AR design. A user study with 15 participants showed that the AR interface improved access to safety information, while the adaptive AR interface reduced cognitive load and enhanced situational awareness without compromising task performance. We offer design insights for developing safety-first heads-up AR interfaces. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.12988 [pdf, other]

Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts

Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: Although existing Learning-to-Defer (L2D) frameworks support multiple experts, they allocate each query to a single expert, limiting their ability to leverage collective expertise in complex decision-making scenarios. To address this, we introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts. Our formulation strictly… ▽ More Although existing Learning-to-Defer (L2D) frameworks support multiple experts, they allocate each query to a single expert, limiting their ability to leverage collective expertise in complex decision-making scenarios. To address this, we introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts. Our formulation strictly generalizes classical two-stage L2D by supporting multi-expert deferral-a capability absent in prior work. We further propose Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of experts per query based on input complexity, expert quality, and consultation cost. We introduce a novel surrogate loss that is Bayes-consistent, $(\mathcal{R}, \mathcal{G})$-consistent, and independent of the cardinality parameter $k$, enabling efficient reuse across different values of $k$. We show that classical model cascades arise as a special case of our method, situating our framework as a strict generalization of both selective deferral and cascaded inference. Experiments on classification and regression demonstrate that Top-$k$ and Top-$k(x)$ yield improved accuracy--cost trade-offs, establishing a new direction for multi-expert deferral in Learning-to-Defer. △ Less

Submitted 15 May, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

arXiv:2503.19916 [pdf, other]

EventFly: Event Camera Perception from Ground to the Sky

Authors: Lingdong Kong, Dongyue Lu, Xiang Xu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau

Abstract: Cross-platform adaptation in event-based dense perception is crucial for deploying event cameras across diverse settings, such as vehicles, drones, and quadrupeds, each with unique motion dynamics, viewpoints, and class distributions. In this work, we introduce EventFly, a framework for robust cross-platform adaptation in event camera perception. Our approach comprises three key components: i) Eve… ▽ More Cross-platform adaptation in event-based dense perception is crucial for deploying event cameras across diverse settings, such as vehicles, drones, and quadrupeds, each with unique motion dynamics, viewpoints, and class distributions. In this work, we introduce EventFly, a framework for robust cross-platform adaptation in event camera perception. Our approach comprises three key components: i) Event Activation Prior (EAP), which identifies high-activation regions in the target domain to minimize prediction entropy, fostering confident, domain-adaptive predictions; ii) EventBlend, a data-mixing strategy that integrates source and target event voxel grids based on EAP-driven similarity and density maps, enhancing feature alignment; and iii) EventMatch, a dual-discriminator technique that aligns features from source, target, and blended domains for better domain-invariant learning. To holistically assess cross-platform adaptation abilities, we introduce EXPo, a large-scale benchmark with diverse samples across vehicle, drone, and quadruped platforms. Extensive experiments validate our effectiveness, demonstrating substantial gains over popular adaptation methods. We hope this work can pave the way for more adaptive, high-performing event perception across diverse and complex environments. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: CVPR 2025; 30 pages, 8 figures, 16 tables; Project Page at https://event-fly.github.io/

arXiv:2502.01234 [pdf, ps, other]

Homeomorphism of the Revuz correspondence for finite energy integrals

Authors: Takumu Ooi

Abstract: We provide necessary and sufficient conditions for the convergence of Revuz measures of finite energy integrals. More precisely, the Revuz map from the set of all smooth measures of finite energy integrals, equipped with the topology induced by the norm given by the sum of the Dirichlet form and the $L^2(m)$-norm, to the space of positive continuous additive functionals, equipped with the topolo… ▽ More We provide necessary and sufficient conditions for the convergence of Revuz measures of finite energy integrals. More precisely, the Revuz map from the set of all smooth measures of finite energy integrals, equipped with the topology induced by the norm given by the sum of the Dirichlet form and the $L^2(m)$-norm, to the space of positive continuous additive functionals, equipped with the topology induced by the $L^2(\mathbb{P}_{m+κ})$-norm with the local uniform topology, is a homeomorphism, where $m$ is the underlying measure and $κ$ is the killing measure of a Dirichlet form. △ Less

Submitted 15 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

Comments: 23 pages

MSC Class: 31C25; 60J55; 28A33

arXiv:2502.01027 [pdf, other]

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation--causing costly misrouting or expe… ▽ More Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation--causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategie--untargeted and targeted--which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose SARD, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent. These guarantees hold across classification, regression, and multi-task settings. Empirical results demonstrate that SARD significantly improves robustness under adversarial attacks while maintaining strong clean performance, marking a critical step toward secure and trustworthy L2D deployment. △ Less

Submitted 23 May, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

arXiv:2501.13045 [pdf, other]

Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes

Authors: Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang Ooi

Abstract: 3D Gaussian Splatting (3DGS) has emerged as a promising representation for photorealistic rendering of 3D scenes. However, its high storage requirements pose significant challenges for practical applications. We observe that Gaussians exhibit distinct roles and characteristics that are analogous to traditional artistic techniques -- Like how artists first sketch outlines before filling in broader… ▽ More 3D Gaussian Splatting (3DGS) has emerged as a promising representation for photorealistic rendering of 3D scenes. However, its high storage requirements pose significant challenges for practical applications. We observe that Gaussians exhibit distinct roles and characteristics that are analogous to traditional artistic techniques -- Like how artists first sketch outlines before filling in broader areas with color, some Gaussians capture high-frequency features like edges and contours; While other Gaussians represent broader, smoother regions, that are analogous to broader brush strokes that add volume and depth to a painting. Based on this observation, we propose a novel hybrid representation that categorizes Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded using parametric models, leveraging their geometric coherence, while Patch Gaussians undergo optimized pruning, retraining, and vector quantization to maintain volumetric consistency and storage efficiency. Our comprehensive evaluation across diverse indoor and outdoor scenes demonstrates that this structure-aware approach achieves up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent model sizes, and correspondingly, for an indoor scene, our model maintains the visual quality with 2.3% of the original model size. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2501.12060 [pdf, other]

GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting

Authors: Longan Wang, Yuang Shi, Wei Tsang Ooi

Abstract: 3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames. GSVC incorporates the following techniques: (i) To exploit temporal red… ▽ More 3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames. GSVC incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process. Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and VVC, and a rendering speed of 1500 fps for a 1920x1080 video. △ Less

Submitted 22 January, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

arXiv:2412.06708 [pdf, ps, other]

FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies

Authors: Dongyue Lu, Lingdong Kong, Gim Hee Lee, Camille Simon Chane, Wei Tsang Ooi

Abstract: Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel fra… ▽ More Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems. Code is publicly available. △ Less

Submitted 29 May, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: Preprint; 27 pages, 14 figures, 10 tables; Code at https://flexevent.github.io/

arXiv:2410.15761 [pdf, ps, other]

Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

Authors: Yannis Montreuil, Shu Heng Yeo, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-… ▽ More Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions while optimizing computational efficiency. Our approach integrates a principled allocation strategy with theoretical guarantees on optimal deferral that balances performance and cost. Empirical evaluations on SQuADv1, SQuADv2, and TriviaQA demonstrate that our method enhances answer reliability while significantly reducing computational overhead, making it well-suited for scalable and efficient EQA deployment. △ Less

Submitted 18 February, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: 25 pages, 17 main paper

arXiv:2410.15729 [pdf, ps, other]

A Two-Stage Learning-to-Defer Approach for Multi-Task Learning

Authors: Yannis Montreuil, Shu Heng Yeo, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leve… ▽ More The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and $(\mathcal{G}, \mathcal{R})$-consistent, ensuring convergence to the Bayes-optimal rejector. We derive explicit consistency bounds tied to the cross-entropy surrogate and the $L_1$-norm of agent-specific costs, and extend minimizability gap analysis to the multi-expert two-stage regime. We also make explicit how shared representation learning--commonly used in multi-task models--affects these consistency guarantees. Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach and highlight the limitations of existing L2D methods in multi-task scenarios. △ Less

Submitted 23 May, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: 32 pages, 17 main paper

arXiv:2410.01753 [pdf, other]

doi 10.1038/s41586-024-08256-5

$^{229}\mathrm{ThF}_4$ thin films for solid-state nuclear clocks

Authors: Chuankun Zhang, Lars von der Wense, Jack F. Doyle, Jacob S. Higgins, Tian Ooi, Hans U. Friebel, Jun Ye, R. Elwell, J. E. S. Terhune, H. W. T. Morgan, A. N. Alexandrova, H. B. Tran Tan, Andrei Derevianko, Eric R. Hudson

Abstract: After nearly fifty years of searching, the vacuum ultraviolet $^{229}$Th nuclear isomeric transition has recently been directly laser excited [1,2] and measured with high spectroscopic precision [3]. Nuclear clocks based on this transition are expected to be more robust [4,5] than and may outperform [6,7] current optical atomic clocks. They also promise sensitive tests for new physics beyond the s… ▽ More After nearly fifty years of searching, the vacuum ultraviolet $^{229}$Th nuclear isomeric transition has recently been directly laser excited [1,2] and measured with high spectroscopic precision [3]. Nuclear clocks based on this transition are expected to be more robust [4,5] than and may outperform [6,7] current optical atomic clocks. They also promise sensitive tests for new physics beyond the standard model [5,8,9]. In light of these important advances and applications, a dramatic increase in the need for $^{229}$Th spectroscopy targets in a variety of platforms is anticipated. However, the growth and handling of high-concentration $^{229}$Th-doped crystals [5] used in previous measurements [1-3,10] are challenging due to the scarcity and radioactivity of the $^{229}$Th material. Here, we demonstrate a potentially scalable solution to these problems by demonstrating laser excitation of the nuclear transition in $^{229}$ThF$_4$ thin films grown with a physical vapor deposition process, consuming only micrograms of $^{229}$Th material. The $^{229}$ThF$_4$ thin films are intrinsically compatible with photonics platforms and nanofabrication tools for integration with laser sources and detectors, paving the way for an integrated and field-deployable solid-state nuclear clock with radioactivity up to three orders of magnitude smaller than typical \thor-doped crystals [1-3,10]. The high nuclear emitter density in $^{229}$ThF$_4$ also potentially enables quantum optics studies in a new regime. Finally, we describe the operation and present the estimation of the performance of a nuclear clock based on a defect-free ThF$_4$ crystal. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 15 pages, 3 figures

Journal ref: Nature 636, 603-608 (2024)

arXiv:2409.11590 [pdf, other]

Temperature sensitivity of a Thorium-229 solid-state nuclear clock

Authors: Jacob S. Higgins, Tian Ooi, Jack F. Doyle, Chuankun Zhang, Jun Ye, Kjeld Beeks, Tomas Sikorsky, Thorsten Schumm

Abstract: Quantum state-resolved spectroscopy of the low energy thorium-229 nuclear transition was recently achieved. The five allowed transitions within the electric quadrupole structure were measured to the kilohertz level in a calcium fluoride host crystal, opening many new areas of research using nuclear clocks. Central to the performance of solid-state clock operation is an understanding of systematic… ▽ More Quantum state-resolved spectroscopy of the low energy thorium-229 nuclear transition was recently achieved. The five allowed transitions within the electric quadrupole structure were measured to the kilohertz level in a calcium fluoride host crystal, opening many new areas of research using nuclear clocks. Central to the performance of solid-state clock operation is an understanding of systematic shifts such as the temperature dependence of the clock transitions. In this work, we measure the four strongest transitions of thorium-229 in the same crystal at three temperature values: 150 K, 229 K, and 293 K. We find shifts of the unsplit frequency and the electric quadrupole splittings, corresponding to decreases in the electron density, electric field gradient, and field gradient asymmetry at the nucleus as temperature increases. The $\textit{m}$ = $\pm 5/2 \rightarrow \pm 3/2$ line shifts only 62(6) kHz over the temperature range, i.e., approximately 0.4 kHz/K, representing a promising candidate for a future solid-state optical clock. Achieving 10$^{-18}$ precision requires crystal temperature stability of 5$μ$K. △ Less

Submitted 22 January, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

Comments: 7 pages, 3 figures, 2 tables

arXiv:2408.14823 [pdf, other]

LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming

Authors: Yuang Shi, Géraldine Morin, Simone Gasparini, Wei Tsang Ooi

Abstract: The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. This paper proposes LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maint… ▽ More The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. This paper proposes LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS with 23% of the original model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications. △ Less

Submitted 10 February, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

Comments: 3DV 2025; Project Page: https://yuang-ian.github.io/lapisgs/ ; Code: https://github.com/nus-vv-streams/lapis-gs

arXiv:2408.14357 [pdf, other]

Exploring ChatGPT App Ecosystem: Distribution, Deployment and Security

Authors: Chuan Yan, Ruomai Ren, Mark Huasong Meng, Liuhuo Wan, Tian Yang Ooi, Guangdong Bai

Abstract: ChatGPT has enabled third-party developers to create plugins to expand ChatGPT's capabilities.These plugins are distributed through OpenAI's plugin store, making them easily accessible to users. With ChatGPT as the backbone, this app ecosystem has illustrated great business potential by offering users personalized services in a conversational manner. Nonetheless, many crucial aspects regarding app… ▽ More ChatGPT has enabled third-party developers to create plugins to expand ChatGPT's capabilities.These plugins are distributed through OpenAI's plugin store, making them easily accessible to users. With ChatGPT as the backbone, this app ecosystem has illustrated great business potential by offering users personalized services in a conversational manner. Nonetheless, many crucial aspects regarding app development, deployment, and security of this ecosystem have yet to be thoroughly studied in the research community, potentially hindering a broader adoption by both developers and users. In this work, we conduct the first comprehensive study of the ChatGPT app ecosystem, aiming to illuminate its landscape for our research community. Our study examines the distribution and deployment models in the integration of LLMs and third-party apps, and assesses their security and privacy implications. We uncover an uneven distribution of functionality among ChatGPT plugins, highlighting prevalent and emerging topics. We also identify severe flaws in the authentication and user data protection for third-party app APIs integrated within LLMs, revealing a concerning status quo of security and privacy in this app ecosystem. Our work provides insights for the secure and sustainable development of this rapidly evolving ecosystem. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

arXiv:2407.17300 [pdf, other]

Fine-structure constant sensitivity of the Th-229 nuclear clock transition

Authors: Kjeld Beeks, Georgy A. Kazakov, Fabian Schaden, Ira Morawetz, Luca Toscani de Col, Thomas Riebner, Michael Bartokos, Tomas Sikorsky, Thorsten Schumm, Chuankun Zhang, Tian Ooi, Jacob S. Higgins, Jack F. Doyle, Jun Ye, Marianna S. Safronova

Abstract: State-resolved laser spectroscopy at the 10$^{-12}$ precision level recently reported in $arXiv$:2406.18719 determined the fractional change in nuclear quadrupole moment between the ground and isomeric state of $^{229}\rm{Th}$, $ΔQ_0/Q_0$=1.791(2) %. Assuming a prolate spheroid nucleus, this allows to quantify the sensitivity of the nuclear transition frequency to variations of the fine-structure… ▽ More State-resolved laser spectroscopy at the 10$^{-12}$ precision level recently reported in $arXiv$:2406.18719 determined the fractional change in nuclear quadrupole moment between the ground and isomeric state of $^{229}\rm{Th}$, $ΔQ_0/Q_0$=1.791(2) %. Assuming a prolate spheroid nucleus, this allows to quantify the sensitivity of the nuclear transition frequency to variations of the fine-structure constant $α$ to $K=5900(2300)$, with the uncertainty dominated by the experimentally measured charge radius difference $Δ\langle r^2 \rangle$ between the ground and isomeric state. This result indicates a three orders of magnitude enhancement over atomic clock schemes based on electron shell transitions. We find that $ΔQ_0$ is highly sensitive to tiny changes in the nuclear volume, thus the constant volume approximation cannot be used to accurately relate changes in $\langle r^2 \rangle$ and $Q_0$. The difference between the experimental and estimated values in $ΔQ_0/Q_0$ raises a further question on the octupole contribution to the alpha-sensitivity. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 10 pages, 2 figures

arXiv:2406.18719 [pdf]

doi 10.1038/s41586-024-07839-6

Frequency ratio of the $^{229\mathrm{m}}$Th nuclear isomeric transition and the $^{87}$Sr atomic clock

Authors: Chuankun Zhang, Tian Ooi, Jacob S. Higgins, Jack F. Doyle, Lars von der Wense, Kjeld Beeks, Adrian Leitner, Georgy Kazakov, Peng Li, Peter G. Thirolf, Thorsten Schumm, Jun Ye

Abstract: Optical atomic clocks$^{1,2}$ use electronic energy levels to precisely keep track of time. A clock based on nuclear energy levels promises a next-generation platform for precision metrology and fundamental physics studies. Thorium-229 nuclei exhibit a uniquely low energy nuclear transition within reach of state-of-the-art vacuum ultraviolet (VUV) laser light sources and have therefore been propos… ▽ More Optical atomic clocks$^{1,2}$ use electronic energy levels to precisely keep track of time. A clock based on nuclear energy levels promises a next-generation platform for precision metrology and fundamental physics studies. Thorium-229 nuclei exhibit a uniquely low energy nuclear transition within reach of state-of-the-art vacuum ultraviolet (VUV) laser light sources and have therefore been proposed for construction of the first nuclear clock$^{3,4}$. However, quantum state-resolved spectroscopy of the $^{229m}$Th isomer to determine the underlying nuclear structure and establish a direct frequency connection with existing atomic clocks has yet to be performed. Here, we use a VUV frequency comb to directly excite the narrow $^{229}$Th nuclear clock transition in a solid-state CaF$_2$ host material and determine the absolute transition frequency. We stabilize the fundamental frequency comb to the JILA $^{87}$Sr clock$^2$ and coherently upconvert the fundamental to its 7th harmonic in the VUV range using a femtosecond enhancement cavity. This VUV comb establishes a frequency link between nuclear and electronic energy levels and allows us to directly measure the frequency ratio of the $^{229}$Th nuclear clock transition and the $^{87}$Sr atomic clock. We also precisely measure the nuclear quadrupole splittings and extract intrinsic properties of the isomer. These results mark the start of nuclear-based solid-state optical clock and demonstrate the first comparison of nuclear and atomic clocks for fundamental physics studies. This work represents a confluence of precision metrology, ultrafast strong field physics, nuclear physics, and fundamental physics. △ Less

Submitted 7 September, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 22 pages, 5 figures, 1 extended data figure

Journal ref: Nature 633, 63-70 (2024)

arXiv:2405.08816 [pdf, other]

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

arXiv:2405.05259 [pdf, other]

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Authors: Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

Abstract: Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we syner… ▽ More Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we synergize information from image, text, and event-data domains and introduce OpenESS to enable scalable ESS in an open-world, annotation-efficient manner. We achieve this goal by transferring the semantically rich CLIP knowledge from image-text pairs to event streams. To pursue better cross-modality adaptation, we propose a frame-to-event contrastive distillation and a text-to-event semantic consistency regularization. Experimental results on popular ESS benchmarks showed our approach outperforms existing methods. Notably, we achieve 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic without using either event or frame labels. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: CVPR 2024 (Highlight); 26 pages, 12 figures, 11 tables; Code at https://github.com/ldkong1205/OpenESS

arXiv:2405.05258 [pdf, other]

doi 10.1109/TPAMI.2025.3535625

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems. △ Less

Submitted 1 February, 2025; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: TPAMI 2025; 18 pages, 6 figures, 9 tables; Code at https://github.com/ldkong1205/LaserMix

arXiv:2403.20156 [pdf, other]

CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening

Authors: Hei Yi Mak, Flint Xiaofeng Fan, Luca A. Lanzendörfer, Cheston Tan, Wei Tsang Ooi, Roger Wattenhofer

Abstract: In this study, we delve into Federated Reinforcement Learning (FedRL) in the context of value-based agents operating across diverse Markov Decision Processes (MDPs). Existing FedRL methods typically aggregate agents' learning by averaging the value functions across them to improve their performance. However, this aggregation strategy is suboptimal in heterogeneous environments where agents converg… ▽ More In this study, we delve into Federated Reinforcement Learning (FedRL) in the context of value-based agents operating across diverse Markov Decision Processes (MDPs). Existing FedRL methods typically aggregate agents' learning by averaging the value functions across them to improve their performance. However, this aggregation strategy is suboptimal in heterogeneous environments where agents converge to diverse optimal value functions. To address this problem, we introduce the Convergence-AwarE SAmpling with scReening (CAESAR) aggregation scheme designed to enhance the learning of individual agents across varied MDPs. CAESAR is an aggregation strategy used by the server that combines convergence-aware sampling with a screening mechanism. By exploiting the fact that agents learning in identical MDPs are converging to the same optimal value function, CAESAR enables the selective assimilation of knowledge from more proficient counterparts, thereby significantly enhancing the overall learning efficiency. We empirically validate our hypothesis and demonstrate the effectiveness of CAESAR in enhancing the learning efficiency of agents, using both a custom-built GridWorld environment and the classical FrozenLake-v1 task, each presenting varying levels of environmental heterogeneity. △ Less

Submitted 16 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2312.11844 [pdf, ps, other]

doi 10.7566/JPSJ.93.053702

Multiband Metallic Ground State in Multilayered Nickelates La$_3$Ni$_2$O$_7$ and La$_4$Ni$_3$O$_{10}$ Probed by $^{139}$La-NMR at Ambient Pressure

Authors: Masataka Kakoi, Takashi Oi, Yujiro Ohshita, Mitsuharu Yashima, Kazuhiko Kuroki, Takeru Kato, Hidefumi Takahashi, Shintaro Ishiwata, Yoshinobu Adachi, Naoyuki Hatada, Tetsuya Uda, Hidekazu Mukuda

Abstract: We report a $^{139}$La-NMR study of polycrystalline samples of multi($n$)-layered nickelates, La$_3$Ni$_2$O$_{7-δ}$ ($n=2$) and La$_4$Ni$_3$O$_{10-δ}$ ($n=3$), at ambient pressure. Measurements of the nuclear magnetic resonance (NMR) spectra and nuclear spin relaxation rate ($1/T_1$) indicate the emergence of a density wave order with a gap below $T^*\sim150$ K for La$_3$Ni$_2$O$_{7-δ}$ and… ▽ More We report a $^{139}$La-NMR study of polycrystalline samples of multi($n$)-layered nickelates, La$_3$Ni$_2$O$_{7-δ}$ ($n=2$) and La$_4$Ni$_3$O$_{10-δ}$ ($n=3$), at ambient pressure. Measurements of the nuclear magnetic resonance (NMR) spectra and nuclear spin relaxation rate ($1/T_1$) indicate the emergence of a density wave order with a gap below $T^*\sim150$ K for La$_3$Ni$_2$O$_{7-δ}$ and $\sim130$ K for La$_4$Ni$_3$O$_{10-δ}$. The finite value of $1/T_1$ below $T^*$ indicates metallic ground states with the remaining density of states at the Fermi level ($E_{\rm F}$) under the density wave order. These features are attributed to multiple $d$ electron bands with different characteristics. Above $T^*$, the gradual decrease in $1/T_1T$ upon cooling implies the presence of a band with flat dispersion near $E_{\rm F}$. From our microscopic probes, we point out that these nickelates ($n=2$ and $3$) possess similar electronic states despite the difference in the formal valence of the Ni-$d$ electron states, which provides a basis for understanding the novel high-$T_{\rm c}$ superconductivity under high pressures. △ Less

Submitted 11 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 5+1 pages, 3+1 figures

Journal ref: J. Phys. Soc. Jpn. 93, 053702 (2024)

arXiv:2310.15171 [pdf, other]

RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

Abstract: Depth estimation from monocular images is pivotal for real-world visual perception systems. While current learning-based depth estimation models train and test on meticulously curated data, they often overlook out-of-distribution (OoD) situations. Yet, in practical settings -- especially safety-critical ones like autonomous driving -- common corruptions can arise. Addressing this oversight, we int… ▽ More Depth estimation from monocular images is pivotal for real-world visual perception systems. While current learning-based depth estimation models train and test on meticulously curated data, they often overlook out-of-distribution (OoD) situations. Yet, in practical settings -- especially safety-critical ones like autonomous driving -- common corruptions can arise. Addressing this oversight, we introduce a comprehensive robustness test suite, RoboDepth, encompassing 18 corruptions spanning three categories: i) weather and lighting conditions; ii) sensor failures and movement; and iii) data processing anomalies. We subsequently benchmark 42 depth estimation models across indoor and outdoor scenes to assess their resilience to these corruptions. Our findings underscore that, in the absence of a dedicated robustness evaluation framework, many leading depth estimation models may be susceptible to typical corruptions. We delve into design considerations for crafting more robust depth estimation models, touching upon pre-training, augmentation, modality, model capacity, and learning paradigms. We anticipate our benchmark will establish a foundational platform for advancing robust OoD depth estimation. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023; 45 pages, 25 figures, 13 tables; Code at https://github.com/ldkong1205/RoboDepth

arXiv:2309.04267 [pdf, other]

The use of deception in dementia-care robots: Should robots tell "white lies" to limit emotional distress?

Authors: Samuel Rhys Cox, Grace Cheong, Wei Tsang Ooi

Abstract: With projections of ageing populations and increasing rates of dementia, there is need for professional caregivers. Assistive robots have been proposed as a solution to this, as they can assist people both physically and socially. However, caregivers often need to use acts of deception (such as misdirection or white lies) in order to ensure necessary care is provided while limiting negative impact… ▽ More With projections of ageing populations and increasing rates of dementia, there is need for professional caregivers. Assistive robots have been proposed as a solution to this, as they can assist people both physically and socially. However, caregivers often need to use acts of deception (such as misdirection or white lies) in order to ensure necessary care is provided while limiting negative impacts on the cared-for such as emotional distress or loss of dignity. We discuss such use of deception, and contextualise their use within robotics. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 3 pages, to be published in Proceedings of the 11th International Conference on Human-Agent Interaction (ACM HAI'23)

arXiv:2308.13479 [pdf, ps, other]

Prompting a Large Language Model to Generate Diverse Motivational Messages: A Comparison with Human-Written Messages

Authors: Samuel Rhys Cox, Ashraf Abdul, Wei Tsang Ooi

Abstract: Large language models (LLMs) are increasingly capable and prevalent, and can be used to produce creative content. The quality of content is influenced by the prompt used, with more specific prompts that incorporate examples generally producing better results. On from this, it could be seen that using instructions written for crowdsourcing tasks (that are specific and include examples to guide work… ▽ More Large language models (LLMs) are increasingly capable and prevalent, and can be used to produce creative content. The quality of content is influenced by the prompt used, with more specific prompts that incorporate examples generally producing better results. On from this, it could be seen that using instructions written for crowdsourcing tasks (that are specific and include examples to guide workers) could prove effective LLM prompts. To explore this, we used a previous crowdsourcing pipeline that gave examples to people to help them generate a collectively diverse corpus of motivational messages. We then used this same pipeline to generate messages using GPT-4, and compared the collective diversity of messages from: (1) crowd-writers, (2) GPT-4 using the pipeline, and (3 & 4) two baseline GPT-4 prompts. We found that the LLM prompts using the crowdsourcing pipeline caused GPT-4 to produce more diverse messages than the two baseline prompts. We also discuss implications from messages generated by both human writers and LLMs. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 3 pages, 1 figure, 1 table, to be published in Proceedings of the 11th International Conference on Human-Agent Interaction (ACM HAI'23)

arXiv:2308.05216 [pdf, other]

doi 10.1088/2632-2153/ad1437

High-dimensional reinforcement learning for optimization and control of ultracold quantum gases

Authors: Nicholas Milson, Arina Tashchilina, Tian Ooi, Anna Czarnecka, Zaheen F. Ahmad, Lindsay J. LeBlanc

Abstract: Machine-learning techniques are emerging as a valuable tool in experimental physics, and among them, reinforcement learning offers the potential to control high-dimensional, multistage processes in the presence of fluctuating environments. In this experimental work, we apply reinforcement learning to the preparation of an ultracold quantum gas to realize a consistent and large number of atoms at m… ▽ More Machine-learning techniques are emerging as a valuable tool in experimental physics, and among them, reinforcement learning offers the potential to control high-dimensional, multistage processes in the presence of fluctuating environments. In this experimental work, we apply reinforcement learning to the preparation of an ultracold quantum gas to realize a consistent and large number of atoms at microkelvin temperatures. This reinforcement learning agent determines an optimal set of thirty control parameters in a dynamically changing environment that is characterized by thirty sensed parameters. By comparing this method to that of training supervised-learning regression models, as well as to human-driven control schemes, we find that both machine learning approaches accurately predict the number of cooled atoms and both result in occasional superhuman control schemes. However, only the reinforcement learning method achieves consistent outcomes, even in the presence of a dynamic environment. △ Less

Submitted 29 December, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

Journal ref: Mach. Learn.: Sci. Technol. 4 045057 (2023)

arXiv:2308.04879 [pdf, other]

Comparing How a Chatbot References User Utterances from Previous Chatting Sessions: An Investigation of Users' Privacy Concerns and Perceptions

Authors: Samuel Rhys Cox, Yi-Chieh Lee, Wei Tsang Ooi

Abstract: Chatbots are capable of remembering and referencing previous conversations, but does this enhance user engagement or infringe on privacy? To explore this trade-off, we investigated the format of how a chatbot references previous conversations with a user and its effects on a user's perceptions and privacy concerns. In a three-week longitudinal between-subjects study, 169 participants talked about… ▽ More Chatbots are capable of remembering and referencing previous conversations, but does this enhance user engagement or infringe on privacy? To explore this trade-off, we investigated the format of how a chatbot references previous conversations with a user and its effects on a user's perceptions and privacy concerns. In a three-week longitudinal between-subjects study, 169 participants talked about their dental flossing habits to a chatbot that either, (1-None): did not explicitly reference previous user utterances, (2-Verbatim): referenced previous utterances verbatim, or (3-Paraphrase): used paraphrases to reference previous utterances. Participants perceived Verbatim and Paraphrase chatbots as more intelligent and engaging. However, the Verbatim chatbot also raised privacy concerns with participants. To gain insights as to why people prefer certain conditions or had privacy concerns, we conducted semi-structured interviews with 15 participants. We discuss implications from our findings that can help designers choose an appropriate format to reference previous user utterances and inform in the design of longitudinal dialogue scripting. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 10 pages, 3 figures, to be published in Proceedings of the 11th International Conference on Human-Agent Interaction (ACM HAI'23)

arXiv:2308.02565 [pdf, other]

SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning

Authors: Keyu Duan, Qian Liu, Tat-Seng Chua, Shuicheng Yan, Wei Tsang Ooi, Qizhe Xie, Junxian He

Abstract: Textual graphs (TGs) are graphs whose nodes correspond to text (sentences or documents), which are widely prevalent. The representation learning of TGs involves two stages: (i) unsupervised feature extraction and (ii) supervised graph representation learning. In recent years, extensive efforts have been devoted to the latter stage, where Graph Neural Networks (GNNs) have dominated. However, the fo… ▽ More Textual graphs (TGs) are graphs whose nodes correspond to text (sentences or documents), which are widely prevalent. The representation learning of TGs involves two stages: (i) unsupervised feature extraction and (ii) supervised graph representation learning. In recent years, extensive efforts have been devoted to the latter stage, where Graph Neural Networks (GNNs) have dominated. However, the former stage for most existing graph benchmarks still relies on traditional feature engineering techniques. More recently, with the rapid development of language models (LMs), researchers have focused on leveraging LMs to facilitate the learning of TGs, either by jointly training them in a computationally intensive framework (merging the two stages), or designing complex self-supervised training tasks for feature extraction (enhancing the first stage). In this work, we present SimTeG, a frustratingly Simple approach for Textual Graph learning that does not innovate in frameworks, models, and tasks. Instead, we first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task, such as node classification. We then generate node embeddings using the last hidden states of finetuned LM. These derived features can be further utilized by any GNN for training on the same task. We evaluate our approach on two fundamental graph representation learning tasks: node classification and link prediction. Through extensive experiments, we show that our approach significantly improves the performance of various GNNs on multiple graph benchmarks. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 9 pages, 3 figures

arXiv:2307.15061 [pdf, other]

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Authors: Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li , et al. (17 additional authors not shown)

Abstract: Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summari… ▽ More Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website. △ Less

Submitted 24 September, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: Technical Report; 65 pages, 34 figures, 24 tables; Code at https://github.com/ldkong1205/RoboDepth

arXiv:2307.12957 [pdf, other]

doi 10.1103/PhysRevResearch.6.013057

Investigation of Floquet engineered non-Abelian geometric phase for holonomic quantum computing

Authors: Logan W. Cooke, Arina Tashchilina, Mason Protter, Joseph Lindon, Tian Ooi, Frank Marsiglio, Joseph Maciejko, Lindsay J. LeBlanc

Abstract: Holonomic quantum computing (HQC) functions by transporting an adiabatically degenerate manifold of computational states around a closed loop in a control-parameter space; this cyclic evolution results in a non-Abelian geometric phase which may couple states within the manifold. Realizing the required degeneracy is challenging, and typically requires auxiliary levels or intermediate-level coupling… ▽ More Holonomic quantum computing (HQC) functions by transporting an adiabatically degenerate manifold of computational states around a closed loop in a control-parameter space; this cyclic evolution results in a non-Abelian geometric phase which may couple states within the manifold. Realizing the required degeneracy is challenging, and typically requires auxiliary levels or intermediate-level couplings. One potential way to circumvent this is through Floquet engineering, where the periodic driving of a nondegenerate Hamiltonian leads to degenerate Floquet bands, and subsequently non-Abelian gauge structures may emerge. Here we present an experiment in ultracold $^{87}$Rb atoms where atomic spin states are dressed by modulated RF fields to induce periodic driving of a family of Hamiltonians linked through a fully tuneable parameter space. The adiabatic motion through this parameter space leads to the holonomic evolution of the degenerate spin states in $SU(2)$, characterized by a non-Abelian connection. We study the holonomic transformations of spin eigenstates in the presence of a background magnetic field, characterizing the fidelity of these single-qubit gate operations. Results indicate that while the Floquet engineering technique removes the need for explicit degeneracies, it inherits many of the same limitations present in degenerate systems. △ Less

Submitted 6 March, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Journal ref: Phys. Rev. Research 6, 013057 (2024)

arXiv:2305.00734 [pdf, ps, other]

Convergence of processes time-changed by Gaussian multiplicative chaos

Authors: Takumu Ooi

Abstract: As represented by the Liouville measure, Gaussian multiplicative chaos is a random measure constructed from a Gaussian field. Under certain technical assumptions, we prove the convergence of a process time-changed by Gaussian multiplicative chaos in the case the latter object is square integrable (the $L^2$-regime). As examples of the main result, we prove that, in the whole $L^2$-regime, the scal… ▽ More As represented by the Liouville measure, Gaussian multiplicative chaos is a random measure constructed from a Gaussian field. Under certain technical assumptions, we prove the convergence of a process time-changed by Gaussian multiplicative chaos in the case the latter object is square integrable (the $L^2$-regime). As examples of the main result, we prove that, in the whole $L^2$-regime, the scaling limit of the Liouville simple random walk on $\mathbb{Z}^2$ is Liouville Brownian motion and, as $α\to 1$, Liouville $α$-stable processes on $\mathbb{R}$ converge weakly to the Liouville Cauchy process. △ Less

Submitted 1 October, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

MSC Class: 60K37; 31C25; 60J55; 60G57; 60G60

arXiv:2304.11897 [pdf, ps, other]

Dynkin games for Markov processes associated with semi-Dirichlet forms

Authors: Takumu Ooi, Toshihiro Uemura

Abstract: We consider Dynkin games for Markov processes associated with semi-Dirichlet forms. Dynkin games are the optimal stopping games introduced as the models of zero-sum games by two players. We prove that the solution to the certain variational inequality with two obstacles is the equilibrium price of the Dynkin game. Moreover, we obtain the saddle point of the game. We consider Dynkin games for Markov processes associated with semi-Dirichlet forms. Dynkin games are the optimal stopping games introduced as the models of zero-sum games by two players. We prove that the solution to the certain variational inequality with two obstacles is the equilibrium price of the Dynkin game. Moreover, we obtain the saddle point of the game. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: 14 pages

MSC Class: 31C25; 60J46; 91A05; 93E20

arXiv:2206.09839 [pdf, other]

Bandwidth-Efficient Multi-video Prefetching for Short Video Streaming

Authors: Xutong Zuo, Yishu Li, Mohan Xu, Wei Tsang Ooi, Jiangchuan Liu, Junchen Jiang, Xinggong Zhang, Kai Zheng, Yong Cui

Abstract: Applications that allow sharing of user-created short videos exploded in popularity in recent years. A typical short video application allows a user to swipe away the current video being watched and start watching the next video in a video queue. Such user interface causes significant bandwidth waste if users frequently swipe a video away before finishing watching. Solutions to reduce bandwidth wa… ▽ More Applications that allow sharing of user-created short videos exploded in popularity in recent years. A typical short video application allows a user to swipe away the current video being watched and start watching the next video in a video queue. Such user interface causes significant bandwidth waste if users frequently swipe a video away before finishing watching. Solutions to reduce bandwidth waste without impairing the Quality of Experience (QoE) are needed. Solving the problem requires adaptively prefetching of short video chunks, which is challenging as the download strategy needs to match unknown user viewing behavior and network conditions. In our work, we first formulate the problem of adaptive multi-video prefetching in short video streaming. Then, to facilitate the integration and comparison of researchers' algorithms towards solving the problem, we design and implement a discrete-event simulator, which we release as open source. Finally, based on the organization of the Short Video Streaming Grand Challenge at ACM Multimedia 2022, we analyze and summarize the algorithms of the contestants, with the hope of promoting the research community towards addressing this problem. △ Less

Submitted 25 June, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

arXiv:2202.14012 [pdf, ps, other]

Markov properties for Gaussian fields associated with Dirichlet forms

Authors: Takumu Ooi

Abstract: We prove the equivalence of the local property for an irreducible regular Dirichlet form and the Markov property for the Gaussian field associated with the Dirichlet form. Moreover we introduce a strong Markov property for Gaussian fields and present some sufficient conditions for this to hold. We prove the equivalence of the local property for an irreducible regular Dirichlet form and the Markov property for the Gaussian field associated with the Dirichlet form. Moreover we introduce a strong Markov property for Gaussian fields and present some sufficient conditions for this to hold. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 15 pages

MSC Class: 31C25; 60G60

arXiv:2007.11985

Shape-CD: Change-Point Detection in Time-Series Data with Shapes and Neurons

Authors: Varsha Suresh, Wei Tsang Ooi

Abstract: Change-point detection in a time series aims to discover the time points at which some unknown underlying physical process that generates the time-series data has changed. We found that existing approaches become less accurate when the underlying process is complex and generates large varieties of patterns in the time series. To address this shortcoming, we propose Shape-CD, a simple, fast, and ac… ▽ More Change-point detection in a time series aims to discover the time points at which some unknown underlying physical process that generates the time-series data has changed. We found that existing approaches become less accurate when the underlying process is complex and generates large varieties of patterns in the time series. To address this shortcoming, we propose Shape-CD, a simple, fast, and accurate change point detection method. Shape-CD uses shape-based features to model the patterns and a conditional neural field to model the temporal correlations among the time regions. We evaluated the performance of Shape-CD using four highly dynamic time-series datasets, including the ExtraSensory dataset with up to 2000 classes. Shape-CD demonstrated improved accuracy (7-60% higher in AUC) and faster computational speed compared to existing approaches. Furthermore, the Shape-CD model consists of only hundreds of parameters and require less data to train than other deep supervised learning models. △ Less

Submitted 31 July, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: The authors have withdrawn this paper as it needs a major revision. An error in the evaluation code invalidates the reported results

arXiv:2003.06760 [pdf, ps, other]

Heat kernel estimates on spaces with varying dimension

Authors: Takumu Ooi

Abstract: We obtain sharp two-sided heat kernel estimates on spaces with varying dimension, in which two spaces of general dimension are connected at one point. On these spaces, if the dimensions of the two constituent parts are different, the volume doubling property fails with respect to the measure induced by the associated Lebesgue measures. Thus the parabolic Harnack inequalities fail and the heat kern… ▽ More We obtain sharp two-sided heat kernel estimates on spaces with varying dimension, in which two spaces of general dimension are connected at one point. On these spaces, if the dimensions of the two constituent parts are different, the volume doubling property fails with respect to the measure induced by the associated Lebesgue measures. Thus the parabolic Harnack inequalities fail and the heat kernels do not enjoy Aronson type estimates. Our estimates show that the on-diagonal estimates are independent of the dimensions of the two parts of the space for small time, whereas they depend on their transience or recurrence for large time. These are multidimensional version of a space considered by Z.-Q. Chen and S. Lou (Ann. Probab. 2019), in which a 1-dimensional space and a 2-dimensional space are connected at one point. △ Less

Submitted 12 July, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

Comments: 41 pages

MSC Class: 60J60; 60J35; 31C25; 60H30; 60J45

arXiv:1505.01933 [pdf, other]

Wireless Multicast for Zoomable Video Streaming

Authors: Hui Wang, Mun Choon Chan, Wei Tsang Ooi

Abstract: Zoomable video streaming refers to a new class of interactive video applications, where users can zoom into a video stream to view a selected region of interest in higher resolutions and pan around to move the region of interest. The zoom and pan effects are typically achieved by breaking the source video into a grid of independently decodable tiles. Streaming the tiles to a set of heterogeneous u… ▽ More Zoomable video streaming refers to a new class of interactive video applications, where users can zoom into a video stream to view a selected region of interest in higher resolutions and pan around to move the region of interest. The zoom and pan effects are typically achieved by breaking the source video into a grid of independently decodable tiles. Streaming the tiles to a set of heterogeneous users using broadcast is challenging, as users have different link rates and different regions of interest at different resolution levels. In this paper, we consider the following problem: given the subset of tiles that each user requested, the link rate of each user, and the available time slots, at which resolution should each tile be sent, to maximize the overall video quality received by all users. We design an efficient algorithm to solve the problem above, and evaluate the solution on a testbed using 10 mobile devices. Our method is able to achieve up to 12dB improvements over other heuristic methods. △ Less

Submitted 8 May, 2015; originally announced May 2015.

arXiv:1407.1909 [pdf, ps, other]

On the equivalence between the cell-based smoothed finite element method and the virtual element method

Authors: Sundararajan Natarajan, Stéphane P. A. Bordas, Ean Tat Ooi

Abstract: We revisit the cell-based smoothed finite element method (SFEM) for quadrilateral elements and extend it to arbitrary polygons and polyhedrons in 2D and 3D, respectively. We highlight the similarity between the SFEM and the virtual element method (VEM). Based on the VEM, we propose a new stabilization approach to the SFEM when applied to arbitrary polygons and polyhedrons. The accuracy and the con… ▽ More We revisit the cell-based smoothed finite element method (SFEM) for quadrilateral elements and extend it to arbitrary polygons and polyhedrons in 2D and 3D, respectively. We highlight the similarity between the SFEM and the virtual element method (VEM). Based on the VEM, we propose a new stabilization approach to the SFEM when applied to arbitrary polygons and polyhedrons. The accuracy and the convergence properties of the SFEM are studied with a few benchmark problems in 2D and 3D linear elasticity. Later, the SFEM is combined with the scaled boundary finite element method to problems involving singularity within the framework of the linear elastic fracture mechanics in 2D. △ Less

Submitted 7 October, 2014; v1 submitted 7 July, 2014; originally announced July 2014.

arXiv:1402.5186 [pdf, ps, other]

Towards Automatic Stress Analysis using Scaled Boundary Finite Element Method with Quadtree Mesh of High-order Elements

Authors: Hou Man, Chongmin Song, Sundararajan Natarajan, Ean Tat Ooi, Carlin Birk

Abstract: This paper presents a technique for stress and fracture analysis by using the scaled boundary finite element method (SBFEM) with quadtree mesh of high-order elements. The cells of the quadtree mesh are modelled as scaled boundary polygons that can have any number of edges, be of any high orders and represent the stress singularity around a crack tip accurately without asymptotic enrichment or othe… ▽ More This paper presents a technique for stress and fracture analysis by using the scaled boundary finite element method (SBFEM) with quadtree mesh of high-order elements. The cells of the quadtree mesh are modelled as scaled boundary polygons that can have any number of edges, be of any high orders and represent the stress singularity around a crack tip accurately without asymptotic enrichment or other special techniques. Owing to these features, a simple and automatic meshing algorithm is devised. No special treatment is required for the hanging nodes and no displacement incompatibility occurs. Curved boundaries and cracks are modelled without excessive local refinement. Five numerical examples are presented to demonstrate the simplicity and applicability of the proposed technique. △ Less

Submitted 20 February, 2014; originally announced February 2014.

arXiv:1310.2913 [pdf, ps, other]

Finite element computations on quadtree meshes: strain smoothing and semi-analytical formulation

Authors: Sundararajan Natarajan, Ean Tat Ooi, Hou Man, Chongmin Song

Abstract: This short communication discusses two alternate techniques to treat hanging nodes in a quadtree mesh. Both the techniques share similarities, in that, they require only boundary information. Moreover, they do not require an explicit form of the shape functions, unlike the conventional approaches, for example, as in the work of Gupta \cite{gupta1978} or Tabarraei and Sukumar \cite{tabarraeisukumar… ▽ More This short communication discusses two alternate techniques to treat hanging nodes in a quadtree mesh. Both the techniques share similarities, in that, they require only boundary information. Moreover, they do not require an explicit form of the shape functions, unlike the conventional approaches, for example, as in the work of Gupta \cite{gupta1978} or Tabarraei and Sukumar \cite{tabarraeisukumar2005}. Hence, no special numerical integration technique is required. One of the techniques relies on the strain projection procedure, whilst the other is based on the scaled boundary finite element method. Numerical examples are presented to demonstrate the accuracy and the convergence properties of the two techniques. △ Less

Submitted 16 October, 2013; v1 submitted 10 October, 2013; originally announced October 2013.

arXiv:1309.1329 [pdf, ps, other]

Displacement based finite element formulations over polygons: a comparison between Laplace interpolants, strain smoothing and scaled boundary polygon formulation

Authors: Sundararajan Natarajan, Ean Tat Ooi, Irene Chiong, Chongmin Song

Abstract: Three different displacement based finite element formulations over arbitrary polygons are studied in this paper. The formulations considered are: the conventional polygonal finite element method (FEM) with Laplace interpolants, the cell-based smoothed polygonal FEM with simple averaging technique and the scaled boundary polygon formulation. For the purpose of numerical integration, we employ the… ▽ More Three different displacement based finite element formulations over arbitrary polygons are studied in this paper. The formulations considered are: the conventional polygonal finite element method (FEM) with Laplace interpolants, the cell-based smoothed polygonal FEM with simple averaging technique and the scaled boundary polygon formulation. For the purpose of numerical integration, we employ the sub-traingulation for the polygonal FEM and classical Gaussian quadrature for the smoothed FEM and for the scaled boundary polygon formulation. The accuracy and the convergence properties of these formulations are studied with a few benchmark problems in the context of linear elasticity and the linear elastic fracture mechanics. The extension of scaled boundary polygon to higher order polygons is also discussed. △ Less

Submitted 5 September, 2013; originally announced September 2013.

arXiv:1211.2063 [pdf]

Mobile-to-Mobile Video Recommendation

Authors: Padmanabha Venkatagiri Seshadri, Mun Choon Chan, Wei Tsang Ooi

Abstract: Mobile device users can now easily capture and socially share video clips in a timely manner by uploading them wirelessly to a server. When attending crowded events, such as an exhibition or the Olympic Games, however, timely sharing of videos becomes difficult due to choking bandwidth in the network infrastructure, preventing like-minded attendees from easily sharing videos with each other throug… ▽ More Mobile device users can now easily capture and socially share video clips in a timely manner by uploading them wirelessly to a server. When attending crowded events, such as an exhibition or the Olympic Games, however, timely sharing of videos becomes difficult due to choking bandwidth in the network infrastructure, preventing like-minded attendees from easily sharing videos with each other through a server. One solution to alleviate this problem is to use direct device-to-device communication to share videos among nearby attendees. Contact capacity between two devices, however, is limited, and thus a recommendation algorithm, such as collaborative filtering, is needed to select and transmit only videos of potential interest to an attendee. In this paper, we address the question: which video clip should be transmitted to which user. We proposed an video transmission scheduling algorithm, called CoFiGel, that runs in a distributed manner and aims to improve both the prediction coverage and precision of the collaborative filtering algorithm. At each device, CoFiGel transmits the video that would increase the estimated number of positive user-video ratings the most if this video is transferred to the destination device. We evaluated CoFiGel using real-world traces and show that substantial improvement can be achieved compared to baseline schemes that do not consider rating or contact history. △ Less

Submitted 9 November, 2012; originally announced November 2012.

arXiv:0807.2328 [pdf, ps, other]

Avatar Mobility in Networked Virtual Environments: Measurements, Analysis, and Implications

Authors: Huiguang Liang, Ian Tay, Ming Feng Neo, Wei Tsang Ooi, Mehul Motani

Abstract: We collected mobility traces of 84,208 avatars spanning 22 regions over two months in Second Life, a popular networked virtual environment. We analyzed the traces to characterize the dynamics of the avatars mobility and behavior, both temporally and spatially. We discuss the implications of the our findings to the design of peer-to-peer networked virtual environments, interest management, mobili… ▽ More We collected mobility traces of 84,208 avatars spanning 22 regions over two months in Second Life, a popular networked virtual environment. We analyzed the traces to characterize the dynamics of the avatars mobility and behavior, both temporally and spatially. We discuss the implications of the our findings to the design of peer-to-peer networked virtual environments, interest management, mobility modeling of avatars, server load balancing and zone partitioning, client-side caching, and prefetching. △ Less

Submitted 15 July, 2008; originally announced July 2008.

ACM Class: H.5.1; C.2.4

Showing 1–43 of 43 results for author: Ooi, T