-
Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity
Authors:
Oluwadamilola Fasina,
Ruben V. C. Pohle,
Pei-Chun Su,
Ronald R. Coifman
Abstract:
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the att…
▽ More
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the attention heads. Furthermore, we use an existing methodology for hierarchical organization of tensors to examine network structure by constructing hierarchal partition trees with respect to the query, key, and head axes of network 3-tensors. Such an organization is consequential since it allows one to profitably execute common signal processing tasks on a geometry where the organized network 3-tensors exhibit regularity. We exemplify this qualitatively, by visualizing the hierarchical organization of the tree comprised of attention heads and the diffusion map embeddings, and quantitatively by investigating network sparsity with the expansion coefficients of individual attention heads and the entire network with respect to the bi and tri-haar bases (respectively) on the space of queries, keys, and heads of the network. To showcase the utility of our theoretical and methodological findings, we provide computational examples using vision and language transformers. The ramifications of these findings are two-fold: (1) a subsequent step in interpretability analysis is theoretically admitted, and can be exploited empirically for downstream interpretability tasks (2) one can use the network 3-tensor organization for empirical network applications such as model pruning (by virtue of network sparsity) and network architecture comparison.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Authors:
Jiamin Xie,
Ju Lin,
Yiteng Huang,
Tyler Vuong,
Zhaojiang Lin,
Zhaojun Yang,
Peng Su,
Prashant Rawat,
Sangeeta Srivastava,
Ming Sun,
Florian Metze
Abstract:
Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone a…
▽ More
Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone array of smart glasses to achieve directional speech recognition, source localization, and bystander cross-talk suppression. To enhance the model's ability to understand directivity, we propose two key techniques: serialized directional output training (S-DOT) and contrastive direction data augmentation (CDDA). Experimental results show that our proposed directional-SpeechLlama effectively captures the relationship between textual cues and spatial audio, yielding strong performance in both speech recognition and source localization tasks.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
RationalVLA: A Rational Vision-Language-Action Model with Dual System
Authors:
Wenxuan Song,
Jiayi Chen,
Wenxue Li,
Xu He,
Han Zhao,
Can Cui,
Pengxiang Ding Shiyan Su,
Feilong Tang,
Xuelian Cheng,
Donglin Wang,
Zongyuan Ge,
Xinhu Zheng,
Zhe Liu,
Hesheng Wang,
Haoang Li
Abstract:
A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasibl…
▽ More
A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasible. To address this problem, we introduce RAtional MAnipulation (RAMA), a new benchmark that challenges models with both unseen executable instructions and defective ones that should be rejected. In RAMA, we construct a dataset with over 14,000 samples, including diverse defective instructions spanning six dimensions: visual, physical, semantic, motion, safety, and out-of-context. We further propose the Rational Vision-Language-Action model (RationalVLA). It is a dual system for robotic arms that integrates the high-level vision-language model with the low-level manipulation policy by introducing learnable latent space embeddings. This design enables RationalVLA to reason over instructions, reject infeasible commands, and execute manipulation effectively. Experiments demonstrate that RationalVLA outperforms state-of-the-art baselines on RAMA by a 14.5% higher success rate and 0.94 average task length, while maintaining competitive performance on standard manipulation tasks. Real-world trials further validate its effectiveness and robustness in practical applications. Our project page is https://irpn-eai.github.io/RationalVLA.
△ Less
Submitted 13 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
A Unified Framework and Efficient Computation for Privacy Amplification via Shuffling
Authors:
Pengcheng Su,
Haibo Cheng,
Ping Wang
Abstract:
The shuffle model offers significant privacy amplification over local differential privacy (LDP), enabling improved privacy-utility trade-offs. To analyze and quantify this amplification effect, two primary frameworks have been proposed: the \textit{privacy blanket} (Balle et al., CRYPTO 2019) and the \textit{clone paradigm}, which includes both the \textit{standard clone} and \textit{stronger clo…
▽ More
The shuffle model offers significant privacy amplification over local differential privacy (LDP), enabling improved privacy-utility trade-offs. To analyze and quantify this amplification effect, two primary frameworks have been proposed: the \textit{privacy blanket} (Balle et al., CRYPTO 2019) and the \textit{clone paradigm}, which includes both the \textit{standard clone} and \textit{stronger clone} (Feldman et al., FOCS 2021; SODA 2023). All of these approaches are grounded in decomposing the behavior of local randomizers.
In this work, we present a unified perspective--termed the \textit{general clone paradigm}--that captures all decomposition-based analyses. We identify the optimal decomposition within this framework and design a simple yet efficient algorithm based on the Fast Fourier Transform (FFT) to compute tight privacy amplification bounds. Empirical results show that our computed upper bounds nearly match the corresponding lower bounds, demonstrating the accuracy and tightness of our method.
Furthermore, we apply our algorithm to derive optimal privacy amplification bounds for both joint composition and parallel composition of LDP mechanisms in the shuffle model.
△ Less
Submitted 16 April, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
Process Reward Modeling with Entropy-Driven Uncertainty
Authors:
Lang Cao,
Renhong Chen,
Yingtian Zou,
Chao Peng,
Wu Ning,
Huacong Xu,
Qian Chen,
Yuxian Wang,
Peishuo Su,
Mofan Peng,
Zijie Chen,
Yitong Li
Abstract:
This paper presents the Entropy-Driven Unified Process Reward Model (EDU-PRM), a novel framework that approximates state-of-the-art performance in process supervision while drastically reducing training costs. EDU-PRM introduces an entropy-guided dynamic step partitioning mechanism, using logit distribution entropy to pinpoint high-uncertainty regions during token generation dynamically. This self…
▽ More
This paper presents the Entropy-Driven Unified Process Reward Model (EDU-PRM), a novel framework that approximates state-of-the-art performance in process supervision while drastically reducing training costs. EDU-PRM introduces an entropy-guided dynamic step partitioning mechanism, using logit distribution entropy to pinpoint high-uncertainty regions during token generation dynamically. This self-assessment capability enables precise step-level feedback without manual fine-grained annotation, addressing a critical challenge in process supervision. Experiments on the Qwen2.5-72B model with only 7,500 EDU-PRM-generated training queries demonstrate accuracy closely approximating the full Qwen2.5-72B-PRM (71.1% vs. 71.6%), achieving a 98% reduction in query cost compared to prior methods. This work establishes EDU-PRM as an efficient approach for scalable process reward model training.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
MTGS: Multi-Traversal Gaussian Splatting
Authors:
Tianyu Li,
Yihang Qiu,
Zhenhua Wu,
Carl Lindström,
Peng Su,
Matthias Nießner,
Hongyang Li
Abstract:
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal rec…
▽ More
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines. The code and data would be available to the public.
△ Less
Submitted 22 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Multi-view Granular-ball Contrastive Clustering
Authors:
Peng Su,
Shudong Huang,
Weihong Ma,
Deng Xiong,
Jiancheng Lv
Abstract:
Previous multi-view contrastive learning methods typically operate at two scales: instance-level and cluster-level. Instance-level approaches construct positive and negative pairs based on sample correspondences, aiming to bring positive pairs closer and push negative pairs further apart in the latent space. Cluster-level methods focus on calculating cluster assignments for samples under each view…
▽ More
Previous multi-view contrastive learning methods typically operate at two scales: instance-level and cluster-level. Instance-level approaches construct positive and negative pairs based on sample correspondences, aiming to bring positive pairs closer and push negative pairs further apart in the latent space. Cluster-level methods focus on calculating cluster assignments for samples under each view and maximize view consensus by reducing distribution discrepancies, e.g., minimizing KL divergence or maximizing mutual information. However, these two types of methods either introduce false negatives, leading to reduced model discriminability, or overlook local structures and cannot measure relationships between clusters across views explicitly. To this end, we propose a method named Multi-view Granular-ball Contrastive Clustering (MGBCC). MGBCC segments the sample set into coarse-grained granular balls, and establishes associations between intra-view and cross-view granular balls. These associations are reinforced in a shared latent space, thereby achieving multi-granularity contrastive learning. Granular balls lie between instances and clusters, naturally preserving the local topological structure of the sample set. We conduct extensive experiments to validate the effectiveness of the proposed method.
△ Less
Submitted 18 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
GPT-4o System Card
Authors:
OpenAI,
:,
Aaron Hurst,
Adam Lerer,
Adam P. Goucher,
Adam Perelman,
Aditya Ramesh,
Aidan Clark,
AJ Ostrow,
Akila Welihinda,
Alan Hayes,
Alec Radford,
Aleksander Mądry,
Alex Baker-Whitcomb,
Alex Beutel,
Alex Borzunov,
Alex Carney,
Alex Chow,
Alex Kirillov,
Alex Nichol,
Alex Paino,
Alex Renzin,
Alex Tachard Passos,
Alexander Kirillov,
Alexi Christakis
, et al. (395 additional authors not shown)
Abstract:
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil…
▽ More
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
LR-SQL: A Supervised Fine-Tuning Method for Text2SQL Tasks under Low-Resource Scenarios
Authors:
Wen Wuzhenghong,
Zhang Yongpan,
Pan Su,
Sun Yuwei,
Lu Pengwei,
Ding Cheng
Abstract:
Large language models revolutionize Text2SQL through supervised fine-tuning, yet a crucial limitation is overlooked: the complexity of databases leads to an increased context length, consequently resulting in higher GPU memory demands for model fine-tuning. To address this issue, we propose LR-SQL. LR-SQL comprises two supervised fine-tuning models: the schema\_link model and the SQL\_generation m…
▽ More
Large language models revolutionize Text2SQL through supervised fine-tuning, yet a crucial limitation is overlooked: the complexity of databases leads to an increased context length, consequently resulting in higher GPU memory demands for model fine-tuning. To address this issue, we propose LR-SQL. LR-SQL comprises two supervised fine-tuning models: the schema\_link model and the SQL\_generation model, with the schema\_link model serving as the focal point for streamlining the overall process. During the fine-tuning of the schema\_link model, LR-SQL breaks down the complete database into flexible combinations of tables with adjustable quantities, enabling the model to learn the relationships within the entire database from these dispersed slices. Furthermore, to enhance the model's ability to perceive the relationships among various discrete slices during inference, LR-SQL trains the model's Chain-of-Thought capability for this task. Experimental results demonstrate that LR-SQL can reduce the total GPU memory usage by 40\% compared to existing fine-tuning methods, while only losing 2\% of table prediction accuracy in schema\_link task. For the overall Text2SQL task, the Execution Accuracy decrease by 0.6\%.Our project is now available on https://github.com/hongWin/LR-SQL
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Learning Two-factor Representation for Magnetic Resonance Image Super-resolution
Authors:
Weifeng Wei,
Heng Chen,
Pengxiang Su
Abstract:
Magnetic Resonance Imaging (MRI) requires a trade-off between resolution, signal-to-noise ratio, and scan time, making high-resolution (HR) acquisition challenging. Therefore, super-resolution for MR image is a feasible solution. However, most existing methods face challenges in accurately learning a continuous volumetric representation from low-resolution image or require HR image for supervision…
▽ More
Magnetic Resonance Imaging (MRI) requires a trade-off between resolution, signal-to-noise ratio, and scan time, making high-resolution (HR) acquisition challenging. Therefore, super-resolution for MR image is a feasible solution. However, most existing methods face challenges in accurately learning a continuous volumetric representation from low-resolution image or require HR image for supervision. To solve these challenges, we propose a novel method for MR image super-resolution based on two-factor representation. Specifically, we factorize intensity signals into a linear combination of learnable basis and coefficient factors, enabling efficient continuous volumetric representation from low-resolution MR image. Besides, we introduce a coordinate-based encoding to capture structural relationships between sparse voxels, facilitating smooth completion in unobserved regions. Experiments on BraTS 2019 and MSSEG 2016 datasets demonstrate that our method achieves state-of-the-art performance, providing superior visual fidelity and robustness, particularly in large up-sampling scale MR image super-resolution.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Off-policy Evaluation with Deeply-abstracted States
Authors:
Meiling Hao,
Pingfan Su,
Liyuan Hu,
Zoltan Szabo,
Qingyuan Zhao,
Chengchun Shi
Abstract:
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abs…
▽ More
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE, and derive a backward-model-irrelevance condition for achieving irrelevance in %sequential and (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP). (ii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplifies the sample complexity of OPE arising from high cardinality. (iii) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.
△ Less
Submitted 3 March, 2025; v1 submitted 27 June, 2024;
originally announced June 2024.
-
JNI Global References Are Still Vulnerable: Attacks and Defenses
Authors:
Yi He,
Yuan Zhou,
Yacong Gu,
Purui Su,
Qi Li,
Yajin Zhou,
Yong Jiang
Abstract:
System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the pro…
▽ More
System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the problem by enforcing security checks, we find that it is still possible to construct a JGR exhaustion DoS attack in the latest Android system.
In this paper, we propose a new JGR exhaustion DoS attack, which is effective in different Android versions, including the latest one (i.e., Android 10). Specifically, we developed JGREAnalyzer, a tool that can systematically detect JGR vulnerable services APIs via a call graph analysis and a forwarding reachability analysis. We applied this tool to different Android versions and found multiple vulnerabilities. In particular, among 148 system services in Android 10, 12 of them have 21 vulnerabilities. Among them, 9 can be successfully exploited without any permissions. We further analyze the root cause of the vulnerabilities and propose a new defense to mitigate the JGRE attack by restricting resource consumption via global reference counting.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Incentive-Compatible Vertiport Reservation in Advanced Air Mobility: An Auction-Based Approach
Authors:
Pan-Yang Su,
Chinmay Maheshwari,
Victoria Tuck,
Shankar Sastry
Abstract:
The rise of advanced air mobility (AAM) is expected to become a multibillion-dollar industry in the near future. Market-based mechanisms are touted to be an integral part of AAM operations, which comprise heterogeneous operators with private valuations. In this work, we study the problem of designing a mechanism to coordinate the movement of electric vertical take-off and landing (eVTOL) aircraft,…
▽ More
The rise of advanced air mobility (AAM) is expected to become a multibillion-dollar industry in the near future. Market-based mechanisms are touted to be an integral part of AAM operations, which comprise heterogeneous operators with private valuations. In this work, we study the problem of designing a mechanism to coordinate the movement of electric vertical take-off and landing (eVTOL) aircraft, operated by multiple operators each having heterogeneous valuations associated with their fleet, between vertiports, while enforcing the arrival, departure, and parking constraints at vertiports. Particularly, we propose an incentive-compatible and individually rational vertiport reservation mechanism that maximizes a social welfare metric, which encapsulates the objective of maximizing the overall valuations of all operators while minimizing the congestion at vertiports. Additionally, we improve the computational tractability of designing the reservation mechanism by proposing a mixed binary linear programming approach that leverages the network flow structure.
△ Less
Submitted 30 September, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Bodioid: philosophical reflections on the hybrid of bodies and artefacts towards post-human
Authors:
Jiang Xu,
Gang Sun,
Jingyu Xu,
Pujie Su
Abstract:
The advent of the post-human era has blurred the boundary between the body and artefacts. Further, external materials and information are more deeply integrated into the body, making emerging technology a key driving force for shaping post-human existence and promoting bodily evolution. Based on this, this study analyses the transformation process of three technological forms, namely tools, machin…
▽ More
The advent of the post-human era has blurred the boundary between the body and artefacts. Further, external materials and information are more deeply integrated into the body, making emerging technology a key driving force for shaping post-human existence and promoting bodily evolution. Based on this, this study analyses the transformation process of three technological forms, namely tools, machines, and cyborgs, and reveals the construction of bodies and artefacts. From the phenomenological perspective, the essences of body and artefact existences are reflected upon, and the 'existence is construction' viewpoint is proposed. Furthermore, a technological design concept, 'bodioid', is proposed to meticulously depict the characteristics of integrating similarities and differences towards unity between the body and artefacts, based on the theoretical foundation of technology mediation and the materialization of morality. Finally, through analogizing the organizational form of language, the two key forms and specific mechanisms of bodioid construction, namely extension and mirroring, are indicated. With this in mind, the post-human existence landscape is discussed with the objective of providing theoretical insights into the study of the underlying philosophical principles of technological design.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Single-pixel 3D imaging based on fusion temporal data of single photon detector and millimeter-wave radar
Authors:
Tingqin Lai,
Xiaolin Liang,
Yi Zhu,
Xinyi Wu,
Lianye Liao,
Xuelin Yuan,
Ping Su,
Shihai Sun
Abstract:
Recently, there has been increased attention towards 3D imaging using single-pixel single-photon detection (also known as temporal data) due to its potential advantages in terms of cost and power efficiency. However, to eliminate the symmetry blur in the reconstructed images, a fixed background is required. This paper proposes a fusion-data-based 3D imaging method that utilizes a single-pixel sing…
▽ More
Recently, there has been increased attention towards 3D imaging using single-pixel single-photon detection (also known as temporal data) due to its potential advantages in terms of cost and power efficiency. However, to eliminate the symmetry blur in the reconstructed images, a fixed background is required. This paper proposes a fusion-data-based 3D imaging method that utilizes a single-pixel single-photon detector and a millimeter-wave radar to capture temporal histograms of a scene from multiple perspectives. Subsequently, the 3D information can be reconstructed from the one-dimensional fusion temporal data by using Artificial Neural Network (ANN). Both the simulation and experimental results demonstrate that our fusion method effectively eliminates symmetry blur and improves the quality of the reconstructed images.
△ Less
Submitted 20 October, 2023;
originally announced December 2023.
-
An Information-theoretic Security Analysis of Honeyword
Authors:
Pengcheng Su,
Haibo Cheng,
Wenting Li,
Ping Wang
Abstract:
Honeyword is a representative "honey" technique that employs decoy objects to mislead adversaries and protect the real ones. To assess the security of a Honeyword system, two metrics--flatness and success-number--have been proposed and evaluated using various simulated attackers. Existing evaluations typically apply statistical learning methods to distinguish real passwords from decoys on real-wor…
▽ More
Honeyword is a representative "honey" technique that employs decoy objects to mislead adversaries and protect the real ones. To assess the security of a Honeyword system, two metrics--flatness and success-number--have been proposed and evaluated using various simulated attackers. Existing evaluations typically apply statistical learning methods to distinguish real passwords from decoys on real-world datasets. However, such evaluations may overestimate the system's security, as more effective distinguishing attacks could potentially exist.
In this paper, we aim to analyze the security of Honeyword systems under the strongest theoretical attack, rather than relying on specific, expert-crafted attacks evaluated in prior experimental studies. We first derive mathematical expressions for the flatness and success-number under the strongest attack. We conduct analyses and computations for several typical scenarios, and determine the security of honeyword generation methods using a uniform distribution and the List model as examples.
We further evaluate the security of existing honeyword generation methods based on password probability models (PPMs), which depends on the sample size used for training. We investigate, for the first time, the sample complexity of several representative PPMs, introducing two novel polynomial-time approximation schemes for computing the total variation between PCFG models and between higher-order Markov models. Our experimental results show that for small-scale password distributions, sample sizes on the order of millions--often tens of millions--are required to reduce the total variation below 0.1. A surprising result is that we establish an equivalence between flatness and total variation, thus bridging the theoretical study of Honeyword systems with classical information theory. Finally, we discuss the practical implications of our findings.
△ Less
Submitted 21 April, 2025; v1 submitted 17 November, 2023;
originally announced November 2023.
-
A Brief Survey of Open Radio Access Network (O-RAN) Security
Authors:
Yi-Zih Chen,
Terrance Yu-Hao Chen,
Po-Jung Su,
Chi-Ting Liu
Abstract:
Open Radio Access Network (O-RAN), a novel architecture that separates the traditional radio access network (RAN) into multiple disaggregated components, leads a revolution in the telecommunication ecosystems. Compared to the traditional RAN, the proposed O-RAN paradigm is more flexible and more cost-effective for the operators, vendors, and the public. The key design considerations of O-RAN inclu…
▽ More
Open Radio Access Network (O-RAN), a novel architecture that separates the traditional radio access network (RAN) into multiple disaggregated components, leads a revolution in the telecommunication ecosystems. Compared to the traditional RAN, the proposed O-RAN paradigm is more flexible and more cost-effective for the operators, vendors, and the public. The key design considerations of O-RAN include virtualization and intelligent capabilities in order to meet the new requirements of 5G. However, because of the open nature and the newly imported techniques in O-RAN architecture, the assessment of the security in O-RAN architecture during its early development stage is crucial. This project aims to present an investigation of the current ORAN architecture from several attack surfaces, including (1) Architectural openness, (2) Cloud and Virtualization, (3) Network slicing, and (4) Machine Learning. The existing attack surfaces and corresponding mitigation methods of these attacks are also surveyed and provided in this report, serving as a guiding principle and valuable recommendation for the O-RAN implementers and framework designers.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Technical Note: Feasibility of translating 3.0T-trained Deep-Learning Segmentation Models Out-of-the-Box on Low-Field MRI 0.55T Knee-MRI of Healthy Controls
Authors:
Rupsa Bhattacharjee,
Zehra Akkaya,
Johanna Luitjens,
Pan Su,
Yang Yang,
Valentina Pedoia,
Sharmila Majumdar
Abstract:
In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segm…
▽ More
In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segmentation performance, areas of improvement, and compartment-wise cartilage thickness values between 0.55T vs. 3.0T. Initial results demonstrate a usable to good technical feasibility of translating existing quantitative deep-learning-based image segmentation techniques, trained on 3.0T, out of 0.55T for knee MRI, in a multi-vendor acquisition environment. Especially in terms of segmenting cartilage compartments, the models perform almost equivalent to 3.0T in terms of Likert ranking. The 0.55T low-field sustainable and easy-to-install MRI, as demonstrated, thus, can be utilized for evaluating knee cartilage thickness and bone segmentations aided by established DL algorithms trained at higher-field strengths out-of-the-box initially. This could be utilized at the far-spread point-of-care locations with a lack of radiologists available to manually segment low-field images, at least till a decent base of low-field data pool is collated. With further fine-tuning with manual labeling of low-field data or utilizing synthesized higher SNR images from low-field images, OA biomarker quantification performance is potentially guaranteed to be further improved.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Recovering from Privacy-Preserving Masking with Large Language Models
Authors:
Arpita Vats,
Zhe Liu,
Peng Su,
Debjyoti Paul,
Yingyi Ma,
Yutong Pang,
Zeeshan Ahmed,
Ozlem Kalinli
Abstract:
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to th…
▽ More
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.
△ Less
Submitted 13 December, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability
Authors:
Ruowei Wang,
Yu Liu,
Pei Su,
Jianwei Zhang,
Qijun Zhao
Abstract:
Shape generation is the practice of producing 3D shapes as various representations for 3D content creation. Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information. Consequently, such generative models often fail to preserve the semantic consistency of shape structure or enable manipulation of the seman…
▽ More
Shape generation is the practice of producing 3D shapes as various representations for 3D content creation. Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information. Consequently, such generative models often fail to preserve the semantic consistency of shape structure or enable manipulation of the semantic attributes of shapes during generation. In this paper, we proposed a novel semantic generative model named 3D Semantic Subspace Traverser that utilizes semantic attributes for category-specific 3D shape generation and editing. Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes. Each dimension of the subspace corresponds to a particular semantic attribute, and we can edit the attributes of generated shapes by traversing the coefficients of those dimensions. Experimental results demonstrate that our method can produce plausible shapes with complex structures and enable the editing of semantic attributes. The code and trained models are available at https://github.com/TrepangCat/3D_Semantic_Subspace_Traverser
△ Less
Submitted 15 August, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Discovery of Optimal Quantum Error Correcting Codes via Reinforcement Learning
Authors:
Vincent Paul Su,
ChunJun Cao,
Hong-Ye Hu,
Yariv Yanay,
Charles Tahan,
Brian Swingle
Abstract:
The recently introduced Quantum Lego framework provides a powerful method for generating complex quantum error correcting codes (QECCs) out of simple ones. We gamify this process and unlock a new avenue for code design and discovery using reinforcement learning (RL). One benefit of RL is that we can specify \textit{arbitrary} properties of the code to be optimized. We train on two such properties,…
▽ More
The recently introduced Quantum Lego framework provides a powerful method for generating complex quantum error correcting codes (QECCs) out of simple ones. We gamify this process and unlock a new avenue for code design and discovery using reinforcement learning (RL). One benefit of RL is that we can specify \textit{arbitrary} properties of the code to be optimized. We train on two such properties, maximizing the code distance, and minimizing the probability of logical error under biased Pauli noise. For the first, we show that the trained agent identifies ways to increase code distance beyond naive concatenation, saturating the linear programming bound for CSS codes on 13 qubits. With a learning objective to minimize the logical error probability under biased Pauli noise, we find the best known CSS code at this task for $\lesssim 20$ qubits. Compared to other (locally deformed) CSS codes, including Surface, XZZX, and 2D Color codes, our $[[17,1,3]]$ code construction actually has \textit{lower} adversarial distance, yet better protects the logical information, highlighting the importance of QECC desiderata. Lastly, we comment on how this RL framework can be used in conjunction with physical quantum devices to tailor a code without explicit characterization of the noise model.
△ Less
Submitted 12 June, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Language Agnostic Data-Driven Inverse Text Normalization
Authors:
Szu-Jui Chen,
Debjyoti Paul,
Yutong Pang,
Peng Su,
Xuedong Zhang
Abstract:
With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written d…
▽ More
With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. In this work, we propose a language-agnostic data-driven ITN framework to fill this gap. Specifically, we leverage the data augmentation in conjunction with neural machine translated data for low resource languages. Moreover, we design an evaluation method for language agnostic ITN model when only English data is available. Our empirical evaluation shows this language agnostic modeling approach is effective for low resource languages while preserving the performance for high resource languages.
△ Less
Submitted 23 January, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Towards Quantum Gravity in the Lab on Quantum Processors
Authors:
Illya Shapoval,
Vincent Paul Su,
Wibe de Jong,
Miro Urbanek,
Brian Swingle
Abstract:
The holographic principle and its realization in the AdS/CFT correspondence led to unexpected connections between general relativity and quantum information. This set the stage for studying aspects of quantum gravity models, which are otherwise difficult to access, in table-top quantum-computational experiments. Recent works have designed a special teleportation protocol that realizes a surprising…
▽ More
The holographic principle and its realization in the AdS/CFT correspondence led to unexpected connections between general relativity and quantum information. This set the stage for studying aspects of quantum gravity models, which are otherwise difficult to access, in table-top quantum-computational experiments. Recent works have designed a special teleportation protocol that realizes a surprising communication phenomenon most naturally explained by the physics of a traversable wormhole. In this work, we have carried out quantum experiments based on this protocol on state-of-the-art quantum computers. The target quantum processing units (QPUs) included the Quantinuum's trapped-ion System Model H1-1 and five IBM superconducting QPUs of various architectures, with public and premium user access. We report the observed teleportation signals from these QPUs with the best one reaching 80% of theoretical predictions. We outline the experimental challenges we have faced in the course of implementation, as well as the new theoretical insights into quantum dynamics the work has led to. We also developed QGLab -- an open-source end-to-end software solution that facilitates conducting the wormhole-inspired teleportation experiments on state-of-the-art and emergent generations of QPUs supported by the Qiskit and tket SDKs. We consider our study and deliverables as an early practical step towards the realization of more complex experiments for the indirect probing of quantum gravity in the lab.
△ Less
Submitted 11 October, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
OJXPerf: Featherlight Object Replica Detection for Java Programs
Authors:
Bolun Li,
Hao Xu,
Qidong Zhao,
Pengfei Su,
Milind Chabbi,
Shuyin Jiao,
Xu Liu
Abstract:
Memory bloat is an important source of inefficiency in complex production software, especially in software written in managed languages such as Java. Prior approaches to this problem have focused on identifying objects that outlive their life span. Few studies have, however, looked into whether and to what extent myriad objects of the same type are identical. A quantitative assessment of identical…
▽ More
Memory bloat is an important source of inefficiency in complex production software, especially in software written in managed languages such as Java. Prior approaches to this problem have focused on identifying objects that outlive their life span. Few studies have, however, looked into whether and to what extent myriad objects of the same type are identical. A quantitative assessment of identical objects with code-level attribution can assist developers in refactoring code to eliminate object bloat, and favor reuse of existing object(s). The result is reduced memory pressure, reduced allocation and garbage collection, enhanced data locality, and reduced re-computation, all of which result in superior performance.
We develop OJXPerf, a lightweight sampling-based profiler, which probabilistically identifies identical objects. OJXPerf employs hardware performance monitoring units (PMU) in conjunction with hardware debug registers to sample and compare field values of different objects of the same type allocated at the same calling context but potentially accessed at different program points. The result is a lightweight measurement, a combination of object allocation contexts and usage contexts ordered by duplication frequency. This class of duplicated objects is relatively easier to optimize. OJXPerf incurs 9% runtime and 6% memory overheads on average. We empirically show the benefit of OJXPerf by using its profiles to instruct us to optimize a number of Java programs, including well-known benchmarks and real-world applications. The results show a noticeable reduction in memory usage (up to 11%) and a significant speedup (up to 25%).
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
A Systematic Study of Android Non-SDK (Hidden) Service API Security
Authors:
Yi He,
Yacong Gu,
Purui Su,
Kun Sun,
Yajin Zhou,
Zhi Wang,
Qi Li
Abstract:
Android allows apps to communicate with its system services via system service helpers so that these apps can use various functions provided by the system services. Meanwhile, the system services rely on their service helpers to enforce security checks for protection. Unfortunately, the security checks in the service helpers may be bypassed via directly exploiting the non-SDK (hidden) APIs, degrad…
▽ More
Android allows apps to communicate with its system services via system service helpers so that these apps can use various functions provided by the system services. Meanwhile, the system services rely on their service helpers to enforce security checks for protection. Unfortunately, the security checks in the service helpers may be bypassed via directly exploiting the non-SDK (hidden) APIs, degrading the stability and posing severe security threats such as privilege escalation, automatic function execution without users' interactions, crashes, and DoS attacks. Google has proposed various approaches to address this problem, e.g., case-by-case fixing the bugs or even proposing a blacklist to block all the non-SDK APIs. However, the developers can still figure out new ways of exploiting these hidden APIs to evade the non-SDKs restrictions.
In this paper, we systematically study the vulnerabilities due to the hidden API exploitation and analyze the effectiveness of Google's countermeasures. We aim to answer if there are still vulnerable hidden APIs that can be exploited in the newest Android 12. We develop a static analysis tool called ServiceAudit to automatically mine the inconsistent security enforcement between service helper classes and the hidden service APIs. We apply ServiceAudit to Android 6~12. Our tool discovers 112 vulnerabilities in Android 6 with higher precision than existing approaches. Moreover, in Android 11 and 12, we identify more than 25 hidden APIs with inconsistent protections; however, only one of the vulnerable APIs can lead to severe security problems in Android 11, and none of them work on Android 12.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Motion Prediction via Joint Dependency Modeling in Phase Space
Authors:
Pengxiang Su,
Zhenguang Liu,
Shuang Wu,
Lei Zhu,
Yifang Yin,
Xuanjing Shen
Abstract:
Motion prediction is a classic problem in computer vision, which aims at forecasting future motion given the observed pose sequence. Various deep learning models have been proposed, achieving state-of-the-art performance on motion prediction. However, existing methods typically focus on modeling temporal dynamics in the pose space. Unfortunately, the complicated and high dimensionality nature of h…
▽ More
Motion prediction is a classic problem in computer vision, which aims at forecasting future motion given the observed pose sequence. Various deep learning models have been proposed, achieving state-of-the-art performance on motion prediction. However, existing methods typically focus on modeling temporal dynamics in the pose space. Unfortunately, the complicated and high dimensionality nature of human motion brings inherent challenges for dynamic context capturing. Therefore, we move away from the conventional pose based representation and present a novel approach employing a phase space trajectory representation of individual joints. Moreover, current methods tend to only consider the dependencies between physically connected joints. In this paper, we introduce a novel convolutional neural model to effectively leverage explicit prior knowledge of motion anatomy, and simultaneously capture both spatial and temporal information of joint trajectory dynamics. We then propose a global optimization module that learns the implicit relationships between individual joint features.
Empirically, our method is evaluated on large-scale 3D human motion benchmark datasets (i.e., Human3.6M, CMU MoCap). These results demonstrate that our method sets the new state-of-the-art on the benchmark datasets. Our code will be available at https://github.com/Pose-Group/TEID.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Measuring Outcomes in Healthcare Economics using Artificial Intelligence: with Application to Resource Management
Authors:
Chih-Hao Huang,
Feras A. Batarseh,
Adel Boueiz,
Ajay Kulkarni,
Po-Hsuan Su,
Jahan Aman
Abstract:
The quality of service in healthcare is constantly challenged by outlier events such as pandemics (i.e. Covid-19) and natural disasters (such as hurricanes and earthquakes). In most cases, such events lead to critical uncertainties in decision making, as well as in multiple medical and economic aspects at a hospital. External (geographic) or internal factors (medical and managerial), lead to shift…
▽ More
The quality of service in healthcare is constantly challenged by outlier events such as pandemics (i.e. Covid-19) and natural disasters (such as hurricanes and earthquakes). In most cases, such events lead to critical uncertainties in decision making, as well as in multiple medical and economic aspects at a hospital. External (geographic) or internal factors (medical and managerial), lead to shifts in planning and budgeting, but most importantly, reduces confidence in conventional processes. In some cases, support from other hospitals proves necessary, which exacerbates the planning aspect. This manuscript presents three data-driven methods that provide data-driven indicators to help healthcare managers organize their economics and identify the most optimum plan for resources allocation and sharing. Conventional decision-making methods fall short in recommending validated policies for managers. Using reinforcement learning, genetic algorithms, traveling salesman, and clustering, we experimented with different healthcare variables and presented tools and outcomes that could be applied at health institutes. Experiments are performed; the results are recorded, evaluated, and presented.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
Authors:
Ivan Vulić,
Pei-Hao Su,
Sam Coope,
Daniela Gerz,
Paweł Budzianowski,
Iñigo Casanueva,
Nikola Mrkšić,
Tsung-Hsien Wen
Abstract:
Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT,…
▽ More
Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the fine-grained semantics of a particular task. Consequently, such specialised sentence encoders allow for treating ID as a simple semantic similarity task based on interpretable nearest neighbours retrieval. We validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets: ConvFiT-ed LMs achieve state-of-the-art ID performance across the board, with particular gains in the most challenging, few-shot setups.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Quantum-Inspired Keyword Search on Multi-Model Databases
Authors:
Gongsheng Yuan,
Jiaheng Lu,
Peifeng Su
Abstract:
With the rising applications implemented in different domains, it is inevitable to require databases to adopt corresponding appropriate data models to store and exchange data derived from various sources. To handle these data models in a single platform, the community of databases introduces a multi-model database. And many vendors are improving their products from supporting a single data model t…
▽ More
With the rising applications implemented in different domains, it is inevitable to require databases to adopt corresponding appropriate data models to store and exchange data derived from various sources. To handle these data models in a single platform, the community of databases introduces a multi-model database. And many vendors are improving their products from supporting a single data model to being multi-model databases. Although this brings benefits, spending lots of enthusiasm to master one of the multi-model query languages for exploring a database is unfriendly to most users. Therefore, we study using keyword searches as an alternative way to explore and query multi-model databases. In this paper, we attempt to utilize quantum physics's probabilistic formalism to bring the problem into vector spaces and represent events (e.g., words) as subspaces. Then we employ a density matrix to encapsulate all the information over these subspaces and use density matrices to measure the divergence between query and candidate answers for finding top-\textit{k} the most relevant results. In this process, we propose using pattern mining to identify compounds for improving accuracy and using dimensionality reduction for reducing complexity. Finally, empirical experiments demonstrate the performance superiority of our approaches over the state-of-the-art approaches.
△ Less
Submitted 31 August, 2021;
originally announced September 2021.
-
Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction
Authors:
Peng Su,
Yifan Peng,
K. Vijay-Shanker
Abstract:
Contrastive learning has been used to learn a high-quality representation of the image in computer vision. However, contrastive learning is not widely utilized in natural language processing due to the lack of a general method of data augmentation for text data. In this work, we explore the method of employing contrastive learning to improve the text representation from the BERT model for relation…
▽ More
Contrastive learning has been used to learn a high-quality representation of the image in computer vision. However, contrastive learning is not widely utilized in natural language processing due to the lack of a general method of data augmentation for text data. In this work, we explore the method of employing contrastive learning to improve the text representation from the BERT model for relation extraction. The key knob of our framework is a unique contrastive pre-training step tailored for the relation extraction tasks by seamlessly integrating linguistic knowledge into the data augmentation. Furthermore, we investigate how large-scale data constructed from the external knowledge bases can enhance the generality of contrastive pre-training of BERT. The experimental results on three relation extraction benchmark datasets demonstrate that our method can improve the BERT model representation and achieve state-of-the-art performance. In addition, we explore the interpretability of models by showing that BERT with contrastive pre-training relies more on rationales for prediction. Our code and data are publicly available at: https://github.com/udel-biotm-lab/BERT-CLRE.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Multilingual and Cross-Lingual Intent Detection from Spoken Data
Authors:
Daniela Gerz,
Pei-Hao Su,
Razvan Kusztos,
Avishek Mondal,
Michał Lis,
Eshan Singhal,
Nikola Mrkšić,
Tsung-Hsien Wen,
Ivan Vulić
Abstract:
We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language…
▽ More
We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties. Our key results indicate that combining machine translation models with state-of-the-art multilingual sentence encoders (e.g., LaBSE) can yield strong intent detectors in the majority of target languages covered in MInDS-14, and offer comparative analyses across different axes: e.g., zero-shot versus few-shot learning, translation direction, and impact of speech recognition. We see this work as an important step towards more inclusive development and evaluation of multilingual intent detectors from spoken data, in a much wider spectrum of languages compared to prior work.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
DJXPerf: Identifying Memory Inefficiencies via Object-centric Profiling for Java
Authors:
Bolun Li,
Pengfei Su,
Milind Chabbi,
Shuyin Jiao,
Xu Liu
Abstract:
Java is the "go-to" programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance tools abound in Java, those that focus on the data locality in the memory hierarchy are rare.
In this paper, we present DJXPerf, a lightweight, object-centric…
▽ More
Java is the "go-to" programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance tools abound in Java, those that focus on the data locality in the memory hierarchy are rare.
In this paper, we present DJXPerf, a lightweight, object-centric memory profiler for Java, which associates memory-hierarchy performance metrics (e.g., cache/TLB misses) with Java objects. DJXPerf uses statistical sampling of hardware performance monitoring counters to attribute metrics to not only source code locations but also Java objects. DJXPerf presents Java object allocation contexts combined with their usage contexts and presents them ordered by the poor locality behaviors. DJXPerf's performance measurement, object attribution, and presentation techniques guide optimizing object allocation, layout, and access patterns. DJXPerf incurs only ~8% runtime overhead and ~5% memory overhead on average, requiring no modifications to hardware, OS, Java virtual machine, or application source code, which makes it attractive to use in production. Guided by DJXPerf, we study and optimize a number of Java and Scala programs, including well-known benchmarks and real-world applications, and demonstrate significant speedups.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Gradient Regularized Contrastive Learning for Continual Domain Adaptation
Authors:
Shixiang Tang,
Peng Su,
Dapeng Chen,
Wanli Ouyang
Abstract:
Human beings can quickly adapt to environmental changes by leveraging learning experience. However, adapting deep neural networks to dynamic environments by machine learning algorithms remains a challenge. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains. The…
▽ More
Human beings can quickly adapt to environmental changes by leveraging learning experience. However, adapting deep neural networks to dynamic environments by machine learning algorithms remains a challenge. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains. The obstacles in this problem are both domain shift and catastrophic forgetting. We propose Gradient Regularized Contrastive Learning (GRCL) to solve the obstacles. At the core of our method, gradient regularization plays two key roles: (1) enforcing the gradient not to harm the discriminative ability of source features which can, in turn, benefit the adaptation ability of the model to target domains; (2) constraining the gradient not to increase the classification loss on old target domains, which enables the model to preserve the performance on old target domains when adapting to an in-coming target domain. Experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach when compared to the other state-of-the-art methods.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
An O(n) time algorithm for finding Hamilton cycles with high probability
Authors:
Rajko Nenadov,
Angelika Steger,
Pascal Su
Abstract:
We design a randomized algorithm that finds a Hamilton cycle in $\mathcal{O}(n)$ time with high probability in a random graph $G_{n,p}$ with edge probability $p\ge C \log n / n$. This closes a gap left open in a seminal paper by Angluin and Valiant from 1979.
We design a randomized algorithm that finds a Hamilton cycle in $\mathcal{O}(n)$ time with high probability in a random graph $G_{n,p}$ with edge probability $p\ge C \log n / n$. This closes a gap left open in a seminal paper by Angluin and Valiant from 1979.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Mastermind with a Linear Number of Queries
Authors:
Anders Martinsson,
Pascal Su
Abstract:
Since the 1960s Mastermind has been studied for the combinatorial and information theoretical interest the game has to offer. Many results have been discovered starting with Erdős and Rényi determining the optimal number of queries needed for two colors. For $k$ colors and $n$ positions, Chvátal found asymptotically optimal bounds when $k \le n^{1-ε}$. Following a sequence of gradual improvements…
▽ More
Since the 1960s Mastermind has been studied for the combinatorial and information theoretical interest the game has to offer. Many results have been discovered starting with Erdős and Rényi determining the optimal number of queries needed for two colors. For $k$ colors and $n$ positions, Chvátal found asymptotically optimal bounds when $k \le n^{1-ε}$. Following a sequence of gradual improvements for $k \geq n$ colors, the central open question is to resolve the gap between $Ω(n)$ and $\mathcal{O}(n\log \log n)$ for $k=n$.
In this paper, we resolve this gap by presenting the first algorithm for solving $k=n$ Mastermind with a linear number of queries. As a consequence, we are able to determine the query complexity of Mastermind for any parameters $k$ and $n$.
△ Less
Submitted 19 September, 2023; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism
Authors:
Peng Su,
K. Vijay-Shanker
Abstract:
With the explosive growth of biomedical literature, designing automatic tools to extract information from the literature has great significance in biomedical research. Recently, transformer-based BERT models adapted to the biomedical domain have produced leading results. However, all the existing BERT models for relation classification only utilize partial knowledge from the last layer. In this pa…
▽ More
With the explosive growth of biomedical literature, designing automatic tools to extract information from the literature has great significance in biomedical research. Recently, transformer-based BERT models adapted to the biomedical domain have produced leading results. However, all the existing BERT models for relation classification only utilize partial knowledge from the last layer. In this paper, we will investigate the method of utilizing the entire layer in the fine-tuning process of BERT model. To the best of our knowledge, we are the first to explore this method. The experimental results illustrate that our method improves the BERT model performance and outperforms the state-of-the-art methods on three benchmark datasets for different relation extraction tasks. In addition, further analysis indicates that the key knowledge about the relations can be learned from the last layer of BERT model.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
CS2-Net: Deep Learning Segmentation of Curvilinear Structures in Medical Imaging
Authors:
Lei Mou,
Yitian Zhao,
Huazhu Fu,
Yonghuai Liu,
Jun Cheng,
Yalin Zheng,
Pan Su,
Jianlong Yang,
Li Chen,
Alejandro F Frang,
Masahiro Akiba,
Jiang Liu
Abstract:
Automated detection of curvilinear structures, e.g., blood vessels or nerve fibres, from medical and biomedical images is a crucial early step in automatic image interpretation associated to the management of many diseases. Precise measurement of the morphological changes of these curvilinear organ structures informs clinicians for understanding the mechanism, diagnosis, and treatment of e.g. card…
▽ More
Automated detection of curvilinear structures, e.g., blood vessels or nerve fibres, from medical and biomedical images is a crucial early step in automatic image interpretation associated to the management of many diseases. Precise measurement of the morphological changes of these curvilinear organ structures informs clinicians for understanding the mechanism, diagnosis, and treatment of e.g. cardiovascular, kidney, eye, lung, and neurological conditions. In this work, we propose a generic and unified convolution neural network for the segmentation of curvilinear structures and illustrate in several 2D/3D medical imaging modalities. We introduce a new curvilinear structure segmentation network (CS2-Net), which includes a self-attention mechanism in the encoder and decoder to learn rich hierarchical representations of curvilinear structures. Two types of attention modules - spatial attention and channel attention - are utilized to enhance the inter-class discrimination and intra-class responsiveness, to further integrate local features with their global dependencies and normalization, adaptively. Furthermore, to facilitate the segmentation of curvilinear structures in medical images, we employ a 1x3 and a 3x1 convolutional kernel to capture boundary features. ...
△ Less
Submitted 19 October, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Variational Preparation of the Sachdev-Ye-Kitaev Thermofield Double
Authors:
Vincent Paul Su
Abstract:
We provide an algorithm for preparing the thermofield double (TFD) state of the Sachdev-Ye-Kitaev model without the need for an auxiliary bath. Following previous work, the TFD can be cast as the approximate ground state of a Hamiltonian, $H_{\text{TFD}}$. Using variational quantum circuits, we propose and implement a gradient-based algorithm for learning parameters that find this ground state, an…
▽ More
We provide an algorithm for preparing the thermofield double (TFD) state of the Sachdev-Ye-Kitaev model without the need for an auxiliary bath. Following previous work, the TFD can be cast as the approximate ground state of a Hamiltonian, $H_{\text{TFD}}$. Using variational quantum circuits, we propose and implement a gradient-based algorithm for learning parameters that find this ground state, an application of the variational quantum eigensolver. Concretely, we find quantum circuits that prepare the ground state of $H_{\text{TFD}}$ for the $q=4$ SYK model up to $N=12$.
△ Less
Submitted 10 December, 2020; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Map-Adaptive Goal-Based Trajectory Prediction
Authors:
Lingyao Zhang,
Po-Hsun Su,
Jerrick Hoang,
Galen Clark Haynes,
Micol Marchetti-Bowick
Abstract:
We present a new method for multi-modal, long-term vehicle trajectory prediction. Our approach relies on using lane centerlines captured in rich maps of the environment to generate a set of proposed goal paths for each vehicle. Using these paths -- which are generated at run time and therefore dynamically adapt to the scene -- as spatial anchors, we predict a set of goal-based trajectories along w…
▽ More
We present a new method for multi-modal, long-term vehicle trajectory prediction. Our approach relies on using lane centerlines captured in rich maps of the environment to generate a set of proposed goal paths for each vehicle. Using these paths -- which are generated at run time and therefore dynamically adapt to the scene -- as spatial anchors, we predict a set of goal-based trajectories along with a categorical distribution over the goals. This approach allows us to directly model the goal-directed behavior of traffic actors, which unlocks the potential for more accurate long-term prediction. Our experimental results on both a large-scale internal driving dataset and on the public nuScenes dataset show that our model outperforms state-of-the-art approaches for vehicle trajectory prediction over a 6-second horizon. We also empirically demonstrate that our model is better able to generalize to road scenes from a completely new city than existing methods.
△ Less
Submitted 13 November, 2020; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Contrastive Visual-Linguistic Pretraining
Authors:
Lei Shi,
Kai Shuang,
Shijie Geng,
Peng Su,
Zhengkai Jiang,
Peng Gao,
Zuohui Fu,
Gerard de Melo,
Sen Su
Abstract:
Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label probl…
▽ More
Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label problems, based on the visual features having been pretrained on the Visual Genome dataset. To overcome these issues, we propose unbiased Contrastive Visual-Linguistic Pretraining (CVLP), which constructs a visual self-supervised loss built upon contrastive learning. We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning. Our code is available at: https://github.com/ArcherYunDong/CVLP-.
△ Less
Submitted 26 July, 2020;
originally announced July 2020.
-
Gradient Regularized Contrastive Learning for Continual Domain Adaptation
Authors:
Peng Su,
Shixiang Tang,
Peng Gao,
Di Qiu,
Ni Zhao,
Xiaogang Wang
Abstract:
Human beings can quickly adapt to environmental changes by leveraging learning experience. However, the poor ability of adapting to dynamic environments remains a major challenge for AI models. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labeled source domain and a sequence of unlabeled target domains. There are two majo…
▽ More
Human beings can quickly adapt to environmental changes by leveraging learning experience. However, the poor ability of adapting to dynamic environments remains a major challenge for AI models. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labeled source domain and a sequence of unlabeled target domains. There are two major obstacles in this problem: domain shifts and catastrophic forgetting. In this work, we propose Gradient Regularized Contrastive Learning to solve the above obstacles. At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains. Hence our method can jointly learn both semantically discriminative and domain-invariant features with labeled source domain and unlabeled target domains. The experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach when compared to the state-of-the-art.
△ Less
Submitted 25 July, 2020;
originally announced July 2020.
-
Adversarial Learning for Supervised and Semi-supervised Relation Extraction in Biomedical Literature
Authors:
Peng Su,
K. Vijay-Shanker
Abstract:
Adversarial training is a technique of improving model performance by involving adversarial examples in the training process. In this paper, we investigate adversarial training with multiple adversarial examples to benefit the relation extraction task. We also apply adversarial training technique in semi-supervised scenarios to utilize unlabeled data. The evaluation results on protein-protein inte…
▽ More
Adversarial training is a technique of improving model performance by involving adversarial examples in the training process. In this paper, we investigate adversarial training with multiple adversarial examples to benefit the relation extraction task. We also apply adversarial training technique in semi-supervised scenarios to utilize unlabeled data. The evaluation results on protein-protein interaction and protein subcellular localization task illustrate adversarial training provides improvement on the supervised model, and is also effective on involving unlabeled data in the semi-supervised training case. In addition, our method achieves state-of-the-art performance on two benchmarking datasets.
△ Less
Submitted 25 September, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Adapting Object Detectors with Conditional Domain Normalization
Authors:
Peng Su,
Kun Wang,
Xingyu Zeng,
Shixiang Tang,
Dapeng Chen,
Di Qiu,
Xiaogang Wang
Abstract:
Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific…
▽ More
Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from one domain via a domain embedding module, which learns a domain-vector to characterize the corresponding domain attribute information. Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level's representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domain-specific factors, CDN aligns different domain distributions by modulating the semantic features of one domain conditioned on the learned domain-vector of another domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection.
△ Less
Submitted 22 July, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
The Quantum Entropy Cone of Hypergraphs
Authors:
Ning Bao,
Newton Cheng,
Sergio Hernández-Cuenca,
Vincent P. Su
Abstract:
In this work, we generalize the graph-theoretic techniques used for the holographic entropy cone to study hypergraphs and their analogously-defined entropy cone. This allows us to develop a framework to efficiently compute entropies and prove inequalities satisfied by hypergraphs. In doing so, we discover a class of quantum entropy vectors which reach beyond those of holographic states and obey co…
▽ More
In this work, we generalize the graph-theoretic techniques used for the holographic entropy cone to study hypergraphs and their analogously-defined entropy cone. This allows us to develop a framework to efficiently compute entropies and prove inequalities satisfied by hypergraphs. In doing so, we discover a class of quantum entropy vectors which reach beyond those of holographic states and obey constraints intimately related to the ones obeyed by stabilizer states and linear ranks. We show that, at least up to 4 parties, the hypergraph cone is identical to the stabilizer entropy cone, thus demonstrating that the hypergraph framework is broadly applicable to the study of entanglement entropy. We conjecture that this equality continues to hold for higher party numbers and report on partial progress on this direction. To physically motivate this conjectured equivalence, we also propose a plausible method inspired by tensor networks to construct a quantum state from a given hypergraph such that their entropy vectors match.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Authors:
Matthew Henderson,
Iñigo Casanueva,
Nikola Mrkšić,
Pei-Hao Su,
Tsung-Hsien Wen,
Ivan Vulić
Abstract:
General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pret…
▽ More
General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pretrain using a retrieval-based response selection task, effectively leveraging quantization and subword-level parameterization in the dual encoder to build a lightweight memory- and energy-efficient model. We show that ConveRT achieves state-of-the-art performance across widely established response selection tasks. We also demonstrate that the use of extended dialog history as context yields further performance gains. Finally, we show that pretrained representations from the proposed encoder can be transferred to the intent classification task, yielding strong results across three diverse data sets. ConveRT trains substantially faster than standard sentence encoders or previous state-of-the-art dual encoders. With its reduced size and superior performance, we believe this model promises wider portability and scalability for Conversational AI applications.
△ Less
Submitted 29 April, 2020; v1 submitted 9 November, 2019;
originally announced November 2019.
-
PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
Authors:
Matthew Henderson,
Ivan Vulić,
Iñigo Casanueva,
Paweł Budzianowski,
Daniela Gerz,
Sam Coope,
Georgios Spithourakis,
Tsung-Hsien Wen,
Nikola Mrkšić,
Pei-Hao Su
Abstract:
We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversation…
▽ More
We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Pinpointing Performance Inefficiencies in Java
Authors:
Pengfei Su,
Qingsen Wang,
Milind Chabbi,
Xu Liu
Abstract:
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume data to/from memory that may have been avoided. We present, JXPerf, a lightweight performance analysis tool for pinpointing wasteful m…
▽ More
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume data to/from memory that may have been avoided. We present, JXPerf, a lightweight performance analysis tool for pinpointing wasteful memory operations in Java programs. Traditional byte-code instrumentation for such analysis (1) introduces prohibitive overheads and (2) misses inefficiencies in machine code generation. JXPerf overcomes both of these problems. JXPerf uses hardware performance monitoring units to sample memory locations accessed by a program and uses hardware debug registers to monitor subsequent accesses to the same memory. The result is a lightweight measurement at machine-code level with attribution of inefficiencies to their provenance: machine and source code within full calling contexts. JXPerf introduces only 7% runtime overhead and 7% memory overhead making it useful in production. Guided by JXPerf, we optimize several Java applications by improving code generation and choosing superior data structures and algorithms, which yield significant speedups.
△ Less
Submitted 28 June, 2019;
originally announced June 2019.
-
Training Neural Response Selection for Task-Oriented Dialogue Systems
Authors:
Matthew Henderson,
Ivan Vulić,
Daniela Gerz,
Iñigo Casanueva,
Paweł Budzianowski,
Sam Coope,
Georgios Spithourakis,
Tsung-Hsien Wen,
Nikola Mrkšić,
Pei-Hao Su
Abstract:
Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue.…
▽ More
Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue. To train response selection models for task-oriented dialogue tasks, we propose a novel method which: 1) pretrains the response selection model on large general-domain conversational corpora; and then 2) fine-tunes the pretrained model for the target dialogue domain, relying only on the small in-domain dataset to capture the nuances of the given dialogue domain. Our evaluation on six diverse application domains, ranging from e-commerce to banking, demonstrates the effectiveness of the proposed training method.
△ Less
Submitted 7 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Defocused images removal of axial overlapping scattering particles by using three-dimensional nonlinear diffusion based on digital holography
Authors:
Wei-Na Li,
Zhengyun Zhang,
Jianshe Ma,
Xiaohao Wang,
Ping Su
Abstract:
We propose a three-dimensional nonlinear diffusion method to implement the similar autofocusing function of multiple micro-objects and simultaneously remove the defocused images, which can distinguish the locations of certain sized scattering particles that are overlapping along z-axis. It is applied to all of the reconstruction slices that are generated from the captured hologram after each back…
▽ More
We propose a three-dimensional nonlinear diffusion method to implement the similar autofocusing function of multiple micro-objects and simultaneously remove the defocused images, which can distinguish the locations of certain sized scattering particles that are overlapping along z-axis. It is applied to all of the reconstruction slices that are generated from the captured hologram after each back propagation. For certain small sized particles, the maxima of maximum gradient magnitude of each reconstruction slice appears at the ground truth z position after applying the proposed scheme when the reconstruction range along z-axis is sufficiently long and the reconstruction depth spacing is sufficiently fine. Therefore, the reconstructed image at ground truth z position is remained, while the defocused images are diffused out. The results demonstrated that the proposed scheme can diffuse out the defocused images which are 20 um away from the ground truth z position in spite of that several scattering particles with different diameters are completely overlapping along z-axis with a distance of 800 um when the hologram pixel pitch is 2 um. It also demonstrated that the sparsity distribution of the ground truth z slice cannot be affected by the sparsity distribution of corresponding defocused images when the diameter of the particle is not more than 35um and the reconstruction depth spacing is not less than 20 um.
△ Less
Submitted 14 August, 2019; v1 submitted 23 April, 2019;
originally announced April 2019.
-
A Repository of Conversational Datasets
Authors:
Matthew Henderson,
Paweł Budzianowski,
Iñigo Casanueva,
Sam Coope,
Daniela Gerz,
Girish Kumar,
Nikola Mrkšić,
Georgios Spithourakis,
Pei-Hao Su,
Ivan Vulić,
Tsung-Hsien Wen
Abstract:
Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains…
▽ More
Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.
△ Less
Submitted 28 May, 2019; v1 submitted 12 April, 2019;
originally announced April 2019.