-
Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
Authors:
Victor OK Li,
Yang Han,
Jacqueline CK Lam,
Lawrence YL Cheung
Abstract:
This study introduces Reverse-Speech-Finder (RSF), a groundbreaking neural network backtracking architecture designed to enhance Alzheimer's Disease (AD) diagnosis through speech analysis. Leveraging the power of pre-trained large language models, RSF identifies and utilizes the most probable AD-specific speech markers, addressing both the scarcity of real AD speech samples and the challenge of li…
▽ More
This study introduces Reverse-Speech-Finder (RSF), a groundbreaking neural network backtracking architecture designed to enhance Alzheimer's Disease (AD) diagnosis through speech analysis. Leveraging the power of pre-trained large language models, RSF identifies and utilizes the most probable AD-specific speech markers, addressing both the scarcity of real AD speech samples and the challenge of limited interpretability in existing models. RSF's unique approach consists of three core innovations: Firstly, it exploits the observation that speech markers most probable of predicting AD, defined as the most probable speech-markers (MPMs), must have the highest probability of activating those neurons (in the neural network) with the highest probability of predicting AD, defined as the most probable neurons (MPNs). Secondly, it utilizes a speech token representation at the input layer, allowing backtracking from MPNs to identify the most probable speech-tokens (MPTs) of AD. Lastly, it develops an innovative backtracking method to track backwards from the MPNs to the input layer, identifying the MPTs and the corresponding MPMs, and ingeniously uncovering novel speech markers for AD detection. Experimental results demonstrate RSF's superiority over traditional methods such as SHAP and Integrated Gradients, achieving a 3.5% improvement in accuracy and a 3.2% boost in F1-score. By generating speech data that encapsulates novel markers, RSF not only mitigates the limitations of real data scarcity but also significantly enhances the robustness and accuracy of AD diagnostic models. These findings underscore RSF's potential as a transformative tool in speech-based AD detection, offering new insights into AD-related linguistic deficits and paving the way for more effective non-invasive early intervention strategies.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Memorization vs. Reasoning: Updating LLMs with New Knowledge
Authors:
Aochong Oliver Li,
Tanya Goyal
Abstract:
Large language models (LLMs) encode vast amounts of pre-trained knowledge in their parameters, but updating them as real-world information evolves remains a challenge. Existing methodologies and benchmarks primarily target entity substitutions, failing to capture the full breadth of complex real-world dynamics. In this paper, we introduce Knowledge Update Playground (KUP), an automatic pipeline fo…
▽ More
Large language models (LLMs) encode vast amounts of pre-trained knowledge in their parameters, but updating them as real-world information evolves remains a challenge. Existing methodologies and benchmarks primarily target entity substitutions, failing to capture the full breadth of complex real-world dynamics. In this paper, we introduce Knowledge Update Playground (KUP), an automatic pipeline for simulating realistic knowledge updates reflected in an evidence corpora. KUP's evaluation framework includes direct and indirect probes to both test memorization of updated facts and reasoning over them, for any update learning methods. Next, we present a lightweight method called memory conditioned training (MCT), which conditions tokens in the update corpus on self-generated "memory" tokens during training. Our strategy encourages LLMs to surface and reason over newly memorized knowledge at inference. Our results on two strong LLMs show that (1) KUP benchmark is highly challenging, with the best CPT models achieving $<2\%$ in indirect probing setting (reasoning) and (2) MCT training significantly outperforms prior continued pre-training (CPT) baselines, improving direct probing (memorization) results by up to $25.4\%$.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
Authors:
Ouxiang Li,
Yuan Wang,
Xinting Hu,
Houcheng Jiang,
Tao Liang,
Yanbin Hao,
Guojun Ma,
Fuli Feng
Abstract:
Erasing concepts from large-scale text-to-image (T2I) diffusion models has become increasingly crucial due to the growing concerns over copyright infringement, offensive content, and privacy violations. However, existing methods either require costly fine-tuning or degrade image quality for non-target concepts (i.e., prior) due to inherent optimization limitations. In this paper, we introduce SPEE…
▽ More
Erasing concepts from large-scale text-to-image (T2I) diffusion models has become increasingly crucial due to the growing concerns over copyright infringement, offensive content, and privacy violations. However, existing methods either require costly fine-tuning or degrade image quality for non-target concepts (i.e., prior) due to inherent optimization limitations. In this paper, we introduce SPEED, a model editing-based concept erasure approach that leverages null-space constraints for scalable, precise, and efficient erasure. Specifically, SPEED incorporates Influence-based Prior Filtering (IPF) to retain the most affected non-target concepts during erasing, Directed Prior Augmentation (DPA) to expand prior coverage while maintaining semantic consistency, and Invariant Equality Constraints (IEC) to regularize model editing by explicitly preserving key invariants during the T2I generation process. Extensive evaluations across multiple concept erasure tasks demonstrate that SPEED consistently outperforms existing methods in prior preservation while achieving efficient and high-fidelity concept erasure, successfully removing 100 concepts within just 5 seconds. Our code and models are available at: https://github.com/Ouxiang-Li/SPEED.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease
Authors:
Tingyu Mo,
Jacqueline C. K. Lam,
Victor O. K. Li,
Lawrence Y. L. Cheung
Abstract:
Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be…
▽ More
Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be used to detect such impairments, but they are often mixed with ambiguous, noisy, and irrelevant information, making the AD detection task difficult. Moreover, the limited availability of AD speech samples and variability in their speech styles pose significant challenges in developing robust speech-based AD detection models. To address these challenges, we propose DECT, a novel speech-based domain-specific approach leveraging large language models (LLMs) for fine-grained linguistic analysis and label-switched label-preserved data generation. Our study presents four novelties: We harness the summarizing capabilities of LLMs to identify and distill key Cognitive-Linguistic information from noisy speech transcripts, effectively filtering irrelevant information. We leverage the inherent linguistic knowledge of LLMs to extract linguistic markers from unstructured and heterogeneous audio transcripts. We exploit the compositional ability of LLMs to generate AD speech transcripts consisting of diverse linguistic patterns to overcome the speech data scarcity challenge and enhance the robustness of AD detection models. We use the augmented AD textual speech transcript dataset and a more fine-grained representation of AD textual speech transcript data to fine-tune the AD detection model. The results have shown that DECT demonstrates superior model performance with an 11% improvement in AD detection accuracy on the datasets from DementiaBank compared to the baselines.
△ Less
Submitted 26 May, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Unravelling Causal Genetic Biomarkers of Alzheimer's Disease via Neuron to Gene-token Backtracking in Neural Architecture: A Groundbreaking Reverse-Gene-Finder Approach
Authors:
Victor OK Li,
Yang Han,
Jacqueline CK Lam
Abstract:
Alzheimer's Disease (AD) affects over 55 million people globally, yet the key genetic contributors remain poorly understood. Leveraging recent advancements in genomic foundation models, we present the innovative Reverse-Gene-Finder technology, a ground-breaking neuron-to-gene-token backtracking approach in a neural network architecture to elucidate the novel causal genetic biomarkers driving AD on…
▽ More
Alzheimer's Disease (AD) affects over 55 million people globally, yet the key genetic contributors remain poorly understood. Leveraging recent advancements in genomic foundation models, we present the innovative Reverse-Gene-Finder technology, a ground-breaking neuron-to-gene-token backtracking approach in a neural network architecture to elucidate the novel causal genetic biomarkers driving AD onset. Reverse-Gene-Finder comprises three key innovations. Firstly, we exploit the observation that genes with the highest probability of causing AD, defined as the most causal genes (MCGs), must have the highest probability of activating those neurons with the highest probability of causing AD, defined as the most causal neurons (MCNs). Secondly, we utilize a gene token representation at the input layer to allow each gene (known or novel to AD) to be represented as a discrete and unique entity in the input space. Lastly, in contrast to the existing neural network architectures, which track neuron activations from the input layer to the output layer in a feed-forward manner, we develop an innovative backtracking method to track backwards from the MCNs to the input layer, identifying the Most Causal Tokens (MCTs) and the corresponding MCGs. Reverse-Gene-Finder is highly interpretable, generalizable, and adaptable, providing a promising avenue for application in other disease scenarios.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Authors:
Yuan Wang,
Ouxiang Li,
Tingting Mu,
Yanbin Hao,
Kuien Liu,
Xiang Wang,
Xiangnan He
Abstract:
Recent success of text-to-image (T2I) generation and its increasing practical applications, enabled by diffusion models, require urgent consideration of erasing unwanted concepts, e.g., copyrighted, offensive, and unsafe ones, from the pre-trained models in a precise, timely, and low-cost manner. The twofold demand of concept erasure includes not only a precise removal of the target concept (i.e.,…
▽ More
Recent success of text-to-image (T2I) generation and its increasing practical applications, enabled by diffusion models, require urgent consideration of erasing unwanted concepts, e.g., copyrighted, offensive, and unsafe ones, from the pre-trained models in a precise, timely, and low-cost manner. The twofold demand of concept erasure includes not only a precise removal of the target concept (i.e., erasure efficacy) but also a minimal change on non-target content (i.e., prior preservation), during generation. Existing methods face challenges in maintaining an effective balance between erasure efficacy and prior preservation, and they can be computationally costly. To improve, we propose a precise, fast, and low-cost concept erasure method, called Adaptive Value Decomposer (AdaVD), which is training-free. Our method is grounded in a classical linear algebraic operation of computing the orthogonal complement, implemented in the value space of each cross-attention layer within the UNet of diffusion models. We design a shift factor to adaptively navigate the erasure strength, enhancing effective prior preservation without sacrificing erasure efficacy. Extensive comparative experiments with both training-based and training-free state-of-the-art methods demonstrate that the proposed AdaVD excels in both single and multiple concept erasure, showing 2 to 10 times improvement in prior preservation than the second best, meanwhile achieving the best or near best erasure efficacy. AdaVD supports a series of diffusion models and downstream image generation tasks, with code available on: https://github.com/WYuan1001/AdaVD.
△ Less
Submitted 30 March, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
A Simple and Fast Algorithm for Fair Cuts
Authors:
Jason Li,
Owen Li
Abstract:
We present a simple and faster algorithm for computing fair cuts on undirected graphs, a concept introduced in recent work of Li et al. (SODA 2023). Informally, for any parameter $ε>0$, a $(1+ε)$-fair $(s,t)$-cut is an $(s,t)$-cut such that there exists an $(s,t)$-flow that uses $1/(1+ε)$ fraction of the capacity of every edge in the cut. Our algorithm computes a $(1+ε)$-fair cut in…
▽ More
We present a simple and faster algorithm for computing fair cuts on undirected graphs, a concept introduced in recent work of Li et al. (SODA 2023). Informally, for any parameter $ε>0$, a $(1+ε)$-fair $(s,t)$-cut is an $(s,t)$-cut such that there exists an $(s,t)$-flow that uses $1/(1+ε)$ fraction of the capacity of every edge in the cut. Our algorithm computes a $(1+ε)$-fair cut in $\tilde O(m/ε)$ time, improving on the $\tilde O(m/ε^3)$ time algorithm of Li et al. and matching the $\tilde O(m/ε)$ time algorithm of Sherman (STOC 2017) for standard $(1+ε)$-approximate min-cut.
Our main idea is to run Sherman's approximate max-flow/min-cut algorithm iteratively on a (directed) residual graph. While Sherman's algorithm is originally stated for undirected graphs, we show that it provides guarantees for directed graphs that are good enough for our purposes.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective
Authors:
Ouxiang Li,
Jiayin Cai,
Yanbin Hao,
Xiaolong Jiang,
Yao Hu,
Fuli Feng
Abstract:
With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SI…
▽ More
With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SID training paradigm. In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i.e., weakened artifact features and overfitted artifact features. Meanwhile, we discover that the imaging mechanism of synthetic images contributes to heightened local correlations among pixels, suggesting that detectors should be equipped with local awareness. In this light, we propose SAFE, a lightweight and effective detector with three simple image transformations. Firstly, for weakened artifact features, we substitute the down-sampling operator with the crop operator in image pre-processing to help circumvent artifact distortion. Secondly, for overfitted artifact features, we include ColorJitter and RandomRotation as additional data augmentations, to help alleviate irrelevant biases from color discrepancies and semantic differences in limited training samples. Thirdly, for local awareness, we propose a patch-based random masking strategy tailored for SID, forcing the detector to focus on local regions at training. Comparative experiments are conducted on an open-world dataset, comprising synthetic images generated by 26 distinct generative models. Our pipeline achieves a new state-of-the-art performance, with remarkable improvements of 4.5% in accuracy and 2.9% in average precision against existing methods. Our code is available at: https://github.com/Ouxiang-Li/SAFE.
△ Less
Submitted 4 January, 2025; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models
Authors:
Ouxiang Li,
Yanbin Hao,
Zhicai Wang,
Bin Zhu,
Shuo Wang,
Zaixi Zhang,
Fuli Feng
Abstract:
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, w…
▽ More
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks. Specifically, we introduce a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance. Our method involves a two-step learning paradigm. Step-1 incorporates the target classifier into the entire CDM learning under a pretrain-then-finetune fashion, with creating pseudo-labels as model conditions in pretraining and adjusting specified layers with image predictions in fine-tuning. Step-2 presents an iterative image reconstruction method, further enhancing the attack performance through a combination of diffusion priors and target knowledge. Additionally, we propose an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier. Extensive experiments demonstrate that Diff-MI significantly improves generative fidelity with an average decrease of 20\% in FID while maintaining competitive attack accuracy compared to state-of-the-art methods across various datasets and models. Our code is available at: \url{https://github.com/Ouxiang-Li/Diff-MI}.
△ Less
Submitted 21 November, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
A Sanity Check for AI-generated Image Detection
Authors:
Shilin Yan,
Ouxiang Li,
Jiayin Cai,
Yanbin Hao,
Xiaolong Jiang,
Yao Hu,
Weidi Xie
Abstract:
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th…
▽ More
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc. While evaluating on existing benchmarks, for example, AIGCDetectBenchmark and GenImage, AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods, and on our proposed challenging Chameleon benchmarks, it also achieves the promising results, despite this problem for detecting AI-generated images is far from being solved.
△ Less
Submitted 15 February, 2025; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Authors:
Aashiq Muhamed,
Oscar Li,
David Woodruff,
Mona Diab,
Virginia Smith
Abstract:
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdien…
▽ More
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Themis: Automatic and Efficient Deep Learning System Testing with Strong Fault Detection Capability
Authors:
Dong Huang,
Tsz On Li,
Xiaofei Xie,
Heming Cui
Abstract:
Deep Learning Systems (DLSs) have been widely applied in safety-critical tasks such as autopilot. However, when a perturbed input is fed into a DLS for inference, the DLS often has incorrect outputs (i.e., faults). DLS testing techniques (e.g., DeepXplore) detect such faults by generating perturbed inputs to explore data flows that induce faults. Since a DLS often has infinitely many data flows, e…
▽ More
Deep Learning Systems (DLSs) have been widely applied in safety-critical tasks such as autopilot. However, when a perturbed input is fed into a DLS for inference, the DLS often has incorrect outputs (i.e., faults). DLS testing techniques (e.g., DeepXplore) detect such faults by generating perturbed inputs to explore data flows that induce faults. Since a DLS often has infinitely many data flows, existing techniques require developers to manually specify a set of activation values in a DLS's neurons for exploring fault-inducing data flows. Unfortunately, recent studies show that such manual effort is tedious and can detect only a tiny proportion of fault-inducing data flows.
In this paper, we present Themis, the first automatic DLS testing system, which attains strong fault detection capability by ensuring a full coverage of fault-inducing data flows at a high probability. Themis carries a new workflow for automatically and systematically revealing data flows whose internal neurons' outputs vary substantially when the inputs are slightly perturbed, as these data flows are likely fault-inducing. We evaluated Themis on ten different DLSs and found that on average the number of faults detected by Themis was 3.78X more than four notable DLS testing techniques. By retraining all evaluated DLSs with the detected faults, Themis also increased (regained) these DLSs' accuracies on average 14.7X higher than all baselines.
△ Less
Submitted 16 August, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
OmniPred: Language Models as Universal Regressors
Authors:
Xingyou Song,
Oscar Li,
Chansoo Lee,
Bangding Yang,
Daiyi Peng,
Sagi Perel,
Yutian Chen
Abstract:
Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ data from arbitrary formats. Using data sourced from Google Vizier, on…
▽ More
Regression is a powerful tool to accurately predict the outcome metric of a system given a set of parameters, but has traditionally been restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ data from arbitrary formats. Using data sourced from Google Vizier, one of the largest proprietary blackbox optimization databases in the world, our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models.
△ Less
Submitted 30 January, 2025; v1 submitted 22 February, 2024;
originally announced February 2024.
-
NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation
Authors:
Oliver Li,
Mallika Subramanian,
Arkadiy Saakyan,
Sky CH-Wang,
Smaranda Muresan
Abstract:
Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting…
▽ More
Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting large language models with a small collection of expert-annotated social norms. We show that our generated dialogues are of high quality through human evaluation and further evaluate the performance of existing large language models on this task. Our findings point towards new directions for understanding the nuances of social norms as they manifest in conversational contexts that span across languages and cultures.
△ Less
Submitted 24 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Authors:
Sky CH-Wang,
Arkadiy Saakyan,
Oliver Li,
Zhou Yu,
Smaranda Muresan
Abstract:
Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on developing computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approa…
▽ More
Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on developing computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approach by leveraging discussions on a Chinese Q&A platform (Zhihu) and the existing SocialChemistry dataset as proxies for contrasting cultural axes, align social situations cross-culturally, and extract social norms from texts using in-context learning. Embedding Chain-of-Thought prompting in a human-AI collaborative framework, we build a high-quality dataset of 3,069 social norms aligned with social situations across Chinese and American cultures alongside corresponding free-text explanations. To test the ability of models to reason about social norms across cultures, we introduce the task of explainable social norm entailment, showing that existing models under 3B parameters have significant room for improvement in both automatic and human evaluation. Further analysis of cross-cultural norm differences based on our dataset shows empirical alignment with the social orientations framework, revealing several situational and descriptive nuances in norms across these cultures.
△ Less
Submitted 22 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
Authors:
Oscar Li,
James Harrison,
Jascha Sohl-Dickstein,
Virginia Smith,
Luke Metz
Abstract:
Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolutio…
▽ More
Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolution strategies (ES) by interleaving partial unrolls and gradient updates. In this work, we propose a general class of unbiased online evolution strategies methods. We analytically and empirically characterize the variance of this class of gradient estimators and identify the one with the least variance, which we term Noise-Reuse Evolution Strategies (NRES). Experimentally, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning.
△ Less
Submitted 9 December, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Is ChatGPT the Ultimate Programming Assistant -- How far is it?
Authors:
Haoye Tian,
Weiqi Lu,
Tsz On Li,
Xunzhu Tang,
Shing-Chi Cheung,
Jacques Klein,
Tegawendé F. Bissyandé
Abstract:
Recently, the ChatGPT LLM has received great attention: it can be used as a bot for discussing source code, prompting it to suggest changes, provide descriptions or even generate code. Typical demonstrations generally focus on existing benchmarks, which may have been used in model training (i.e., data leakage). To assess the feasibility of using an LLM as a useful assistant bot for programmers, we…
▽ More
Recently, the ChatGPT LLM has received great attention: it can be used as a bot for discussing source code, prompting it to suggest changes, provide descriptions or even generate code. Typical demonstrations generally focus on existing benchmarks, which may have been used in model training (i.e., data leakage). To assess the feasibility of using an LLM as a useful assistant bot for programmers, we must assess its realistic capabilities on unseen problems as well as its capabilities on various tasks. In this paper, we present an empirical study of ChatGPT's potential as a fully automated programming assistant, focusing on the tasks of code generation, program repair, and code summariziation. The study investigates ChatGPT's performance on common programming problems and compares it with state-of-the-art approaches on two benchmarks. Among several findings, our study shows that ChatGPT is effective in dealing with common programming problems. However, our experiments also reveal limitations in terms of its attention span: detailed descriptions will constrain the focus of ChatGPT and prevent it from leveraging its vast knowledge to solve the actual problem. Surprisingly, we have identified the ability of ChatGPT to reason the original intention of the code. We expect future work to build on this insight for dealing with the open question of the oracle problem. Our findings contribute interesting insights to the development of LLMs for programming assistance, notably by demonstrating the importance of prompt engineering, and providing a better understanding of ChatGPT's practical applications for software engineering.
△ Less
Submitted 31 August, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
Authors:
Zhicai Wang,
Yanbin Hao,
Tingting Mu,
Ouxiang Li,
Shuo Wang,
Xiangnan He
Abstract:
It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match. Although transductive ZSL (TZSL) attempts to improve this by allowing the use of unlabelled examples from the unseen classes, there is still a high level of distribution shift. We propose a novel TZSL model (named as…
▽ More
It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match. Although transductive ZSL (TZSL) attempts to improve this by allowing the use of unlabelled examples from the unseen classes, there is still a high level of distribution shift. We propose a novel TZSL model (named as Bi-VAEGAN), which largely improves the shift by a strengthened distribution alignment between the visual and auxiliary spaces. The key proposal of the model design includes (1) a bi-directional distribution alignment, (2) a simple but effective L_2-norm based feature normalization approach, and (3) a more sophisticated unseen class prior estimation approach. In benchmark evaluation using four datasets, Bi-VAEGAN achieves the new state of the arts under both the standard and generalized TZSL settings. Code could be found at https://github.com/Zhicaiwww/Bi-VAEGAN
△ Less
Submitted 19 March, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Affective Idiosyncratic Responses to Music
Authors:
Sky CH-Wang,
Evan Li,
Oliver Li,
Smaranda Muresan,
Zhou Yu
Abstract:
Affective responses to music are highly personal. Despite consensus that idiosyncratic factors play a key role in regulating how listeners emotionally respond to music, precisely measuring the marginal effects of these variables has proved challenging. To address this gap, we develop computational methods to measure affective responses to music from over 403M listener comments on a Chinese social…
▽ More
Affective responses to music are highly personal. Despite consensus that idiosyncratic factors play a key role in regulating how listeners emotionally respond to music, precisely measuring the marginal effects of these variables has proved challenging. To address this gap, we develop computational methods to measure affective responses to music from over 403M listener comments on a Chinese social music platform. Building on studies from music psychology in systematic and quasi-causal analyses, we test for musical, lyrical, contextual, demographic, and mental health effects that drive listener affective responses. Finally, motivated by the social phenomenon known as wǎng-yì-yún, we identify influencing factors of platform user self-disclosures, the social support they receive, and notable differences in discloser user activity.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
COMET: Coverage-guided Model Generation For Deep Learning Library Testing
Authors:
Meiziniu Li,
Jialun Cao,
Yongqiang Tian,
Tsz On Li,
Ming Wen,
Shing-Chi Cheung
Abstract:
Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. Techniques have been proposed to generate various DL models and apply them to test these libraries. However, their test effectiveness is constrained by the diversity of layer API calls in their generated DL models. Our…
▽ More
Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. Techniques have been proposed to generate various DL models and apply them to test these libraries. However, their test effectiveness is constrained by the diversity of layer API calls in their generated DL models. Our study reveals that these techniques can cover at most 34.1% layer inputs, 25.9% layer parameter values, and 15.6% layer sequences. As a result, we find that many bugs arising from specific layer API calls (i.e., specific layer inputs, parameter values, or layer sequences) can be missed by existing techniques. Because of this limitation, we propose COMET to effectively generate DL models with diverse layer API calls for DL library testing. COMET: (1) designs a set of mutation operators and a coverage-based search algorithm to diversify layer inputs, layer parameter values, and layer sequences in DL models. (2) proposes a model synthesis method to boost the test efficiency without compromising the layer API call diversity. Our evaluation result shows that COMET outperforms baselines by covering twice as many layer inputs (69.7% vs. 34.1%), layer parameter values (50.2% vs. 25.9%), and layer sequences (39.0% vs. 15.6%) as those by the state-of-the-art. Moreover, COMET covers 3.4% more library branches than those by existing techniques. Finally, COMET detects 32 new bugs in the latest version of eight popular DL libraries, including TensorFlow and MXNet, with 21 of them confirmed by DL library developers and 7 of those confirmed bugs have been fixed by developers.
△ Less
Submitted 30 January, 2023; v1 submitted 2 August, 2022;
originally announced August 2022.
-
SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling
Authors:
Yamei Tu,
Olga Li,
Junpeng Wang,
Han-Wei Shen,
Przemek Powalko,
Irina Tomescu-Dubrow,
Kazimierz M. Slomczynski,
Spyros Blanas,
J. Craig Jenkins
Abstract:
Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in di…
▽ More
Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus helping users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Show Me How To Revise: Improving Lexically Constrained Sentence Generation with XLNet
Authors:
Xingwei He,
Victor O. K. Li
Abstract:
Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the actio…
▽ More
Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the action to be taken, resulting in many invalid refinements. To overcome this challenge, we used a classifier to instruct the MCMC-based models where and how to refine the candidate sentences. First, we developed two methods to create synthetic data on which the pre-trained model is fine-tuned to obtain a reliable classifier. Next, we proposed a two-step approach, "Predict and Revise", for constrained sentence generation. During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence. During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior. We compared our proposed models with many strong baselines on two tasks, generating sentences with lexical constraints and text infilling. Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity. Our code and pre-trained models are available at https://github.com/NLPCode/MCMCXLNet.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Deep-AIR: A Hybrid CNN-LSTM Framework for Air Quality Modeling in Metropolitan Cities
Authors:
Yang Han,
Qi Zhang,
Victor O. K. Li,
Jacqueline C. K. Lam
Abstract:
Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dyna…
▽ More
Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dynamics. Our Deep-AIR, a novel hybrid deep learning framework that combines a convolutional neural network with a long short-term memory network, aims to address this gap to provide fine-grained city-wide air pollution estimation and station-wide forecast. Our proposed framework creates 1x1 convolution layers to strengthen the learning of cross-feature spatial interaction between air pollution and important urban dynamic features, particularly road density, building density/height, and street canyon effect. Using Hong Kong and Beijing as case studies, Deep-AIR achieves a higher accuracy than our baseline models. Our model attains an accuracy of 67.6%, 77.2%, and 66.1% in fine-grained hourly estimation, 1-hr, and 24-hr air pollution forecast for Hong Kong, and an accuracy of 65.0%, 75.3%, and 63.5% for Beijing. Our saliency analysis has revealed that for Hong Kong, street canyon and road density are the best estimators for NO2, while meteorology is the best estimator for PM2.5.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
AQEyes: Visual Analytics for Anomaly Detection and Examination of Air Quality Data
Authors:
Dongyu Liu,
Kalyan Veeramachaneni,
Alexander Geiger,
Victor O. K. Li,
Huamin Qu
Abstract:
Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability…
▽ More
Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability, and the efficiency of analysis. Furthermore, the poor quality of collected air quality data (inconsistently formatted and sometimes missing) also increases the difficulty of analysis substantially. In this paper, we systematically formulate design requirements for a system that can solve these limitations and then propose AQEyes, an integrated visual analytics system for efficiently monitoring, detecting, and examining anomalies in air quality data. In particular, we propose a unified end-to-end tunable machine learning pipeline that includes several data pre-processors and featurizers to deal with data quality issues. The pipeline integrates an efficient unsupervised anomaly detection method that works without the use of labeled data and overcomes the limitations of existing approaches. Further, we develop an interactive visualization system to visualize the outputs from the pipeline. The system incorporates a set of novel visualization and interaction designs, allowing analysts to visually examine air quality dynamics and anomalous events in multiple scales and from multiple facets. We demonstrate the performance of this pipeline through a quantitative evaluation and show the effectiveness of the visualization system using qualitative case studies on real-world datasets.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution
Authors:
Amrith Setlur,
Oscar Li,
Virginia Smith
Abstract:
We categorize meta-learning evaluation into two settings: $\textit{in-distribution}$ [ID], in which the train and test tasks are sampled $\textit{iid}$ from the same underlying task distribution, and $\textit{out-of-distribution}$ [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification ben…
▽ More
We categorize meta-learning evaluation into two settings: $\textit{in-distribution}$ [ID], in which the train and test tasks are sampled $\textit{iid}$ from the same underlying task distribution, and $\textit{out-of-distribution}$ [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given meta-learning method, and 2) consistently comparing the performance of different methods. To address these concerns, we provide suggestions on how to construct FSL benchmarks to allow for ID evaluation as well as more reliable OOD evaluation. Our work aims to inform the meta-learning community about the importance and distinction of ID vs. OOD evaluation, as well as the subtleties of OOD evaluation with current benchmarks.
△ Less
Submitted 27 October, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Label Leakage and Protection in Two-party Split Learning
Authors:
Oscar Li,
Jiankai Sun,
Xin Yang,
Weihao Gao,
Hongyi Zhang,
Junyuan Xie,
Virginia Smith,
Chong Wang
Abstract:
Two-party split learning is a popular technique for learning a model across feature-partitioned data. In this work, we explore whether it is possible for one party to steal the private label information from the other party during split training, and whether there are methods that can protect against such attacks. Specifically, we first formulate a realistic threat model and propose a privacy loss…
▽ More
Two-party split learning is a popular technique for learning a model across feature-partitioned data. In this work, we explore whether it is possible for one party to steal the private label information from the other party during split training, and whether there are methods that can protect against such attacks. Specifically, we first formulate a realistic threat model and propose a privacy loss metric to quantify label leakage in split learning. We then show that there exist two simple yet effective methods within the threat model that can allow one party to accurately recover private ground-truth labels owned by the other party. To combat these attacks, we propose several random perturbation techniques, including $\texttt{Marvell}$, an approach that strategically finds the structure of the noise perturbation by minimizing the amount of label leakage (measured through our quantification metric) of a worst-case adversary. We empirically demonstrate the effectiveness of our protection techniques against the identified attacks, and show that $\texttt{Marvell}$ in particular has improved privacy-utility tradeoffs relative to baseline approaches.
△ Less
Submitted 24 May, 2022; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Is Support Set Diversity Necessary for Meta-Learning?
Authors:
Amrith Setlur,
Oscar Li,
Virginia Smith
Abstract:
Meta-learning is a popular framework for learning with limited data in which an algorithm is produced by training over multiple few-shot learning tasks. For classification problems, these tasks are typically constructed by sampling a small number of support and query examples from a subset of the classes. While conventional wisdom is that task diversity should improve the performance of meta-learn…
▽ More
Meta-learning is a popular framework for learning with limited data in which an algorithm is produced by training over multiple few-shot learning tasks. For classification problems, these tasks are typically constructed by sampling a small number of support and query examples from a subset of the classes. While conventional wisdom is that task diversity should improve the performance of meta-learning, in this work we find evidence to the contrary: we propose a modification to traditional meta-learning approaches in which we keep the support sets fixed across tasks, thus reducing task diversity. Surprisingly, we find that not only does this modification not result in adverse effects, it almost always improves the performance for a variety of datasets and meta-learning methods. We also provide several initial analyses to understand this phenomenon. Our work serves to: (i) more closely investigate the effect of support set construction for the problem of meta-learning, and (ii) suggest a simple, general, and competitive baseline for few-shot learning.
△ Less
Submitted 7 October, 2021; v1 submitted 27 November, 2020;
originally announced November 2020.
-
On the Sparsity of Neural Machine Translation Models
Authors:
Yong Wang,
Longyue Wang,
Victor O. K. Li,
Zhaopeng Tu
Abstract:
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different…
▽ More
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution
Authors:
Yingruo Fan,
Jacqueline C. K. Lam,
Victor O. K. Li
Abstract:
The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via…
▽ More
The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks
Authors:
Yong Wang,
Longyue Wang,
Shuming Shi,
Victor O. K. Li,
Zhaopeng Tu
Abstract:
The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In re…
▽ More
The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Max-min Fairness of K-user Cooperative Rate-Splitting in MISO Broadcast Channel with User Relaying
Authors:
Yijie Mao,
Bruno Clerckx,
Jian Zhang,
Victor O. K. Li,
Mohammed Arafah
Abstract:
Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Outpu…
▽ More
Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Output (MISO) Broadcast Channel (BC) with user relaying. In this work, the existing two-user CRS transmission strategy is generalized to the K-user case. We study the problem of jointly optimizing the precoders, message split, time slot allocation, and relaying user scheduling with the objective of maximizing the minimum rate among users. An efficient self-organizing relaying protocol is first proposed followed by a Successive Convex Approximation (SCA)-based algorithm to jointly optimize time slot, precoders and message split. Numerical results show that the worst-case achievable rate achieved by CRS is significantly increased over that of NRS and SDMA in a wide range of network loads and user deployments. Importantly, the proposed SCA-based algorithm dramatically reduces the computational complexity without any rate loss compared with the conventional algorithm in the literature of CRS. Therefore, we conclude that the proposed K-user CRS is more powerful than the existing transmission schemes.
△ Less
Submitted 5 August, 2020; v1 submitted 17 October, 2019;
originally announced October 2019.
-
Interpretable Image Recognition with Hierarchical Prototypes
Authors:
Peter Hase,
Chaofan Chen,
Oscar Li,
Cynthia Rudin
Abstract:
Vision models are interpretable when they classify objects on the basis of features that a person can directly understand. Recently, methods relying on visual feature prototypes have been developed for this purpose. However, in contrast to how humans categorize objects, these approaches have not yet made use of any taxonomical organization of class labels. With such an approach, for instance, we m…
▽ More
Vision models are interpretable when they classify objects on the basis of features that a person can directly understand. Recently, methods relying on visual feature prototypes have been developed for this purpose. However, in contrast to how humans categorize objects, these approaches have not yet made use of any taxonomical organization of class labels. With such an approach, for instance, we may see why a chimpanzee is classified as a chimpanzee, but not why it was considered to be a primate or even an animal. In this work we introduce a model that uses hierarchically organized prototypes to classify objects at every level in a predefined taxonomy. Hence, we may find distinct explanations for the prediction an image receives at each level of the taxonomy. The hierarchical prototypes enable the model to perform another important task: interpretably classifying images from previously unseen classes at the level of the taxonomy to which they correctly relate, e.g. classifying a hand gun as a weapon, when the only weapons in the training data are rifles. With a subset of ImageNet, we test our model against its counterpart black-box model on two tasks: 1) classification of data from familiar classes, and 2) classification of data from previously unseen classes at the appropriate level in the taxonomy. We find that our model performs approximately as well as its counterpart black-box model while allowing for each classification to be interpreted.
△ Less
Submitted 24 August, 2019; v1 submitted 25 June, 2019;
originally announced June 2019.
-
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations
Authors:
Jiatao Gu,
Yong Wang,
Kyunghyun Cho,
Victor O. K. Li
Abstract:
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which…
▽ More
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks
Authors:
Yushu Chen,
Hao Jing,
Wenlai Zhao,
Zhiqiang Liu,
Ouyi Li,
Liang Qiao,
Wei Xue,
Guangwen Yang
Abstract:
We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to i…
▽ More
We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.
△ Less
Submitted 6 September, 2020; v1 submitted 3 May, 2019;
originally announced May 2019.
-
Rate-Splitting for Multi-User Multi-Antenna Wireless Information and Power Transfer
Authors:
Yijie Mao,
Bruno Clerckx,
Victor O. K. Li
Abstract:
In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ER…
▽ More
In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ERs as noise. In this paper, we investigate the application of linearly-precoded Rate-Splitting (RS) in Multiple Input Single Output (MISO) SWIPT Broadcast Channel (BC). By splitting the messages of IRs into private and common parts and encoding the common parts into a common stream decoded by all IRs, RS manages the interference dynamically. The precoders are designed such that the Weighted Sum Rate (WSR) of IRs is maximized under the total transmit power constraint and the sum energy constraint for ERs. Numerical results show that the proposed RS-assisted strategy provides a better rate-energy tradeoff in MISO SWIPT BC. Under a sum energy constraint of ERs, RS-assisted strategy achieves better WSR performance of IRs than MU--LP and NOMA in a wide range of IR and ER deployments. Hence, we draw the conclusion that RS is superior for downlink SWIPT networks.
△ Less
Submitted 2 July, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Meta-Learning for Low-Resource Neural Machine Translation
Authors:
Jiatao Gu,
Yong Wang,
Yun Chen,
Kyunghyun Cho,
Victor O. K. Li
Abstract:
In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome t…
▽ More
In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro, Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach~\citep{zoph2016transfer} and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing only 16,000 translated words (~600 parallel sentences).
△ Less
Submitted 25 August, 2018;
originally announced August 2018.
-
Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission: Spectral and Energy Efficiency Analysis
Authors:
Yijie Mao,
Bruno Clerckx,
Victor O. K. Li
Abstract:
In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-laye…
▽ More
In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-layer Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. We further propose multi-layer transmission strategies based on the generalized RS and power-domain Non-Orthogonal Multiple Access (NOMA). Two different objectives are studied for the design of the precoders, namely, maximizing the Weighted Sum Rate (WSR) of the unicast messages and maximizing the system Energy Efficiency (EE), both subject to Quality of Service (QoS) rate requirements of all the messages and a sum power constraint. A Weighted Minimum Mean Square Error (WMMSE)-based algorithm and a Successive Convex Approximation (SCA)-based algorithm are proposed to solve the WSR and EE problems, respectively. Numerical results show that the proposed RS-assisted NOUM transmission strategies are more spectrally and energy efficient than the conventional Multi-User Linear-Precoding (MU-LP), Orthogonal Multiple Access (OMA) and power-domain NOMA in a wide range of user deployments (with a diversity of channel directions, channel strengths and qualities of channel state information at the transmitter) and network loads (underloaded and overloaded regimes). It is superior for the downlink multi-antenna NOUM transmission.
△ Less
Submitted 19 September, 2019; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Efficient and DoS-resistant Consensus for Permissioned Blockchains
Authors:
Xusheng Chen,
Shixiong Zhao,
Ji Qi,
Jianyu Jiang,
Haoze Song,
Cheng Wang,
Tsz On Li,
T. -H. Hubert Chan,
Fengwei Zhang,
Xiapu Luo,
Sen Wang,
Gong Zhang,
Heming Cui
Abstract:
Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run a consensus protocol that confirms the same sequence of blocks among all nodes. Unfortunately, when such a permissioned blockchain runs in a large scale on the Internet, these explicit committee nodes can be easily turned down by denial-of-service (DoS) or network partition attacks. Although wor…
▽ More
Existing permissioned blockchain systems designate a fixed and explicit group of committee nodes to run a consensus protocol that confirms the same sequence of blocks among all nodes. Unfortunately, when such a permissioned blockchain runs in a large scale on the Internet, these explicit committee nodes can be easily turned down by denial-of-service (DoS) or network partition attacks. Although work proposes scalable BFT protocols that run on a larger number of committee nodes, their efficiency drops dramatically when only a small number of nodes are attacked.
In this paper, our EGES protocol leverages Intel SGX to develop a new abstraction called "stealth committee", which effectively hides the committee nodes into a large pool of fake committee nodes. EGES selects a distinct group of stealth committee for each block and confirms the same sequence of blocks among all nodes with overwhelming probability. Evaluation on typical geo-distributed settings shows that: (1)EGES is the first permissioned blockchain's consensus protocol that can tolerate tough DoS and network partition attacks; and (2) EGES achieves comparable throughput and latency as existing permissioned blockchains' protocols
△ Less
Submitted 14 December, 2020; v1 submitted 7 August, 2018;
originally announced August 2018.
-
Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition
Authors:
Yingruo Fan,
Jacqueline C. K. Lam,
Victor O. K. Li
Abstract:
Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning…
▽ More
Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning power of CNN models by capturing both the global and the local features from multiple human face sub-regions. Second, the weighted prediction scores from each sub-network are aggregated to produce the final prediction of high accuracy. Third, we investigate the effects of different sub-regions of the whole face on facial expression recognition. Our proposed method is evaluated based on two well-known publicly available facial expression databases: AFEW 7.0 and RAF-DB, and has been shown to achieve the state-of-the-art recognition accuracy.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Large Margin Few-Shot Learning
Authors:
Yong Wang,
Xiao-Ming Wu,
Qimai Li,
Jiatao Gu,
Wangmeng Xiang,
Lei Zhang,
Victor O. K. Li
Abstract:
The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive exp…
▽ More
The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning methods, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.
△ Less
Submitted 21 September, 2018; v1 submitted 8 July, 2018;
originally announced July 2018.
-
This Looks Like That: Deep Learning for Interpretable Image Recognition
Authors:
Chaofan Chen,
Oscar Li,
Chaofan Tao,
Alina Jade Barnett,
Jonathan Su,
Cynthia Rudin
Abstract:
When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the networ…
▽ More
When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.
△ Less
Submitted 28 December, 2019; v1 submitted 27 June, 2018;
originally announced June 2018.
-
Rate-Splitting Multiple Access for Coordinated Multi-Point Joint Transmission
Authors:
Yijie Mao,
Bruno Clerckx,
Victor O. K. Li
Abstract:
As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the recei…
▽ More
As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the receivers, RSMA has the capability of partially decoding the interference and partially treating the interference as noise, and therefore copes with a wide range of user deployments and network loads. In this work, we further study RSMA in downlink Coordinated Multi-Point (CoMP) Joint Transmission (JT) networks by investigating the optimal beamformer design to maximize the Weighted Sum-Rate (WSR) of all users subject to individual Quality of Service (QoS) rate constraints and per base station power constraints. Numerical results show that, in CoMP JT, RSMA achieves significant WSR improvement over SDMA and NOMA in a wide range of inter-user and inter-cell channel strength disparities. Specifically, SDMA (resp. NOMA) is more suited to deployments with little (resp. large) inter-user channel strength disparity and large (resp. little) inter-cell channel disparity, while RSMA is suited to any deployment. We conclude that RSMA provides rate, robustness and QoS enhancements over SDMA and NOMA in CoMP JT networks.
△ Less
Submitted 16 January, 2019; v1 submitted 27 April, 2018;
originally announced April 2018.
-
Energy Efficiency of Rate-Splitting Multiple Access, and Performance Benefits over SDMA and NOMA
Authors:
Yijie Mao,
Bruno Clerckx,
Victor O. K. Li
Abstract:
Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of…
▽ More
Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of the interference as noise. Recently, RSMA has been shown to outperform both SDMA and NOMA rate-wise in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities of Channel State Information at the Transmitter). Moreover, RSMA was shown to provide spectral efficiency and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers. In this paper, we build upon those results and investigate the energy efficiency of RSMA compared to SDMA and NOMA. Considering a multiple-input single-output broadcast channel, we show that RSMA is more energy-efficient than SDMA and NOMA in a wide range of user deployments (with a diversity of channel directions and channel strengths). We conclude that RSMA is more spectrally and energy-efficient than SDMA and NOMA.
△ Less
Submitted 21 November, 2018; v1 submitted 23 April, 2018;
originally announced April 2018.
-
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Authors:
Yun Chen,
Victor O. K. Li,
Kyunghyun Cho,
Samuel R. Bowman
Abstract:
Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The…
▽ More
Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system.
△ Less
Submitted 27 August, 2018; v1 submitted 21 April, 2018;
originally announced April 2018.
-
Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission
Authors:
Yijie Mao,
Bruno Clerckx,
Victor O. K. Li
Abstract:
In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast m…
▽ More
In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast message into common and private parts and encoding the common parts along with the multicast message into a super-common stream decoded by all users, the SIC is used for the dual purpose of separating the unicast and multicast streams as well as better managing the multi-user interference between the unicast streams. The precoders are designed with the objective of maximizing the Weighted Sum Rate (WSR) of the unicast messages subject to a Quality of Service (QoS) requirement of the multicast message and a sum power constraint. Numerical results show that RS outperforms existing Multi-User Linear-Precoding (MU-LP) and power-domain Non-Orthogonal Multiple Access (NOMA) in a wide range of user deployments (with a diversity of channel directions and channel strengths). Moreover, since one layer of SIC is required to separate the unicast and multicast streams, the performance gain of RS comes without any increase in the receiver complexity compared with MU-LP. Hence, in such non-orthogonal unicast and multicast transmissions, RS provides rate and QoS enhancements at no extra cost for the receivers.
△ Less
Submitted 16 February, 2018; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Universal Neural Machine Translation for Extremely Low Resource Languages
Authors:
Jiatao Gu,
Hany Hassan,
Jacob Devlin,
Victor O. K. Li
Abstract:
In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual wo…
▽ More
In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multilingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting.
△ Less
Submitted 16 April, 2018; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Zero-Resource Neural Machine Translation with Multi-Agent Communication Game
Authors:
Yun Chen,
Yang Liu,
Victor O. K. Li
Abstract:
While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed…
▽ More
While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.
△ Less
Submitted 8 February, 2018;
originally announced February 2018.
-
A Unified Framework for Wide Area Measurement System Planning
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns i…
▽ More
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Delay Aware Intelligent Transient Stability Assessment System
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focu…
▽ More
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Non-Autoregressive Neural Machine Translation
Authors:
Jiatao Gu,
James Bradbury,
Caiming Xiong,
Victor O. K. Li,
Richard Socher
Abstract:
Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we ac…
▽ More
Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.
△ Less
Submitted 8 March, 2018; v1 submitted 6 November, 2017;
originally announced November 2017.