-
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting
Authors:
Liang Yu,
Lai Tu,
Xiang Bai
Abstract:
Multivariate time-series forecasting holds immense value across diverse applications, requiring methods to effectively capture complex temporal and inter-variable dynamics. A key challenge lies in uncovering the intrinsic patterns that govern predictability, beyond conventional designs, focusing on network architectures to explore latent relationships or temporal dependencies. Inspired by signal d…
▽ More
Multivariate time-series forecasting holds immense value across diverse applications, requiring methods to effectively capture complex temporal and inter-variable dynamics. A key challenge lies in uncovering the intrinsic patterns that govern predictability, beyond conventional designs, focusing on network architectures to explore latent relationships or temporal dependencies. Inspired by signal decomposition, this paper posits that time series predictability is derived from periodic characteristics at different frequencies. Consequently, we propose a novel time series forecasting method based on multi-frequency reference series correlation analysis. Through spectral analysis on long-term training data, we identify dominant spectral components and their harmonics to design base-pattern reference series. Unlike signal decomposition, which represents the original series as a linear combination of basis signals, our method uses a transformer model to compute cross-attention between the original series and reference series, capturing essential features for forecasting. Experiments on major open and synthetic datasets show state-of-the-art performance. Furthermore, by focusing on attention with a small number of reference series rather than pairwise variable attention, our method ensures scalability and broad applicability. The source code is available at: https://github.com/yuliang555/MFRS
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Assessing the Limitations of Large Language Models in Clinical Fact Decomposition
Authors:
Monica Munnangi,
Akshay Swaminathan,
Jason Alan Fries,
Jenelle Jindal,
Sanjana Narayanan,
Ivan Lopez,
Lucia Tu,
Philip Chung,
Jesutofunmi A. Omiye,
Mehr Kashyap,
Nigam Shah
Abstract:
Verifying factual claims is critical for using large language models (LLMs) in healthcare. Recent work has proposed fact decomposition, which uses LLMs to rewrite source text into concise sentences conveying a single piece of information, as an approach for fine-grained fact verification. Clinical documentation poses unique challenges for fact decomposition due to dense terminology and diverse not…
▽ More
Verifying factual claims is critical for using large language models (LLMs) in healthcare. Recent work has proposed fact decomposition, which uses LLMs to rewrite source text into concise sentences conveying a single piece of information, as an approach for fine-grained fact verification. Clinical documentation poses unique challenges for fact decomposition due to dense terminology and diverse note types. To explore these challenges, we present FactEHR, a dataset consisting of full document fact decompositions for 2,168 clinical notes spanning four types from three hospital systems. Our evaluation, including review by clinicians, highlights significant variability in the quality of fact decomposition for four commonly used LLMs, with some LLMs generating 2.6x more facts per sentence than others. The results underscore the need for better LLM capabilities to support factual verification in clinical text. To facilitate future research in this direction, we plan to release our code at \url{https://github.com/som-shahlab/factehr}.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Authors:
Lifu Tu,
Rui Meng,
Shafiq Joty,
Yingbo Zhou,
Semih Yavuz
Abstract:
Large language models (LLMs) have demonstrated strong capabilities in text understanding and generation. However, they often lack factuality, producing a mixture of true and false information, especially in long-form generation. In this work, we investigates the factuality of long-form text generation across various large language models (LLMs), including GPT-4, Gemini-1.5-Pro, Claude-3-Opus, Llam…
▽ More
Large language models (LLMs) have demonstrated strong capabilities in text understanding and generation. However, they often lack factuality, producing a mixture of true and false information, especially in long-form generation. In this work, we investigates the factuality of long-form text generation across various large language models (LLMs), including GPT-4, Gemini-1.5-Pro, Claude-3-Opus, Llama-3-70B, and Mistral. Our analysis reveals that factuality scores tend to decline in later sentences of the generated text, accompanied by a rise in the number of unsupported claims. Furthermore, we explore the effectiveness of different evaluation settings to assess whether LLMs can accurately judge the correctness of their own outputs: Self-Known (the percentage of supported atomic claims, decomposed from LLM outputs, that the corresponding LLMs judge as correct) and Self-Unknown (the percentage of unsupported atomic claims that the corresponding LLMs judge as incorrect). The results indicate that even advanced models like GPT-4 and Gemini-1.5-Pro fail to achieve perfect Self-Known scores, while their Self-Unknown scores remain notably above zero, reflecting ongoing uncertainty in their self-assessments. Moreover, we find a correlation between higher Self-Known scores and improved factuality, while higher Self-Unknown scores are associated with lower factuality. Interestingly, even without significant changes in the models' self-judgment (Self-Known and Self-Unknown), the number of unsupported claims can increases, likely as an artifact of long-form generation. These findings show the limitations of current LLMs in long-form generation, and provide valuable insights for improving factuality in long-form text generation.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models
Authors:
Rui Meng,
Ye Liu,
Lifu Tu,
Daqing He,
Yingbo Zhou,
Semih Yavuz
Abstract:
Phrases are fundamental linguistic units through which humans convey semantics. This study critically examines the capacity of API-based large language models (LLMs) to comprehend phrase semantics, utilizing three human-annotated datasets. We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions and explore the impact of common promptin…
▽ More
Phrases are fundamental linguistic units through which humans convey semantics. This study critically examines the capacity of API-based large language models (LLMs) to comprehend phrase semantics, utilizing three human-annotated datasets. We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions and explore the impact of common prompting techniques, including few-shot demonstrations and Chain-of-Thought reasoning. Our findings reveal that LLMs greatly outperform traditional embedding methods across the datasets; however, they do not show a significant advantage over fine-tuned methods. The effectiveness of advanced prompting strategies shows variability. We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics. Code and data can be found at https://github.com/memray/llm_phrase_semantics.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Authors:
Can Qin,
Congying Xia,
Krithika Ramakrishnan,
Michael Ryoo,
Lifu Tu,
Yihao Feng,
Manli Shu,
Honglu Zhou,
Anas Awadalla,
Jun Wang,
Senthil Purushwalkam,
Le Xue,
Yingbo Zhou,
Huan Wang,
Silvio Savarese,
Juan Carlos Niebles,
Zeyuan Chen,
Ran Xu,
Caiming Xiong
Abstract:
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi…
▽ More
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of visual tokens and the computational demands associated with generating long-sequence videos. To further address the computational costs, we propose a divide-and-merge strategy that maintains temporal consistency across video segments. Our Diffusion Transformer (DiT) model incorporates spatial and temporal self-attention layers, enabling robust generalization across different timeframes and aspect ratios. We have devised a data processing pipeline from the very beginning and collected over 13M high-quality video-text pairs. The pipeline includes multiple steps such as clipping, text detection, motion estimation, aesthetics scoring, and dense captioning based on our in-house video-LLM model. Training the VidVAE and DiT models required approximately 40 and 642 H100 days, respectively. Our model supports over 14-second 720p video generation in an end-to-end way and demonstrates competitive performance against state-of-the-art T2V models.
△ Less
Submitted 31 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Distillation Learning Guided by Image Reconstruction for One-Shot Medical Image Segmentation
Authors:
Feng Zhou,
Yanjie Zhou,
Longjie Wang,
Yun Peng,
David E. Carlson,
Liyun Tu
Abstract:
Traditional one-shot medical image segmentation (MIS) methods use registration networks to propagate labels from a reference atlas or rely on comprehensive sampling strategies to generate synthetic labeled data for training. However, these methods often struggle with registration errors and low-quality synthetic images, leading to poor performance and generalization. To overcome this, we introduce…
▽ More
Traditional one-shot medical image segmentation (MIS) methods use registration networks to propagate labels from a reference atlas or rely on comprehensive sampling strategies to generate synthetic labeled data for training. However, these methods often struggle with registration errors and low-quality synthetic images, leading to poor performance and generalization. To overcome this, we introduce a novel one-shot MIS framework based on knowledge distillation, which allows the network to directly 'see' real images through a distillation process guided by image reconstruction. It focuses on anatomical structures in a single labeled image and a few unlabeled ones. A registration-based data augmentation network creates realistic, labeled samples, while a feature distillation module helps the student network learn segmentation from these samples, guided by the teacher network. During inference, the streamlined student network accurately segments new images. Evaluations on three public datasets (OASIS for T1 brain MRI, BCV for abdomen CT, and VerSe for vertebrae CT) show superior segmentation performance and generalization across different medical image datasets and modalities compared to leading methods. Our code is available at https://github.com/NoviceFodder/OS-MedSeg.
△ Less
Submitted 5 January, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration
Authors:
Nhi Ngoc-Yen Nguyen,
Le-Huy Tu,
Dieu-Phuong Nguyen,
Nhat-Tan Do,
Minh Triet Thai,
Bao-Thien Nguyen-Tat
Abstract:
Purpose: Our study presents an enhanced approach to medical image caption generation by integrating concept detection into attention mechanisms. Method: This method utilizes sophisticated models to identify critical concepts within medical images, which are then refined and incorporated into the caption generation process. Results: Our concept detection task, which employed the Swin-V2 model, achi…
▽ More
Purpose: Our study presents an enhanced approach to medical image caption generation by integrating concept detection into attention mechanisms. Method: This method utilizes sophisticated models to identify critical concepts within medical images, which are then refined and incorporated into the caption generation process. Results: Our concept detection task, which employed the Swin-V2 model, achieved an F1 score of 0.58944 on the validation set and 0.61998 on the private test set, securing the third position. For the caption prediction task, our BEiT+BioBart model, enhanced with concept integration and post-processing techniques, attained a BERTScore of 0.60589 on the validation set and 0.5794 on the private test set, placing ninth. Conclusion: These results underscore the efficacy of concept-aware algorithms in generating precise and contextually appropriate medical descriptions. The findings demonstrate that our approach significantly improves the quality of medical image captions, highlighting its potential to enhance medical image interpretation and documentation, thereby contributing to improved healthcare outcomes.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model
Authors:
Mingxiang Fu,
Yu Song,
Jiameng Lv,
Liang Cao,
Peng Jia,
Nan Li,
Xiangru Li,
Jifeng Liu,
A-Li Luo,
Bo Qiu,
Shiyin Shen,
Liangping Tu,
Lili Wang,
Shoulin Wei,
Haifeng Yang,
Zhenping Yi,
Zhiqiang Zou
Abstract:
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. He…
▽ More
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Label-efficient multi-organ segmentation with a diffusion model
Authors:
Yongzhi Huang,
Fengjun Xi,
Liyun Tu,
Jinxin Zhu,
Haseeb Hassan,
Liyilei Su,
Yun Peng,
Jingyu Li,
Jun Ma,
Bingding Huang
Abstract:
Accurate segmentation of multiple organs in Computed Tomography (CT) images plays a vital role in computer-aided diagnosis systems. While various supervised learning approaches have been proposed recently, these methods heavily depend on a large amount of high-quality labeled data, which are expensive to obtain in practice. To address this challenge, we propose a label-efficient framework using kn…
▽ More
Accurate segmentation of multiple organs in Computed Tomography (CT) images plays a vital role in computer-aided diagnosis systems. While various supervised learning approaches have been proposed recently, these methods heavily depend on a large amount of high-quality labeled data, which are expensive to obtain in practice. To address this challenge, we propose a label-efficient framework using knowledge transfer from a pre-trained diffusion model for CT multi-organ segmentation. Specifically, we first pre-train a denoising diffusion model on 207,029 unlabeled 2D CT slices to capture anatomical patterns. Then, the model backbone is transferred to the downstream multi-organ segmentation task, followed by fine-tuning with few labeled data. In fine-tuning, two fine-tuning strategies, linear classification and fine-tuning decoder, are employed to enhance segmentation performance while preserving learned representations. Quantitative results show that the pre-trained diffusion model is capable of generating diverse and realistic 256x256 CT images (Fréchet inception distance (FID): 11.32, spatial Fréchet inception distance (sFID): 46.93, F1-score: 73.1%). Compared to state-of-the-art methods for multi-organ segmentation, our method achieves competitive performance on the FLARE 2022 dataset, particularly in limited labeled data scenarios. After fine-tuning with 1% and 10% labeled data, our method achieves dice similarity coefficients (DSCs) of 71.56% and 78.51%, respectively. Remarkably, the method achieves a DSC score of 51.81% using only four labeled CT slices. These results demonstrate the efficacy of our approach in overcoming the limitations of supervised learning approaches that is highly dependent on large-scale labeled data.
△ Less
Submitted 19 March, 2025; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding
Authors:
Lifu Tu,
Semih Yavuz,
Jin Qu,
Jiacheng Xu,
Rui Meng,
Caiming Xiong,
Yingbo Zhou
Abstract:
Large Language Models (LLMs) have demonstrated a powerful ability for text generation. However, achieving optimal results with a given prompt or instruction can be challenging, especially for billion-sized models. Additionally, undesired behaviors such as toxicity or hallucinations can manifest. While much larger models (e.g., ChatGPT) may demonstrate strength in mitigating these issues, there is…
▽ More
Large Language Models (LLMs) have demonstrated a powerful ability for text generation. However, achieving optimal results with a given prompt or instruction can be challenging, especially for billion-sized models. Additionally, undesired behaviors such as toxicity or hallucinations can manifest. While much larger models (e.g., ChatGPT) may demonstrate strength in mitigating these issues, there is still no guarantee of complete prevention. In this work, we propose formalizing text generation as a future-constrained generation problem to minimize undesirable behaviors and enforce faithfulness to instructions. The estimation of future constraint satisfaction, accomplished using LLMs, guides the text generation process. Our extensive experiments demonstrate the effectiveness of the proposed approach across three distinct text generation tasks: keyword-constrained generation (Lin et al., 2020), toxicity reduction (Gehman et al., 2020), and factual correctness in question-answering (Gao et al., 2023).
△ Less
Submitted 4 October, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
The Impact of Gamified Auditory-Verbal Training for Hearing-Challenged Children at Intermediate and Advanced Rehabilitation Stages
Authors:
Yan Xiang,
Zhen Zhang,
Danni Chang,
Lei Tu
Abstract:
Auditory-verbal training is essential for children with hearing challenges, and the gamification approach has become a promising direction for improving the rehabilitation experience and effect. However, the specific influence of the gamified training approach on participants at different rehabilitation stages has not been empirically studied. This paper is thusly intended to investigate the resea…
▽ More
Auditory-verbal training is essential for children with hearing challenges, and the gamification approach has become a promising direction for improving the rehabilitation experience and effect. However, the specific influence of the gamified training approach on participants at different rehabilitation stages has not been empirically studied. This paper is thusly intended to investigate the research questions: Do the training performances of children at advanced rehabilitation stage differ before and after using the gamified training system? Do the training performances of children at intermediate rehabilitation stage differ before and after using the gamified training system? Do children enjoy the gamified training approach? For the purpose, a digital gamified auditory-verbal training system was originally developed, and a series of user experiments were organized. Particularly, 31 hearing-challenged children aging between three-six years old at an auditory-verbal rehabilitation center were recruited to take the training, and six professional therapists were also invited to assist with the experiments and attend the interviews. Based on the training performance observation and interviews with participants, their parents and the therapists, it can be found that generally the gamified training approach can effectively facilitate the training experience, and help with the basic auditory memory and expression capabilities. Regarding the specific influence, the gamified way can better improve the basic auditory-verbal performance of children at the intermediate stage, since they focus more on the ease of learning and adaption to the training system. These findings and conclusions can provide insights for the further exploration and application of the gamification approach in children's auditory-verbal rehabilitation.
△ Less
Submitted 22 January, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
XGen-7B Technical Report
Authors:
Erik Nijkamp,
Tian Xie,
Hiroaki Hayashi,
Bo Pang,
Congying Xia,
Chen Xing,
Jesse Vig,
Semih Yavuz,
Philippe Laban,
Ben Krause,
Senthil Purushwalkam,
Tong Niu,
Wojciech Kryściński,
Lidiya Murakhovs'ka,
Prafulla Kumar Choubey,
Alex Fabbri,
Ye Liu,
Rui Meng,
Lifu Tu,
Meghana Bhat,
Chien-Sheng Wu,
Silvio Savarese,
Yingbo Zhou,
Shafiq Joty,
Caiming Xiong
Abstract:
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t…
▽ More
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
RIS-Assisted Wireless Communications: Long-Term versus Short-Term Phase Shift Designs
Authors:
Trinh Van Chien,
Lam Thanh Tu,
Waqas Khalid,
Heejung Yu,
Symeon Chatzinotas,
Marco Di Renzo
Abstract:
Reconfigurable intelligent surface (RIS) has recently gained significant interest as an emerging technology for future wireless networks thanks to its potential for improving the coverage probability in challenging propagation environments. This paper studies an RIS-assisted propagation environment, where a source transmits data to a destination in the presence of a weak direct link. We analyze an…
▽ More
Reconfigurable intelligent surface (RIS) has recently gained significant interest as an emerging technology for future wireless networks thanks to its potential for improving the coverage probability in challenging propagation environments. This paper studies an RIS-assisted propagation environment, where a source transmits data to a destination in the presence of a weak direct link. We analyze and compare RIS designs based on long-term and short-term channel statistics in terms of coverage probability and ergodic rate. For the considered optimization designs, we derive closed-form expressions for the coverage probability and ergodic rate, which explicitly unveil the impact of both the propagation environment and the RIS on the system performance. Besides the optimization of the RIS phase profile, we formulate an RIS placement optimization problem with the aim of maximizing the coverage probability by relying only on partial channel state information. An efficient algorithm is proposed based on the gradient ascent method. Simulation results are illustrated in order to corroborate the analytical framework and findings. The proposed RIS phase profile is shown to outperform several heuristic benchmarks in terms of outage probability and ergodic rate. In addition, the proposed RIS placement strategy provides an extra degree of freedom that remarkably improves system performance.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning
Authors:
Lifu Tu,
Jin Qu,
Semih Yavuz,
Shafiq Joty,
Wenhao Liu,
Caiming Xiong,
Yingbo Zhou
Abstract:
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD for cross-lingual alignment pretraining, a parallel and la…
▽ More
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset that we created by translating the English-only Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2020) into 105 other languages. XSGD contains approximately 330k utterances per language. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts. We also investigate two different classifiers: NLI-based and vanilla classifiers, and test cross-lingual capability enabled by the aligned prompts. We evaluate our model's cross-lingual generalization capabilities on two conversation tasks: slot-filling and intent classification. Our results demonstrate the strong and efficient modeling ability of NLI-based classifiers and the large cross-lingual transfer improvements achieved by our aligned prompts, particularly in few-shot settings. In addition, we highlight the nice results of our approach compared to LLMs such as text-davinci-003 and ChatGPT in both zero-shot and few-shot settings. While LLMs exhibit impressive performance in English, their cross-lingual capabilities in other languages, particularly low-resource languages, are limited.
△ Less
Submitted 26 January, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation
Authors:
Rui Meng,
Ye Liu,
Semih Yavuz,
Divyansh Agarwal,
Lifu Tu,
Ning Yu,
Jianguo Zhang,
Meghana Bhat,
Yingbo Zhou
Abstract:
Dense retrievers have made significant strides in text retrieval and open-domain question answering. However, most of these achievements have relied heavily on extensive human-annotated supervision. In this study, we aim to develop unsupervised methods for improving dense retrieval models. We propose two approaches that enable annotation-free and scalable training by creating pseudo querydocument…
▽ More
Dense retrievers have made significant strides in text retrieval and open-domain question answering. However, most of these achievements have relied heavily on extensive human-annotated supervision. In this study, we aim to develop unsupervised methods for improving dense retrieval models. We propose two approaches that enable annotation-free and scalable training by creating pseudo querydocument pairs: query extraction and transferred query generation. The query extraction method involves selecting salient spans from the original document to generate pseudo queries. On the other hand, the transferred query generation method utilizes generation models trained for other NLP tasks, such as summarization, to produce pseudo queries. Through extensive experimentation, we demonstrate that models trained using these augmentation methods can achieve comparable, if not better, performance than multiple strong dense baselines. Moreover, combining these strategies leads to further improvements, resulting in superior performance of unsupervised dense retrieval, unsupervised domain adaptation and supervised finetuning, benchmarked on both BEIR and ODQA datasets. Code and datasets are publicly available at https://github.com/salesforce/AugTriever.
△ Less
Submitted 29 October, 2024; v1 submitted 17 December, 2022;
originally announced December 2022.
-
Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models
Authors:
Lifu Tu,
Caiming Xiong,
Yingbo Zhou
Abstract:
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks…
▽ More
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks (sentence classification, sequence labeling, question answering) using prompt-tuning and compare it with fine-tuning. The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets, with only 0.1% to 0.3% tuned parameters. Additionally, we demonstrate through the analysis that prompt tuning can have better cross-lingual transferability of representations on downstream tasks with better aligned decision boundaries.
△ Less
Submitted 13 December, 2022; v1 submitted 22 October, 2022;
originally announced October 2022.
-
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Authors:
Erik Nijkamp,
Bo Pang,
Hiroaki Hayashi,
Lifu Tu,
Huan Wang,
Yingbo Zhou,
Silvio Savarese,
Caiming Xiong
Abstract:
Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of…
▽ More
Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. We show the utility of the trained model by demonstrating that it is competitive with the previous state-of-the-art on zero-shot Python code generation on HumanEval. We further investigate the multi-step paradigm for program synthesis, where a single program is factorized into multiple prompts specifying subproblems. To this end, we construct an open benchmark, Multi-Turn Programming Benchmark (MTPB), consisting of 115 diverse problem sets that are factorized into multi-turn prompts. Our analysis on MTPB shows that the same intent provided to CODEGEN in multi-turn fashion significantly improves program synthesis over that provided as a single turn. We make the training library JAXFORMER and model checkpoints available as open source contribution: https://github.com/salesforce/CodeGen.
△ Less
Submitted 27 February, 2023; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Coverage Probability and Spectral Efficiency Analysis of Multi-Gateway Downlink LoRa Networks
Authors:
Lam-Thanh Tu,
Abbas Bradai,
Yannis Pousset
Abstract:
The system-level performance of multi-gateway downlink long-range (LoRa) networks is investigated in the present paper.
Specifically, we first compute the active probability of a channel and the selection probability of an active end-device (ED) in the closed-form expressions. We then derive the coverage probability (Pcov) and the area spectral efficiency (ASE) under the impact of the capture ef…
▽ More
The system-level performance of multi-gateway downlink long-range (LoRa) networks is investigated in the present paper.
Specifically, we first compute the active probability of a channel and the selection probability of an active end-device (ED) in the closed-form expressions. We then derive the coverage probability (Pcov) and the area spectral efficiency (ASE) under the impact of the capture effects and different spreading factor (SF) allocation schemes.
Our findings show that both the Pcov and the ASE of the considered networks can be enhanced significantly by increasing both the duty cycle and the transmit power.
Finally, Monte-Carlo simulations are provided to verify the accuracy of the proposed mathematical frameworks.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Variable Augmented Network for Invertible MR Coil Compression
Authors:
Xianghao Liao,
Shanshan Wang,
Lanlan Tu,
Yuhao Wang,
Dong Liang,
Qiegen Liu
Abstract:
A large number of coils are able to provide enhanced signal-to-noise ratio and improve imaging performance in parallel imaging. Nevertheless, the increasing growth of coil number simultaneously aggravates the drawbacks of data storage and reconstruction speed, especially in some iterative reconstructions. Coil compression addresses these issues by generating fewer virtual coils. In this work, a no…
▽ More
A large number of coils are able to provide enhanced signal-to-noise ratio and improve imaging performance in parallel imaging. Nevertheless, the increasing growth of coil number simultaneously aggravates the drawbacks of data storage and reconstruction speed, especially in some iterative reconstructions. Coil compression addresses these issues by generating fewer virtual coils. In this work, a novel variable augmentation network for invertible coil compression termed VAN-ICC is presented. It utilizes inherent reversibility of normalizing flow-based models for high-precision compression and invertible recovery. By employing the variable augmentation technology to image/k-space variables from multi-coils, VAN-ICC trains invertible networks by finding an invertible and bijective function, which can map the original data to the compressed counterpart and vice versa. Experiments conducted on both fully-sampled and under-sampled data verified the effectiveness and flexibility of VAN-ICC. Quantitative and qualitative comparisons with traditional non-deep learning-based approaches demonstrated that VAN-ICC can carry much higher compression effects. Additionally, its performance is not susceptible to different number of virtual coils.
△ Less
Submitted 19 March, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Controlling Smart Propagation Environments: Long-Term versus Short-Term Phase Shift Optimization
Authors:
Trinh Van Chien,
Lam Thanh Tu,
Dinh-Hieu Tran,
Hieu Van Nguyen,
Symeon Chatzinotas,
Marco Di Renzo,
Björn Ottersten
Abstract:
Reconfigurable intelligent surfaces (RISs) have recently gained significant interest as an emerging technology for future wireless networks. This paper studies an RIS-assisted propagation environment, where a single-antenna source transmits data to a single-antenna destination in the presence of a weak direct link. We analyze and compare RIS designs based on long-term and short-term channel statis…
▽ More
Reconfigurable intelligent surfaces (RISs) have recently gained significant interest as an emerging technology for future wireless networks. This paper studies an RIS-assisted propagation environment, where a single-antenna source transmits data to a single-antenna destination in the presence of a weak direct link. We analyze and compare RIS designs based on long-term and short-term channel statistics in terms of coverage probability and ergodic rate. For the considered optimization designs, closed-form expressions for the coverage probability and ergodic rate are derived. We use numerical simulations to analyze and compare against analytic results in finite samples. Also, we show that the considered optimal phase shift designs outperform several heuristic benchmarks.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility
Authors:
Liyun Tu,
Austin Talbot,
Neil Gallagher,
David Carlson
Abstract:
Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have…
▽ More
Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.
△ Less
Submitted 8 July, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP
Authors:
Lifu Tu
Abstract:
Structured prediction in natural language processing (NLP) has a long history. The complex models of structured application come at the difficulty of learning and inference. These difficulties lead researchers to focus more on models with simple structure components (e.g., local classifier). Deep representation learning has become increasingly popular in recent years. The structure components of t…
▽ More
Structured prediction in natural language processing (NLP) has a long history. The complex models of structured application come at the difficulty of learning and inference. These difficulties lead researchers to focus more on models with simple structure components (e.g., local classifier). Deep representation learning has become increasingly popular in recent years. The structure components of their method, on the other hand, are usually relatively simple. We concentrate on complex structured models in this dissertation. We provide a learning framework for complicated structured models as well as an inference method with a better speed/accuracy/search error trade-off. The dissertation begins with a general introduction to energy-based models. In NLP and other applications, an energy function is comparable to the concept of a scoring function. In this dissertation, we discuss the concept of the energy function and structured models with different energy functions. Then, we propose a method in which we train a neural network to do argmax inference under a structured energy function, referring to the trained networks as "inference networks" or "energy-based inference networks". We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework. Despite the inference and learning difficulties of energy-based models, we present approaches in this thesis that enable energy-based models more easily to be applied in structured NLP applications.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Outage Probability Analysis of IRS-Assisted Systems Under Spatially Correlated Channels
Authors:
Trinh Van Chien,
Anastasios K. Papazafeiropoulos,
Lam Thanh Tu,
Ribhu Chopra,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
This paper investigates the impact of spatial channel correlation on the outage probability of intelligent reflecting surface (IRS)-assisted single-input single-output (SISO) communication systems. In particular, we derive a novel closed-form expression of the outage probability for arbitrary phase shifts and correlation matrices of the indirect channels. To shed light on the impact of the spatial…
▽ More
This paper investigates the impact of spatial channel correlation on the outage probability of intelligent reflecting surface (IRS)-assisted single-input single-output (SISO) communication systems. In particular, we derive a novel closed-form expression of the outage probability for arbitrary phase shifts and correlation matrices of the indirect channels. To shed light on the impact of the spatial correlation, we further attain the closed-form expressions for two common scenarios met in the literature when the large-scale fading coefficients are expressed by the loss over a propagation distance. Numerical results validate the tightness and effectiveness of the closed-form expressions. Furthermore, the spatial correlation offers significant decreases in the outage probability as the direct channel is blocked.
△ Less
Submitted 22 April, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
An Exploration of Arbitrary-Order Sequence Labeling via Energy-Based Inference Networks
Authors:
Lifu Tu,
Tianyu Liu,
Kevin Gimpel
Abstract:
Many tasks in natural language processing involve predicting structured outputs, e.g., sequence labeling, semantic role labeling, parsing, and machine translation. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. In this work, we propose several high-order energy terms to capture comp…
▽ More
Many tasks in natural language processing involve predicting structured outputs, e.g., sequence labeling, semantic role labeling, parsing, and machine translation. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. In this work, we propose several high-order energy terms to capture complex dependencies among labels in sequence labeling, including several that consider the entire label sequence. We use neural parameterizations for these energy terms, drawing from convolutional, recurrent, and self-attention networks. We use the framework of learning energy-based inference networks (Tu and Gimpel, 2018) for dealing with the difficulties of training and inference with such models. We empirically demonstrate that this approach achieves substantial improvement using a variety of high-order energy terms on four sequence labeling tasks, while having the same decoding speed as simple, local classifiers. We also find high-order energies to help in noisy data conditions.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Coverage Probability and Ergodic Capacity of Intelligent Reflecting Surface-Enhanced Communication Systems
Authors:
Trinh Van Chien,
Lam Thanh Tu,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
This paper studies the performance of a single-input single-output (SISO) system enhanced by the assistance of an intelligent reflecting surface (IRS), which is equipped with a finite number of elements under Rayleigh fading channels. From the instantaneous channel capacity, we compute a closed-form expression of the coverage probability as a function of statistical channel information only. A sca…
▽ More
This paper studies the performance of a single-input single-output (SISO) system enhanced by the assistance of an intelligent reflecting surface (IRS), which is equipped with a finite number of elements under Rayleigh fading channels. From the instantaneous channel capacity, we compute a closed-form expression of the coverage probability as a function of statistical channel information only. A scaling law of the coverage probability and the number of phase shifts is further obtained. The ergodic capacity is derived, then a simple upper bound to simplify matters of utilizing the symbolic functions and can be applied for a long period of time. Numerical results manifest the tightness and effectiveness of our closed-form expressions compared with Monte-Carlo simulations.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Authors:
Lifu Tu,
Garima Lalwani,
Spandana Gella,
He He
Abstract:
Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scra…
▽ More
Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations.
△ Less
Submitted 11 August, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation
Authors:
Lifu Tu,
Richard Yuanzhe Pang,
Sam Wiseman,
Kevin Gimpel
Abstract:
We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model. In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled cor…
▽ More
We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model. In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled corpus consisting of the beam-searched outputs of such a teacher model. Our approach, which we call ENGINE (ENerGy-based Inference NEtworks), achieves state-of-the-art non-autoregressive results on the IWSLT 2014 DE-EN and WMT 2016 RO-EN datasets, approaching the performance of autoregressive models.
△ Less
Submitted 12 May, 2020; v1 submitted 2 May, 2020;
originally announced May 2020.
-
Improving Joint Training of Inference Networks and Structured Prediction Energy Networks
Authors:
Lifu Tu,
Richard Yuanzhe Pang,
Kevin Gimpel
Abstract:
Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an efficient framework for energy-based models by training "inference networks" to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requirin…
▽ More
Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an efficient framework for energy-based models by training "inference networks" to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requiring additional loss terms and careful hyperparameter tuning. In this paper, we contribute several strategies to stabilize and improve this joint training of energy functions and inference networks for structured prediction. We design a compound objective to jointly train both cost-augmented and test-time inference networks along with the energy function. We propose joint parameterizations for the inference networks that encourage them to capture complementary functionality during learning. We empirically validate our strategies on two sequence labeling tasks, showing easier paths to strong performance than prior work, as well as further improvements with global energy terms.
△ Less
Submitted 10 October, 2020; v1 submitted 7 November, 2019;
originally announced November 2019.
-
Generating Diverse Story Continuations with Controllable Semantics
Authors:
Lifu Tu,
Xiaoan Ding,
Dong Yu,
Kevin Gimpel
Abstract:
We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framewor…
▽ More
We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framework is accurate in terms of generating outputs that match the target control values; (2) our model yields increased maximum metric scores compared to standard n-best list generation via beam search; (3) controlling generation with semantic frames leads to a stronger combination of diversity and quality than other control variables as measured by automatic metrics. We also conduct a human evaluation to assess the utility of providing multiple suggestions for creative writing, demonstrating promising results for the potential of controllable, diverse generation in a collaborative writing system.
△ Less
Submitted 1 June, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Delivering Scientific Influence Analysis as a Service on Research Grants Repository
Authors:
Yuming Wang,
Yanbo Long,
Lai Tu,
Ling Liu
Abstract:
Research grants have played an important role in seeding and promoting fundamental research projects worldwide. There is a growing demand for developing and delivering scientific influence analysis as a service on research grant repositories. Such analysis can provide insight on how research grants help foster new research collaborations, encourage cross-organizational collaborations, influence ne…
▽ More
Research grants have played an important role in seeding and promoting fundamental research projects worldwide. There is a growing demand for developing and delivering scientific influence analysis as a service on research grant repositories. Such analysis can provide insight on how research grants help foster new research collaborations, encourage cross-organizational collaborations, influence new research trends, and identify technical leadership. This paper presents the design and development of a grants-based scientific influence analysis service, coined as GImpact. It takes a graph-theoretic approach to design and develop large scale scientific influence analysis over a large research-grant repository with three original contributions. First, we mine the grant database to identify and extract important features for grants influence analysis and represent such features using graph theoretic models. For example, we extract an institution graph and multiple associated aspect-based collaboration graphs, including a discipline graph and a keyword graph. Second, we introduce self-influence and co-influence algorithms to compute two types of collaboration relationship scores based on the number of grants and the types of grants for institutions. We compute the self-influence scores to reflect the grant based research collaborations among institutions and compute multiple co-influence scores to model the various types of cross-institution collaboration relationships in terms of disciplines and subject areas. Third, we compute the overall scientific influence score for every pair of institutions by introducing a weighted sum of the self-influence score and the multiple co-influence scores and conduct an influence-based clustering analysis. We evaluate GImpact using a real grant database, consisting of 2512 institutions and their grants received over a period of 14 years...
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Benchmarking Approximate Inference Methods for Neural Structured Prediction
Authors:
Lifu Tu,
Kevin Gimpel
Abstract:
Exact structured inference with neural network scoring functions is computationally challenging but several methods have been proposed for approximating inference. One approach is to perform gradient descent with respect to the output structure directly (Belanger and McCallum, 2016). Another approach, proposed recently, is to train a neural network (an "inference network") to perform inference (Tu…
▽ More
Exact structured inference with neural network scoring functions is computationally challenging but several methods have been proposed for approximating inference. One approach is to perform gradient descent with respect to the output structure directly (Belanger and McCallum, 2016). Another approach, proposed recently, is to train a neural network (an "inference network") to perform inference (Tu and Gimpel, 2018). In this paper, we compare these two families of inference methods on three sequence labeling datasets. We choose sequence labeling because it permits us to use exact inference as a benchmark in terms of speed, accuracy, and search error. Across datasets, we demonstrate that inference networks achieve a better speed/accuracy/search error trade-off than gradient descent, while also being faster than exact inference at similar accuracy levels. We find further benefit by combining inference networks and gradient descent, using the former to provide a warm start for the latter.
△ Less
Submitted 6 July, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Learning Approximate Inference Networks for Structured Prediction
Authors:
Lifu Tu,
Kevin Gimpel
Abstract:
Structured prediction energy networks (SPENs; Belanger & McCallum 2016) use neural network architectures to define energy functions that can capture arbitrary dependencies among parts of structured outputs. Prior work used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradi…
▽ More
Structured prediction energy networks (SPENs; Belanger & McCallum 2016) use neural network architectures to define energy functions that can capture arbitrary dependencies among parts of structured outputs. Prior work used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradient descent with a neural network trained to approximate structured argmax inference. This "inference network" outputs continuous values that we treat as the output structure. We develop large-margin training criteria for joint training of the structured energy function and inference network. On multi-label classification we report speed-ups of 10-60x compared to (Belanger et al, 2017) while also improving accuracy. For sequence labeling with simple structured energies, our approach performs comparably to exact inference while being much faster at test time. We then demonstrate improved accuracy by augmenting the energy with a "label language model" that scores entire output label sequences, showing it can improve handling of long-distance dependencies in part-of-speech tagging. Finally, we show how inference networks can replace dynamic programming for test-time inference in conditional random fields, suggestive for their general use for fast inference in structured settings.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Learning to Embed Words in Context for Syntactic Tasks
Authors:
Lifu Tu,
Kevin Gimpel,
Karen Livescu
Abstract:
We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large…
▽ More
We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes.
△ Less
Submitted 11 June, 2017; v1 submitted 8 June, 2017;
originally announced June 2017.
-
MIMO Cellular Networks with Simultaneous Wireless Information and Power Transfer
Authors:
Lam Thanh Tu,
Marco Di Renzo,
Justin P. Coon
Abstract:
In this paper, we introduce a mathematical approach for system-level analysis and optimization of densely deployed multiple-antenna cellular networks, where low-energy devices are capable of decoding information data and harvesting power simultaneously. The base stations are assumed to be deployed according to a Poisson point process and tools from stochastic geometry are exploited to quantify the…
▽ More
In this paper, we introduce a mathematical approach for system-level analysis and optimization of densely deployed multiple-antenna cellular networks, where low-energy devices are capable of decoding information data and harvesting power simultaneously. The base stations are assumed to be deployed according to a Poisson point process and tools from stochastic geometry are exploited to quantify the trade-off in terms of information rate and harvested power. It is shown that multiple-antenna transmission is capable of increasing information rate and harvested power at the same time.
△ Less
Submitted 29 August, 2016;
originally announced August 2016.
-
Estimation of Passenger Route Choice Pattern Using Smart Card Data for Complex Metro Systems
Authors:
Juanjuan Zhao,
Fan Zhang,
Lai Tu,
Chengzhong Xu,
Dayong Shen,
Chen Tian,
Xiang-Yang Li,
Zhengxi Li
Abstract:
Nowadays, metro systems play an important role in meeting the urban transportation demand in large cities. The understanding of passenger route choice is critical for public transit management. The wide deployment of Automated Fare Collection(AFC) systems opens up a new opportunity. However, only each trip's tap-in and tap-out timestamp and stations can be directly obtained from AFC system records…
▽ More
Nowadays, metro systems play an important role in meeting the urban transportation demand in large cities. The understanding of passenger route choice is critical for public transit management. The wide deployment of Automated Fare Collection(AFC) systems opens up a new opportunity. However, only each trip's tap-in and tap-out timestamp and stations can be directly obtained from AFC system records; the train and route chosen by a passenger are unknown, which are necessary to solve our problem. While existing methods work well in some specific situations, they don't work for complicated situations. In this paper, we propose a solution that needs no additional equipment or human involvement than the AFC systems. We develop a probabilistic model that can estimate from empirical analysis how the passenger flows are dispatched to different routes and trains. We validate our approach using a large scale data set collected from the Shenzhen metro system. The measured results provide us with useful inputs when building the passenger path choice model.
△ Less
Submitted 19 April, 2016;
originally announced May 2016.
-
Network Inference by Learned Node-Specific Degree Prior
Authors:
Qingming Tang,
Lifu Tu,
Weiran Wang,
Jinbo Xu
Abstract:
We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by cross validation. Then we formulate network inference as a matrix completion problem regularized by our degree prior. Our theoretical analysis indicates that this p…
▽ More
We propose a novel method for network inference from partially observed edges using a node-specific degree prior. The degree prior is derived from observed edges in the network to be inferred, and its hyper-parameters are determined by cross validation. Then we formulate network inference as a matrix completion problem regularized by our degree prior. Our theoretical analysis indicates that this prior favors a network following the learned degree distribution, and may lead to improved network recovery error bound than previous work. Experimental results on both simulated and real biological networks demonstrate the superior performance of our method in various settings.
△ Less
Submitted 7 February, 2016;
originally announced February 2016.