-
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation
Authors:
Junhong Lian,
Xiang Ao,
Xinyu Liu,
Yang Liu,
Qing He
Abstract:
Personalized news headline generation aims to provide users with attention-grabbing headlines that are tailored to their preferences. Prevailing methods focus on user-oriented content preferences, but most of them overlook the fact that diverse stylistic preferences are integral to users' panoramic interests, leading to suboptimal personalization. In view of this, we propose a novel Stylistic-Cont…
▽ More
Personalized news headline generation aims to provide users with attention-grabbing headlines that are tailored to their preferences. Prevailing methods focus on user-oriented content preferences, but most of them overlook the fact that diverse stylistic preferences are integral to users' panoramic interests, leading to suboptimal personalization. In view of this, we propose a novel Stylistic-Content Aware Personalized Headline Generation (SCAPE) framework. SCAPE extracts both content and stylistic features from headlines with the aid of large language model (LLM) collaboration. It further adaptively integrates users' long- and short-term interests through a contrastive learning-based hierarchical fusion network. By incorporating the panoramic interests into the headline generator, SCAPE reflects users' stylistic-content preferences during the generation process. Extensive experiments on the real-world dataset PENS demonstrate the superiority of SCAPE over baselines.
△ Less
Submitted 27 January, 2025; v1 submitted 21 January, 2025;
originally announced January 2025.
-
Fact-Preserved Personalized News Headline Generation
Authors:
Zhao Yang,
Junhong Lian,
Xiang Ao
Abstract:
Personalized news headline generation, aiming at generating user-specific headlines based on readers' preferences, burgeons a recent flourishing research direction. Existing studies generally inject a user interest embedding into an encoderdecoder headline generator to make the output personalized, while the factual consistency of headlines is inadequate to be verified. In this paper, we propose a…
▽ More
Personalized news headline generation, aiming at generating user-specific headlines based on readers' preferences, burgeons a recent flourishing research direction. Existing studies generally inject a user interest embedding into an encoderdecoder headline generator to make the output personalized, while the factual consistency of headlines is inadequate to be verified. In this paper, we propose a framework Fact-Preserved Personalized News Headline Generation (short for FPG), to prompt a tradeoff between personalization and consistency. In FPG, the similarity between the candidate news to be exposed and the historical clicked news is used to give different levels of attention to key facts in the candidate news, and the similarity scores help to learn a fact-aware global user embedding. Besides, an additional training procedure based on contrastive learning is devised to further enhance the factual consistency of generated headlines. Extensive experiments conducted on a real-world benchmark PENS validate the superiority of FPG, especially on the tradeoff between personalization and factual consistency.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Controlling Large Language Models Through Concept Activation Vectors
Authors:
Hanyu Zhang,
Xiting Wang,
Chengao Li,
Xiang Ao,
Qing He
Abstract:
As large language models (LLMs) are widely deployed across various domains, the ability to control their generated outputs has become more critical. This control involves aligning LLMs outputs with human values and ethical principles or customizing LLMs on specific topics or styles for individual users. Existing controlled generation methods either require significant computational resources and e…
▽ More
As large language models (LLMs) are widely deployed across various domains, the ability to control their generated outputs has become more critical. This control involves aligning LLMs outputs with human values and ethical principles or customizing LLMs on specific topics or styles for individual users. Existing controlled generation methods either require significant computational resources and extensive trial-and-error or provide coarse-grained control. In this paper, we propose Generation with Concept Activation Vector (GCAV), a lightweight model control framework that ensures accurate control without requiring resource-extensive fine-tuning. Specifically, GCAV first trains a concept activation vector for specified concepts to be controlled, such as toxicity. During inference, GCAV steers the concept vector in LLMs, for example, by removing the toxicity concept vector from the activation layers. Control experiments from different perspectives, including toxicity reduction, sentiment control, linguistic style, and topic control, demonstrate that our framework achieves state-of-the-art performance with granular control, allowing for fine-grained adjustments of both the steering layers and the steering magnitudes for individual samples.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Financial Risk Assessment via Long-term Payment Behavior Sequence Folding
Authors:
Yiran Qiao,
Yateng Tang,
Xiang Ao,
Qi Yuan,
Ziming Liu,
Chen Shen,
Xuehao Zheng
Abstract:
Online inclusive financial services encounter significant financial risks due to their expansive user base and low default costs. By real-world practice, we reveal that utilizing longer-term user payment behaviors can enhance models' ability to forecast financial risks. However, learning long behavior sequences is non-trivial for deep sequential models. Additionally, the diverse fields of payment…
▽ More
Online inclusive financial services encounter significant financial risks due to their expansive user base and low default costs. By real-world practice, we reveal that utilizing longer-term user payment behaviors can enhance models' ability to forecast financial risks. However, learning long behavior sequences is non-trivial for deep sequential models. Additionally, the diverse fields of payment behaviors carry rich information, requiring thorough exploitation. These factors collectively complicate the task of long-term user behavior modeling. To tackle these challenges, we propose a Long-term Payment Behavior Sequence Folding method, referred to as LBSF. In LBSF, payment behavior sequences are folded based on merchants, using the merchant field as an intrinsic grouping criterion, which enables informative parallelism without reliance on external knowledge. Meanwhile, we maximize the utility of payment details through a multi-field behavior encoding mechanism. Subsequently, behavior aggregation at the merchant level followed by relational learning across merchants facilitates comprehensive user financial representation. We evaluate LBSF on the financial risk assessment task using a large-scale real-world dataset. The results demonstrate that folding long behavior sequences based on internal behavioral cues effectively models long-term patterns and changes, thereby generating more accurate user financial profiles for practical applications.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
Authors:
Zhengqing Gao,
Xiang Ao,
Xu-Yao Zhang,
Cheng-Lin Liu
Abstract:
Adapting pre-trained models to open classes is a challenging problem in machine learning. Vision-language models fully explore the knowledge of text modality, demonstrating strong zero-shot recognition performance, which is naturally suited for various open-set problems. More recently, some research focuses on fine-tuning such models to downstream tasks. Prompt tuning methods achieved huge improve…
▽ More
Adapting pre-trained models to open classes is a challenging problem in machine learning. Vision-language models fully explore the knowledge of text modality, demonstrating strong zero-shot recognition performance, which is naturally suited for various open-set problems. More recently, some research focuses on fine-tuning such models to downstream tasks. Prompt tuning methods achieved huge improvements by learning context vectors on few-shot data. However, through the evaluation under open-set adaptation setting with the test data including new classes, we find that there exists a dilemma that learned prompts have worse generalization abilities than hand-crafted prompts. In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach, which leverages the maximum concept matching (MCM) scores as dynamic weights to generate an input-conditioned prompt for each image during test. Through extensive experiments on 11 different datasets, we show that our proposed method outperforms all comparison methods on average considering both base and new classes. The code is available at https://github.com/gaozhengqing/TTPT
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors
Authors:
Hao Shi,
Weili Song,
Xinting Zhang,
Jiahe Shi,
Cuicui Luo,
Xiang Ao,
Hamid Arian,
Luis Seco
Abstract:
The complexity of financial data, characterized by its variability and low signal-to-noise ratio, necessitates advanced methods in quantitative investment that prioritize both performance and interpretability.Transitioning from early manual extraction to genetic programming, the most advanced approach in the alpha factor mining domain currently employs reinforcement learning to mine a set of combi…
▽ More
The complexity of financial data, characterized by its variability and low signal-to-noise ratio, necessitates advanced methods in quantitative investment that prioritize both performance and interpretability.Transitioning from early manual extraction to genetic programming, the most advanced approach in the alpha factor mining domain currently employs reinforcement learning to mine a set of combination factors with fixed weights. However, the performance of resultant alpha factors exhibits inconsistency, and the inflexibility of fixed factor weights proves insufficient in adapting to the dynamic nature of financial markets. To address this issue, this paper proposes a two-stage formulaic alpha generating framework AlphaForge, for alpha factor mining and factor combination. This framework employs a generative-predictive neural network to generate factors, leveraging the robust spatial exploration capabilities inherent in deep learning while concurrently preserving diversity. The combination model within the framework incorporates the temporal performance of factors for selection and dynamically adjusts the weights assigned to each component alpha factor. Experiments conducted on real-world datasets demonstrate that our proposed model outperforms contemporary benchmarks in formulaic alpha factor mining. Furthermore, our model exhibits a notable enhancement in portfolio returns within the realm of quantitative investment and real money investment.
△ Less
Submitted 12 December, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework
Authors:
Yiran Qiao,
Xiang Ao,
Yang Liu,
Jiarong Xu,
Xiaoqian Sun,
Qing He
Abstract:
Recent prevailing works on graph machine learning typically follow a similar methodology that involves designing advanced variants of graph neural networks (GNNs) to maintain the superior performance of GNNs on different graphs. In this paper, we aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tas…
▽ More
Recent prevailing works on graph machine learning typically follow a similar methodology that involves designing advanced variants of graph neural networks (GNNs) to maintain the superior performance of GNNs on different graphs. In this paper, we aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tasks. We formulate a new paradigm, coined "LLMs-as-Consultants," which integrates LLMs with GNNs in an interactive manner. A framework named LOGIN (LLM Consulted GNN training) is instantiated, empowering the interactive utilization of LLMs within the GNN training process. First, we attentively craft concise prompts for spotted nodes, carrying comprehensive semantic and topological information, and serving as input to LLMs. Second, we refine GNNs by devising a complementary coping mechanism that utilizes the responses from LLMs, depending on their correctness. We empirically evaluate the effectiveness of LOGIN on node classification tasks across both homophilic and heterophilic graphs. The results illustrate that even basic GNN architectures, when employed within the proposed LLMs-as-Consultants paradigm, can achieve comparable performance to advanced GNNs with intricate designs. Our codes are available at https://github.com/QiaoYRan/LOGIN.
△ Less
Submitted 6 June, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
EFSA: Towards Event-Level Financial Sentiment Analysis
Authors:
Tianyu Chen,
Yiming Zhang,
Guoxin Yu,
Dapeng Zhang,
Li Zeng,
Qing He,
Xiang Ao
Abstract:
In this paper, we extend financial sentiment analysis~(FSA) to event-level since events usually serve as the subject of the sentiment in financial text. Though extracting events from the financial text may be conducive to accurate sentiment predictions, it has specialized challenges due to the lengthy and discontinuity of events in a financial text. To this end, we reconceptualize the event extrac…
▽ More
In this paper, we extend financial sentiment analysis~(FSA) to event-level since events usually serve as the subject of the sentiment in financial text. Though extracting events from the financial text may be conducive to accurate sentiment predictions, it has specialized challenges due to the lengthy and discontinuity of events in a financial text. To this end, we reconceptualize the event extraction as a classification task by designing a categorization comprising coarse-grained and fine-grained event categories. Under this setting, we formulate the \textbf{E}vent-Level \textbf{F}inancial \textbf{S}entiment \textbf{A}nalysis~(\textbf{EFSA} for short) task that outputs quintuples consisting of (company, industry, coarse-grained event, fine-grained event, sentiment) from financial text. A large-scale Chinese dataset containing $12,160$ news articles and $13,725$ quintuples is publicized as a brand new testbed for our task. A four-hop Chain-of-Thought LLM-based approach is devised for this task. Systematically investigations are conducted on our dataset, and the empirical results demonstrate the benchmarking scores of existing methods and our proposed method can reach the current state-of-the-art. Our dataset and framework implementation are available at https://anonymous.4open.science/r/EFSA-645E
△ Less
Submitted 27 November, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Correlated Bayesian Additive Regression Trees with Gaussian Process for Regression Analysis of Dependent Data
Authors:
Xuetao Lu a,
Robert E. McCulloch
Abstract:
Bayesian Additive Regression Trees (BART) has gained widespread popularity, prompting the development of various extensions for different applications. However, limited attention has been given to analyzing dependent data. Based on a general correlated error assumption and an innovative dummy representation, we introduces a novel extension of BART, called Correlated BART (CBART), designed to handl…
▽ More
Bayesian Additive Regression Trees (BART) has gained widespread popularity, prompting the development of various extensions for different applications. However, limited attention has been given to analyzing dependent data. Based on a general correlated error assumption and an innovative dummy representation, we introduces a novel extension of BART, called Correlated BART (CBART), designed to handle correlated errors. By integrating CBART with a Gaussian process (GP), we propose the CBART-GP model, in which the CBART and GP components are loosely coupled, allowing them to be estimated and applied independently. CBART captures the covariate mean function E[y|x]=f(x), while the Gaussian process models the dependency structure in the response $y$. We also developed a computationally efficient approach, named two-stage analysis of variance with weighted residuals, for the estimation of CBART-GP. Simulation studies demonstrate the superiority of CBART-GP over other models, and a real-world application illustrates its practical applicability.
△ Less
Submitted 12 September, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning
Authors:
Shuo Yu,
Hongyan Xue,
Xiang Ao,
Feiyang Pan,
Jia He,
Dandan Tu,
Qing He
Abstract:
In the field of quantitative trading, it is common practice to transform raw historical stock data into indicative signals for the market trend. Such signals are called alpha factors. Alphas in formula forms are more interpretable and thus favored by practitioners concerned with risk. In practice, a set of formulaic alphas is often used together for better modeling precision, so we need to find sy…
▽ More
In the field of quantitative trading, it is common practice to transform raw historical stock data into indicative signals for the market trend. Such signals are called alpha factors. Alphas in formula forms are more interpretable and thus favored by practitioners concerned with risk. In practice, a set of formulaic alphas is often used together for better modeling precision, so we need to find synergistic formulaic alpha sets that work well together. However, most traditional alpha generators mine alphas one by one separately, overlooking the fact that the alphas would be combined later. In this paper, we propose a new alpha-mining framework that prioritizes mining a synergistic set of alphas, i.e., it directly uses the performance of the downstream combination model to optimize the alpha generator. Our framework also leverages the strong exploratory capabilities of reinforcement learning~(RL) to better explore the vast search space of formulaic alphas. The contribution to the combination models' performance is assigned to be the return used in the RL process, driving the alpha generator to find better alphas that improve upon the current set. Experimental evaluations on real-world stock market data demonstrate both the effectiveness and the efficiency of our framework for stock trend forecasting. The investment simulation results show that our framework is able to achieve higher returns compared to previous approaches.
△ Less
Submitted 25 May, 2023;
originally announced June 2023.
-
Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN
Authors:
Kuan Li,
Yang Liu,
Xiang Ao,
Jianfeng Chi,
Jinghua Feng,
Hao Yang,
Qing He
Abstract:
Benefiting from the message passing mechanism, Graph Neural Networks (GNNs) have been successful on flourish tasks over graph data. However, recent studies have shown that attackers can catastrophically degrade the performance of GNNs by maliciously modifying the graph structure. A straightforward solution to remedy this issue is to model the edge weights by learning a metric function between pair…
▽ More
Benefiting from the message passing mechanism, Graph Neural Networks (GNNs) have been successful on flourish tasks over graph data. However, recent studies have shown that attackers can catastrophically degrade the performance of GNNs by maliciously modifying the graph structure. A straightforward solution to remedy this issue is to model the edge weights by learning a metric function between pairwise representations of two end nodes, which attempts to assign low weights to adversarial edges. The existing methods use either raw features or representations learned by supervised GNNs to model the edge weights. However, both strategies are faced with some immediate problems: raw features cannot represent various properties of nodes (e.g., structure information), and representations learned by supervised GNN may suffer from the poor performance of the classifier on the poisoned graph. We need representations that carry both feature information and as mush correct structure information as possible and are insensitive to structural perturbations. To this end, we propose an unsupervised pipeline, named STABLE, to optimize the graph structure. Finally, we input the well-refined graph into a downstream classifier. For this part, we design an advanced GCN that significantly enhances the robustness of vanilla GCN without increasing the time complexity. Extensive experiments on four real-world graph benchmarks demonstrate that STABLE outperforms the state-of-the-art methods and successfully defends against various attacks.
△ Less
Submitted 21 April, 2023; v1 submitted 30 June, 2022;
originally announced July 2022.
-
Selective Fairness in Recommendation via Prompts
Authors:
Yiqing Wu,
Ruobing Xie,
Yongchun Zhu,
Fuzhen Zhuang,
Xiang Ao,
Xu Zhang,
Leyu Lin,
Qing He
Abstract:
Recommendation fairness has attracted great attention recently. In real-world systems, users usually have multiple sensitive attributes (e.g. age, gender, and occupation), and users may not want their recommendation results influenced by those attributes. Moreover, which of and when these user attributes should be considered in fairness-aware modeling should depend on users' specific demands. In t…
▽ More
Recommendation fairness has attracted great attention recently. In real-world systems, users usually have multiple sensitive attributes (e.g. age, gender, and occupation), and users may not want their recommendation results influenced by those attributes. Moreover, which of and when these user attributes should be considered in fairness-aware modeling should depend on users' specific demands. In this work, we define the selective fairness task, where users can flexibly choose which sensitive attributes should the recommendation model be bias-free. We propose a novel parameter-efficient prompt-based fairness-aware recommendation (PFRec) framework, which relies on attribute-specific prompt-based bias eliminators with adversarial training, enabling selective fairness with different attribute combinations on sequential recommendation. Both task-specific and user-specific prompts are considered. We conduct extensive evaluations to verify PFRec's superiority in selective fairness. The source codes are released in \url{https://github.com/wyqing20/PFRec}.
△ Less
Submitted 5 July, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Combining Humor and Sarcasm for Improving Political Parody Detection
Authors:
Xiao Ao,
Danae Sánchez Villegas,
Daniel Preoţiuc-Pietro,
Nikolaos Aletras
Abstract:
Parody is a figurative device used for mimicking entities for comedic or critical purposes. Parody is intentionally humorous and often involves sarcasm. This paper explores jointly modelling these figurative tropes with the goal of improving performance of political parody detection in tweets. To this end, we present a multi-encoder model that combines three parallel encoders to enrich parody-spec…
▽ More
Parody is a figurative device used for mimicking entities for comedic or critical purposes. Parody is intentionally humorous and often involves sarcasm. This paper explores jointly modelling these figurative tropes with the goal of improving performance of political parody detection in tweets. To this end, we present a multi-encoder model that combines three parallel encoders to enrich parody-specific representations with humor and sarcasm information. Experiments on a publicly available data set of political parody tweets demonstrate that our approach outperforms previous state-of-the-art methods.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Research on spatial information transmission efficiency and capability of safe evacuation signs
Authors:
Ruiwen Fan,
Zhangyin Dai,
Shixiang Tian,
Ting Xia a,
Hui Zhou,
Congbao Huang
Abstract:
As an indispensable spatial direction information indicator for emergency evacuation, the spatial relationship between safety evacuation signs and evacuees will affect the response time of evacuees and the evacuation efficiency. This paper takes 2 kinds of common safety evacuation signs, hangtag-type and embedded, as the research object and designs space direction information transmission efficien…
▽ More
As an indispensable spatial direction information indicator for emergency evacuation, the spatial relationship between safety evacuation signs and evacuees will affect the response time of evacuees and the evacuation efficiency. This paper takes 2 kinds of common safety evacuation signs, hangtag-type and embedded, as the research object and designs space direction information transmission efficiency and capability simulation experiment and fire drill, the efficiency and capability of spatial direction information transmission of safety evacuation signs are studied. The results show that the space angle of the hangtag-type safety evacuation sign is inversely proportional to the information transmission efficiency and capability of the space direction, and the fire drill also confirms this conclusion. When the spatial angle of the embedded safety evacuation sign is 5°, the spatial direction information transmission efficiency and capability increase. Simultaneously, the average escape time of the participants in the fire drill was lower, and the percentage of choosing unfamiliarity exports increased. The evolution of spatial angle has no significant effect on the intention of the response of subjects of different genders; when choosing the direction, males are more easily affected by the change of spatial angle than females; the confidence level of females' choice is more easily affected by spatial angle. In addition, according to the research results, the corresponding three-dimensional structure safety evacuation signs are designed. The functional structure of the safety evacuation signs is perfected, which can effectively improve the efficiency of fire emergency evacuation.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
User-Centric Conversational Recommendation with Multi-Aspect User Modeling
Authors:
Shuokai Li,
Ruobing Xie,
Yongchun Zhu,
Xiang Ao,
Fuzhen Zhuang,
Qing He
Abstract:
Conversational recommender systems (CRS) aim to provide highquality recommendations in conversations. However, most conventional CRS models mainly focus on the dialogue understanding of the current session, ignoring other rich multi-aspect information of the central subjects (i.e., users) in recommendation. In this work, we highlight that the user's historical dialogue sessions and look-alike user…
▽ More
Conversational recommender systems (CRS) aim to provide highquality recommendations in conversations. However, most conventional CRS models mainly focus on the dialogue understanding of the current session, ignoring other rich multi-aspect information of the central subjects (i.e., users) in recommendation. In this work, we highlight that the user's historical dialogue sessions and look-alike users are essential sources of user preferences besides the current dialogue session in CRS. To systematically model the multi-aspect information, we propose a User-Centric Conversational Recommendation (UCCR) model, which returns to the essence of user preference learning in CRS tasks. Specifically, we propose a historical session learner to capture users' multi-view preferences from knowledge, semantic, and consuming views as supplements to the current preference signals. A multi-view preference mapper is conducted to learn the intrinsic correlations among different views in current and historical sessions via self-supervised objectives. We also design a temporal look-alike user selector to understand users via their similar users. The learned multi-aspect multi-view user preferences are then used for the recommendation and dialogue generation. In experiments, we conduct comprehensive evaluations on both Chinese and English CRS datasets. The significant improvements over competitive models in both recommendation and dialogue generation verify the superiority of UCCR.
△ Less
Submitted 25 April, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Multi-view Multi-behavior Contrastive Learning in Recommendation
Authors:
Yiqing Wu,
Ruobing Xie,
Yongchun Zhu,
Xiang Ao,
Xin Chen,
Xu Zhang,
Fuzhen Zhuang,
Leyu Lin,
Qing He
Abstract:
Multi-behavior recommendation (MBR) aims to jointly consider multiple behaviors to improve the target behavior's performance. We argue that MBR models should: (1) model the coarse-grained commonalities between different behaviors of a user, (2) consider both individual sequence view and global graph view in multi-behavior modeling, and (3) capture the fine-grained differences between multiple beha…
▽ More
Multi-behavior recommendation (MBR) aims to jointly consider multiple behaviors to improve the target behavior's performance. We argue that MBR models should: (1) model the coarse-grained commonalities between different behaviors of a user, (2) consider both individual sequence view and global graph view in multi-behavior modeling, and (3) capture the fine-grained differences between multiple behaviors of a user. In this work, we propose a novel Multi-behavior Multi-view Contrastive Learning Recommendation (MMCLR) framework, including three new CL tasks to solve the above challenges, respectively. The multi-behavior CL aims to make different user single-behavior representations of the same user in each view to be similar. The multi-view CL attempts to bridge the gap between a user's sequence-view and graph-view representations. The behavior distinction CL focuses on modeling fine-grained differences of different behaviors. In experiments, we conduct extensive evaluations and ablation tests to verify the effectiveness of MMCLR and various CL tasks on two real-world datasets, achieving SOTA performance over existing baselines. Our code will be available on \url{https://github.com/wyqing20/MMCLR}
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
Highly sensitive fire alarm system based on cellulose paper with low temperature response and wireless signal conversion
Authors:
Xiaolu Li,
Jose Sanchez del Rio Saez,
Xiang Ao,
Abdulmalik Yusuf,
De-Yi Wang
Abstract:
Highly sensitive smart sensors for early fire detection with remote warning capabilities are urgently required to improve the fire safety of combustible materials in diverse applications. The highly-sensitive fire alarm can detect fire situation within a short time quickly when a fire disaster is about to occur, which is conducive to achieve fire tuned. Herein, a novel fire alarm is designed by us…
▽ More
Highly sensitive smart sensors for early fire detection with remote warning capabilities are urgently required to improve the fire safety of combustible materials in diverse applications. The highly-sensitive fire alarm can detect fire situation within a short time quickly when a fire disaster is about to occur, which is conducive to achieve fire tuned. Herein, a novel fire alarm is designed by using flame-retardant cellulose paper loaded with graphene oxide (GO) and two-dimensional titanium carbide (Ti3C2, MXene). Owing to the excellent temperature dependent electrical resistance switching effect of GO, it acts as an electrical insulator at room temperature and becomes electrically conductive at high temperature. During a fire incident, the partial oxygen-containing groups on GO will undergo complete removal, which results in the conductivity transformation.Besides the use of GO feature, this work also introduces conductive MXene to enhance fire detection speed and warning at low temperature, especially below 300 °C. The designed flame-retardant fire alarm is sensitive enough to detect fire incident, showing a response time of 2 s at 250 °C, which is calculated by a novel and quantifiable technique. More importantly, the designed fire alarm sensor is coupled to a wireless communication interface to conveniently transmit fire signal remotely. Therefore, when an abnormal temperature is detected, the signal is wirelessly transmitted to a liquid crystal display (LCD) screen when displays a message such as "FIRE DANGER". The designed smart fire alarm paper is promising for use as a smart wallpaper for interior house decoration and other applications requiring early fire detection and warning.
△ Less
Submitted 28 December, 2021;
originally announced January 2022.
-
Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement
Authors:
Fuwei Zhang,
Zhao Zhang,
Xiang Ao,
Dehong Gao,
Fuzhen Zhuang,
Yi Wei,
Qing He
Abstract:
Cross-Lingual Information Retrieval (CLIR) aims to rank the documents written in a language different from the user's query. The intrinsic gap between different languages is an essential challenge for CLIR. In this paper, we introduce the multilingual knowledge graph (KG) to the CLIR task due to the sufficient information of entities in multiple languages. It is regarded as a "silver bullet" to si…
▽ More
Cross-Lingual Information Retrieval (CLIR) aims to rank the documents written in a language different from the user's query. The intrinsic gap between different languages is an essential challenge for CLIR. In this paper, we introduce the multilingual knowledge graph (KG) to the CLIR task due to the sufficient information of entities in multiple languages. It is regarded as a "silver bullet" to simultaneously perform explicit alignment between queries and documents and also broaden the representations of queries. And we propose a model named CLIR with hierarchical knowledge enhancement (HIKE) for our task. The proposed model encodes the textual information in queries, documents and the KG with multilingual BERT, and incorporates the KG information in the query-document matching process with a hierarchical information fusion mechanism. Particularly, HIKE first integrates the entities and their neighborhood in KG into query representations with a knowledge-level fusion, then combines the knowledge from both source and target languages to further mitigate the linguistic gap with a language-level fusion. Finally, experimental results demonstrate that HIKE achieves substantial improvements over state-of-the-art competitors.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
ConRPG: Paraphrase Generation using Contexts as Regularizer
Authors:
Yuxian Meng,
Xiang Ao,
Qing He,
Xiaofei Sun,
Qinghong Han,
Fei Wu,
Chun fan,
Jiwei Li
Abstract:
A long-standing issue with paraphrase generation is how to obtain reliable supervision signals. In this paper, we propose an unsupervised paradigm for paraphrase generation based on the assumption that the probabilities of generating two sentences with the same meaning given the same context should be the same. Inspired by this fundamental idea, we propose a pipelined system which consists of para…
▽ More
A long-standing issue with paraphrase generation is how to obtain reliable supervision signals. In this paper, we propose an unsupervised paradigm for paraphrase generation based on the assumption that the probabilities of generating two sentences with the same meaning given the same context should be the same. Inspired by this fundamental idea, we propose a pipelined system which consists of paraphrase candidate generation based on contextual language models, candidate filtering using scoring functions, and paraphrase model training based on the selected candidates. The proposed paradigm offers merits over existing paraphrase generation methods: (1) using the context regularizer on meanings, the model is able to generate massive amounts of high-quality paraphrase pairs; and (2) using human-interpretable scoring functions to select paraphrase pairs from candidates, the proposed framework provides a channel for developers to intervene with the data generation process, leading to a more controllable model. Experimental results across different tasks and datasets demonstrate that the effectiveness of the proposed model in both supervised and unsupervised setups.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Layer-wise Model Pruning based on Mutual Information
Authors:
Chun Fan,
Jiwei Li,
Xiang Ao,
Fei Wu,
Yuxian Meng,
Xiaofei Sun
Abstract:
The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each l…
▽ More
The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level. Extensive experiments show that at the same sparsity level, the proposed strategy offers both greater speedup and higher performances than weight-based pruning methods (e.g., magnitude pruning, movement pruning).
△ Less
Submitted 28 August, 2021;
originally announced August 2021.
-
Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback
Authors:
Haoming Li,
Feiyang Pan,
Xiang Ao,
Zhao Yang,
Min Lu,
Junwei Pan,
Dapeng Liu,
Lei Xiao,
Qing He
Abstract:
The delayed feedback problem is one of the imperative challenges in online advertising, which is caused by the highly diversified feedback delay of a conversion varying from a few minutes to several days. It is hard to design an appropriate online learning system under these non-identical delay for different types of ads and users. In this paper, we propose to tackle the delayed feedback problem i…
▽ More
The delayed feedback problem is one of the imperative challenges in online advertising, which is caused by the highly diversified feedback delay of a conversion varying from a few minutes to several days. It is hard to design an appropriate online learning system under these non-identical delay for different types of ads and users. In this paper, we propose to tackle the delayed feedback problem in online advertising by "Following the Prophet" (FTP for short). The key insight is that, if the feedback came instantly for all the logged samples, we could get a model without delayed feedback, namely the "prophet". Although the prophet cannot be obtained during online learning, we show that we could predict the prophet's predictions by an aggregation policy on top of a set of multi-task predictions, where each task captures the feedback patterns of different periods. We propose the objective and optimization approach for the policy, and use the logged data to imitate the prophet. Extensive experiments on three real-world advertising datasets show that our method outperforms the previous state-of-the-art baselines.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Authors:
Feiyang Pan,
Haoming Li,
Xiang Ao,
Wei Wang,
Yanrong Kang,
Ao Tan,
Qing He
Abstract:
The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration with principled uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to comp…
▽ More
The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration with principled uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lacks clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits.
In this paper, we introduce Guided Bootstrap (GuideBoot for short), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic task and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
Authors:
Zijun Sun,
Xiaoya Li,
Xiaofei Sun,
Yuxian Meng,
Xiang Ao,
Qing He,
Fei Wu,
Jiwei Li
Abstract:
Recent pretraining models in Chinese neglect two important aspects specific to the Chinese language: glyph and pinyin, which carry significant syntax and semantic information for language understanding. In this work, we propose ChineseBERT, which incorporates both the {\it glyph} and {\it pinyin} information of Chinese characters into language model pretraining. The glyph embedding is obtained bas…
▽ More
Recent pretraining models in Chinese neglect two important aspects specific to the Chinese language: glyph and pinyin, which carry significant syntax and semantic information for language understanding. In this work, we propose ChineseBERT, which incorporates both the {\it glyph} and {\it pinyin} information of Chinese characters into language model pretraining. The glyph embedding is obtained based on different fonts of a Chinese character, being able to capture character semantics from the visual features, and the pinyin embedding characterizes the pronunciation of Chinese characters, which handles the highly prevalent heteronym phenomenon in Chinese (the same character has different pronunciations with different meanings). Pretrained on large-scale unlabeled Chinese corpus, the proposed ChineseBERT model yields significant performance boost over baseline models with fewer training steps. The porpsoed model achieves new SOTA performances on a wide range of Chinese NLP tasks, including machine reading comprehension, natural language inference, text classification, sentence pair matching, and competitive performances in named entity recognition. Code and pretrained models are publicly available at https://github.com/ShannonAI/ChineseBert.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Iterative Network Pruning with Uncertainty Regularization for Lifelong Sentiment Classification
Authors:
Binzong Geng,
Min Yang,
Fajie Yuan,
Shupeng Wang,
Xiang Ao,
Ruifeng Xu
Abstract:
Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of incrementally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel iterative network pruning…
▽ More
Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of incrementally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel iterative network pruning with uncertainty regularization method for lifelong sentiment classification (IPRLS), which leverages the principles of network pruning and weight regularization. By performing network pruning with uncertainty regularization in an iterative manner, IPRLS can adapta single BERT model to work with continuously arriving data from multiple domains while avoiding catastrophic forgetting and interference. Specifically, we leverage an iterative pruning method to remove redundant parameters in large deep networks so that the freed-up space can then be employed to learn new tasks, tackling the catastrophic forgetting problem. Instead of keeping the old-tasks fixed when learning new tasks, we also use an uncertainty regularization based on the Bayesian online learning framework to constrain the update of old tasks weights in BERT, which enables positive backward transfer, i.e. learning new tasks improves performance on past tasks while protecting old knowledge from being lost. In addition, we propose a task-specific low-dimensional residual function in parallel to each layer of BERT, which makes IPRLS less prone to losing the knowledge saved in the base BERT network when learning a new task. Extensive experiments on 16 popular review corpora demonstrate that the proposed IPRLS method sig-nificantly outperforms the strong baselines for lifelong sentiment classification. For reproducibility, we submit the code and data at:https://github.com/siat-nlp/IPRLS.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Quasi-solid-state sodium-ion hybrid capacitors enabled by UiO-66@PVDF-HFP multifunctional separators: selective charge transfer and high safety
Authors:
Wenliang Feng,
Jing Zhang,
Abdulmalik Yusuf,
Xiang Ao,
Dongfeng Shi,
Vinodkumar Etacheri,
De-Yi Wang
Abstract:
The practical application of sodium-ion hybrid capacitors is limited by their low energy densities resulted from the kinetics mismatch between cathodes and anodes, and the fire safety related to the flammable electrolyte-separator system. Hence, we report a rational design of metal-organic frameworks (MOFs, UiO-66) modified PVDF-HFP separator. High tensile strength and dimensional thermal stabilit…
▽ More
The practical application of sodium-ion hybrid capacitors is limited by their low energy densities resulted from the kinetics mismatch between cathodes and anodes, and the fire safety related to the flammable electrolyte-separator system. Hence, we report a rational design of metal-organic frameworks (MOFs, UiO-66) modified PVDF-HFP separator. High tensile strength and dimensional thermal stability of the separator reduce the risk of electrode short circuit caused by the separator deformation. MCC test demonstrates a reduction of 75% in peak heat release rate (pHRR), indicating an enhanced fire-resistant property of the separator. This is due to the transformation of UiO-66 into ZrO2 accompanied by the consumption of oxygen and the formation of the barrier char that suppresses further heat release. Quasi-solid-state electrolyte prepared based on this separator presents an enhanced ionic conductivity of 2.44 mS*cm-1 and Na-ion transference number of 0.55, which are related to the high porosity ( >70%) and electrolyte uptake (~ 320%) of the separator. Moreover, the open metal sites of UiO-66 can capture PF6- and consequently liberate the Na+ for faster migration, thus reducing the kinetics mismatch between cathodes and anodes. Such multifunctional separator enables the quasi-solid-state Na-ion hybrid capacitor to achieve high energy density (182 Wh*kg-1 @31 W*kg-1) and power density (5280 W*kg-1 @22 Wh*kg-1), as well as excellent cyclic stability (10000 cycles @1000 mA*g-1).
Keywords: Quasi-solid-state; PVDF-HFP; Metal-organic frameworks; Dimensional thermal stability; Fire safety; Selective charge transfer
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Defending Against Backdoor Attacks in Natural Language Generation
Authors:
Xiaofei Sun,
Xiaoya Li,
Yuxian Meng,
Xiang Ao,
Lingjuan Lyu,
Jiwei Li,
Tianwei Zhang
Abstract:
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of back…
▽ More
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the {\it one-to-many} issue in many NLG tasks such as dialog generation. We hope that this work can raise the awareness of backdoor risks concealed in deep NLG systems and inspire more future work (both attack and defense) towards this direction.
△ Less
Submitted 9 October, 2023; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Sentence Similarity Based on Contexts
Authors:
Xiaofei Sun,
Yuxian Meng,
Xiang Ao,
Fei Wu,
Tianwei Zhang,
Jiwei Li,
Chun Fan
Abstract:
Existing methods to measure sentence similarity are faced with two challenges: (1) labeled datasets are usually limited in size, making them insufficient to train supervised neural models; (2) there is a training-test gap for unsupervised language modeling (LM) based models to compute semantic scores between sentences, since sentence-level semantics are not explicitly modeled at training. This res…
▽ More
Existing methods to measure sentence similarity are faced with two challenges: (1) labeled datasets are usually limited in size, making them insufficient to train supervised neural models; (2) there is a training-test gap for unsupervised language modeling (LM) based models to compute semantic scores between sentences, since sentence-level semantics are not explicitly modeled at training. This results in inferior performances in this task. In this work, we propose a new framework to address these two issues. The proposed framework is based on the core idea that the meaning of a sentence should be defined by its contexts, and that sentence similarity can be measured by comparing the probabilities of generating two sentences given the same context. The proposed framework is able to generate high-quality, large-scale dataset with semantic similarity scores between two sentences in an unsupervised manner, with which the train-test gap can be largely bridged. Extensive experiments show that the proposed framework achieves significant performance boosts over existing baselines under both the supervised and unsupervised settings across different datasets.
△ Less
Submitted 28 January, 2022; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
Authors:
Yuyang Nie,
Yuanhe Tian,
Yan Song,
Xiang Ao,
Xiang Wan
Abstract:
Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriatel…
▽ More
Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Improving Robustness and Generality of NLP Models Using Disentangled Representations
Authors:
Jiawei Wu,
Xiaoya Li,
Xiang Ao,
Yuxian Meng,
Fei Wu,
Jiwei Li
Abstract:
Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a mod…
▽ More
Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a model trained on one domain drops drastically when tested on another domain.
In this paper, we present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. Instead of mapping $x$ to a single representation $z$, the proposed strategy maps $x$ to a set of representations $\{z_1,z_2,...,z_K\}$ while forcing them to be disentangled. These representations are then mapped to different logits $l$s, the ensemble of which is used to make the final prediction $y$. We propose different methods to incorporate this idea into currently widely-used models, including adding an $L$2 regularizer on $z$s or adding Total Correlation (TC) under the framework of variational information bottleneck (VIB). We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Spectral self-adaptive absorber/emitter for harvesting energy from the sun and outer space
Authors:
Xianze Ao,
Bowen Li,
Bin Zhao,
Mingke Hu,
Hui Ren,
Honglun Yang,
Jie Liu,
Jingyu Cao,
Junsheng Feng,
Yuanjun Yang,
Zeming Qi,
Liangbin Li,
Gang Pei,
Chongwen Zou
Abstract:
The sun (~6000 K) and outer space (~3 K) are the original heat source and sink for human beings on Earth. The energy applications of absorbing solar irradiation and harvesting the coldness of outer space for energy utilization have attracted considerable interest from researchers. However, combining these two functions in a static device for continuous energy harvesting is unachievable due to the…
▽ More
The sun (~6000 K) and outer space (~3 K) are the original heat source and sink for human beings on Earth. The energy applications of absorbing solar irradiation and harvesting the coldness of outer space for energy utilization have attracted considerable interest from researchers. However, combining these two functions in a static device for continuous energy harvesting is unachievable due to the intrinsic infrared spectral conflict. In this study, we developed spectral self-adaptive absorber/emitter (SSA/E) for daytime photothermal and nighttime radiative sky cooling modes depending on the phase transition of the vanadium dioxide coated layer. A 24-hour day-night test showed that the fabricated SSA/E has continuous energy harvesting ability and improved overall energy utilization performance, thus showing remarkable potential in future energy applications.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Discovering Protagonist of Sentiment with Aspect Reconstructed Capsule Network
Authors:
Chi Xu,
Hao Feng,
Guoxin Yu,
Min Yang,
Xiting Wang,
Xiang Ao
Abstract:
Most recent existing aspect-term level sentiment analysis (ATSA) approaches combined various neural network models with delicately carved attention mechanisms built upon given aspect and context to generate refined sentence representations for better predictions. In these methods, aspect terms are always provided in both training and testing process which may degrade aspect-level analysis into sen…
▽ More
Most recent existing aspect-term level sentiment analysis (ATSA) approaches combined various neural network models with delicately carved attention mechanisms built upon given aspect and context to generate refined sentence representations for better predictions. In these methods, aspect terms are always provided in both training and testing process which may degrade aspect-level analysis into sentence-level prediction. However, the annotated aspect term might be unavailable in real-world scenarios which may challenge the applicability of the existing methods. In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown. We access this goal by proposing a capsule network based model named CAPSAR. In CAPSAR, sentiment categories are denoted by capsules and aspect term information is injected into sentiment capsules through a sentiment-aspect reconstruction procedure during the training. As a result, coherent patterns between aspects and sentimental expressions are encapsulated by these sentiment capsules. Experiments on three widely used benchmarks demonstrate these patterns have potential in exploring aspect terms from test sentence when only feeding the sentence to the model. Meanwhile, the proposed CAPSAR can clearly outperform SOTA methods in standard ATSA tasks.
△ Less
Submitted 19 January, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions
Authors:
Feiyang Pan,
Xiang Ao,
Pingzhong Tang,
Min Lu,
Dapeng Liu,
Lei Xiao,
Qing He
Abstract:
It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some popu…
▽ More
It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration. It is responsible for the unreliability of practical machine learning systems. For example, in online advertising, an ad can receive a click-through rate prediction of 0.1 over some population of users where its actual click rate is 0.15. In such cases, the probabilistic predictions have to be fixed before the system can be deployed.
In this paper, we first introduce a new evaluation metric named field-level calibration error that measures the bias in predictions over the sensitive input field that the decision-maker concerns. We show that existing post-hoc calibration methods have limited improvements in the new field-level metric and other non-calibration metrics such as the AUC score. To this end, we propose Neural Calibration, a simple yet powerful post-hoc calibration method that learns to calibrate by making full use of the field-aware information over the validation set. We present extensive experiments on five large-scale datasets. The results showed that Neural Calibration significantly improves against uncalibrated predictions in common metrics such as the negative log-likelihood, Brier score and AUC, as well as the proposed field-level calibration error.
△ Less
Submitted 27 January, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings
Authors:
Feiyang Pan,
Shuokai Li,
Xiang Ao,
Pingzhong Tang,
Qing He
Abstract:
Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem.
In this pap…
▽ More
Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such learning techniques are data demanding and work poorly on new ads with little logging data, which is known as the cold-start problem.
In this paper, we aim to improve CTR predictions during both the cold-start phase and the warm-up phase when a new ad is added to the candidate pool. We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs. The proposed method trains an embedding generator for new ad IDs by making use of previously learned ads through gradient-based meta-learning. In other words, our method learns how to learn better embeddings. When a new ad comes, the trained generator initializes the embedding of its ID by feeding its contents and attributes. Next, the generated embedding can speed up the model fitting during the warm-up phase when a few labeled examples are available, compared to the existing initialization methods.
Experimental results on three real-world datasets showed that Meta-Embedding can significantly improve both the cold-start and warm-up performances for six existing CTR prediction models, ranging from lightweight models such as Factorization Machines to complicated deep models such as PNN and DeepFM. All of the above apply to conversion rate (CVR) predictions as well.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents
Authors:
Ganbin Zhou,
Rongyu Cao,
Xiang Ao,
Ping Luo,
Fen Lin,
Leyu Lin,
Qing He
Abstract:
In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describing concepts, properties of entities, or relations among entities, while knowledgeable documents are the ones with enough knowledgeable snippets. These knowledgeab…
▽ More
In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describing concepts, properties of entities, or relations among entities, while knowledgeable documents are the ones with enough knowledgeable snippets. These knowledgeable snippets and documents could be helpful in multiple applications, such as knowledge base construction and knowledge-oriented service. Previous studies extracted the knowledgeable snippets using the pattern-based method. Here, we propose the semantic-based method for this task. Specifically, a CNN based model is developed to extract knowledgeable snippets and annotate knowledgeable documents simultaneously. Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains. Compared with building multiple domain-specific CNNs, this joint model not only critically saves the training time, but also improves the prediction accuracy visibly. The superiority of the proposed method is demonstrated in a real dataset from Wechat public platform.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
Free-rider Episode Screening via Dual Partition Model
Authors:
Xiang Ao,
Yang Liu,
Zhen Huang,
Luo Zuo,
Qing He
Abstract:
One of the drawbacks of frequent episode mining is that overwhelmingly many of the discovered patterns are redundant. Free-rider episode, as a typical example, consists of a real pattern doped with some additional noise events. Because of the possible high support of the inside noise events, such free-rider episodes may have abnormally high support that they cannot be filtered by frequency based f…
▽ More
One of the drawbacks of frequent episode mining is that overwhelmingly many of the discovered patterns are redundant. Free-rider episode, as a typical example, consists of a real pattern doped with some additional noise events. Because of the possible high support of the inside noise events, such free-rider episodes may have abnormally high support that they cannot be filtered by frequency based framework. An effective technique for filtering free-rider episodes is using a partition model to divide an episode into two consecutive subepisodes and comparing the observed support of such episode with its expected support under the assumption that these two subepisodes occur independently. In this paper, we take more complex subepisodes into consideration and develop a novel partition model named EDP for free-rider episode filtering from a given set of episodes. It combines (1) a dual partition strategy which divides an episode to an underlying real pattern and potential noises; (2) a novel definition of the expected support of a free-rider episode based on the proposed partition strategy. We can deem the episode interesting if the observed support is substantially higher than the expected support estimated by our model. The experiments on synthetic and real-world datasets demonstrate EDP can effectively filter free-rider episodes compared with existing state-of-the-arts.
△ Less
Submitted 18 May, 2018;
originally announced May 2018.
-
Cross-domain novelty seeking trait mining for sequential recommendation
Authors:
Fuzhen Zhuang,
Yingmin Zhou,
Fuzheng Zhang,
Xiang Ao,
Xing Xie,
Qing He
Abstract:
Transfer learning has attracted a large amount of interest and research in last decades, and some efforts have been made to build more precise recommendation systems. Most previous transfer recommendation systems assume that the target domain shares the same/similar rating patterns with the auxiliary source domain, which is used to improve the recommendation performance. However, to the best of ou…
▽ More
Transfer learning has attracted a large amount of interest and research in last decades, and some efforts have been made to build more precise recommendation systems. Most previous transfer recommendation systems assume that the target domain shares the same/similar rating patterns with the auxiliary source domain, which is used to improve the recommendation performance. However, to the best of our knowledge, almost these works do not consider the characteristics of sequential data. In this paper, we study the new cross-domain recommendation scenario for mining novelty-seeking trait. Recent studies in psychology suggest that novelty-seeking trait is highly related to consumer behavior, which has a profound business impact on online recommendation. Previous work performing on only one single target domain may not fully characterize users' novelty-seeking trait well due to the data scarcity and sparsity, leading to the poor recommendation performance. Along this line, we proposed a new cross-domain novelty-seeking trait mining model (CDNST for short) to improve the sequential recommendation performance by transferring the knowledge from auxiliary source domain. We conduct systematic experiments on three domain data sets crawled from Douban (www.douban.com) to demonstrate the effectiveness of the proposed model. Moreover, we analyze how the temporal property of sequential data affects the performance of CDNST, and conduct simulation experiments to validate our analysis.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data
Authors:
Thapana Boonchoo,
Xiang Ao,
Qing He
Abstract:
DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n^2) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based…
▽ More
DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n^2) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based DBSCAN still suffers from two problems: neighbour explosion and redundancies in merging, which make the algorithms infeasible in high-dimensional space. In this paper, we propose a novel algorithm named GDPAM attempting to extend Grid-based DBSCAN to higher data dimension. In GDPAM, a bitmap indexing is utilized to manage non-empty grids so that the neighbour grid queries can be performed efficiently. Furthermore, we adopt an efficient union-find algorithm to maintain the clustering information in order to reduce redundancies in the merging. The experimental results on both real-world and synthetic datasets demonstrate that the proposed algorithm outperforms the state-of-the-art exact/approximate DBSCAN and suggests a good scalability.
△ Less
Submitted 22 January, 2018;
originally announced January 2018.
-
Diffusion of Chiral Janus Particles in a Sinusoidal Channel
Authors:
Xue Ao,
Pulak Kumar Ghosh,
Yunyun Li,
Gerhard Schmid,
Peter Hänggi,
Fabio Marchesoni
Abstract:
We investigate the transport diffusivity of artificial microswimmers, a.k.a. Janus particles, moving in a sinusoidal channel in the absence of external biases. Their diffusion constant turns out to be quite sensitive to the self-propulsion mechanism and the geometry of the channel compartments. Our analysis thus suggests how to best control the diffusion of active Brownian motion in confined geome…
▽ More
We investigate the transport diffusivity of artificial microswimmers, a.k.a. Janus particles, moving in a sinusoidal channel in the absence of external biases. Their diffusion constant turns out to be quite sensitive to the self-propulsion mechanism and the geometry of the channel compartments. Our analysis thus suggests how to best control the diffusion of active Brownian motion in confined geometries.
△ Less
Submitted 18 December, 2014;
originally announced December 2014.
-
Active Brownian motion in a narrow channel
Authors:
Xue Ao,
Pulak Kumar Ghosh,
Yunyun Li,
Gerhard Schmid,
Peter Hänggi,
Fabio Marchesoni
Abstract:
We review recent advances in rectification control of artificial microswimmers, also known as Janus particles, diffusing along narrow, periodically corrugated channels. The swimmer self-propulsion mechanism is modeled so as to incorporate a nonzero torque (propulsion chirality). We first summarize the effects of chirality on the autonomous current of microswimmers freely diffusing in channels of d…
▽ More
We review recent advances in rectification control of artificial microswimmers, also known as Janus particles, diffusing along narrow, periodically corrugated channels. The swimmer self-propulsion mechanism is modeled so as to incorporate a nonzero torque (propulsion chirality). We first summarize the effects of chirality on the autonomous current of microswimmers freely diffusing in channels of different geometries. In particular, left-right and upside-down asymmetric channels are shown to exhibit different transport properties. We then report new results on the dependence of the diffusivity of chiral microswimmers on the channel geometry and their own self-propulsion mechanism. The self-propulsion torque turns out to play a key role as a transport control parameter.
△ Less
Submitted 17 September, 2014;
originally announced September 2014.
-
Ion Acoustic Travelling Waves
Authors:
G. M. Webb,
R. H. Burrows,
X. Ao,
G. P. Zank
Abstract:
Models for travelling waves in multi-fluid plasmas give essential insight into fully nonlinear wave structures in plasmas, not readily available from either numerical simulations or from weakly nonlinear wave theories. We illustrate these ideas using one of the simplest models of an electron-proton multi-fluid plasma for the case where there is no magnetic field or a constant normal magnetic field…
▽ More
Models for travelling waves in multi-fluid plasmas give essential insight into fully nonlinear wave structures in plasmas, not readily available from either numerical simulations or from weakly nonlinear wave theories. We illustrate these ideas using one of the simplest models of an electron-proton multi-fluid plasma for the case where there is no magnetic field or a constant normal magnetic field present. We show that the travelling waves can be reduced to a single first order differential equation governing the dynamics. We also show that the equations admit a multi-symplectic Hamiltonian formulation in which both the space and time variables can act as the evolution variable. An integral equation useful for calculating adiabatic, electrostatic solitary wave signatures for multi-fluid plasmas with arbitrary mass ratios is presented. The integral equation arises naturally from a fluid dynamics approach for a two fluid plasma, with a given mass ratio of the two species (e.g. the plasma could be an electron proton or an electron positron plasma). Besides its intrinsic interest, the integral equation solution provides a useful analytical test for numerical codes that include a proton-electron mass ratio as a fundamental constant, such as for particle in cell (PIC) codes. The integral equation is used to delineate the physical characteristics of ion acoustic travelling waves consisting of hot electron and cold proton fluids.
△ Less
Submitted 22 December, 2013;
originally announced December 2013.
-
In-phase and anti-phase synchronization in noisy Hodgkin-Huxley neurons
Authors:
Xue Ao,
Peter Hanggi,
Gerhard Schmid
Abstract:
We numerically investigate the influence of intrinsic channel noise on the dynamical response of delay-coupling in neuronal systems. The stochastic dynamics of the spiking is modeled within a stochastic modification of the standard Hodgkin-Huxley model wherein the delay-coupling accounts for the finite propagation time of an action potential along the neuronal axon. We quantify this delay-coupling…
▽ More
We numerically investigate the influence of intrinsic channel noise on the dynamical response of delay-coupling in neuronal systems. The stochastic dynamics of the spiking is modeled within a stochastic modification of the standard Hodgkin-Huxley model wherein the delay-coupling accounts for the finite propagation time of an action potential along the neuronal axon. We quantify this delay-coupling of the Pyragas-type in terms of the difference between corresponding presynaptic and postsynaptic membrane potentials. For an elementary neuronal network consisting of two coupled neurons we detect characteristic stochastic synchronization patterns which exhibit multiple phase-flip bifurcations: The phase-flip bifurcations occur in form of alternate transitions from an in-phase spiking activity towards an anti-phase spiking activity. Interestingly, these phase-flips remain robust in strong channel noise and in turn cause a striking stabilization of the spiking frequency.
△ Less
Submitted 19 February, 2013; v1 submitted 21 June, 2012;
originally announced June 2012.
-
Torsion Cosmology of Poincaré gauge theory and the constraints of its parameters via SNeIa data
Authors:
Xi-Chen Ao,
Xin-Zhou Li
Abstract:
Poincarè gauge theory (PGT) is an alternative gravity theory, which attempts to bring the gravity into the gauge-theoretic frame, where the Lagrangian is quadratic in torsion and curvature. Recently, the cosmological models with torsion based on this theory have drawn many attentions, which try to explain the cosmic acceleration in a new way. Among these PGT cosmological models, the one with only…
▽ More
Poincarè gauge theory (PGT) is an alternative gravity theory, which attempts to bring the gravity into the gauge-theoretic frame, where the Lagrangian is quadratic in torsion and curvature. Recently, the cosmological models with torsion based on this theory have drawn many attentions, which try to explain the cosmic acceleration in a new way. Among these PGT cosmological models, the one with only even parity dynamical modes -- SNY model, for its realistic meaning, is very attractive. In this paper, we first analyze the past-time cosmic evolution of SNY model analytically. And based on these results we fit this model to the most comprehensive SNeIa data (Union 2) and thus find the best-fit values of model parameters and initial conditions, whose related $χ^{2}$ value is consistent with the one from $Λ$CMD at the 1$σ$ level. Also by the $χ^{2}$ estimate, we provide certain constraints on these parameters. Using these best-fit values for the Union 2 SNeIa dataset, we are able to predict the evolution of our real universe over the late time. From this prediction, we know the fate of our universe that it would expand forever, slowly asymptotically to a halt, which is in accordance with the earlier works.
△ Less
Submitted 9 November, 2011;
originally announced November 2011.
-
de Sitter Gauge Theory of Gravity: An Alternative Torsion Cosmology
Authors:
Xi-Chen Ao,
Xin-Zhou Li
Abstract:
A new cosmological model based on the de Sitter gauge theory (dSGT) is studied in this paper. By some transformations, we find, in the dust universe, the cosmological equations of dSGT could form an autonomous system. We conduct dynamics analysis to this system, and find 9 critical points, among which there exist one positive attractor and one negative attractor. The positive attractor shows us th…
▽ More
A new cosmological model based on the de Sitter gauge theory (dSGT) is studied in this paper. By some transformations, we find, in the dust universe, the cosmological equations of dSGT could form an autonomous system. We conduct dynamics analysis to this system, and find 9 critical points, among which there exist one positive attractor and one negative attractor. The positive attractor shows us that our universe will enter a exponential expansion phase in the end, which is similar to the conclusion of $Λ$CDM. We also carry out some numerical calculations, which confirms the conclusion of dynamics analysis. Finally, we fit the model parameter and initial values to the Union 2 SNIa dataset, present the confidence contour of parameters and obtain the best-fit values of parameters of dSGT.
△ Less
Submitted 7 November, 2011;
originally announced November 2011.
-
Cosmological Dynamics of de Sitter Gravity
Authors:
Xi-chen Ao,
Xin-zhou Li,
Ping Xi
Abstract:
A new cosmological model based on the de Sitter gravity is investigated by dynamical analysis and numerical discussions. Via some transformations, the evolution equations of this model can form an autonomous system with 8 physical critical points. Among these critical points there exist one positive attractor and one negative attractor. The positive attractor describes the asymptotic behavior of l…
▽ More
A new cosmological model based on the de Sitter gravity is investigated by dynamical analysis and numerical discussions. Via some transformations, the evolution equations of this model can form an autonomous system with 8 physical critical points. Among these critical points there exist one positive attractor and one negative attractor. The positive attractor describes the asymptotic behavior of late-time universe, which indicates that the universe will enter the exponential expansion phase, finally. Some numerical calculations are also carried out, which convince us of this conclusion derived from the dynamical analysis.
△ Less
Submitted 12 August, 2011;
originally announced August 2011.
-
Analytical approach of late-time evolution in a torsion cosmology
Authors:
Xi-chen Ao,
Xin-zhou Li,
Ping Xi
Abstract:
In this letter, we study the late-time evolution of a torsion cosmology only with the spin-$0^+$ mode. We find three kinds of analytical solutions with a constant affine scalar curvature. In the first case, it is not physical because the matter density will be negative. In the second case, it shows that the dark energy can be mimicked in the torsion cosmological model. In the third case, the chara…
▽ More
In this letter, we study the late-time evolution of a torsion cosmology only with the spin-$0^+$ mode. We find three kinds of analytical solutions with a constant affine scalar curvature. In the first case, it is not physical because the matter density will be negative. In the second case, it shows that the dark energy can be mimicked in the torsion cosmological model. In the third case, the characteristic of late-time evolution is similar to that of the universe of matter dominant. And we also find a kind of expression with the non-constant curvature that the periodic character of numerical calculation is only the reflection of solution in a specific period of evolution. Using these expressions, we shall be able to predict the evolution over the late-time. From this prediction, we know the fate of universe that the universe would expand forever, slowly asymtotically to a halt.
△ Less
Submitted 20 October, 2010;
originally announced October 2010.
-
A simple derivation of level spacing of quasinormal frequencies for a black hole with a deficit solid angle and quintessence-like matter
Authors:
Ping Xi,
Xi-chen Ao,
Xin-zhou Li
Abstract:
In this paper, we investigate analytically the level space of the imaginary part of quasinormal frequencies for a black hole with a deficit solid angle and quintessence-like matter by the Padmanabhan's method \cite{Padmanabhan}. Padmanabhan presented a method to study analytically the imaginary part of quasinormal frequencies for a class of spherically symmetric spacetimes including Schwarzschild-…
▽ More
In this paper, we investigate analytically the level space of the imaginary part of quasinormal frequencies for a black hole with a deficit solid angle and quintessence-like matter by the Padmanabhan's method \cite{Padmanabhan}. Padmanabhan presented a method to study analytically the imaginary part of quasinormal frequencies for a class of spherically symmetric spacetimes including Schwarzschild-de Sitter black holes which has an evenly spaced structure. The results show that the level space of scalar and gravitational quasinormal frequencies for this kind of black holes only depend on the surface gravity of black-hole horizon in the range of -1 < w < -1/3, respectively . We also extend the range of $w$ to $w \leq -1$, the results of which are similar to that in -1 < w < -1/3 case. Particularly, a black hole with a deficit solid angle in accelerating universe will be a Schwarzschild-de Sitter black hole, fixing $w = -1$ and $ε^2 = 0$. And a black hole with a deficit solid angle in the accelerating universe will be a Schwarzschild black hole,when $ρ_0 = 0$ and $ε^2 = 0$. In this paper, $w$ is the parameter of state equation, $ε^2$ is a parameter relating to a deficit solid angle and $ρ_0$ is the density of static spherically symmetrical quintessence-like matter at $r = 1$.
△ Less
Submitted 3 November, 2010; v1 submitted 21 May, 2010;
originally announced May 2010.
-
Level spacing of quasinormal frequencies for black holes with a deficit solid angle surrounded by quintessence-like matter
Authors:
Ping Xi,
Xi-Chen Ao,
Xin-Zhou Li
Abstract:
This paper has been withdrawn by the author.
This paper has been withdrawn by the author.
△ Less
Submitted 17 May, 2010; v1 submitted 29 October, 2009;
originally announced October 2009.