-
Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning
Authors:
Ngoc Bui,
Menglin Yang,
Runjin Chen,
Leonardo Neves,
Mingxuan Ju,
Rex Ying,
Neil Shah,
Tong Zhao
Abstract:
Backward compatible representation learning enables updated models to integrate seamlessly with existing ones, avoiding to reprocess stored data. Despite recent advances, existing compatibility approaches in Euclidean space neglect the uncertainty in the old embedding model and force the new model to reconstruct outdated representations regardless of their quality, thereby hindering the learning p…
▽ More
Backward compatible representation learning enables updated models to integrate seamlessly with existing ones, avoiding to reprocess stored data. Despite recent advances, existing compatibility approaches in Euclidean space neglect the uncertainty in the old embedding model and force the new model to reconstruct outdated representations regardless of their quality, thereby hindering the learning process of the new model. In this paper, we propose to switch perspectives to hyperbolic geometry, where we treat time as a natural axis for capturing a model's confidence and evolution. By lifting embeddings into hyperbolic space and constraining updated embeddings to lie within the entailment cone of the old ones, we maintain generational consistency across models while accounting for uncertainties in the representations. To further enhance compatibility, we introduce a robust contrastive alignment loss that dynamically adjusts alignment weights based on the uncertainty of the old embeddings. Experiments validate the superiority of the proposed method in achieving compatibility, paving the way for more resilient and adaptable machine learning systems.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Revisiting Self-attention for Cross-domain Sequential Recommendation
Authors:
Clark Mingxuan Ju,
Leonardo Neves,
Bhuvesh Kumar,
Liam Collins,
Tong Zhao,
Yuwei Qiu,
Qing Dou,
Sohail Nizam,
Sen Yang,
Neil Shah
Abstract:
Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional…
▽ More
Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional domain-specific components (e.g. domain-aware module blocks). While these additional components help, we argue they overlook the core self-attention module already present in the transformer, a naturally powerful tool to learn correlations among behaviors. In this work, we aim to improve the CDSR performance for simple models from a novel perspective of enhancing the self-attention. Specifically, we introduce a Pareto-optimal self-attention and formulate the cross-domain learning as a multi-objective problem, where we optimize the recommendation task while dynamically minimizing the cross-domain attention scores. Our approach automates knowledge transfer in CDSR (dubbed as AutoCDSR) -- it not only mitigates negative transfer but also encourages complementary knowledge exchange among auxiliary domains. Based on the idea, we further introduce AutoCDSR+, a more performant variant with slight additional cost. Our proposal is easy to implement and works as a plug-and-play module that can be incorporated into existing transformer-based recommenders. Besides flexibility, it is practical to deploy because it brings little extra computational overheads without heavy hyper-parameter tuning. AutoCDSR on average improves Recall@10 for SASRec and Bert4Rec by 9.8% and 16.0% and NDCG@10 by 12.0% and 16.7%, respectively. Code is available at https://github.com/snap-research/AutoCDSR.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat
Authors:
Clark Mingxuan Ju,
Leonardo Neves,
Bhuvesh Kumar,
Liam Collins,
Tong Zhao,
Yuwei Qiu,
Qing Dou,
Yang Zhou,
Sohail Nizam,
Rengim Ozturk,
Yvette Liu,
Sen Yang,
Manish Malik,
Neil Shah
Abstract:
The development of powerful user representations is a key factor in the success of recommender systems (RecSys). Online platforms employ a range of RecSys techniques to personalize user experience across diverse in-app surfaces. User representations are often learned individually through user's historical interactions within each surface and user representations across different surfaces can be sh…
▽ More
The development of powerful user representations is a key factor in the success of recommender systems (RecSys). Online platforms employ a range of RecSys techniques to personalize user experience across diverse in-app surfaces. User representations are often learned individually through user's historical interactions within each surface and user representations across different surfaces can be shared post-hoc as auxiliary features or additional retrieval sources. While effective, such schemes cannot directly encode collaborative filtering signals across different surfaces, hindering its capacity to discover complex relationships between user behaviors and preferences across the whole platform. To bridge this gap at Snapchat, we seek to conduct universal user modeling (UUM) across different in-app surfaces, learning general-purpose user representations which encode behaviors across surfaces. Instead of replacing domain-specific representations, UUM representations capture cross-domain trends, enriching existing representations with complementary information. This work discusses our efforts in developing initial UUM versions, practical challenges, technical choices and modeling and research directions with promising offline performance. Following successful A/B testing, UUM representations have been launched in production, powering multiple use cases and demonstrating their value. UUM embedding has been incorporated into (i) Long-form Video embedding-based retrieval, leading to 2.78% increase in Long-form Video Open Rate, (ii) Long-form Video L2 ranking, with 19.2% increase in Long-form Video View Time sum, (iii) Lens L2 ranking, leading to 1.76% increase in Lens play time, and (iv) Notification L2 ranking, with 0.87% increase in Notification Open Rate.
△ Less
Submitted 9 June, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
Ab initio modeling of TWIP and TRIP effects in $β$-Ti alloys
Authors:
David Holec,
Johann Grillitsch,
Jose L. Neves,
David Obersteiner,
Thomas Klein
Abstract:
Transformations in bcc-$β$, hcp-$α$, and the $ω$ phases of Ti alloys are studied using Density Functional Theory for pure Ti and Ti alloyed with Al, Si, V, Cr, Fe, Cu, Nb, Mo, and Sn. The $β$-stabilization caused by alloying Si, Fe, Cr, and Mo was observed, but the most stable phase appears between the $β$ and the $α$ phases, corresponding to the martensitic $α''$ phase. Next, the…
▽ More
Transformations in bcc-$β$, hcp-$α$, and the $ω$ phases of Ti alloys are studied using Density Functional Theory for pure Ti and Ti alloyed with Al, Si, V, Cr, Fe, Cu, Nb, Mo, and Sn. The $β$-stabilization caused by alloying Si, Fe, Cr, and Mo was observed, but the most stable phase appears between the $β$ and the $α$ phases, corresponding to the martensitic $α''$ phase. Next, the $\{112\}\langle11\bar1\rangle$ bcc twins are separated by a positive barrier, which further increases by alloying w.r.t. pure Ti. The $\{332\}\langle11\bar3\rangle$ twinning yields negative barriers for all species but Mo and Fe. This is because the transition state is structurally similar to the $α$ phase, which is preferred over the $β$ phase for the majority of alloying elements. Lastly, the impact of alloying on twin boundary energies is discussed. These results may serve as design guidelines for novel Ti-based alloys with specific application areas.
△ Less
Submitted 6 July, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Ptychographic estimation of qudit states encoded in the angular position and orbital angular momentum of single photons
Authors:
A. M. da Costa,
L. Neves
Abstract:
Ptychography is a computational imaging technique mainly used in optical and electron microscopy. Its quantum analogue was recently introduced as a simple method for estimating unknown pure quantum states through projections onto partially overlapping subspaces, each one followed by a projective measurement in the Fourier basis. In the end, an iterative algorithm estimates the state from the colle…
▽ More
Ptychography is a computational imaging technique mainly used in optical and electron microscopy. Its quantum analogue was recently introduced as a simple method for estimating unknown pure quantum states through projections onto partially overlapping subspaces, each one followed by a projective measurement in the Fourier basis. In the end, an iterative algorithm estimates the state from the collected data. Here, we theoretically describe how to implement this method for $D$-dimensional qudit states encoded in the angular position and orbital angular momentum (OAM) of single photons. For this purpose, we define the qudit by discretizing the spatial profile of a photon in cylindrical coordinates, using an array of $D$ angular slits symmetrically distributed in the transverse plane. To apply ptychography to this encoding, we show that the intermediate projections will be carried out by simple binary spatial filters in the angular paths, while the measurement in the Fourier basis will be performed by postselecting $D$ OAM modes compatible with the quantum Fourier transform of the path basis. We illustrate the effectiveness of this scheme through simulations and discuss its experimental feasibility.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Coherence based on positive operator-valued measures for standard and concatenated quantum state discrimination with inconclusive results
Authors:
L. F. Melo,
O. Jiménez,
L. Neves
Abstract:
The optimal measurement that discriminates nonorthogonal quantum states with fixed rates of inconclusive outcomes (FRIO) can be decomposed into an assisted separation of the inputs, yielding conclusive and inconclusive outputs, followed by a minimum-error (ME) measurement for the conclusive ones (standard FRIO) or both ones (concatenated FRIO). The implementation of these measurements is underpinn…
▽ More
The optimal measurement that discriminates nonorthogonal quantum states with fixed rates of inconclusive outcomes (FRIO) can be decomposed into an assisted separation of the inputs, yielding conclusive and inconclusive outputs, followed by a minimum-error (ME) measurement for the conclusive ones (standard FRIO) or both ones (concatenated FRIO). The implementation of these measurements is underpinned by quantum resources, and here we investigate coherence based on positive operator-valued measures (POVMs) as a resource for both strategies in discriminating equally probable symmetric states of arbitrary dimension. First, we show that the POVM coherence in the assisted separation stage decomposes into the coherence of the ancillary state and the quantum discord between the system and the ancilla, evidencing coherence as a more elementary resource than quantum correlations. Next, it is demonstrated that the POVM coherence for standard and concatenated FRIO decomposes into the POVM coherence measures for state separation and ME measurement, weighted by the probabilities of occurrence of each event. Due to the ME discrimination of inconclusive states, the coherence required for the concatenated scheme is shown to be greater than that of the standard one. We discuss other general aspects of our results by characterizing the POVM coherence in the discrimination of qutrit states, with respect to the distinguishability of the inputs and the inconclusive rate. Finally, by exploiting POVM-based coherence as a quantifier of cryptographic randomness gain, we discuss the standard and concatenated FRIO strategies from the perspective of generating random bits that are secret to an eavesdropper.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
GraphHash: Graph Clustering Enables Parameter Efficiency in Recommender Systems
Authors:
Xinyi Wu,
Donald Loveland,
Runjin Chen,
Yozen Liu,
Xin Chen,
Leonardo Neves,
Ali Jadbabaie,
Clark Mingxuan Ju,
Neil Shah,
Tong Zhao
Abstract:
Deep recommender systems rely heavily on large embedding tables to handle high-cardinality categorical features such as user/item identifiers, and face significant memory constraints at scale. To tackle this challenge, hashing techniques are often employed to map multiple entities to the same embedding and thus reduce the size of the embedding tables. Concurrently, graph-based collaborative signal…
▽ More
Deep recommender systems rely heavily on large embedding tables to handle high-cardinality categorical features such as user/item identifiers, and face significant memory constraints at scale. To tackle this challenge, hashing techniques are often employed to map multiple entities to the same embedding and thus reduce the size of the embedding tables. Concurrently, graph-based collaborative signals have emerged as powerful tools in recommender systems, yet their potential for optimizing embedding table reduction remains unexplored. This paper introduces GraphHash, the first graph-based approach that leverages modularity-based bipartite graph clustering on user-item interaction graphs to reduce embedding table sizes. We demonstrate that the modularity objective has a theoretical connection to message-passing, which provides a foundation for our method. By employing fast clustering algorithms, GraphHash serves as a computationally efficient proxy for message-passing during preprocessing and a plug-and-play graph-based alternative to traditional ID hashing. Extensive experiments show that GraphHash substantially outperforms diverse hashing baselines on both retrieval and click-through-rate prediction tasks. In particular, GraphHash achieves on average a 101.52% improvement in recall when reducing the embedding table size by more than 75%, highlighting the value of graph-based collaborative information for model reduction. Our code is available at https://github.com/snap-research/GraphHash.
△ Less
Submitted 8 February, 2025; v1 submitted 22 December, 2024;
originally announced December 2024.
-
Enhancing Item Tokenization for Generative Recommendation through Self-Improvement
Authors:
Runjin Chen,
Mingxuan Ju,
Ngoc Bui,
Dimosthenis Antypas,
Stanley Cai,
Xiaopeng Wu,
Leonardo Neves,
Zhangyang Wang,
Neil Shah,
Tong Zhao
Abstract:
Generative recommendation systems, driven by large language models (LLMs), present an innovative approach to predicting user preferences by modeling items as token sequences and generating recommendations in a generative manner. A critical challenge in this approach is the effective tokenization of items, ensuring that they are represented in a form compatible with LLMs. Current item tokenization…
▽ More
Generative recommendation systems, driven by large language models (LLMs), present an innovative approach to predicting user preferences by modeling items as token sequences and generating recommendations in a generative manner. A critical challenge in this approach is the effective tokenization of items, ensuring that they are represented in a form compatible with LLMs. Current item tokenization methods include using text descriptions, numerical strings, or sequences of discrete tokens. While text-based representations integrate seamlessly with LLM tokenization, they are often too lengthy, leading to inefficiencies and complicating accurate generation. Numerical strings, while concise, lack semantic depth and fail to capture meaningful item relationships. Tokenizing items as sequences of newly defined tokens has gained traction, but it often requires external models or algorithms for token assignment. These external processes may not align with the LLM's internal pretrained tokenization schema, leading to inconsistencies and reduced model performance. To address these limitations, we propose a self-improving item tokenization method that allows the LLM to refine its own item tokenizations during training process. Our approach starts with item tokenizations generated by any external model and periodically adjusts these tokenizations based on the LLM's learned patterns. Such alignment process ensures consistency between the tokenization and the LLM's internal understanding of the items, leading to more accurate recommendations. Furthermore, our method is simple to implement and can be integrated as a plug-and-play enhancement into existing generative recommendation systems. Experimental results on multiple datasets and using various initial tokenization strategies demonstrate the effectiveness of our method, with an average improvement of 8\% in recommendation performance.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
Ptychographic estimation of pure multiqubit states in a quantum device
Authors:
Warley M. S. Alves,
Leonardo Neves
Abstract:
Quantum ptychography is a method for estimating an unknown pure quantum state by subjecting it to overlapping projections, each one followed by a projective measurement on a single prescribed basis. Here, we present a comprehensive study of this method applied for estimating $n$-qubit states in a circuit-based quantum computer, including numerical simulations and experiments carried out on an IBM…
▽ More
Quantum ptychography is a method for estimating an unknown pure quantum state by subjecting it to overlapping projections, each one followed by a projective measurement on a single prescribed basis. Here, we present a comprehensive study of this method applied for estimating $n$-qubit states in a circuit-based quantum computer, including numerical simulations and experiments carried out on an IBM superconducting quantum processor. The intermediate projections are implemented through Pauli measurements on one qubit at a time, which sets the number of ptychographic circuits to $3n$ (in contrast to the $3^n$ circuits for standard Pauli tomography); the final projective measurement in the computational basis is preceded by the quantum Fourier transform (QFT). Due to the large depth and number of two-qubit gates of the QFT circuit, which is unsuitable for noisy devices, we also test the approximate QFT (AQFT) and separable unitary operations. Using the QFT and AQFT of degree $2$, we obtained high estimation fidelities in all tests with separable and entangled states for up to three and four qubits, respectively; on the other hand, the separable unitaries in this scenario provided good estimations only for separable states, in general. Our results compare favorably with recent results in the literature and we discuss further alternatives to make the ptychographic method scalable for the current noisy devices.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Experimental optimal discrimination of $N$ states of a qubit with fixed rates of inconclusive outcomes
Authors:
L. F. Melo,
M. A. Solís-Prosser,
O. Jiménez,
A. Delgado,
L. Neves
Abstract:
In a general optimized measurement scheme for discriminating between nonorthogonal quantum states, the error rate is minimized under the constraint of a fixed rate of inconclusive outcomes (FRIO). This so-called optimal FRIO measurement encompasses the standard and well known minimum-error and optimal unambiguous (or maximum-confidence) discrimination strategies as particular cases. Here, we exper…
▽ More
In a general optimized measurement scheme for discriminating between nonorthogonal quantum states, the error rate is minimized under the constraint of a fixed rate of inconclusive outcomes (FRIO). This so-called optimal FRIO measurement encompasses the standard and well known minimum-error and optimal unambiguous (or maximum-confidence) discrimination strategies as particular cases. Here, we experimentally demonstrate the optimal FRIO discrimination between $N=2,3,5,$ and $7$ equally likely symmetric states of a qubit encoded in photonic path modes. Our implementation consists of applying a probabilistic quantum map which increases the distinguishability between the inputs in a controlled way, followed by a minimum-error measurement on the successfully transformed outputs. The results obtained corroborate this two-step approach and, in our experimental scheme, it can be straightforwardly extended to higher dimensions. The optimized measurement demonstrated here will be useful for quantum communication scenarios where the error rate and the inconclusive rate must be kept below the levels provided by the respective standard strategies.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
First Steps towards K-12 Computer Science Education in Portugal -- Experience Report
Authors:
Fernando Luis Neves,
Jose Nuno Oliveira
Abstract:
Computer scientists Jeannette Wing and Simon Peyton Jones have catalyzed a pivotal discussion on the need to introduce computing in K-12 mandatory education. In Wing's own words, computing 'represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.'' The crux of this educational endeavor lies in its execution. This paper repo…
▽ More
Computer scientists Jeannette Wing and Simon Peyton Jones have catalyzed a pivotal discussion on the need to introduce computing in K-12 mandatory education. In Wing's own words, computing 'represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.'' The crux of this educational endeavor lies in its execution. This paper reports on the efforts of the ENSICO association to implement such aims in Portugal. Starting with pilot projects in a few schools in 2020, it is currently working with 4500 students, 35 schools and 100 school teachers. The main aim is to gain enough experience and knowledge to eventually define a comprehensive syllabus for teaching computing as a mandatory subject throughout the basic and secondary levels of the Portuguese educational system. A structured framework for integrating computational thinking into K-12 education is proposed, with a particular emphasis on mathematical modeling and the functional programming paradigm. This approach is chosen for its potential to promote analytical and problem-solving skills of computational thinking aligned with the core background on maths and science.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Energy additivity as a requirement for universal quantum thermodynamical frameworks
Authors:
Luis Rodrigo Neves,
Frederico Brito
Abstract:
The quest to develop a general framework for thermodynamics, suitable for the regime of strong coupling and correlations between subsystems of an autonomous quantum "universe," has entailed diverging definitions for basic quantities, including internal energy. While most approaches focus solely on the system of interest, we propose that a universal notion of internal energy should also account for…
▽ More
The quest to develop a general framework for thermodynamics, suitable for the regime of strong coupling and correlations between subsystems of an autonomous quantum "universe," has entailed diverging definitions for basic quantities, including internal energy. While most approaches focus solely on the system of interest, we propose that a universal notion of internal energy should also account for the environment in order to keep consistency with the closed-system energy of the universe. We introduce an abstract framework to describe all effective Hamiltonian-based approaches and address a rigorous definition of energy additivity in this context, in both a weak and a strong forms, discussing the underlying subtleties. As an illustration, we study a particular two-qubit universe model, obtaining the exact master equations for both parties and calculating their effective Hamiltonians and internal energies as given by the recently devised minimal dissipation approach. In this case, we show that internal energies are neither additive nor conservative, which leads to unphysical features.
△ Less
Submitted 26 June, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
Authors:
Agostina Calabrese,
Leonardo Neves,
Neil Shah,
Maarten W. Bos,
Björn Ross,
Mirella Lapata,
Francesco Barbieri
Abstract:
Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improv…
▽ More
Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
USE: Dynamic User Modeling with Stateful Sequence Models
Authors:
Zhihan Zhou,
Qixiang Fang,
Leonardo Neves,
Francesco Barbieri,
Yozen Liu,
Han Liu,
Maarten W. Bos,
Ron Dotsch
Abstract:
User embeddings play a crucial role in user engagement forecasting and personalized services. Recent advances in sequence modeling have sparked interest in learning user embeddings from behavioral data. Yet behavior-based user embedding learning faces the unique challenge of dynamic user modeling. As users continuously interact with the apps, user embeddings should be periodically updated to accou…
▽ More
User embeddings play a crucial role in user engagement forecasting and personalized services. Recent advances in sequence modeling have sparked interest in learning user embeddings from behavioral data. Yet behavior-based user embedding learning faces the unique challenge of dynamic user modeling. As users continuously interact with the apps, user embeddings should be periodically updated to account for users' recent and long-term behavior patterns. Existing methods highly rely on stateless sequence models that lack memory of historical behavior. They have to either discard historical data and use only the most recent data or reprocess the old and new data jointly. Both cases incur substantial computational overhead. To address this limitation, we introduce User Stateful Embedding (USE). USE generates user embeddings and reflects users' evolving behaviors without the need for exhaustive reprocessing by storing previous model states and revisiting them in the future. Furthermore, we introduce a novel training objective named future W-behavior prediction to transcend the limitations of next-token prediction by forecasting a broader horizon of upcoming user behaviors. By combining it with the Same User Prediction, a contrastive learning-based objective that predicts whether different segments of behavior sequences belong to the same user, we further improve the embeddings' distinctiveness and representativeness. We conducted experiments on 8 downstream tasks using Snapchat users' behavioral logs in both static (i.e., fixed user behavior sequences) and dynamic (i.e., periodically updated user behavior sequences) settings. We demonstrate USE's superior performance over established baselines. The results underscore USE's effectiveness and efficiency in integrating historical and recent user behavior sequences into user embeddings in dynamic user modeling.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Elevating Software Quality in Agile Environments: The Role of Testing Professionals in Unit Testing
Authors:
Lucas Neves,
Oscar Campos,
Robson Santos,
Italo Santos,
Cleyton Magalhaes,
Ronnie de Souza Santos
Abstract:
Testing is an essential quality activity in the software development process. Usually, a software system is tested on several levels, starting with unit testing that checks the smallest parts of the code until acceptance testing, which is focused on the validations with the end-user. Historically, unit testing has been the domain of developers, who are responsible for ensuring the accuracy of thei…
▽ More
Testing is an essential quality activity in the software development process. Usually, a software system is tested on several levels, starting with unit testing that checks the smallest parts of the code until acceptance testing, which is focused on the validations with the end-user. Historically, unit testing has been the domain of developers, who are responsible for ensuring the accuracy of their code. However, in agile environments, testing professionals play an integral role in various quality improvement initiatives throughout each development cycle. This paper explores the participation of test engineers in unit testing within an industrial context, employing a survey-based research methodology. Our findings demonstrate that testing professionals have the potential to strengthen unit testing by collaborating with developers to craft thorough test cases and fostering a culture of mutual learning and cooperation, ultimately contributing to increasing the overall quality of software projects.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study
Authors:
Qixiang Fang,
Zhihan Zhou,
Francesco Barbieri,
Yozen Liu,
Leonardo Neves,
Dong Nguyen,
Daniel L. Oberski,
Maarten W. Bos,
Ron Dotsch
Abstract:
Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach. It benefits from easily available, privacy-friendly yet expressive data, and does not require extensive re-tuning of the upstream user model for different downstream tasks. While this approach has shown promise in search engines and e-commerce applications, its fit for ins…
▽ More
Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach. It benefits from easily available, privacy-friendly yet expressive data, and does not require extensive re-tuning of the upstream user model for different downstream tasks. While this approach has shown promise in search engines and e-commerce applications, its fit for instant messaging platforms, a cornerstone of modern digital communication, remains largely uncharted. We explore this research gap using Snapchat data as a case study. Specifically, we implement a Transformer-based user model with customized training objectives and show that the model can produce high-quality user representations across a broad range of evaluation tasks, among which we introduce three new downstream tasks that concern pivotal topics in user research: user safety, engagement and churn. We also tackle the challenge of efficient extrapolation of long sequences at inference time, by applying a novel positional encoding method.
△ Less
Submitted 25 July, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Authors:
Dimosthenis Antypas,
Asahi Ushio,
Francesco Barbieri,
Leonardo Neves,
Kiamehr Rezaee,
Luis Espinosa-Anke,
Jiaxin Pei,
Jose Camacho-Collados
Abstract:
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks. This fragmented landscape makes it hard for the community to know, for instance, given a task, which is the best performing model and how it compares with others. To alleviate this issue, we introduce a unified benchmark for NLP evaluation in social media, SuperTweet…
▽ More
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks. This fragmented landscape makes it hard for the community to know, for instance, given a task, which is the best performing model and how it compares with others. To alleviate this issue, we introduce a unified benchmark for NLP evaluation in social media, SuperTweetEval, which includes a heterogeneous set of tasks and datasets combined, adapted and constructed from scratch. We benchmarked the performance of a wide range of models on SuperTweetEval and our results suggest that, despite the recent advances in language modelling, social media remains challenging.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Context-aware Adversarial Attack on Named Entity Recognition
Authors:
Shuguang Chen,
Leonardo Neves,
Thamar Solorio
Abstract:
In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model's robustness.…
▽ More
In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks. Despite their success, prior studies have shown that PLMs are vulnerable to attacks from adversarial examples. In this work, we focus on the named entity recognition task and study context-aware adversarial attack methods to examine the model's robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples and investigate different candidate replacement methods to generate natural and plausible adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
△ Less
Submitted 2 February, 2024; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Tweet Insights: A Visualization Platform to Extract Temporal Insights from Twitter
Authors:
Daniel Loureiro,
Kiamehr Rezaee,
Talayeh Riahi,
Francesco Barbieri,
Leonardo Neves,
Luis Espinosa Anke,
Jose Camacho-Collados
Abstract:
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models. This data comprises the past five years and captures changes in n-gram frequency, similarity, sentiment and topic distribution. The interface built on top of this data enables temporal analysis for detecting and charact…
▽ More
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models. This data comprises the past five years and captures changes in n-gram frequency, similarity, sentiment and topic distribution. The interface built on top of this data enables temporal analysis for detecting and characterizing shifts in meaning, including complementary information to trending metrics, such as sentiment and topic association over time. We release an online demo for easy experimentation, and we share code and the underlying aggregated data for future work. In this paper, we also discuss three case studies unlocked thanks to our platform, showcasing its potential for temporal linguistic analysis.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition
Authors:
Shuguang Chen,
Leonardo Neves,
Thamar Solorio
Abstract:
In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios. We propose a new method to effectively transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes to generate synt…
▽ More
In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios. We propose a new method to effectively transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes to generate synthetic data for training. Moreover, we design a constrained decoding algorithm along with a set of key ingredients for data selection to guarantee the generation of valid and coherent data. Experiments and analysis on five different domain pairs under different data regimes demonstrate that our approach can significantly improve results compared to current state-of-the-art data augmentation methods. Our approach is a practical solution to data scarcity, and we expect it to be applicable to other NLP tasks.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Authors:
Asahi Ushio,
Leonardo Neves,
Vitor Silva,
Francesco Barbieri,
Jose Camacho-Collados
Abstract:
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER…
▽ More
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it.
△ Less
Submitted 15 November, 2022; v1 submitted 7 October, 2022;
originally announced October 2022.
-
SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Authors:
Jiaxin Pei,
Vítor Silva,
Maarten Bos,
Yozon Liu,
Leonardo Neves,
David Jurgens,
Francesco Barbieri
Abstract:
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/u…
▽ More
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy).
△ Less
Submitted 3 February, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Twitter Topic Classification
Authors:
Dimosthenis Antypas,
Asahi Ushio,
Jose Camacho-Collados,
Leonardo Neves,
Vítor Silva,
Francesco Barbieri
Abstract:
Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on t…
▽ More
Social media platforms host discussions about a wide variety of topics that arise everyday. Making sense of all the content and organising it into categories is an arduous task. A common way to deal with this issue is relying on topic modeling, but topics discovered using this technique are difficult to interpret and can differ from corpus to corpus. In this paper, we present a new task based on tweet topic classification and release two associated datasets. Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data from recent time periods that can be used to evaluate tweet classification models. Moreover, we perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task, which provide more insights on the challenges and nature of the task.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
TempoWiC: An Evaluation Benchmark for Detecting Meaning Shift in Social Media
Authors:
Daniel Loureiro,
Aminette D'Souza,
Areej Nasser Muhajab,
Isabella A. White,
Gabriel Wong,
Luis Espinosa Anke,
Leonardo Neves,
Francesco Barbieri,
Jose Camacho-Collados
Abstract:
Language evolves over time, and word meaning changes accordingly. This is especially true in social media, since its dynamic nature leads to faster semantic shifts, making it challenging for NLP models to deal with new content and trends. However, the number of datasets and models that specifically address the dynamic nature of these social platforms is scarce. To bridge this gap, we present Tempo…
▽ More
Language evolves over time, and word meaning changes accordingly. This is especially true in social media, since its dynamic nature leads to faster semantic shifts, making it challenging for NLP models to deal with new content and trends. However, the number of datasets and models that specifically address the dynamic nature of these social platforms is scarce. To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift. Our results show that TempoWiC is a challenging benchmark, even for recently-released language models specialized in social media.
△ Less
Submitted 16 September, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
TweetNLP: Cutting-Edge Natural Language Processing for Social Media
Authors:
Jose Camacho-Collados,
Kiamehr Rezaee,
Talayeh Riahi,
Asahi Ushio,
Daniel Loureiro,
Dimosthenis Antypas,
Joanne Boisson,
Luis Espinosa-Anke,
Fangyu Liu,
Eugenio Martínez-Cámara,
Gonzalo Medina,
Thomas Buhrmann,
Leonardo Neves,
Francesco Barbieri
Abstract:
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-siz…
▽ More
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-sized Transformer-based language models specialized on social media text (in particular, Twitter) which can be run without the need for dedicated hardware or cloud services. The main contributions of TweetNLP are: (1) an integrated Python library for a modern toolkit supporting social media analysis using our various task-specific models adapted to the social domain; (2) an interactive online demo for codeless experimentation using our models; and (3) a tutorial covering a wide variety of typical social media applications.
△ Less
Submitted 25 October, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
A constraint on local definitions of quantum internal energy
Authors:
Luis Rodrigo Torres Neves,
Frederico Brito
Abstract:
Recent advances in quantum thermodynamics have been focusing on ever more elementary systems of interest, approaching the limit of a single qubit, with correlations, strong coupling and non-equilibrium environments coming into play. Under such scenarios, it is clear that fundamental physical quantities must be revisited. This article questions whether a universal definition of internal energy for…
▽ More
Recent advances in quantum thermodynamics have been focusing on ever more elementary systems of interest, approaching the limit of a single qubit, with correlations, strong coupling and non-equilibrium environments coming into play. Under such scenarios, it is clear that fundamental physical quantities must be revisited. This article questions whether a universal definition of internal energy for open quantum systems may be devised, setting limits on its possible properties. We argue that, for such a definition to be regarded as local, it should be implemented as a functional of the open system's reduced density operator and its time derivatives. Then we show that it should involve at least up to the second-order derivative, otherwise failing to recover the previously-known internal energy of the "universe". Possible implications of this general result are discussed.
△ Less
Submitted 14 October, 2023; v1 submitted 9 May, 2022;
originally announced May 2022.
-
TimeLMs: Diachronic Language Models from Twitter
Authors:
Daniel Loureiro,
Francesco Barbieri,
Leonardo Neves,
Luis Espinosa Anke,
Jose Camacho-Collados
Abstract:
Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive…
▽ More
Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.
△ Less
Submitted 1 April, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Enhanced discrimination of high-dimensional quantum states by concatenated optimal measurement strategies
Authors:
M. A. Solís-Prosser,
O. Jiménez,
A. Delgado,
L. Neves
Abstract:
The impossibility of deterministic and error-free discrimination among nonorthogonal quantum states lies at the core of quantum theory and constitutes a primitive for secure quantum communication. Demanding determinism leads to errors, while demanding certainty leads to some inconclusiveness. One of the most fundamental strategies developed for this task is the optimal unambiguous measurement. It…
▽ More
The impossibility of deterministic and error-free discrimination among nonorthogonal quantum states lies at the core of quantum theory and constitutes a primitive for secure quantum communication. Demanding determinism leads to errors, while demanding certainty leads to some inconclusiveness. One of the most fundamental strategies developed for this task is the optimal unambiguous measurement. It encompasses conclusive results, which allow for error-free state retrodictions with the maximum success probability, and inconclusive results, which are discarded for not allowing perfect identifications. Interestingly, in high-dimensional Hilbert spaces the inconclusive results may contain valuable information about the input states. Here, we theoretically describe and experimentally demonstrate the discrimination of nonorthogonal states from both conclusive and inconclusive results in the optimal unambiguous strategy, by concatenating a minimum-error measurement at its inconclusive space. Our implementation comprises 4- and 9-dimensional spatially encoded photonic states. By accessing the inconclusive space to retrieve the information that is wasted in the conventional protocol, we achieve significant increases of up to a factor of 2.07 and 3.73, respectively, in the overall probabilities of correct retrodictions. The concept of concatenated optimal measurements demonstrated here can be extended to other strategies and will enable one to explore the full potential of high-dimensional nonorthogonal states for quantum communication with larger alphabets.
△ Less
Submitted 18 December, 2021;
originally announced December 2021.
-
Data Augmentation for Cross-Domain Named Entity Recognition
Authors:
Shuguang Chen,
Gustavo Aguilar,
Leonardo Neves,
Thamar Solorio
Abstract:
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limited. In contrast, we study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains…
▽ More
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limited. In contrast, we study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp
Authors:
Shuguang Chen,
Leonardo Neves,
Thamar Solorio
Abstract:
Performance of neural models for named entity recognition degrades over time, becoming stale. This degradation is due to temporal drift, the change in our target variables' statistical properties over time. This issue is especially problematic for social media data, where topics change rapidly. In order to mitigate the problem, data annotation and retraining of models is common. Despite its useful…
▽ More
Performance of neural models for named entity recognition degrades over time, becoming stale. This degradation is due to temporal drift, the change in our target variables' statistical properties over time. This issue is especially problematic for social media data, where topics change rapidly. In order to mitigate the problem, data annotation and retraining of models is common. Despite its usefulness, this process is expensive and time-consuming, which motivates new research on efficient model updating. In this paper, we propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training. We conduct experiments on three state-of-the-art models on the Temporal Twitter Dataset. Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an attractive, practical solution.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks
Authors:
Brihi Joshi,
Neil Shah,
Francesco Barbieri,
Leonardo Neves
Abstract:
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze…
▽ More
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze such models' performance on tasks that require a more granular level of representation. We focus on the problem of textual similarity from two perspectives: matching documents on a granular level (requiring embeddings to capture fine-grained attributes in the text), and an abstract level (requiring embeddings to capture overall textual semantics). We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks. We then propose a simple but effective method to incorporate TF-IDF into models that use contextual embeddings, achieving relative improvements of up to 36% on granular tasks.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Ptychographic reconstruction of pure quantum states
Authors:
M. F. Fernandes,
M. A. Solís-Prosser,
L. Neves
Abstract:
The quantum analogue of ptychography, a powerful coherent diffractive imaging technique, is a simple method for reconstructing $d$-dimensional pure states. It relies on measuring partially overlapping parts of the input state in a single orthonormal basis and feeding the outcomes to an iterative phase-retrieval algorithm for postprocessing. We provide a proof of concept demonstration of this metho…
▽ More
The quantum analogue of ptychography, a powerful coherent diffractive imaging technique, is a simple method for reconstructing $d$-dimensional pure states. It relies on measuring partially overlapping parts of the input state in a single orthonormal basis and feeding the outcomes to an iterative phase-retrieval algorithm for postprocessing. We provide a proof of concept demonstration of this method by determining pure states given by superpositions of $d$ transverse spatial modes of an optical field. A set of $n$ rank-$r$ projectors, diagonal in the spatial mode basis, is used to generate $n$ partially overlapping parts of the input and each part is projectively measured in the Fourier transformed basis. For $d$ up to 32, we successfully reconstructed hundreds of random states using $n=5$ and $n=d$ rank-$\lceil d/2\rceil$ projectors. The extension of quantum ptychography for other types of photonic spatial modes is outlined.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
On Transferability of Bias Mitigation Effects in Language Model Fine-Tuning
Authors:
Xisen Jin,
Francesco Barbieri,
Brendan Kennedy,
Aida Mostafazadeh Davani,
Leonardo Neves,
Xiang Ren
Abstract:
Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution. Previous works focus on detecting these biases, reducing bias in data representations, and using auxiliary training objectives to mitigate bias during fine-tuning. Although these techniques achieve bias reduction for the task and…
▽ More
Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution. Previous works focus on detecting these biases, reducing bias in data representations, and using auxiliary training objectives to mitigate bias during fine-tuning. Although these techniques achieve bias reduction for the task and domain at hand, the effects of bias mitigation may not directly transfer to new tasks, requiring additional data collection and customized annotation of sensitive attributes, and re-evaluation of appropriate fairness metrics. We explore the feasibility and benefits of upstream bias mitigation (UBM) for reducing bias on downstream tasks, by first applying bias mitigation to an upstream model through fine-tuning and subsequently using it for downstream fine-tuning. We find, in extensive experiments across hate speech detection, toxicity detection, occupation prediction, and coreference resolution tasks over various bias factors, that the effects of UBM are indeed transferable to new downstream tasks or domains via fine-tuning, creating less biased downstream models than directly fine-tuning on the downstream task or transferring from a vanilla upstream model. Though challenges remain, we show that UBM promises more efficient and accessible bias mitigation in LM fine-tuning.
△ Less
Submitted 11 April, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Can images help recognize entities? A study of the role of images for Multimodal NER
Authors:
Shuguang Chen,
Gustavo Aguilar,
Leonardo Neves,
Thamar Solorio
Abstract:
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to leverage multimodal interactions remains poorly understood. In this work, we conduct in-depth analyses of existing multimodal fusion techniques from differ…
▽ More
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to leverage multimodal interactions remains poorly understood. In this work, we conduct in-depth analyses of existing multimodal fusion techniques from different perspectives and describe the scenarios where adding information from the image does not always boost performance. We also study the use of captions as a way to enrich the context for MNER. Experiments on three datasets from popular social platforms expose the bottleneck of existing multimodal models and the situations where using captions is beneficial.
△ Less
Submitted 19 September, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
Authors:
Francesco Barbieri,
Jose Camacho-Collados,
Leonardo Neves,
Luis Espinosa-Anke
Abstract:
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such dom…
▽ More
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.
△ Less
Submitted 26 October, 2020; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Data Augmentation for Graph Neural Networks
Authors:
Tong Zhao,
Yozen Liu,
Leonardo Neves,
Oliver Woodford,
Meng Jiang,
Neil Shah
Abstract:
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph da…
▽ More
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
△ Less
Submitted 2 December, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
A simple individual-based population growth model with limited resources
Authors:
Luis R. T. Neves,
Leonardo Paulo Maia
Abstract:
We address a novel approach for stochastic individual-based modelling of a single species population. Individuals are distinguished by their remaining lifetimes, which are regulated by the interplay between the inexorable running of time and the individual's nourishment history. A food-limited environment induces intraspecific competition and henceforth the carrying capacity of the medium may be f…
▽ More
We address a novel approach for stochastic individual-based modelling of a single species population. Individuals are distinguished by their remaining lifetimes, which are regulated by the interplay between the inexorable running of time and the individual's nourishment history. A food-limited environment induces intraspecific competition and henceforth the carrying capacity of the medium may be finite, often emulating the qualitative features of logistic growth. Inherently non-logistic behavior is also obtained by suitable change of the few parameters involved, composing a wide variety of dynamical features. Some analytical results are obtained. Beyond the rich phenomenology observed, we expect that possible modifications of our model may account for an even broader scope of collective population growth phenomena.
△ Less
Submitted 30 November, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation
Authors:
Dong-Ho Lee,
Rahul Khanna,
Bill Yuchen Lin,
Jamin Chen,
Seyeon Lee,
Qinyuan Ye,
Elizabeth Boschee,
Leonardo Neves,
Xiang Ren
Abstract:
Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only…
▽ More
Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
3D compact photonic circuits for realizing quantum state tomography of qudits in any finite dimension
Authors:
Wilder Cardoso,
Davi Barros,
Leonardo Neves,
Sebastião Pádua
Abstract:
In this work, we propose three-dimensional photonic circuit designs that guarantee a considerable reduction in the complexity of circuits for the purpose of performing quantum state tomography of N-dimensional path qudits. The POVM (Positive Operator-Valued Measure) chosen in this work ensures that, for odd dimensions, such process is minimal. Our proposal consists of organizing the waveguides tha…
▽ More
In this work, we propose three-dimensional photonic circuit designs that guarantee a considerable reduction in the complexity of circuits for the purpose of performing quantum state tomography of N-dimensional path qudits. The POVM (Positive Operator-Valued Measure) chosen in this work ensures that, for odd dimensions, such process is minimal. Our proposal consists of organizing the waveguides that form the circuit as a square array formed by N vertical sectors composed of N waveguides each, arranged in the vertical direction. Based on the symmetry of the chosen POVM, the interferometer acting on the initial quantum system can be divided into a sequence of three different unitary operations. These operations act independently on each vertical sector, or layer, of the circuit, which simplifies their determination. We have thus obtained circuits such that the number of beam splitters obeys a polynomial function of degree 3 with the quantum system dimension, whereas in current proposals this quantity grows with a polynomial function of degree 4. Besides that, the optical depth is reduced from a quadratic to a linear function of the quantum system dimension in our scheme. These results confirm the remarkable reduction of the complexity of the photonic circuits in our proposal.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Photonic Discrete-time Quantum Walks and Applications
Authors:
Leonardo Neves,
Graciana Puentes
Abstract:
We present a review of photonic implementations of discrete-time quantum walks (DTQW) in the spatial and temporal domains, based on spatial- and time-multiplexing techniques, respectively. Additionally, we propose a detailed novel scheme for photonic DTQW, using transverse spatial modes of single photons and programmable spatial light modulators (SLM) to manipulate them. Unlike all previous mode-m…
▽ More
We present a review of photonic implementations of discrete-time quantum walks (DTQW) in the spatial and temporal domains, based on spatial- and time-multiplexing techniques, respectively. Additionally, we propose a detailed novel scheme for photonic DTQW, using transverse spatial modes of single photons and programmable spatial light modulators (SLM) to manipulate them. Unlike all previous mode-multiplexed implementations, this scheme enables simulation of an arbitrary step of the walker, only limited, in principle, by the SLM resolution. We discuss current applications of such photonic DTQW architectures in quantum simulation of topological effects and the use of non-local coin operations based on two-photon hybrid entanglement.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Learning from Explanations with Neural Execution Tree
Authors:
Ziqi Wang,
Yujia Qin,
Wenxuan Zhou,
Jun Yan,
Qinyuan Ye,
Leonardo Neves,
Zhiyuan Liu,
Xiang Ren
Abstract:
While deep neural networks have achieved impressive performance on a range of NLP tasks, these data-hungry models heavily rely on labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data…
▽ More
While deep neural networks have achieved impressive performance on a range of NLP tasks, these data-hungry models heavily rely on labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data over new instances, while the annotation time only doubles. However, directly applying them for augmenting model learning encounters two challenges: (1) NL explanations are unstructured and inherently compositional, which asks for a modularized model to represent their semantics, (2) NL explanations often have large numbers of linguistic variants, resulting in low recall and limited generalization ability. In this paper, we propose a novel Neural Execution Tree (NExT) framework to augment training data for text classification using NL explanations. After transforming NL explanations into executable logical forms by semantic parsing, NExT generalizes different types of actions specified by the logical forms for labeling data instances, which substantially increases the coverage of each NL explanation. Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort.
△ Less
Submitted 14 February, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction
Authors:
Wenxuan Zhou,
Hongtao Lin,
Bill Yuchen Lin,
Ziqi Wang,
Junyi Du,
Leonardo Neves,
Xiang Ren
Abstract:
Deep neural models for relation extraction tend to be less reliable when perfectly labeled data is limited, despite their success in label-sufficient scenarios. Instead of seeking more instance-level labels from human annotators, here we propose to annotate frequent surface patterns to form labeling rules. These rules can be automatically mined from large text corpora and generalized via a soft ru…
▽ More
Deep neural models for relation extraction tend to be less reliable when perfectly labeled data is limited, despite their success in label-sufficient scenarios. Instead of seeking more instance-level labels from human annotators, here we propose to annotate frequent surface patterns to form labeling rules. These rules can be automatically mined from large text corpora and generalized via a soft rule matching mechanism. Prior works use labeling rules in an exact matching fashion, which inherently limits the coverage of sentence matching and results in the low-recall issue. In this paper, we present a neural approach to ground rules for RE, named NERO, which jointly learns a relation extraction module and a soft matching module. One can employ any neural relation extraction models as the instantiation for the RE module. The soft matching module learns to match rules with semantically similar sentences such that raw corpora can be automatically labeled and leveraged by the RE module (in a much better coverage) as augmented supervision, in addition to the exactly matched sentences. Extensive experiments and analysis on two public and widely-used datasets demonstrate the effectiveness of the proposed NERO framework, comparing with both rule-based and semi-supervised methods. Through user studies, we find that the time efficiency for a human to annotate rules and sentences are similar (0.30 vs. 0.35 min per label). In particular, NERO's performance using 270 rules is comparable to the models trained using 3,000 labeled sentences, yielding a 9.5x speedup. Moreover, NERO can predict for unseen relations at test time and provide interpretable predictions. We release our code to the community for future research.
△ Less
Submitted 15 January, 2020; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering
Authors:
Lahari Poddar,
Leonardo Neves,
William Brendel,
Luis Marujo,
Sergey Tulyakov,
Pradeep Karuturi
Abstract:
Tracking user reported bugs requires considerable engineering effort in going through many repetitive reports and assigning them to the correct teams. This paper proposes a neural architecture that can jointly (1) detect if two bug reports are duplicates, and (2) aggregate them into latent topics. Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we…
▽ More
Tracking user reported bugs requires considerable engineering effort in going through many repetitive reports and assigning them to the correct teams. This paper proposes a neural architecture that can jointly (1) detect if two bug reports are duplicates, and (2) aggregate them into latent topics. Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we design a loss function that can jointly perform both tasks but needs supervision for only duplicate classification, achieving topic clustering in an unsupervised fashion. We use a two-step attention module that uses self-attention for topic clustering and conditional attention for duplicate detection. We study the characteristics of two types of real world datasets that have been marked for duplicate bugs by engineers and by non-technical annotators. The results demonstrate that our model not only can outperform state-of-the-art methods for duplicate classification on both cases, but can also learn meaningful latent clusters without additional supervision.
△ Less
Submitted 3 April, 2019; v1 submitted 29 March, 2019;
originally announced March 2019.
-
Ptychography of pure quantum states
Authors:
Mário Foganholi Fernandes,
Leonardo Neves
Abstract:
Ptychography is an imaging technique in which a localized illumination scans overlapping regions of an object and generates a set of diffraction intensities used to computationally reconstruct its complex-valued transmission function. We propose a quantum analogue of this technique designed to reconstruct $d$-dimensional pure states. A set of $n$ rank-$r$ projectors "scans" overlapping parts of an…
▽ More
Ptychography is an imaging technique in which a localized illumination scans overlapping regions of an object and generates a set of diffraction intensities used to computationally reconstruct its complex-valued transmission function. We propose a quantum analogue of this technique designed to reconstruct $d$-dimensional pure states. A set of $n$ rank-$r$ projectors "scans" overlapping parts of an input state and the moduli of the $d$ Fourier amplitudes of each part are measured. These $nd$ outcomes are fed into an iterative phase retrieval algorithm that estimates the state. Using $d$ up to 100 and $r$ around $d/2$, we performed numerical simulations for single systems in an economic ($n=4$) and a costly ($n=d$) scenario, as well as for multiqubit systems ($n=6\log d$). This numeric study included realistic amounts of depolarization and poissonian noise, and all scenarios yielded, in general, reconstructions with infidelities below $10^{-2}$. The method is shown, therefore, to be resilient to noise and, for any $d$, requires a simple and fast postprocessing algorithm. We show that the algorithm is equivalent to an alternating gradient search, which ensures that it does not suffer from local-minima stagnation. Unlike traditional approaches to state reconstruction, the ptychographic scheme uses a single measurement basis; the diversity and redundancy in the measured data---key for its success---are provided by the overlapping projections. We illustrate the simplicity of this scheme with the paradigmatic multiport interferometer.
△ Less
Submitted 16 December, 2019; v1 submitted 29 December, 2018;
originally announced December 2018.
-
Multimodal Named Entity Recognition for Short Social Media Posts
Authors:
Seungwhan Moon,
Leonardo Neves,
Vitor Carvalho
Abstract:
We introduce a new task called Multimodal Named Entity Recognition (MNER) for noisy user-generated data such as tweets or Snapchat captions, which comprise short text with accompanying images. These social media posts often come in inconsistent or incomplete syntax and lexical notations with very limited surrounding textual contexts, bringing significant challenges for NER. To this end, we create…
▽ More
We introduce a new task called Multimodal Named Entity Recognition (MNER) for noisy user-generated data such as tweets or Snapchat captions, which comprise short text with accompanying images. These social media posts often come in inconsistent or incomplete syntax and lexical notations with very limited surrounding textual contexts, bringing significant challenges for NER. To this end, we create a new dataset for MNER called SnapCaptions (Snapchat image-caption pairs submitted to public and crowd-sourced stories with fully annotated named entities). We then build upon the state-of-the-art Bi-LSTM word/character based NER models with 1) a deep image network which incorporates relevant visual context to augment textual information, and 2) a generic modality-attention module which learns to attenuate irrelevant modalities while amplifying the most informative ones to extract contexts from, adaptive to each sample and token. The proposed MNER model with modality attention significantly outperforms the state-of-the-art text-only NER models by successfully leveraging provided visual contexts, opening up potential applications of MNER on myriads of social media platforms.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Visual Features for Context-Aware Speech Recognition
Authors:
Abhinav Gupta,
Yajie Miao,
Leonardo Neves,
Florian Metze
Abstract:
Automatic transcriptions of consumer-generated multi-media content such as "Youtube" videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited. In this paper, we extend our earlier work on adapting the acoustic model of…
▽ More
Automatic transcriptions of consumer-generated multi-media content such as "Youtube" videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited. In this paper, we extend our earlier work on adapting the acoustic model of a DNN-based speech recognition system to an RNN language model and show how both can be adapted to the objects and scenes that can be automatically detected in the video. We are working on a corpus of "how-to" videos from the web, and the idea is that an object that can be seen ("car"), or a scene that is being detected ("kitchen") can be used to condition both models on the "context" of the recording, thereby reducing perplexity and improving transcription. We achieve good improvements in both cases and compare and analyze the respective reductions in word error rate. We expect that our results can be used for any type of speech processing in which "context" information is available, for example in robotics, man-machine interaction, or when indexing large audio-visual archives, and should ultimately help to bring together the "video-to-text" and "speech-to-text" communities.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
A Novel Metamaterial-Inspired RF-coil for Preclinical Dual-Nuclei MRI
Authors:
A. Hurshkainen,
A. Nikulin,
E. Georget,
B. Larrat,
D. Berrahou,
L. Neves,
P. Sabouroux,
S. Enoch,
I. Melchakova,
P. Belov,
S. Glybovski,
R. Abdeddaim
Abstract:
In this paper we propose, design and test a new dual-nuclei RF-coil inspired by wire metamaterial structures. The coil operates due to resonant excitation of hybridized eigenmodes in multimode flat periodic structures comprising several coupled thin metal strips. It was shown that the field distribution of the coil (i.e. penetration depth) can be controlled independently at two different Larmor fr…
▽ More
In this paper we propose, design and test a new dual-nuclei RF-coil inspired by wire metamaterial structures. The coil operates due to resonant excitation of hybridized eigenmodes in multimode flat periodic structures comprising several coupled thin metal strips. It was shown that the field distribution of the coil (i.e. penetration depth) can be controlled independently at two different Larmor frequencies by selecting a proper eigenmode in each of two mutually orthogonal periodic structures. The proposed coil requires no lumped capacitors for tuning and matching. In order to demonstrate the performance of the new design, an experimental preclinical coil for $^{19}$F/$^{1}$H imaging of small animals at 7.05T was engineered and tested on a homogeneous liquid phantom and in-vivo. The presented results demonstrate that the coil was well tuned and matched simultaneously at two Larmor frequencies and capable of image acquisition with both the nuclei reaching large homogeneity area along with a sufficient signal-to-noise ratio. In an in-vivo experiment it has been shown that without retuning the setup it was possible to obtain anatomical $^{1}$H images of a mouse under anesthesia consecutively with $^{19}$F images of a tiny tube filled with a fluorine-containing liquid and attached to the body of the mouse.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.
-
Nonradial solutions for the Hénon equation close to the threshold
Authors:
Pablo Figueroa,
Sérgio L. N. Neves
Abstract:
We consider the Hénon problem \begin{equation*} \left\{ \begin{array} - - Δu = |x|^α u^{\frac{N+2+2α}{N-2}-\varepsilon} & \ \ \text{in} \ B_1, \\ u > 0 & \ \ \text{in} \ B_1, \\ u=0 & \ \ \text{on} \ \partial B_1, \end{array} \right. \end{equation*} where $B_1$ is the unit ball in ${\mathbb R}^N$ and $N\geqslant 3$. For $\varepsilon > 0$ small enough, we use $α$ as a paramenter and prove the exist…
▽ More
We consider the Hénon problem \begin{equation*} \left\{ \begin{array} - - Δu = |x|^α u^{\frac{N+2+2α}{N-2}-\varepsilon} & \ \ \text{in} \ B_1, \\ u > 0 & \ \ \text{in} \ B_1, \\ u=0 & \ \ \text{on} \ \partial B_1, \end{array} \right. \end{equation*} where $B_1$ is the unit ball in ${\mathbb R}^N$ and $N\geqslant 3$. For $\varepsilon > 0$ small enough, we use $α$ as a paramenter and prove the existence of a branch of nonradial solutions that bifurcates from the radial one when $α$ is close to an even positive integer.
△ Less
Submitted 29 September, 2017; v1 submitted 30 August, 2017;
originally announced August 2017.
-
Proposal for Automated Operations for Single-Photon Multipath Qudits
Authors:
Roberto D. Baldijão,
Gilberto F. Borges,
Breno Marques,
Miguel Solís-prosser,
Leonardo Neves,
Sebastião Pádua
Abstract:
We propose a method for implementing automated state transformations on single-photon multipath qudits encoded in a one-dimensional transverse spatial domain. It relies on transferring the encoding from this domain to the orthogonal one by applying a spatial phase modulation with diffraction gratings, merging all the initial propagation paths with a stable interferometric network, and filtering ou…
▽ More
We propose a method for implementing automated state transformations on single-photon multipath qudits encoded in a one-dimensional transverse spatial domain. It relies on transferring the encoding from this domain to the orthogonal one by applying a spatial phase modulation with diffraction gratings, merging all the initial propagation paths with a stable interferometric network, and filtering out the unwanted diffraction orders. The automated feature is attained by utilizing a programmable phase-only spatial light modulator (SLM) where properly designed diffraction gratings displayed on its screen will implement the desired transformations, including, among others, projections, permutations and random operations. We discuss the losses in the process which is, in general, inherently nonunitary. Some examples of transformations are presented and, considering a realistic scenario, we analyse how they will be affected by the pixelated structure of the SLM screen. The method proposed here enables one to implement much more general transformations on multipath qudits than it is possible with an SLM alone operating in the diagonal basis of which-path states. Therefore, it will extend the range of applicability for this encoding in high-dimensional quantum information and computing protocols as well as fundamental studies in quantum theory.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
Experimental minimum-error quantum-state discrimination in high dimensions
Authors:
M. A. Solís-Prosser,
M. F. Fernandes,
O. Jiménez,
A. Delgado,
L. Neves
Abstract:
Quantum mechanics forbids perfect discrimination among nonorthogonal states through a single shot measurement. To optimize this task, many strategies were devised that later became fundamental tools for quantum information processing. Here, we address the pioneering minimum-error (ME) measurement and give the first experimental demonstration of its application for discriminating nonorthogonal stat…
▽ More
Quantum mechanics forbids perfect discrimination among nonorthogonal states through a single shot measurement. To optimize this task, many strategies were devised that later became fundamental tools for quantum information processing. Here, we address the pioneering minimum-error (ME) measurement and give the first experimental demonstration of its application for discriminating nonorthogonal states in high dimensions. Our scheme is designed to distinguish symmetric pure states encoded in the transverse spatial modes of an optical field; the optimal measurement is performed by a projection onto the Fourier transform basis of these modes. For dimensions ranging from D = 2 to D = 21 and nearly 14000 states tested, the deviations of the experimental results from the theoretical values range from 0.3% to 3.6% (getting below 2% for the vast majority), thus showing the excellent performance of our scheme. This ME measurement is a building block for high-dimensional implementations of many quantum communication protocols, including probabilistic state discrimination, dense coding with nonmaximal entanglement, and cryptographic schemes.
△ Less
Submitted 8 March, 2017;
originally announced March 2017.