-
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
Authors:
Brian Yan,
Xuankai Chang,
Antonios Anastasopoulos,
Yuya Fujita,
Shinji Watanabe
Abstract:
Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modal…
▽ More
Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modally. Our method reduces the speech-text modality gap via a pre-processing stage which converts speech and text inputs into two discrete token sequences of similar length -- this allows models to indiscriminately process both modalities simply using a joint vocabulary. With experiments on MuST-C, we demonstrate that our multi-tasking framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU without any external MT data. Further, we show that this framework incorporates external MT data, yielding +0.8 BLEU, and also improves transfer learning from pre-trained textual models, yielding +1.8 BLEU.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
Authors:
Amir Hussein,
Brian Yan,
Antonios Anastasopoulos,
Shinji Watanabe,
Sanjeev Khudanpur
Abstract:
Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied. To bridge this gap, we introduce target language context in E2E-ST, enhancing coherence and overcoming memory constraints of extended audio segments. Additionally, we propose context dropout to ensure robustness to the absence of…
▽ More
Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied. To bridge this gap, we introduce target language context in E2E-ST, enhancing coherence and overcoming memory constraints of extended audio segments. Additionally, we propose context dropout to ensure robustness to the absence of context, and further improve performance by adding speaker information. Our proposed contextual E2E-ST outperforms the isolated utterance-based E2E-ST approach. Lastly, we demonstrate that in conversational speech, contextual information primarily contributes to capturing context style, as well as resolving anaphora and named entities.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Authors:
Claytone Sikasote,
Kalinda Siaminwe,
Stanly Mwape,
Bangiwe Zulu,
Mofya Phiri,
Martin Phiri,
David Zulu,
Mayumbo Nyirenda,
Antonios Anastasopoulos
Abstract:
This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but…
▽ More
This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0 large-scale multilingual pre-trained model to build end-to-end (E2E) speech recognition models for our baseline models. The dataset is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be accessed via https://github.com/unza-speech-lab/zambezi-voice .
△ Less
Submitted 13 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Sequential Bayesian Learning with A Self-Interested Coordinator
Authors:
Xupeng Wei,
Achilleas Anastasopoulos
Abstract:
Social learning refers to the process by which networked strategic agents learn an unknown state of the world by observing private state-related signals as well as other agents' actions. In their classic work, Bikhchandani, Hirshleifer and Welch showed that information cascades occur in social learning, in which agents blindly follow others' behavior, and consequently, the actions in a cascade rev…
▽ More
Social learning refers to the process by which networked strategic agents learn an unknown state of the world by observing private state-related signals as well as other agents' actions. In their classic work, Bikhchandani, Hirshleifer and Welch showed that information cascades occur in social learning, in which agents blindly follow others' behavior, and consequently, the actions in a cascade reveal no further information about the state.
In this paper, we consider the introduction of an information coordinator to mitigate information cascades. The coordinator commits to a mechanism, which is a contract that agents may choose to accept or not. If an agent enters the mechanism, she pays a fee, and sends a message to the coordinator indicating her private signal (not necessarily truthfully). The coordinator, in turn, suggests an action to the agents according to his knowledge and interest. We study a class of mechanisms that possess properties such as individual rationality for agents (i.e., agents are willing to enter), truth telling, and profit maximization for the coordinator. We prove that the coordinator, without loss of optimality, can adopt a summary-based mechanism that depends on the complete observation history through an appropriate sufficient statistic. Furthermore, we show the existence of a mechanism which strictly improves social welfare, and results in strictly positive profit, so that such a mechanism is acceptable for both agents and the coordinator, and is beneficial to the agent community. Finally, we analyze the performance of this mechanism and show significant gains on both aforementioned metrics.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Autocorrelation and Spectrum Analysis for Variable Symbol Length Communications with Feedback
Authors:
Chin-Wei Hsu,
Hun-Seok Kim,
Achilleas Anastasopoulos
Abstract:
Variable-length feedback codes can provide advantages over fixed-length feedback or non-feedback codes. This letter focuses on uncoded variable-symbol-length feedback communication and analyzes the autocorrelation and spectrum of the signal. We provide a mathematical expression for the autocorrelation that can be evaluated numerically. We then numerically evaluate the autocorrelation and spectrum…
▽ More
Variable-length feedback codes can provide advantages over fixed-length feedback or non-feedback codes. This letter focuses on uncoded variable-symbol-length feedback communication and analyzes the autocorrelation and spectrum of the signal. We provide a mathematical expression for the autocorrelation that can be evaluated numerically. We then numerically evaluate the autocorrelation and spectrum for the variable-symbol-length signal in a feedback-based communication system that attains a target reliability for every symbol by adapting the symbol length to the noise realization. The analysis and numerical results show that the spectrum changes with SNR when the average symbol length is fixed, and approaches the fixed-length scheme at high SNR.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Joint Information and Mechanism Design for Queues with Heterogeneous Users
Authors:
Nasimeh Heydaribeni,
Achilleas Anastasopoulos
Abstract:
We consider a queue with an unobservable backlog by the incoming users. There is an information designer that observes the queue backlog and makes recommendations to the users arriving at the queue whether to join or not to join the queue. The arriving users have payoff relevant private types. The users, upon arrival, send a message, that is supposed to be their type, to the information designer i…
▽ More
We consider a queue with an unobservable backlog by the incoming users. There is an information designer that observes the queue backlog and makes recommendations to the users arriving at the queue whether to join or not to join the queue. The arriving users have payoff relevant private types. The users, upon arrival, send a message, that is supposed to be their type, to the information designer if they are willing to hear a recommendation. The information designer then creates a recommendation for that specific type of user. The users have to pay a tax in exchange for the information they receive. In this setting, the information designer has two types of commitments. The first commitment is the recommendation policy and the second commitment is the tax function. We combine mechanism design and information design to study a queuing system with heterogeneous users. In this setting, the information designer is a sender of the information in the information design aspect and a receiver in the mechanism design aspect of the model. We formulate an optimization problem that characterizes the solution of the joint design problem. We characterize the tax functions and provide structural results for the recommendation policy of the information designer.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Mechanism Design for Demand Management in Energy Communities
Authors:
Xupeng Wei,
Achilleas Anastasopoulos
Abstract:
We consider a demand management problem of an energy community, in which several users obtain energy from an external organization such as an energy company, and pay for the energy according to pre-specified prices that consist of a time-dependent price per unit of energy, as well as a separate price for peak demand. Since users' utilities are their private information, which they may not be willi…
▽ More
We consider a demand management problem of an energy community, in which several users obtain energy from an external organization such as an energy company, and pay for the energy according to pre-specified prices that consist of a time-dependent price per unit of energy, as well as a separate price for peak demand. Since users' utilities are their private information, which they may not be willing to share, a mediator, known as the planner, is introduced to help optimize the overall satisfaction of the community (total utility minus total payments) by mechanism design. A mechanism consists of a message space, a tax/subsidy and an allocation function for each user. Each user reports a message chosen from her own message space, and then receives some amount of energy determined by the allocation function and pays the tax specified by the tax function. A desirable mechanism induces a game, the Nash equilibria (NE) of which result in an allocation that coincides with the optimal allocation for the community.
As a starting point, we design a mechanism for the energy community with desirable properties such as full implementation, strong budget balance and individual rationality for both users and the planner. We then modify this baseline mechanism for communities where message exchanges are allowed only within neighborhoods, and consequently, the tax/subsidy and allocation functions of each user are only determined by the messages from her neighbors. All the desirable properties of the baseline mechanism are preserved in the distributed mechanism. Finally, we present a learning algorithm for the baseline mechanism, based on projected gradient descent, that is guaranteed to converge to the NE of the induced game.
△ Less
Submitted 15 June, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Structured Equilibria for Dynamic Games with Asymmetric Information and Dependent Types
Authors:
Nasimeh Heydaribeni,
Achilleas Anastasopoulos
Abstract:
We consider a dynamic game with asymmetric information where each player observes privately a noisy version of a (hidden) state of the world V, resulting in dependent private observations. We study structured perfect Bayesian equilibria that use private beliefs in their strategies as sufficient statistics for summarizing their observation history. The main difficulty in finding the appropriate suf…
▽ More
We consider a dynamic game with asymmetric information where each player observes privately a noisy version of a (hidden) state of the world V, resulting in dependent private observations. We study structured perfect Bayesian equilibria that use private beliefs in their strategies as sufficient statistics for summarizing their observation history. The main difficulty in finding the appropriate sufficient statistic (state) for the structured strategies arises from the fact that players need to construct (private) beliefs on other players' private beliefs on V, which in turn would imply that an infinite hierarchy of beliefs on beliefs needs to be constructed, rendering the problem unsolvable. We show that this is not the case: each player's belief on other players' beliefs on V can be characterized by her own belief on V and some appropriately defined public belief. We then specialize this setting to the case of a Linear Quadratic Gaussian (LQG) non-zero-sum game and we characterize linear structured PBE that can be found through a backward/forward algorithm akin to dynamic programming for the standard LQG control problem. Unlike the standard LQG problem, however, some of the required quantities for the Kalman filter are observation-dependent and thus cannot be evaluated off-line through a forward recursion.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Universal Phone Recognition with a Multilingual Allophone System
Authors:
Xinjian Li,
Siddharth Dalmia,
Juncheng Li,
Matthew Lee,
Patrick Littell,
Jiali Yao,
Antonios Anastasopoulos,
David R. Mortensen,
Graham Neubig,
Alan W Black,
Florian Metze
Abstract:
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This c…
▽ More
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This can lead to performance degradation when combining a variety of training languages, as identically annotated phonemes can actually correspond to several different underlying phonetic realizations. In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute in low-resource conditions. Additionally, because we are explicitly modeling language-independent phones, we can build a (nearly-)universal phone recognizer that, when combined with the PHOIBLE large, manually curated database of phone inventories, can be customized into 2,000 language dependent recognizers. Experiments on two low-resourced indigenous languages, Inuktitut and Tusom, show that our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Linear Equilibria for Dynamic LQG Games with Asymmetric Information and Dependent Types
Authors:
Nasimeh Heydaribeni,
Achilleas Anastasopoulos
Abstract:
We consider a non-zero-sum linear quadratic Gaussian (LQG) dynamic game with asymmetric information. Each player observes privately a noisy version of a (hidden) state of the world $V$, resulting in dependent private observations. We study perfect Bayesian equilibria (PBE) for this game with equilibrium strategies that are linear in players' private estimates of $V$. The main difficulty arises fro…
▽ More
We consider a non-zero-sum linear quadratic Gaussian (LQG) dynamic game with asymmetric information. Each player observes privately a noisy version of a (hidden) state of the world $V$, resulting in dependent private observations. We study perfect Bayesian equilibria (PBE) for this game with equilibrium strategies that are linear in players' private estimates of $V$. The main difficulty arises from the fact that players need to construct estimates on other players' estimate on $V$, which in turn would imply that an infinite hierarchy of estimates on estimates needs to be constructed, rendering the problem unsolvable. We show that this is not the case: each player's estimate on other players' estimates on $V$ can be summarized into her own estimate on $V$ and some appropriately defined public information. Based on this finding we characterize the PBE through a backward/forward algorithm akin to dynamic programming for the standard LQG control problem. Unlike the standard LQG problem, however, Kalman filter covariance matrices, as well as some other required quantities, are observation-dependent and thus cannot be evaluated off-line through a forward recursion.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Incentive design for learning in user-recommendation systems with time-varying states
Authors:
Deepanshu Vasal,
Vijay Subramanian,
Achilleas Anastasopoulos
Abstract:
We consider the problem of how strategic users with asymmetric information can learn an underlying time varying state in a user-recommendation system. Users who observe private signals about the state, sequentially make a decision about buying a product whose value varies with time in an ergodic manner. We formulate the team problem as an instance of decentralized stochastic control problem and ch…
▽ More
We consider the problem of how strategic users with asymmetric information can learn an underlying time varying state in a user-recommendation system. Users who observe private signals about the state, sequentially make a decision about buying a product whose value varies with time in an ergodic manner. We formulate the team problem as an instance of decentralized stochastic control problem and characterize its optimal policies. With strategic users, we design incentives such that users reveal their true private signals, so that the gap between the strategic and team objective is small and the overall expected incentive payments are also small.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Linear Quadratic Games with Costly Measurements
Authors:
Dipankar Maity,
Achilleas Anastasopoulos,
John S. Baras
Abstract:
In this work we consider a stochastic linear quadratic two-player game. The state measurements are observed through a switched noiseless communication link. Each player incurs a finite cost every time the link is established to get measurements. Along with the usual control action, each player is equipped with a switching action to control the communication link. The measurements help to improve t…
▽ More
In this work we consider a stochastic linear quadratic two-player game. The state measurements are observed through a switched noiseless communication link. Each player incurs a finite cost every time the link is established to get measurements. Along with the usual control action, each player is equipped with a switching action to control the communication link. The measurements help to improve the estimate and hence reduce the quadratic cost but at the same time the cost is increased due to switching. We study the subgame perfect equilibrium control and switching strategies for the players. We show that the problem can be solved in a two-step process by solving two dynamic programming problems. The first step corresponds to solving a dynamic programming for the control strategy and the second step solves another dynamic programming for the switching strategy
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
Decentralized Bayesian learning in dynamic games: A framework for studying informational cascades
Authors:
Deepanshu Vasal,
Achilleas Anastasopoulos
Abstract:
We study the problem of Bayesian learning in a dynamical system involving strategic agents with asymmetric information. In a series of seminal papers in the literature, this problem has been investigated under a simplifying model where myopically selfish players appear sequentially and act once in the game, based on private noisy observations of the system state and public observation of past play…
▽ More
We study the problem of Bayesian learning in a dynamical system involving strategic agents with asymmetric information. In a series of seminal papers in the literature, this problem has been investigated under a simplifying model where myopically selfish players appear sequentially and act once in the game, based on private noisy observations of the system state and public observation of past players' actions. It has been shown that there exist information cascades where users discard their private information and mimic the action of their predecessor. In this paper, we provide a framework for studying Bayesian learning dynamics in a more general setting than the one described above. In particular, our model incorporates cases where players are non-myopic and strategically participate for the whole duration of the game, and cases where an endogenous process selects which subset of players will act at each time instance. The proposed framework hinges on a sequential decomposition methodology for finding structured perfect Bayesian equilibria (PBE) of a general class of dynamic games with asymmetric information, where user-specific states evolve as conditionally independent Markov processes and users make independent noisy observations of their states. Using this methodology, we study a specific dynamic learning model where players make decisions about public investment based on their estimates of everyone's types. We characterize a set of informational cascades for this problem where learning stops for the team as a whole. We show that in such cascades, all players' estimates of other players' types freeze even though each individual player asymptotically learns its own true type.
△ Less
Submitted 8 April, 2018; v1 submitted 22 July, 2016;
originally announced July 2016.
-
Signaling equilibria for dynamic LQG games with asymmetric information
Authors:
Deepanshu Vasal,
Achilleas Anastasopoulos
Abstract:
We consider a finite horizon dynamic game with two players who observe their types privately and take actions, which are publicly observed. Players' types evolve as independent, controlled linear Gaussian processes and players incur quadratic instantaneous costs. This forms a dynamic linear quadratic Gaussian (LQG) game with asymmetric information. We show that under certain conditions, players' s…
▽ More
We consider a finite horizon dynamic game with two players who observe their types privately and take actions, which are publicly observed. Players' types evolve as independent, controlled linear Gaussian processes and players incur quadratic instantaneous costs. This forms a dynamic linear quadratic Gaussian (LQG) game with asymmetric information. We show that under certain conditions, players' strategies that are linear in their private types, together with Gaussian beliefs form a perfect Bayesian equilibrium (PBE) of the game. Furthermore, it is shown that this is a signaling equilibrium due to the fact that future beliefs on players' types are affected by the equilibrium strategies. We provide a backward-forward algorithm to find the PBE. Each step of the backward algorithm reduces to solving an algebraic matrix equation for every possible realization of the state estimate covariance matrix. The forward algorithm consists of Kalman filter recursions, where state estimate covariance matrices depend on equilibrium strategies.
△ Less
Submitted 15 June, 2016;
originally announced June 2016.
-
A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information
Authors:
Deepanshu Vasal,
Abhinav Sinha,
Achilleas Anastasopoulos
Abstract:
We consider finite-horizon and infinite-horizon versions of a dynamic game with $N$ selfish players who observe their types privately and take actions that are publicly observed. Players' types evolve as conditionally independent Markov processes, conditioned on their current actions. Their actions and types jointly determine their instantaneous rewards. In dynamic games with asymmetric informatio…
▽ More
We consider finite-horizon and infinite-horizon versions of a dynamic game with $N$ selfish players who observe their types privately and take actions that are publicly observed. Players' types evolve as conditionally independent Markov processes, conditioned on their current actions. Their actions and types jointly determine their instantaneous rewards. In dynamic games with asymmetric information, a widely used concept of equilibrium is perfect Bayesian equilibrium (PBE), which consists of a strategy and belief pair that simultaneously satisfy sequential rationality and belief consistency. In general, there does not exist a universal algorithm that decouples the interdependence of strategies and beliefs over time in calculating PBE. In this paper, for the finite-horizon game with independent types we develop a two-step backward-forward recursive algorithm that sequentially decomposes the problem (w.r.t. time) to obtain a subset of PBEs, which we refer to as structured Bayesian perfect equilibria (SPBE). In such equilibria, a player's strategy depends on its history only through a common public belief and its current private type. The backward recursive part of this algorithm defines an equilibrium generating function. Each period in the backward recursion involves solving a fixed-point equation on the space of probability simplexes for every possible belief on types. Using this function, equilibrium strategies and beliefs are generated through a forward recursion. We then extend this methodology to the infinite-horizon model, where we propose a time-invariant single-shot fixed-point equation, which in conjunction with a forward recursive step, generates the SPBE. Sufficient conditions for the existence of SPBE are provided. With our proposed method, we find equilibria that exhibit signaling behavior. This is illustrated with the help of a concrete public goods example.
△ Less
Submitted 18 March, 2018; v1 submitted 25 August, 2015;
originally announced August 2015.