Search | arXiv e-print repository

Fair and Efficient Allocation of Indivisible Mixed Manna

Authors: Siddharth Barman, Vishwa Prakash HV, Aditi Sethia, Mashbat Suzuki

Abstract: We study fair division of indivisible mixed manna (items whose values may be positive, negative, or zero) among agents with additive valuations. Here, we establish that fairness -- in terms of a relaxation of envy-freeness -- and Pareto efficiency can always be achieved together. Specifically, our fairness guarantees are in terms of envy-freeness up to $k$ reallocations (EFR-$k$): An allocation… ▽ More We study fair division of indivisible mixed manna (items whose values may be positive, negative, or zero) among agents with additive valuations. Here, we establish that fairness -- in terms of a relaxation of envy-freeness -- and Pareto efficiency can always be achieved together. Specifically, our fairness guarantees are in terms of envy-freeness up to $k$ reallocations (EFR-$k$): An allocation $A$ of the indivisible items is said to be EFR-$k$ if there exists a subset $R$ of at most $k$ items such that, for each agent $i$, we can reassign items from within $R$ (in $A$) and obtain an allocation, $A^i$, which is envy-free for $i$. We establish that, when allocating mixed manna among $n$ agents with additive valuations, an EFR-$(n-1)$ and Pareto optimal (PO) allocation $A$ always exists. Further, the individual envy-free allocations $A^i$, induced by reassignments, are also PO. In addition, we prove that such fair and efficient allocations are efficiently computable when the number of agents, $n$, is fixed. We also obtain positive results focusing on EFR by itself (and without the PO desideratum). Specifically, we show that an EFR-$(n-1)$ allocation of mixed manna can be computed in polynomial time. In addition, we prove that when all the items are goods, an EFR-${\lfloor n/2 \rfloor}$ allocation exists and can be computed efficiently. Here, the $(n-1)$ bound is tight for chores and $\lfloor n/2 \rfloor$ is tight for goods. Our results advance the understanding of fair and efficient allocation of indivisible mixed manna and rely on a novel application of the Knaster-Kuratowski-Mazurkiewicz (KKM) Theorem in discrete fair division. We utilize weighted welfare maximization, with perturbed valuations, to achieve Pareto efficiency, and overall, our techniques are notably different from existing market-based approaches. △ Less

Submitted 5 July, 2025; originally announced July 2025.

Comments: 31 pages

arXiv:2506.22881 [pdf, ps, other]

How Semantically Informative is an Image?: Measuring the Covariance-Weighted Norm of Contrastive Learning Embeddings

Authors: Fumiya Uchiyama, Rintaro Yanagi, Shohei Taniguchi, Shota Takashiro, Masahiro Suzuki, Hirokatsu Kataoka, Yusuke Iwasawa, Yutaka Matsuo

Abstract: Contrastive learning has the capacity to model multimodal probability distributions by embedding and aligning visual representations with semantics from captions. This approach enables the estimation of relational semantic similarity; however, it remains unclear whether it can also represent absolute semantic informativeness. In this work, we introduce a semantic informativeness metric for an imag… ▽ More Contrastive learning has the capacity to model multimodal probability distributions by embedding and aligning visual representations with semantics from captions. This approach enables the estimation of relational semantic similarity; however, it remains unclear whether it can also represent absolute semantic informativeness. In this work, we introduce a semantic informativeness metric for an image calculated from text samples via a contrastive learning model; similarly, the informativeness of a text is calculated from image samples. We propose a redefinition of the concept of Information Gain, a concept previously explored in natural language processing, extending its application to the domains of vision and language. Our metric quantifies how conditioning on an image distorts the distribution of associated texts, and vice versa for text conditioning on image distributions. In OpenCLIP's empirical results, we observe that images with the lowest Information Gain scores often correspond to placeholder icons such as "image not found." Furthermore, we propose to measure a norm-based metric of the embedding to estimate the Information Gain, following the theoretical results for Skip-Gram with Negative Sampling (SGNS) word embedding. Information Gain can be measured using either CLIP or SigLIP, and the results demonstrate a strong correlation with a coefficient of determination ranging from 0.98 to 1.00. After obtaining the mean and the covariance of the sample embedding, the computational cost of this method is independent of the sample size, and it is compatible with publicly available, open-weight models. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.20164 [pdf]

Do psychic cells generate consciousness?

Authors: Mototaka Suzuki, Jaan Aru

Abstract: Technological advances in the past decades have begun to enable neuroscientists to address fundamental questions about consciousness in an unprecedented way. Here we review remarkable recent progress in our understanding of cellular-level mechanisms of conscious processing in the brain. Of particular interest are the cortical pyramidal neurons -- or "psychic cells" called by Ramón y Cajal more tha… ▽ More Technological advances in the past decades have begun to enable neuroscientists to address fundamental questions about consciousness in an unprecedented way. Here we review remarkable recent progress in our understanding of cellular-level mechanisms of conscious processing in the brain. Of particular interest are the cortical pyramidal neurons -- or "psychic cells" called by Ramón y Cajal more than 100 years ago -- which have an intriguing cellular mechanism that accounts for selective disruption of feedback signaling in the brain upon anesthetic-induced loss of consciousness. Importantly, a particular class of metabotropic receptors distributed over the dendrites of pyramidal cells are highlighted as the key cellular mechanism. After all, Cajal's instinct over a century ago may turn out to be correct -- we may have just begun to understand whether and how psychic cells indeed generate and control our consciousness. △ Less

Submitted 25 June, 2025; originally announced June 2025.

arXiv:2506.14046 [pdf, ps, other]

Ace-CEFR -- A Dataset for Automated Evaluation of the Linguistic Difficulty of Conversational Texts for LLM Applications

Authors: David Kogan, Max Schumacher, Sam Nguyen, Masanori Suzuki, Melissa Smith, Chloe Sophia Bellows, Jared Bernstein

Abstract: There is an unmet need to evaluate the language difficulty of short, conversational passages of text, particularly for training and filtering Large Language Models (LLMs). We introduce Ace-CEFR, a dataset of English conversational text passages expert-annotated with their corresponding level of text difficulty. We experiment with several models on Ace-CEFR, including Transformer-based models and L… ▽ More There is an unmet need to evaluate the language difficulty of short, conversational passages of text, particularly for training and filtering Large Language Models (LLMs). We introduce Ace-CEFR, a dataset of English conversational text passages expert-annotated with their corresponding level of text difficulty. We experiment with several models on Ace-CEFR, including Transformer-based models and LLMs. We show that models trained on Ace-CEFR can measure text difficulty more accurately than human experts and have latency appropriate to production environments. Finally, we release the Ace-CEFR dataset to the public for research and development. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2503.06138 [pdf, other]

System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems

Authors: Tadahiro Taniguchi, Yasushi Hirai, Masahiro Suzuki, Shingo Murata, Takato Horii, Kazutoshi Tanaka

Abstract: This paper introduces the System 0/1/2/3 framework as an extension of dual-process theory, employing a quad-process model of cognition. Expanding upon System 1 (fast, intuitive thinking) and System 2 (slow, deliberative thinking), we incorporate System 0, which represents pre-cognitive embodied processes, and System 3, which encompasses collective intelligence and symbol emergence. We contextualiz… ▽ More This paper introduces the System 0/1/2/3 framework as an extension of dual-process theory, employing a quad-process model of cognition. Expanding upon System 1 (fast, intuitive thinking) and System 2 (slow, deliberative thinking), we incorporate System 0, which represents pre-cognitive embodied processes, and System 3, which encompasses collective intelligence and symbol emergence. We contextualize this model within Bergson's philosophy by adopting multi-scale time theory to unify the diverse temporal dynamics of cognition. System 0 emphasizes morphological computation and passive dynamics, illustrating how physical embodiment enables adaptive behavior without explicit neural processing. Systems 1 and 2 are explained from a constructive perspective, incorporating neurodynamical and AI viewpoints. In System 3, we introduce collective predictive coding to explain how societal-level adaptation and symbol emergence operate over extended timescales. This comprehensive framework ranges from rapid embodied reactions to slow-evolving collective intelligence, offering a unified perspective on cognition across multiple timescales, levels of abstraction, and forms of human intelligence. The System 0/1/2/3 model provides a novel theoretical foundation for understanding the interplay between adaptive and cognitive processes, thereby opening new avenues for research in cognitive science, AI, robotics, and collective intelligence. △ Less

Submitted 13 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

Comments: Under review

arXiv:2503.00885 [pdf, ps, other]

Social Welfare Maximization in Approval-Based Committee Voting under Uncertainty

Authors: Haris Aziz, Yuhang Guo, Venkateswara Rao Kagita, Baharak Rastegari, Mashbat Suzuki

Abstract: Approval voting is widely used for making multi-winner voting decisions. The canonical rule (also called Approval Voting) used in the setting aims to maximize social welfare by selecting candidates with the highest number of approvals. We revisit approval-based multi-winner voting in scenarios where the information regarding the voters' preferences is uncertain. We present several algorithmic resu… ▽ More Approval voting is widely used for making multi-winner voting decisions. The canonical rule (also called Approval Voting) used in the setting aims to maximize social welfare by selecting candidates with the highest number of approvals. We revisit approval-based multi-winner voting in scenarios where the information regarding the voters' preferences is uncertain. We present several algorithmic results for problems related to social welfare maximization under uncertainty, including computing an outcome that is social welfare maximizing with the highest probability, computing the social welfare probability distribution of a given outcome, computing the probability that a given outcome is social welfare maximizing, and understanding how robust an outcome is with respect to social welfare maximizing. △ Less

Submitted 2 March, 2025; originally announced March 2025.

arXiv:2502.17869 [pdf, ps, other]

Maximum Welfare Allocations under Quantile Valuations

Authors: Haris Aziz, Shivika Narang, Mashbat Suzuki

Abstract: We propose a new model for aggregating preferences over a set of indivisible items based on a quantile value. In this model, each agent is endowed with a specific quantile, and the value of a given bundle is defined by the corresponding quantile of the individual values of the items within it. Our model captures the diverse ways in which agents may perceive a bundle, even when they agree on the va… ▽ More We propose a new model for aggregating preferences over a set of indivisible items based on a quantile value. In this model, each agent is endowed with a specific quantile, and the value of a given bundle is defined by the corresponding quantile of the individual values of the items within it. Our model captures the diverse ways in which agents may perceive a bundle, even when they agree on the values of individual items. It enables richer behavioral modeling that cannot be easily captured by additive valuation functions. We study the problem of maximizing utilitarian and egalitarian welfare within the quantile-based valuation setting. For each of the welfare functions, we analyze the complexity of the objectives. Interestingly, our results show that the complexity of both objectives varies significantly depending on whether the allocation is required to be balanced. We provide near-optimal approximation algorithms for utilitarian welfare, and for egalitarian welfare, we present exact algorithms whenever possible. △ Less

Submitted 17 April, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.13671 [pdf, other]

On the Subsidy of Envy-Free Orientations in Graphs

Authors: Bo Li, Ankang Sun, Mashbat Suzuki, Shiji Xing

Abstract: We study a fair division problem in (multi)graphs where $n$ agents (vertices) are pairwise connected by items (edges), and each agent is only interested in its incident items. We consider how to allocate items to incident agents in an envy-free manner, i.e., envy-free orientations, while minimizing the overall payment, i.e., subsidy. We first prove that computing an envy-free orientation with the… ▽ More We study a fair division problem in (multi)graphs where $n$ agents (vertices) are pairwise connected by items (edges), and each agent is only interested in its incident items. We consider how to allocate items to incident agents in an envy-free manner, i.e., envy-free orientations, while minimizing the overall payment, i.e., subsidy. We first prove that computing an envy-free orientation with the minimum subsidy is NP-hard, even when the graph is simple and the agents have bi-valued additive valuations. We then bound the worst-case subsidy. We prove that for any multigraph (i.e., allowing parallel edges) and monotone valuations where the marginal value of each good is at most \$1 for each agent, \$1 each (a total subsidy of $n-1$, where $n$ is the number of agents) is sufficient. This is one of the few cases where linear subsidy $Θ(n)$ is known to be necessary and sufficient to guarantee envy-freeness when agents have monotone valuations. When the valuations are additive (while the graph may contain parallel edges) and when the graph is simple (while the valuations may be monotone), we improve the bound to $n/2$ and $n-2$, respectively. Moreover, these two bounds are tight. △ Less

Submitted 19 February, 2025; originally announced February 2025.

arXiv:2502.09006 [pdf, ps, other]

Whoever Said Money Won't Solve All Your Problems? Weighted Envy-free Allocation with Subsidy

Authors: Noga Klein Elmalem, Haris Aziz, Rica Gonen, Xin Huang, Kei Kimura, Indrajit Saha, Erel Segal-Halevi, Zhaohong Sun, Mashbat Suzuki, Makoto Yokoo

Abstract: We explore solutions for fairly allocating indivisible items among agents assigned weights representing their entitlements. Our fairness goal is weighted-envy-freeness (WEF), where each agent deems their allocated portion relative to their entitlement at least as favorable as any others relative to their own. Often, achieving WEF necessitates monetary transfers, which can be modeled as third-party… ▽ More We explore solutions for fairly allocating indivisible items among agents assigned weights representing their entitlements. Our fairness goal is weighted-envy-freeness (WEF), where each agent deems their allocated portion relative to their entitlement at least as favorable as any others relative to their own. Often, achieving WEF necessitates monetary transfers, which can be modeled as third-party subsidies. The goal is to attain WEF with bounded subsidies. Previous work relied on characterizations of unweighted envy-freeness (EF), that fail in the weighted setting. This makes our new setting challenging. We present polynomial-time algorithms that compute WEF allocations with a guaranteed upper bound on total subsidy for monotone valuations and various subclasses thereof. We also present an efficient algorithm to compute a fair allocation of items and money, when the budget is not enough to make the allocation WEF. This algorithm is new even for the unweighted setting. △ Less

Submitted 5 March, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

Comments: 60 pages, 5 tables. arXiv admin note: substantial text overlap with arXiv:2411.12696

arXiv:2501.19252 [pdf, ps, other]

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Authors: Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta

Abstract: The remarkable progress in text-to-video diffusion models enables photorealistic generations, although the contents of the generated video often include unnatural movement or deformation, reverse playback, and motionless scenes. Recently, an alignment problem has attracted huge attention, where we steer the output of diffusion models based on some quantity on the goodness of the content. Because t… ▽ More The remarkable progress in text-to-video diffusion models enables photorealistic generations, although the contents of the generated video often include unnatural movement or deformation, reverse playback, and motionless scenes. Recently, an alignment problem has attracted huge attention, where we steer the output of diffusion models based on some quantity on the goodness of the content. Because there is a large room for improvement of perceptual quality along the frame direction, we should address which metrics we should optimize and how we can optimize them in the video generation. In this paper, we propose diffusion latent beam search with lookahead estimator, which can select a better diffusion latent to maximize a given alignment reward, at inference time. We then point out that the improvement of perceptual video quality considering the alignment to prompts requires reward calibration by weighting existing metrics. This is because when humans or vision language models evaluate outputs, many previous metrics to quantify the naturalness of video do not always correlate with evaluation. We demonstrate that our method improves the perceptual quality evaluated on the calibrated reward, VLMs, and human assessment, without model parameter update, and outputs the best generation compared to greedy search and best-of-N sampling under much more efficient computational cost. The experiments highlight that our method is beneficial to many capable generative models, and provide a practical guideline that we should prioritize the inference-time compute allocation into lookahead steps for reward estimation over search budget or denoising steps. △ Less

Submitted 1 June, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: Code: https://github.com/shim0114/T2V-Diffusion-Search

arXiv:2501.00226 [pdf, other]

Generative Emergent Communication: Large Language Model is a Collective World Model

Authors: Tadahiro Taniguchi, Ryo Ueda, Tomoaki Nakamura, Masahiro Suzuki, Akira Taniguchi

Abstract: This study proposes a unifying theoretical framework called generative emergent communication (generative EmCom) that bridges emergent communication, world models, and large language models (LLMs) through the lens of collective predictive coding (CPC). The proposed framework formalizes the emergence of language and symbol systems through decentralized Bayesian inference across multiple agents, ext… ▽ More This study proposes a unifying theoretical framework called generative emergent communication (generative EmCom) that bridges emergent communication, world models, and large language models (LLMs) through the lens of collective predictive coding (CPC). The proposed framework formalizes the emergence of language and symbol systems through decentralized Bayesian inference across multiple agents, extending beyond conventional discriminative model-based approaches to emergent communication. This study makes the following two key contributions: First, we propose generative EmCom as a novel framework for understanding emergent communication, demonstrating how communication emergence in multi-agent reinforcement learning (MARL) can be derived from control as inference while clarifying its relationship to conventional discriminative approaches. Second, we propose a mathematical formulation showing the interpretation of LLMs as collective world models that integrate multiple agents' experiences through CPC. The framework provides a unified theoretical foundation for understanding how shared symbol systems emerge through collective predictive coding processes, bridging individual cognitive development and societal language evolution. Through mathematical formulations and discussion on prior works, we demonstrate how this framework explains fundamental aspects of language emergence and offers practical insights for understanding LLMs and developing sophisticated AI systems for improving human-AI interaction and multi-agent systems. △ Less

Submitted 30 December, 2024; originally announced January 2025.

arXiv:2412.05630 [pdf, other]

Dislocation-based crystal plasticity simulation on grain-size dependence of mechanical properties in dual-phase steels

Authors: Misato Suzuki, Mayu Muramatsu, Kazuyuki Shizawa

Abstract: In this study, the effect of ferrite grain size on the mechanical properties and dislocation behavior of dual-phase (DP) steel is investigated using dislocation-based crystal plasticity finite element analysis. DP steel, composed of a soft ferritic phase and a hard martensitic phase, shows mechanical properties that are significantly influenced by ferrite grain size. The mechanism underlying this… ▽ More In this study, the effect of ferrite grain size on the mechanical properties and dislocation behavior of dual-phase (DP) steel is investigated using dislocation-based crystal plasticity finite element analysis. DP steel, composed of a soft ferritic phase and a hard martensitic phase, shows mechanical properties that are significantly influenced by ferrite grain size. The mechanism underlying this grain size effect is clarified by analyzing the partitioning and distribution of stress, strain, and dislocations in each phase. Three models with the same volume fraction of martensitic phase but different ferrite grain sizes are subjected to tensile loading. Interestingly, even though only the ferrite grain size is changed, the stress in the martensitic phase exhibited a notable dependence on ferrite grain size. This can be explained as follows. Geometrically necessary (GN) dislocations accumulate on the ferrite side of the ferrite-martensite grain boundary, and the grain boundary occupancy per unit area increases as the ferrite grain size decreases. As a result, smaller ferrite grain sizes make the ferritic phase less deformable owing to the effect of GN dislocations, shifting more deformation to the martensitic phase. This behavior is confirmed by the more uniform strain distribution and partitioning observed with decreasing ferrite grain size. As the martensitic phase takes on greater deformation, the statistically stored dislocation density in the martensitic phase becomes ferrite grain size dependent, which in turn leads to the observed grain size dependence of stress in the martensitic phase. △ Less

Submitted 7 December, 2024; originally announced December 2024.

Comments: 16 pages, 14figures

arXiv:2412.02435 [pdf, ps, other]

Approximately Fair and Population Consistent Budget Division via Simple Payment Schemes

Authors: Haris Aziz, Patrick Lederer, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen

Abstract: In approval-based budget division, a budget needs to be distributed to candidates based on the voters' approval ballots over these candidates. In the pursuit of a simple, consistent, and approximately fair rule for this setting, we introduce the maximum payment rule (MP). Under this rule, each voter controls a part of the budget and, in each step, the corresponding voters allocate their entire bud… ▽ More In approval-based budget division, a budget needs to be distributed to candidates based on the voters' approval ballots over these candidates. In the pursuit of a simple, consistent, and approximately fair rule for this setting, we introduce the maximum payment rule (MP). Under this rule, each voter controls a part of the budget and, in each step, the corresponding voters allocate their entire budget to the candidate approved by the largest number of voters with non-zero budget. We show that MP meets our criteria as it satisfies monotonicity and a demanding population consistency condition and gives a $2$-approximation to a fairness notion called average fair share (AFS). Moreover, we generalize MP to the class of sequential payment rule and prove that it is the most desirable rule in this class: all sequential payment rules but MP and one other rule fail monotonicity while only allowing for a small improvement in the approximation ratio to AFS. △ Less

Submitted 2 July, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

Comments: This paper (version 1) has been accepted at EC'25. The current version is in preparation for a revision at a journal, which caused significant changes in the presentation of the results and lead to a new title

arXiv:2411.09937 [pdf, other]

Refined and Segmented Price Sentiment Indices from Survey Comments

Authors: Masahiro Suzuki, Hiroki Sakaji

Abstract: We aim to enhance a price sentiment index and to more precisely understand price trends from the perspective of not only consumers but also businesses. We extract comments related to prices from the Economy Watchers Survey conducted by the Cabinet Office of Japan and classify price trends using a large language model (LLM). We classify whether the survey sample reflects the perspective of consumer… ▽ More We aim to enhance a price sentiment index and to more precisely understand price trends from the perspective of not only consumers but also businesses. We extract comments related to prices from the Economy Watchers Survey conducted by the Cabinet Office of Japan and classify price trends using a large language model (LLM). We classify whether the survey sample reflects the perspective of consumers or businesses, and whether the comments pertain to goods or services by utilizing information on the fields of comments and the industries of respondents included in the Economy Watchers Survey. From these classified price-related comments, we construct price sentiment indices not only for a general purpose but also for more specific objectives by combining perspectives on consumers and prices, as well as goods and services. It becomes possible to achieve a more accurate classification of price directions by employing a LLM for classification. Furthermore, integrating the outputs of multiple LLMs suggests the potential for the better performance of the classification. The use of more accurately classified comments allows for the construction of an index with a higher correlation to existing indices than previous studies. We demonstrate that the correlation of the price index for consumers, which has a larger sample size, is further enhanced by selecting comments for aggregation based on the industry of the survey respondents. △ Less

Submitted 26 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

Comments: Accepted to IEEE BigData 2024. 9 pages, 11 tables, 1 figure

arXiv:2411.02853 [pdf, other]

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

Authors: Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

Abstract: Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose… ▽ More Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $\mathcal{O} ( 1 / \sqrt{T} )$ with any choice of $β_2$ without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning. The implementation is available at https://github.com/iShohei220/adopt. △ Less

Submitted 21 November, 2024; v1 submitted 5 November, 2024; originally announced November 2024.

Comments: Accepted at Neural Information Processing Systems (NeurIPS 2024)

arXiv:2410.20822 [pdf, other]

Conditional diffusion model for inverse prediction of process parameters and dendritic microstructures from mechanical properties

Authors: Arisa Ikeda, Ryo Higuchi, Tomohiro Yokozeki, Katsuhiro Endo, Yuta Kojima, Misato Suzuki, Mayu Muramatsu

Abstract: In this study, we develop a conditional diffusion model that proposes the optimal process parameters and predicts the microstructure for the desired mechanical properties. In materials development, it is costly to try many samples with different parameters in experiments and numerical simulations. The use of data-driven inverse design method can reduce the cost of materials development. This study… ▽ More In this study, we develop a conditional diffusion model that proposes the optimal process parameters and predicts the microstructure for the desired mechanical properties. In materials development, it is costly to try many samples with different parameters in experiments and numerical simulations. The use of data-driven inverse design method can reduce the cost of materials development. This study develops an inverse analysis model that predicts process parameters and microstructures. This method can be used for any material, but in this study it is applied to polymeric material, which is the matrix resin of carbon fiber reinforced thermoplastics as an example. Matrix resins contain a mixture of dendrites, which are crystalline phases, and amorphous phases even after crystal growth is complete, and it is important to consider the microstructures consisting of the crystalline structure and the remaining amorphous phase to achieve the desired mechanical properties. Typically, the temperature during forming affects the microstructures, which in turn affect the macroscopic mechanical properties. The trained diffusion model can propose not only the processing temperature but also the microstructure when Young's modulus and Poisson's ratio are given. The capability of our conditional diffusion model to represent complex dendrites is also noteworthy. △ Less

Submitted 14 March, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

Comments: 22pages, 22figures

arXiv:2410.15728 [pdf, other]

Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases

Authors: Cristian Meo, Akihiro Nakano, Mircea Lică, Aniket Didolkar, Masahiro Suzuki, Anirudh Goyal, Mengmi Zhang, Justin Dauwels, Yutaka Matsuo, Yoshua Bengio

Abstract: Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that pretrained Vision Transformers (ViTs) can be useful to learn object-centric representations on real-world video datasets. However, while these approaches succeed at extr… ▽ More Unsupervised object-centric learning from videos is a promising approach towards learning compositional representations that can be applied to various downstream tasks, such as prediction and reasoning. Recently, it was shown that pretrained Vision Transformers (ViTs) can be useful to learn object-centric representations on real-world video datasets. However, while these approaches succeed at extracting objects from the scenes, the slot-based representations fail to maintain temporal consistency across consecutive frames in a video, i.e. the mapping of objects to slots changes across the video. To address this, we introduce Conditional Autoregressive Slot Attention (CA-SA), a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks. Leveraging an autoregressive prior network to condition representations on previous timesteps and a novel consistency loss function, CA-SA predicts future slot representations and imposes consistency across frames. We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks, such as video prediction and visual question-answering tasks. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.11403 [pdf, other]

Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference

Authors: Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo

Abstract: Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities. A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number (2^M) of inference networks for all possible modality combinations. Mixture-based models simplify this by requiring only… ▽ More Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities. A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number (2^M) of inference networks for all possible modality combinations. Mixture-based models simplify this by requiring only as many inference models as there are modalities, aggregating unimodal inferences. However, they suffer from information loss when modalities are missing. Alignment-based VAEs address this by aligning unimodal inference models with a multimodal model through minimizing the Kullback-Leibler (KL) divergence but face issues due to amortization gaps, which compromise inference accuracy. To tackle these problems, we introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework. This method overcomes information loss from missing modalities and minimizes the amortization gap by iteratively refining the multimodal inference using all available modalities. By aligning unimodal inference to this refined multimodal posterior, we achieve unimodal inferences that effectively incorporate multimodal information while requiring only unimodal inputs during inference. Experiments on benchmark datasets show that our approach improves inference performance, evidenced by higher linear classification accuracy and competitive cosine similarity, and enhances cross-modal generation, indicated by lower FID scores. This demonstrates that our method enhances inferred representations from unimodal inputs. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 22 pages, 12 figures

arXiv:2408.12326 [pdf, other]

Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

Authors: Meiyun Wang, Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi

Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, curren… ▽ More Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, current methods for knowledge distillation using LLMs often struggle to enhance the effectiveness of both teacher and student models. To address these challenges, we introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models during knowledge distillation. DualChecker employs ContextAligner to ensure that the context provided by teacher models aligns with human labeling standards. It also features a dynamic checker system that enhances model interaction: one component re-prompts teacher models with more detailed content when they show low confidence, and another identifies borderline cases from student models to refine the teaching templates. This interactive process promotes continuous improvement and effective knowledge transfer between the models. We evaluate DualChecker using a green innovation textual dataset that includes binary, multiclass, and token classification tasks. The experimental results show that DualChecker significantly outperforms existing state-of-the-art methods, achieving up to a 17% improvement in F1 score for teacher models and 10% for student models. Notably, student models fine-tuned with LLM predictions perform comparably to those fine-tuned with actual data, even in a challenging domain. We make all datasets, models, and code from this research publicly available. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.08711 [pdf, other]

Weighted Envy-free Allocation with Subsidy

Authors: Haris Aziz, Xin Huang, Kei Kimura, Indrajit Saha, Zhaohong Sun, Mashbat Suzuki, Makoto Yokoo

Abstract: We consider the problem of fair allocation of indivisible items with subsidies when agents have weighted entitlements. After highlighting several important differences from the unweighted case, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds of the amount of s… ▽ More We consider the problem of fair allocation of indivisible items with subsidies when agents have weighted entitlements. After highlighting several important differences from the unweighted case, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds of the amount of subsidies for envy-freeable allocations, and algorithms for achieving weighted envy-freeability along with other properties. △ Less

Submitted 17 October, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

Comments: 26 pages, 1 Table, 1 Figure

arXiv:2407.19391 [pdf, ps, other]

Approval-Based Committee Voting under Uncertainty

Authors: Hariz Aziz, Venkateswara Rao Kagita, Baharak Rastegari, Mashbat Suzuki

Abstract: We study approval-based committee voting in which a target number of candidates are selected based on voters' approval preferences over candidates. In contrast to most of the work, we consider the setting where voters express uncertain approval preferences and explore four different types of uncertain approval preference models. For each model, we study the problems such as computing a committee w… ▽ More We study approval-based committee voting in which a target number of candidates are selected based on voters' approval preferences over candidates. In contrast to most of the work, we consider the setting where voters express uncertain approval preferences and explore four different types of uncertain approval preference models. For each model, we study the problems such as computing a committee with the highest probability of satisfying axioms such as justified representation. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.14727 [pdf, ps, other]

Economy Watchers Survey Provides Datasets and Tasks for Japanese Financial Domain

Authors: Masahiro Suzuki, Hiroki Sakaji

Abstract: Natural language processing (NLP) tasks in English and general domains are widely available and are often used to evaluate pre-trained language models. In contrast, fewer tasks are available for languages other than English and in the financial domain. Particularly, tasks in the Japanese and financial domains are limited. We develop two large datasets using data published by a Japanese central gov… ▽ More Natural language processing (NLP) tasks in English and general domains are widely available and are often used to evaluate pre-trained language models. In contrast, fewer tasks are available for languages other than English and in the financial domain. Particularly, tasks in the Japanese and financial domains are limited. We develop two large datasets using data published by a Japanese central government agency. The datasets provide three Japanese financial NLP tasks, including 3- and 12-class classifications for categorizing sentences, along with a 5-class classification task for sentiment analysis. Our datasets are designed to be comprehensive and updated by leveraging an automatic update framework that ensures that the latest task datasets are publicly always available. △ Less

Submitted 1 February, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: Accepted to the ACM Web Conference 2025. 4 pages

arXiv:2407.13300 [pdf, other]

Robust ASR Error Correction with Conservative Data Filtering

Authors: Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

Abstract: Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise… ▽ More Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings. △ Less

Submitted 16 October, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted to EMNLP 2024 Industry Track

arXiv:2407.13171 [pdf, other]

Maximin Fair Allocation of Indivisible Items under Cost Utilities

Authors: Sirin Botan, Angus Ritossa, Mashbat Suzuki, Toby Walsh

Abstract: We study the problem of fairly allocating indivisible goods among a set of agents. Our focus is on the existence of allocations that give each agent their maximin fair share--the value they are guaranteed if they divide the goods into as many bundles as there are agents, and receive their lowest valued bundle. An MMS allocation is one where every agent receives at least their maximin fair share. W… ▽ More We study the problem of fairly allocating indivisible goods among a set of agents. Our focus is on the existence of allocations that give each agent their maximin fair share--the value they are guaranteed if they divide the goods into as many bundles as there are agents, and receive their lowest valued bundle. An MMS allocation is one where every agent receives at least their maximin fair share. We examine the existence of such allocations when agents have cost utilities. In this setting, each item has an associated cost, and an agent's valuation for an item is the cost of the item if it is useful to them, and zero otherwise. Our main results indicate that cost utilities are a promising restriction for achieving MMS. We show that for the case of three agents with cost utilities, an MMS allocation always exists. We also show that when preferences are restricted slightly further--to what we call laminar set approvals--we can guarantee MMS allocations for any number of agents. Finally, we explore if it is possible to guarantee each agent their maximin fair share while using a strategyproof mechanism. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Appeared in SAGT 2023

arXiv:2407.12461 [pdf, other]

Compatibility of Fairness and Nash Welfare under Subadditive Valuations

Authors: Siddharth Barman, Mashbat Suzuki

Abstract: We establish a compatibility between fairness and efficiency, captured via Nash Social Welfare (NSW), under the broad class of subadditive valuations. We prove that, for subadditive valuations, there always exists a partial allocation that is envy-free up to the removal of any good (EFx) and has NSW at least half of the optimal; here, optimality is considered across all allocations, fair or otherw… ▽ More We establish a compatibility between fairness and efficiency, captured via Nash Social Welfare (NSW), under the broad class of subadditive valuations. We prove that, for subadditive valuations, there always exists a partial allocation that is envy-free up to the removal of any good (EFx) and has NSW at least half of the optimal; here, optimality is considered across all allocations, fair or otherwise. We also prove, for subadditive valuations, the universal existence of complete allocations that are envy-free up to one good (EF1) and also achieve a factor $1/2$ approximation to the optimal NSW. Our EF1 result resolves an open question posed by Garg, Husic, Li, Végh, and Vondrák (STOC 2023). In addition, we develop a polynomial-time algorithm which, given an arbitrary allocation $\widetilde{A}$ as input, returns an EF1 allocation with NSW at least $\frac{1}{e^{2/e}}\approx \frac{1}{2.08}$ times that of $\widetilde{A}$. Therefore, our results imply that the EF1 criterion can be attained simultaneously with a constant-factor approximation to optimal NSW in polynomial time (with demand queries), for subadditive valuations. The previously best-known approximation factor for optimal NSW, under EF1 and among $n$ agents, was $O(n)$ -- we improve this bound to $O(1)$. It is known that EF1 and exact Pareto efficiency (PO) are incompatible with subadditive valuations. Complementary to this negative result, the current work shows that we regain compatibility by just considering a factor $1/2$ approximation: EF1 can be achieved in conjunction with $\frac{1}{2}$-PO under subadditive valuations. As such, our results serve as a general tool that can be used as a black box to convert any efficient outcome into a fair one, with only a marginal decrease in efficiency. △ Less

Submitted 4 March, 2025; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.05240 [pdf, other]

Neighborhood Stability in Assignments on Graphs

Authors: Haris Aziz, Grzegorz Lisowski, Mashbat Suzuki, Jeremy Vollen

Abstract: We study the problem of assigning agents to the vertices of a graph such that no pair of neighbors can benefit from swapping assignments -- a property we term neighborhood stability. We further assume that agents' utilities are based solely on their preferences over the assignees of adjacent vertices and that those preferences are binary. Having shown that even this very restricted setting does no… ▽ More We study the problem of assigning agents to the vertices of a graph such that no pair of neighbors can benefit from swapping assignments -- a property we term neighborhood stability. We further assume that agents' utilities are based solely on their preferences over the assignees of adjacent vertices and that those preferences are binary. Having shown that even this very restricted setting does not guarantee neighborhood stable assignments, we focus on special cases that provide such guarantees. We show that when the graph is a cycle or a path, a neighborhood stable assignment always exists for any preference profile. Furthermore, we give a general condition under which neighborhood stable assignments always exist. For each of these results, we give a polynomial-time algorithm to compute a neighborhood stable assignment. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2406.14907 [pdf, other]

Maximum Flow is Fair: A Network Flow Approach to Committee Voting

Authors: Mashbat Suzuki, Jeremy Vollen

Abstract: In the committee voting setting, a subset of $k$ alternatives is selected based on the preferences of voters. In this paper, our goal is to efficiently compute $\textit{ex-ante}$ fair probability distributions over committees. We introduce a new axiom called $\textit{group resource proportionality}$, which strengthens other fairness notions in the literature. We characterize our fairness axiom by… ▽ More In the committee voting setting, a subset of $k$ alternatives is selected based on the preferences of voters. In this paper, our goal is to efficiently compute $\textit{ex-ante}$ fair probability distributions over committees. We introduce a new axiom called $\textit{group resource proportionality}$, which strengthens other fairness notions in the literature. We characterize our fairness axiom by a correspondence with max flows on a network formulation of committee voting. Using the connection to flow networks revealed by this characterization, we introduce two voting rules which achieve fairness in conjunction with other desiderata. The first rule - the $\textit{redistributive utilitarian rule}$ - satisfies ex-ante efficiency in addition to our fairness axiom. The second rule - Generalized CUT - reduces instances of our problem to instances of the minimum-cost maximum flow problem. We show that Generalized CUT maximizes social welfare subject to our fairness axiom and additionally satisfies an incentive compatibility property known as $\textit{excludable strategyproofness}$. Lastly, we show our fairness property can be obtained in tandem with strong $\textit{ex-post}$ fairness properties - an approach known as $\textit{best-of-both-worlds}$ fairness. We strengthen existing best-or-both-worlds fairness results in committee voting and resolve an open question posed by Aziz et al. [2023]. △ Less

Submitted 27 December, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: Previous version appeared in EC 2024. This version features significant additional content; notably, treatment of excludable strategyproofness and modified motivation surrounding fractional core. The latter change was made as we realized that the definition of fractional core was misstated in our previous version. In the current manuscript, our old definition has become strict fractional core

arXiv:2406.00765 [pdf]

The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts

Authors: Wakana Haijima, Kou Nakakubo, Masahiro Suzuki, Yutaka Matsuo

Abstract: In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of… ▽ More In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of utilizing visual data and the function of LLM as a world model were investigated with the aim of improving the performance of embodied AI. The experimental results revealed that LLM can extract necessary information from visual data, and the utilization of the information improves its performance as a world model. It was also suggested that devised prompts could bring out the LLM's function as a world model. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.01689 [pdf, other]

doi 10.1016/j.mtcomm.2024.110557

Investigation on optimal microstructure of dual-phase steel with high strength and ductility by machine learning

Authors: Misato Suzuki, Kazuyuki Shizawa, Mayu Muramatsu

Abstract: In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts th… ▽ More In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts the maximum stress and working limit strain from DP steel microstructures. GAN was trained using images of DP steel microstructures generated by the phase-field method. CNN was trained using images of DP steel microstructures, the maximum stress and the working limit strain calculated by the dislocation-crystal plasticity finite element method. The constructed framework made an efficient search for microstructures possible because of a low-dimensional search space by a latent variable of GAN. The multiple deformation modes were considered in this framework, which allowed the required microstructures to be explored under complex deformation modes. A microstructure with a fine grain size was proposed by using the developed framework. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 27 pages, 23 figures

Journal ref: Mater. Today Commun., Volume 41, 110557, 2024

arXiv:2404.09260 [pdf, other]

JaFIn: Japanese Financial Instruction Dataset

Authors: Kota Tanabe, Masahiro Suzuki, Hiroki Sakaji, Itsuki Noda

Abstract: We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the… ▽ More We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the Japanese Financial Instruction Dataset. JaFIn is manually constructed based on multiple data sources, including Japanese government websites, which provide extensive financial knowledge. We then utilize JaFIn to apply instruction tuning for several LLMs, demonstrating that our models specialized in finance have better domain adaptability than the original models. The financial-specialized LLMs created were evaluated using a quantitative Japanese financial benchmark and qualitative response comparisons, showing improved performance over the originals. △ Less

Submitted 19 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: 10 pages, 1 figure. The paper is a camera-ready version for the 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)

arXiv:2404.05198 [pdf, ps, other]

Fair Lotteries for Participatory Budgeting

Authors: Haris Aziz, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen, Toby Walsh

Abstract: In pursuit of participatory budgeting (PB) outcomes with broader fairness guarantees, we initiate the study of lotteries over discrete PB outcomes. As the projects have heterogeneous costs, the amount spent may not be equal ex ante and ex post. To address this, we develop a technique to bound the amount by which the ex-post spend differs from the ex-ante spend -- the property is termed budget bala… ▽ More In pursuit of participatory budgeting (PB) outcomes with broader fairness guarantees, we initiate the study of lotteries over discrete PB outcomes. As the projects have heterogeneous costs, the amount spent may not be equal ex ante and ex post. To address this, we develop a technique to bound the amount by which the ex-post spend differs from the ex-ante spend -- the property is termed budget balanced up to one project (BB1). With respect to fairness, we take a best-of-both-worlds perspective, seeking outcomes that are both ex-ante and ex-post fair. Towards this goal, we initiate a study of ex-ante fairness properties in PB, including Individual Fair Share (IFS), Unanimous Fair Share (UFS) and their stronger variants, as well as Group Fair Share (GFS). We show several incompatibility results between these ex-ante fairness notions and existing ex-post concepts based on justified representation. One of our main contributions is a randomized algorithm which simultaneously satisfies ex-ante Strong UFS, ex-post full justified representation (FJR) and ex-post BB1 for PB with binary utilities. △ Less

Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Appears in the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024

arXiv:2403.07711 [pdf, other]

SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spaces

Authors: Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo

Abstract: Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their computational costs, which increase quadratically wit… ▽ More Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their computational costs, which increase quadratically with the sequence length. This limitation presents significant challenges when generating longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs) as temporal feature extractors. SSMs (e.g., Mamba) have recently gained attention as promising alternatives due to their linear-time memory consumption relative to sequence length. In line with previous research suggesting that using bidirectional SSMs is effective for understanding spatial features in image generation, we found that bidirectionality is also beneficial for capturing temporal features in video data, rather than relying on traditional unidirectional SSMs. We conducted comprehensive evaluations on multiple long-term video datasets, such as MineRL Navigate, across various model sizes. For sequences up to 256 frames, SSM-based models require less memory to achieve the same FVD as attention-based models. Moreover, SSM-based models often deliver better performance with comparable GPU memory usage. Our codes are available at https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models. △ Less

Submitted 3 September, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted as a workshop paper at ICLR 2024

arXiv:2402.14484 [pdf, other]

Is ChatGPT the Future of Causal Text Mining? A Comprehensive Evaluation and Analysis

Authors: Takehiro Takayanagi, Masahiro Suzuki, Ryotaro Kobayashi, Hiroki Sakaji, Kiyoshi Izumi

Abstract: Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that exte… ▽ More Causality is fundamental in human cognition and has drawn attention in diverse research fields. With growing volumes of textual data, discerning causalities within text data is crucial, and causal text mining plays a pivotal role in extracting meaningful patterns. This study conducts comprehensive evaluations of ChatGPT's causal text mining capabilities. Firstly, we introduce a benchmark that extends beyond general English datasets, including domain-specific and non-English datasets. We also provide an evaluation framework to ensure fair comparisons between ChatGPT and previous approaches. Finally, our analysis outlines the limitations and future challenges in employing ChatGPT for causal text mining. Specifically, our analysis reveals that ChatGPT serves as a good starting point for various datasets. However, when equipped with a sufficient amount of training data, previous models still surpass ChatGPT's performance. Additionally, ChatGPT suffers from the tendency to falsely recognize non-causal sequences as causal sequences. These issues become even more pronounced with advanced versions of the model, such as GPT-4. In addition, we highlight the constraints of ChatGPT in handling complex causality types, including both intra/inter-sentential and implicit causality. The model also faces challenges with effectively leveraging in-context learning and domain adaptation. We release our code to support further research and development in this field. △ Less

Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2312.11286 [pdf, ps, other]

Envy-free House Allocation under Uncertain Preferences

Authors: Haris Aziz, Isaiah Iliffe, Bo Li, Angus Ritossa, Ankang Sun, Mashbat Suzuki

Abstract: We study the envy-free house allocation problem when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in… ▽ More We study the envy-free house allocation problem when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in-)approximability. En route, we consider two related problems of checking whether there exists an allocation that is possibly or necessarily envy-free. We give a complete picture of the computational complexity of these two problems for all the uncertainty models we consider. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: To appear in the proceeding of AAAI2024

arXiv:2310.12900 [pdf]

Personalized human mobility prediction for HuMob challenge

Authors: Masahiro Suzuki, Shomu Furuta, Yusuke Fukazawa

Abstract: We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and… ▽ More We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.10083 [pdf, other]

JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning

Authors: Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, Satoshi Kodera

Abstract: In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in doma… ▽ More In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning is used to fine-tune some LLMs, its precise roles in domain adaptation remain unknown. Here we show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks. In doing so, we employ a multifaceted evaluation for multiple-choice questions, including scoring based on "Exact match" and "Gestalt distance" in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services. △ Less

Submitted 30 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 1 figures

arXiv:2309.04031 [pdf, other]

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

Abstract: Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different laye… ▽ More Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation. △ Less

Submitted 25 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024

arXiv:2309.03412 [pdf, other]

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

Authors: Masahiro Suzuki, Masanori Hirano, Hiroki Sakaji

Abstract: Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-… ▽ More Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-trained base model. We performed Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models using our instruction dataset. We evaluated these models from both quantitative and qualitative perspectives. As a result, the effectiveness of Japanese instruction datasets is confirmed. The results also indicate that even with relatively small LLMs, performances in downstream tasks would be improved through instruction tuning. Our instruction dataset, tuned models, and implementation are publicly available online. △ Less

Submitted 5 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 10 pages, 1 figure, 2 tables. The paper is a camera-ready version of IEEE BigData 2023

arXiv:2306.09564 [pdf, ps, other]

doi 10.1613/jair.1.15800

Mixed Fair Division: A Survey

Authors: Shengxin Liu, Xinhang Lu, Mashbat Suzuki, Toby Walsh

Abstract: Fair division considers the allocation of scarce resources among agents in such a way that every agent gets a fair share. It is a fundamental problem in society and has received significant attention and rapid developments from the game theory and artificial intelligence communities in recent years. The majority of the fair division literature can be divided along at least two orthogonal direction… ▽ More Fair division considers the allocation of scarce resources among agents in such a way that every agent gets a fair share. It is a fundamental problem in society and has received significant attention and rapid developments from the game theory and artificial intelligence communities in recent years. The majority of the fair division literature can be divided along at least two orthogonal directions: goods versus chores, and divisible versus indivisible resources. In this survey, besides describing the state of the art, we outline a number of interesting open questions and future directions in three mixed fair division settings: (i) indivisible goods and chores, (ii) divisible and indivisible goods (mixed goods), and (iii) indivisible goods with subsidy which can be viewed like a divisible good. △ Less

Submitted 12 August, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Appears in the 38th AAAI Conference on Artificial Intelligence (AAAI), Senior Member Presentation Track, 2024

Journal ref: Journal of Artificial Intelligence Research (JAIR), 80:1373-1406, 2024

arXiv:2305.19684 [pdf, other]

End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive Divergence with Local Mode Initialization

Authors: Shohei Taniguchi, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

Abstract: We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs). The existing method to obtain an unbiased estimator uses a maximal coupling based on a Gibbs sampler, but when the state is high-dimensional, it takes a long time to converge. In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the… ▽ More We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs). The existing method to obtain an unbiased estimator uses a maximal coupling based on a Gibbs sampler, but when the state is high-dimensional, it takes a long time to converge. In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution. Because of the propensity of MH to reject proposals, the coupling tends to converge in only one step with a high probability, leading to high efficiency. We find that our method allows DBMs to be trained in an end-to-end fashion without greedy pretraining. We also propose some practical techniques to further improve the performance of DBMs. We empirically demonstrate that our training algorithm enables DBMs to show comparable generative performance to other deep generative models, achieving the FID score of 10.33 for MNIST. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted at ICML 2023

arXiv:2305.12720 [pdf, ps, other]

llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology

Authors: Masanori Hirano, Masahiro Suzuki, Hiroki Sakaji

Abstract: This study constructed a Japanese chat dataset for tuning large language models (LLMs), which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in bot… ▽ More This study constructed a Japanese chat dataset for tuning large language models (LLMs), which consist of about 8.4 million records. Recently, LLMs have been developed and gaining popularity. However, high-performing LLMs are usually mainly for English. There are two ways to support languages other than English by those LLMs: constructing LLMs from scratch or tuning existing models. However, in both ways, datasets are necessary parts. In this study, we focused on supporting Japanese in those LLMs and making a dataset for training or tuning LLMs in Japanese. The dataset we constructed consisted of various tasks, such as translation and knowledge tasks. In our experiment, we tuned an existing LLM using our dataset and evaluated the performance qualitatively. The results suggest that our dataset is possibly beneficial for LLMs. However, we also revealed some difficulties in constructing LLMs in languages other than English. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 12 pages

arXiv:2303.03642 [pdf, ps, other]

Best-of-Both-Worlds Fairness in Committee Voting

Authors: Haris Aziz, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen, Toby Walsh

Abstract: The best-of-both-worlds paradigm advocates an approach that achieves desirable properties both ex-ante and ex-post. We launch a best-of-both-worlds fairness perspective for the important social choice setting of approval-based committee voting. To this end, we initiate work on ex-ante proportional representation properties in this domain and formalize a hierarchy of notions including Individual Fa… ▽ More The best-of-both-worlds paradigm advocates an approach that achieves desirable properties both ex-ante and ex-post. We launch a best-of-both-worlds fairness perspective for the important social choice setting of approval-based committee voting. To this end, we initiate work on ex-ante proportional representation properties in this domain and formalize a hierarchy of notions including Individual Fair Share (IFS), Unanimous Fair Share (UFS), Group Fair Share (GFS), and their stronger variants. We establish their compatibility with well-studied ex-post concepts such as extended justified representation (EJR) and fully justified representation (FJR). Our first main result is a polynomial-time algorithm that simultaneously satisfies ex-post EJR, ex-ante GFS and ex-ante Strong UFS. Subsequently, we strengthen our ex-post guarantee to FJR and present an algorithm that outputs a lottery which is ex-post FJR and ex-ante Strong UFS, but does not run in polynomial time. △ Less

Submitted 25 December, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Appears in the 19th Conference on Web and Internet Economics (WINE), 2023

arXiv:2301.05832 [pdf, other]

World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges

Authors: Tadahiro Taniguchi, Shingo Murata, Masahiro Suzuki, Dimitri Ognibene, Pablo Lanillos, Emre Ugur, Lorenzo Jamone, Tomoaki Nakamura, Alejandra Ciria, Bruno Lara, Giovanni Pezzulo

Abstract: Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on… ▽ More Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on the two concepts of world models and predictive coding. Recently, world models have attracted renewed attention as a topic of considerable interest in artificial intelligence. Cognitive systems learn world models to better predict future sensory observations and optimize their policies, i.e., controllers. Alternatively, in neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment. Both ideas may be considered as underpinning the cognitive development of robots and humans capable of continual or lifelong learning. Although many studies have been conducted on predictive coding in cognitive robotics and neurorobotics, the relationship between world model-based approaches in AI and predictive coding in robotics has rarely been discussed. Therefore, in this paper, we clarify the definitions, relationships, and status of current research on these topics, as well as missing pieces of world models and predictive coding in conjunction with crucially related concepts such as the free-energy principle and active inference in the context of cognitive and developmental robotics. Furthermore, we outline the frontiers and challenges involved in world models and predictive coding toward the further integration of AI and robotics, as well as the creation of robots with real cognitive and developmental capabilities in the future. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 28 pages, 3 figures

arXiv:2211.00879 [pdf, ps, other]

Fair Allocation of Two Types of Chores

Authors: Haris Aziz, Jeremy Lindsay, Angus Ritossa, Mashbat Suzuki

Abstract: We consider the problem of fair allocation of indivisible chores under additive valuations. We assume that the chores are divided into two types and under this scenario, we present several results. Our first result is a new characterization of Pareto optimal allocations in our setting, and a polynomial-time algorithm to compute an envy-free up to one item (EF1) and Pareto optimal allocation. We th… ▽ More We consider the problem of fair allocation of indivisible chores under additive valuations. We assume that the chores are divided into two types and under this scenario, we present several results. Our first result is a new characterization of Pareto optimal allocations in our setting, and a polynomial-time algorithm to compute an envy-free up to one item (EF1) and Pareto optimal allocation. We then turn to the question of whether we can achieve a stronger fairness concept called envy-free up any item (EFX). We present a polynomial-time algorithm that returns an EFX allocation. Finally, we show that for our setting, it can be checked in polynomial time whether an envy-free allocation exists or not. △ Less

Submitted 24 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.08703 [pdf, ps, other]

Spoken Dialogue System Based on Attribute Vector for Travel Agent Robot

Authors: Motoyuki Suzuki, Shintaro Sodeya, Taichi Nakamura

Abstract: In this study, we develop a dialogue system for a dialogue robot competition. In the system, the characteristics of sightseeing spots are expressed as "attribute vectors" in advance, and the user is questioned on the different attributes of the two candidate spots. Consequently, the system can make recommendations based on user intentions. A dialogue experiment is conducted during a preliminary ro… ▽ More In this study, we develop a dialogue system for a dialogue robot competition. In the system, the characteristics of sightseeing spots are expressed as "attribute vectors" in advance, and the user is questioned on the different attributes of the two candidate spots. Consequently, the system can make recommendations based on user intentions. A dialogue experiment is conducted during a preliminary round of competition. The overall satisfaction score obtained is 40.1 out of 63 points, which is a reasonable result. Analysis of the relationship between the system behavior and satisfaction scores reveals that satisfaction increases when the system correctly understands the user intention and responds appropriately. However, a negative correlation is observed between the number of user utterances and the satisfaction score. This implies that inappropriate responses reduce the usefulness of the system as a consultation partner. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2022

arXiv:2207.02127 [pdf, ps, other]

doi 10.1080/01691864.2022.2035253

A survey of multimodal deep generative models

Authors: Masahiro Suzuki, Yutaka Matsuo

Abstract: Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years,… ▽ More Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Published in Advanced Robotics

Journal ref: Advanced Robotics, 36:5-6, 261-278, 2022

arXiv:2206.05966 [pdf, other]

Coordinating Monetary Contributions in Participatory Budgeting

Authors: Haris Aziz, Sujit Gujar, Manisha Padala, Mashbat Suzuki, Jeremy Vollen

Abstract: We formalize a framework for coordinating funding and selecting projects, the costs of which are shared among agents with quasi-linear utility functions and individual budgets. Our model contains the classical discrete participatory budgeting model as a special case, while capturing other useful scenarios. We propose several important axioms and objectives and study how well they can be simultaneo… ▽ More We formalize a framework for coordinating funding and selecting projects, the costs of which are shared among agents with quasi-linear utility functions and individual budgets. Our model contains the classical discrete participatory budgeting model as a special case, while capturing other useful scenarios. We propose several important axioms and objectives and study how well they can be simultaneously satisfied. We show that whereas welfare maximization admits an FPTAS, welfare maximization subject to a natural and very weak participation requirement leads to a strong inapproximability. This result is bypassed if we consider some natural restricted valuations, namely laminar single-minded valuations and symmetric valuations. Our analysis for the former restriction leads to the discovery of a new class of tractable instances for the Set Union Knapsack problem, a classical problem in combinatorial optimization. △ Less

Submitted 22 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: In this version, we include results regarding single minded valuations. We have also corrected a bug in the proof of Lemma 1

arXiv:2205.14798 [pdf, other]

Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism

Authors: Haris Aziz, Alexander Lam, Mashbat Suzuki, Toby Walsh

Abstract: Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice. In our work, we propose a concept called Strong Proportionality, which ensures that when there are two groups of agents at different locations, both groups incur the same total cost. We show that although Strong Proportionality… ▽ More Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice. In our work, we propose a concept called Strong Proportionality, which ensures that when there are two groups of agents at different locations, both groups incur the same total cost. We show that although Strong Proportionality is a well-motivated and basic axiom, there is no deterministic strategyproof mechanism satisfying the property. We then identify a randomized mechanism called Random Rank (which uniformly selects a number $k$ between $1$ to $n$ and locates the facility at the $k$'th highest agent location) which satisfies Strong Proportionality in expectation. Our main theorem characterizes Random Rank as the unique mechanism that achieves universal truthfulness, universal anonymity, and Strong Proportionality in expectation among all randomized mechanisms. Finally, we show via the AverageOrRandomRank mechanism that even stronger ex-post fairness guarantees can be achieved by weakening universal truthfulness to strategyproofness in expectation. △ Less

Submitted 14 June, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2204.00212 [pdf, ps, other]

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon

Abstract: Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is… ▽ More Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance. △ Less

Submitted 18 August, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: Accepted to Interspeech 2022

arXiv:2203.15176 [pdf, other]

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Authors: Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

Abstract: We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects… ▽ More We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Interspeech 2022

Showing 1–50 of 66 results for author: Suzuki, M