Search | arXiv e-print repository

Theoretical Analysis of KL-regularized RLHF with Multiple Reference Models

Authors: Gholamali Aminian, Amir R. Asadi, Idan Shenfeld, Youssef Mroueh

Abstract: Recent methods for aligning large language models (LLMs) with human feedback predominantly rely on a single reference model, which limits diversity, model overfitting, and underutilizes the wide range of available pre-trained models. Incorporating multiple reference models has the potential to address these limitations by broadening perspectives, reducing bias, and leveraging the strengths of dive… ▽ More Recent methods for aligning large language models (LLMs) with human feedback predominantly rely on a single reference model, which limits diversity, model overfitting, and underutilizes the wide range of available pre-trained models. Incorporating multiple reference models has the potential to address these limitations by broadening perspectives, reducing bias, and leveraging the strengths of diverse open-source LLMs. However, integrating multiple reference models into reinforcement learning with human feedback (RLHF) frameworks poses significant theoretical challenges, where achieving exact solutions has remained an open problem. This paper presents the first \emph{exact solution} to the multiple reference model problem in reverse KL-regularized RLHF. We introduce a comprehensive theoretical framework that includes rigorous statistical analysis and provides sample complexity guarantees. Additionally, we extend our analysis to forward KL-regularized RLHF, offering new insights into sample complexity requirements in multiple reference scenarios. Our contributions lay the foundation for more advanced and adaptable LLM alignment techniques, enabling the effective use of multiple reference models. This work paves the way for developing alignment frameworks that are both theoretically sound and better suited to the challenges of modern AI ecosystems. △ Less

Submitted 4 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

Comments: Experiments are added in new version

arXiv:2411.17541 [pdf]

doi 10.1007/978-3-031-85663-1_1

Metaverse Innovation Canvas: A Tool for Extended Reality Product/Service Development

Authors: Amir Reza Asadi, Mohamad Saraee, Azadeh Mohammadi

Abstract: This study investigated the factors contributing to the failure of augmented reality (AR) and virtual reality (VR) startups in the emerging metaverse landscape. Through an in-depth analysis of 29 failed AR/VR startups from 2016 to 2022, key pitfalls were identified, such as a lack of scalability, poor usability, unclear value propositions, and the failure to address specific user problems. Grounde… ▽ More This study investigated the factors contributing to the failure of augmented reality (AR) and virtual reality (VR) startups in the emerging metaverse landscape. Through an in-depth analysis of 29 failed AR/VR startups from 2016 to 2022, key pitfalls were identified, such as a lack of scalability, poor usability, unclear value propositions, and the failure to address specific user problems. Grounded in these findings, we developed the Metaverse Innovation Canvas (MIC) a tailored business ideation framework for XR products and services. The canvas guides founders to define user problems, articulate unique XR value propositions, evaluate usability factors such as the motion-based interaction load, consider social/virtual economy opportunities, and plan for long term scalability. Unlike generalized models, specialized blocks prompt the consideration of critical XR factors from the outset. The canvas was evaluated through expert testing with startup consultants on five failed venture cases. The results highlighted the tool's effectiveness in surfacing overlooked usability issues and technology constraints upfront, enhancing the viability of future metaverse startups. △ Less

Submitted 26 November, 2024; originally announced November 2024.

arXiv:2411.10408 [pdf]

Exploring the Future Metaverse: Research Models for User Experience, Business Readiness, and National Competitiveness

Authors: Amir Reza Asadi, Shiva Ghasemi

Abstract: This systematic literature review paper explores perspectives on the ideal metaverse from user experience, business, and national levels, considering both academic and industry viewpoints. The study examines the metaverse as a sociotechnical imaginary, enabled collectively by virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies. Through a systematic literature review,… ▽ More This systematic literature review paper explores perspectives on the ideal metaverse from user experience, business, and national levels, considering both academic and industry viewpoints. The study examines the metaverse as a sociotechnical imaginary, enabled collectively by virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies. Through a systematic literature review, n=144 records were included and by employing grounded theory for analysis of data, we developed three research models, which can guide researchers in examining the metaverse as a sociotechnical future of information technology. Designers can apply the metaverse user experience maturity model to develop more user-friendly services, while business strategists can use the metaverse business readiness model to assess their firms' current state and prepare for transformation. Additionally, policymakers and policy analysts can utilize the metaverse national competitiveness model to track their countries' competitiveness during this paradigm shift. The synthesis of the results also led to the development of practical assessment tools derived from these models that can guide researchers △ Less

Submitted 15 November, 2024; originally announced November 2024.

ACM Class: K.4.2; K.6.1; H.5.2

arXiv:2409.19431 [pdf, ps, other]

Generalization and Robustness of the Tilted Empirical Risk

Authors: Gholamali Aminian, Amir R. Asadi, Tian Li, Ahmad Beirami, Gesine Reinert, Samuel N. Cohen

Abstract: The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error… ▽ More The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+ε)$-th moment of loss function for some $ε\in(0,1]$ with a convergence rate of $O(n^{-ε/(1+ε)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner. △ Less

Submitted 7 June, 2025; v1 submitted 28 September, 2024; originally announced September 2024.

Comments: Accepted in ICML 2025

arXiv:2402.15974 [pdf]

Towards Mixed Reality as the Everyday Computing Paradigm: Challenges & Design Recommendations

Authors: Amir Reza Asadi, Reza Hemadi

Abstract: This research presents a proof-of-concept prototype of an all-in-one mixed reality application platform, developed to investigate the needs and expectations of users from mixed reality systems. The study involved an extensive user study with 1,052 participants, including the collection of diaries from 6 users and conducting interviews with 15 participants to gain deeper insights into their experie… ▽ More This research presents a proof-of-concept prototype of an all-in-one mixed reality application platform, developed to investigate the needs and expectations of users from mixed reality systems. The study involved an extensive user study with 1,052 participants, including the collection of diaries from 6 users and conducting interviews with 15 participants to gain deeper insights into their experiences. The findings from the interviews revealed that directly porting current user flows into 3D environments was not well-received by the target users. Instead, users expressed a clear preference for alternative 3D interactions along with the continued use of 2D interfaces. This study provides insights for understanding user preferences and interactions in mixed reality systems, and design recommendations to facilitate the mass adoption of MR systems. △ Less

Submitted 15 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2301.03566 [pdf, other]

Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints

Authors: Ankit Pensia, Amir R. Asadi, Varun Jog, Po-Ling Loh

Abstract: We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints. We qualify our results as either minimax optimal or instance optimal: the former hold for the set of distribution pairs with prescribed Hellinger divergence and total variation distance, whereas the latter hold for specific distribution pairs. For the sample complexity of simple hyp… ▽ More We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints. We qualify our results as either minimax optimal or instance optimal: the former hold for the set of distribution pairs with prescribed Hellinger divergence and total variation distance, whereas the latter hold for specific distribution pairs. For the sample complexity of simple hypothesis testing under pure LDP constraints, we establish instance-optimal bounds for distributions with binary support; minimax-optimal bounds for general distributions; and (approximately) instance-optimal, computationally efficient algorithms for general distributions. When both privacy and communication constraints are present, we develop instance-optimal, computationally efficient algorithms that achieve the minimum possible sample complexity (up to universal constants). Our results on instance-optimal algorithms hinge on identifying the extreme points of the joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as $\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$, where $\mathcal C$ is the set of channels characterizing the constraints. △ Less

Submitted 15 December, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: 1 figure

arXiv:2212.14681 [pdf, other]

An Entropy-Based Model for Hierarchical Learning

Authors: Amir R. Asadi

Abstract: Machine learning is the dominant approach to artificial intelligence, through which computers learn from data and experience. In the framework of supervised learning, a necessity for a computer to learn from data accurately and efficiently is to be provided with auxiliary information about the data distribution and target function through the learning model. This notion of auxiliary information re… ▽ More Machine learning is the dominant approach to artificial intelligence, through which computers learn from data and experience. In the framework of supervised learning, a necessity for a computer to learn from data accurately and efficiently is to be provided with auxiliary information about the data distribution and target function through the learning model. This notion of auxiliary information relates to the concept of regularization in statistical learning theory. A common feature among real-world datasets is that data domains are multiscale and target functions are well-behaved and smooth. This paper proposes an entropy-based learning model that exploits this data structure and discusses its statistical and computational benefits. The hierarchical learning model is inspired by human beings' logical and progressive easy-to-hard learning mechanism and has interpretable levels. The model apportions computational resources according to the complexity of data instances and target functions. This property can have multiple benefits, including higher inference speed and computational savings in training a model for many users or when training is interrupted. We provide a statistical analysis of the learning mechanism using multiscale entropies and show that it can yield significantly stronger guarantees than uniform convergence bounds. △ Less

Submitted 24 January, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

arXiv:2203.14253 [pdf]

doi 10.1109/DGRC.2018.8712047

Understanding Currencies in Video Games: A Review

Authors: Amir Reza Asadi, Reza Hemadi

Abstract: This paper presents a review of the status of currencies in video games. The business of video games is a multibillion-dollar industry, and its internal economy design is an important field to investigate. In this study, we have distinguished virtual currencies in terms of game mechanics and virtual currency schema, and we have examined 11 games that have used virtual currencies in a significant w… ▽ More This paper presents a review of the status of currencies in video games. The business of video games is a multibillion-dollar industry, and its internal economy design is an important field to investigate. In this study, we have distinguished virtual currencies in terms of game mechanics and virtual currency schema, and we have examined 11 games that have used virtual currencies in a significant way and have provided insight for game designers on the internal game economy by showing tangible examples of game mechanics presented in our model △ Less

Submitted 28 September, 2024; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: "Published" 1st International Digital Games Research Conference: Trends, Technologies, and Applications (DGRC)

ACM Class: A.1; K.0

arXiv:2201.08163 [pdf]

doi 10.1109/IISEC54230.2021.9672433

Cognitive Ledger Project: Towards Building Personal Digital Twins Through Cognitive Blockchain

Authors: Amir Reza Asadi

Abstract: The Cognitive Ledger Project is an effort to develop a modular system for turning users' personal data into structured information and machine learning models based on a blockchain-based infrastructure. In this work-in-progress paper, we propose a cognitive architecture for cognitive digital twins. The suggested design embraces a cognitive blockchain (Cognitive ledger) at its core. The architectur… ▽ More The Cognitive Ledger Project is an effort to develop a modular system for turning users' personal data into structured information and machine learning models based on a blockchain-based infrastructure. In this work-in-progress paper, we propose a cognitive architecture for cognitive digital twins. The suggested design embraces a cognitive blockchain (Cognitive ledger) at its core. The architecture includes several modules that turn users' activities in the digital environment into reusable knowledge objects and artificial intelligence that one day can work together to form the cognitive digital twin of users. △ Less

Submitted 15 June, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

ACM Class: I.2.11; I.2.0

arXiv:2006.14614 [pdf, other]

Maximum Multiscale Entropy and Neural Network Regularization

Authors: Amir R. Asadi, Emmanuel Abbe

Abstract: A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This is used for instance in density estimation or to achieve excess risk bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This paper investigates a gener… ▽ More A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This is used for instance in density estimation or to achieve excess risk bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This paper investigates a generalization of these results to a multiscale setting. We present different ways of generalizing the maximum entropy result by incorporating the notion of scale. For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics. For the case of decimation transformation, it is further shown that this distribution is Gaussian whenever the optimal single-scale distribution is Gaussian. This is then applied to neural networks, and it is shown that in a teacher-student scenario, the multiscale Gibbs posterior can achieve a smaller excess risk than the single-scale Gibbs posterior. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 27 pages, 2 figures

arXiv:1906.11148 [pdf, other]

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

Authors: Amir R. Asadi, Emmanuel Abbe

Abstract: We derive generalization and excess risk bounds for neural nets using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18. The resulting bounds are algorithm-dependent and exploit the… ▽ More We derive generalization and excess risk bounds for neural nets using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18. The resulting bounds are algorithm-dependent and exploit the multilevel structure of neural nets. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multi-scale generalization of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural nets with performance guarantees, which exploits the chain rule of relative entropy rather than the chain rule of derivatives (as in backpropagation). To obtain an efficient implementation of the latter, we further develop a multilevel Metropolis algorithm simulating the multi-scale Gibbs distribution, with an experiment for a two-layer neural net on the MNIST data set. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Comments: 30 pages, 3 figures

arXiv:1806.03803 [pdf, other]

Chaining Mutual Information and Tightening Generalization Bounds

Authors: Amir R. Asadi, Emmanuel Abbe, Sergio Verdú

Abstract: Bounding the generalization error of learning algorithms has a long history, which yet falls short in explaining various generalization successes including those of deep learning. Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output. Progress on the first point was made with the chaining metho… ▽ More Bounding the generalization error of learning algorithms has a long history, which yet falls short in explaining various generalization successes including those of deep learning. Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output. Progress on the first point was made with the chaining method, originating from the work of Kolmogorov, and used in the VC-dimension bound. More recently, progress on the second point was made with the mutual information method by Russo and Zou '15. Yet, these two methods are currently disjoint. In this paper, we introduce a technique to combine the chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses. We provide an example in which our bound significantly outperforms both the chaining and the mutual information bounds. As a corollary, we tighten Dudley's inequality when the learning algorithm chooses its output from a small subset of hypotheses with high probability. △ Less

Submitted 1 July, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: 20 pages, 1 figure; published at the NeurIPS 2018 conference

Showing 1–12 of 12 results for author: Asadi, A R