Search | arXiv e-print repository

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Authors: Jiayi Tian, Ryan Solgi, Jinming Lu, Yifan Yang, Hai Li, Zheng Zhang

Abstract: Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result i… ▽ More Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result in inefficient model architectures that hinder real-world inference speedups. In this paper, we propose FLAT-LLM, a fast and accurate, training-free structural compression method based on fine-grained low-rank transformations in the activation space. Specifically, we reduce the hidden dimension by transforming the weights using truncated eigenvectors computed via head-wise Principal Component Analysis (PCA), and employ an importance-based metric to adaptively allocate ranks across decoders. FLAT-LLM achieves efficient and effective weight compression without recovery fine-tuning, which could complete the calibration within a few minutes. Evaluated across 4 models and 11 datasets, FLAT-LLM outperforms structural pruning baselines in generalization and downstream performance, while delivering inference speedups over decomposition-based methods. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.14871 [pdf, ps, other]

Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models

Authors: Ryan Solgi, Kai Zhen, Rupak Vignesh Swaminathan, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang

Abstract: The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging d… ▽ More The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2310.20077 [pdf, other]

Partial Tensorized Transformers for Natural Language Processing

Authors: Subhadra Vadlamannati, Ryan Solgi

Abstract: The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, n… ▽ More The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, namely BERT and ViT. We focus both on embedding-layer compression and partial tensorization of neural networks (PTNN) through an algorithmic approach. Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments, breaking new ground in the field of tensor decomposition. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: In Review under the 16th International Conference on Agents and Artificial Intelligence

arXiv:2205.10651 [pdf, other]

Tensor Shape Search for Optimum Data Compression

Authors: Ryan Solgi, Zichang He, William Jiahua Liang, Zheng Zhang

Abstract: Various tensor decomposition methods have been proposed for data compression. In real world applications of the tensor decomposition, selecting the tensor shape for the given data poses a challenge and the shape of the tensor may affect the error and the compression ratio. In this work, we study the effect of the tensor shape on the tensor decomposition and propose an optimization model to find an… ▽ More Various tensor decomposition methods have been proposed for data compression. In real world applications of the tensor decomposition, selecting the tensor shape for the given data poses a challenge and the shape of the tensor may affect the error and the compression ratio. In this work, we study the effect of the tensor shape on the tensor decomposition and propose an optimization model to find an optimum shape for the tensor train (TT) decomposition. The proposed optimization model maximizes the compression ratio of the TT decomposition given an error bound. We implement a genetic algorithm (GA) linked with the TT-SVD algorithm to solve the optimization model. We apply the proposed method for the compression of RGB images. The results demonstrate the effectiveness of the proposed evolutionary tensor shape search for the TT decomposition. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:1605.03471 [pdf, other]

Nonparametric hierarchical Bayesian quantiles

Authors: Luke Bornn, Neil Shephard, Reza Solgi

Abstract: Here we develop a method for performing nonparametric Bayesian inference on quantiles. Relying on geometric measure theory and employing a Hausdorff base measure, we are able to specify meaningful priors for the quantile while treating the distribution of the data otherwise nonparametrically. We further extend the method to a hierarchical model for quantiles of subpopulations, linking subgroups to… ▽ More Here we develop a method for performing nonparametric Bayesian inference on quantiles. Relying on geometric measure theory and employing a Hausdorff base measure, we are able to specify meaningful priors for the quantile while treating the distribution of the data otherwise nonparametrically. We further extend the method to a hierarchical model for quantiles of subpopulations, linking subgroups together solely through their quantiles. Our approach is computationally straightforward, allowing for censored and noisy data. We demonstrate the proposed methodology on simulated data and an applied problem from sports statistics, where it is observed to stabilize and improve inference and prediction. △ Less

Submitted 11 May, 2016; originally announced May 2016.

arXiv:1507.08645 [pdf, other]

Moment conditions and Bayesian nonparametrics

Authors: Luke Bornn, Neil Shephard, Reza Solgi

Abstract: Models phrased though moment conditions are central to much of modern inference. Here these moment conditions are embedded within a nonparametric Bayesian setup. Handling such a model is not probabilistically straightforward as the posterior has support on a manifold. We solve the relevant issues, building new probability and computational tools using Hausdorff measures to analyze them on real and… ▽ More Models phrased though moment conditions are central to much of modern inference. Here these moment conditions are embedded within a nonparametric Bayesian setup. Handling such a model is not probabilistically straightforward as the posterior has support on a manifold. We solve the relevant issues, building new probability and computational tools using Hausdorff measures to analyze them on real and simulated data. These new methods which involve simulating on a manifold can be applied widely, including providing Bayesian analysis of quasi-likelihoods, linear and nonlinear regression, missing data and hierarchical models. △ Less

Submitted 13 January, 2016; v1 submitted 30 July, 2015; originally announced July 2015.

arXiv:1502.04266 [pdf]

Constrained Nonlinear Model Predictive Control of an MMA Polymerization Process via Evolutionary Optimization

Authors: Masoud Abbaszadeh, Reza Solgi

Abstract: In this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is… ▽ More In this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is designed for thermal trajectory tracking of the MMA polymerization. The input control signal to the process is constrained by the maximum thermal power provided by the heaters. The constrained optimization in the model predictive controller is solved via genetic algorithms to minimize a DMC cost function in each sampling interval. △ Less

Submitted 14 February, 2015; originally announced February 2015.

Comments: 12 pages, 9 figures, 28 references

arXiv:1012.2983 [pdf, other]

doi 10.1007/s11222-012-9344-6

Zero Variance Markov Chain Monte Carlo for Bayesian Estimators

Authors: Antonietta Mira, Reza Solgi, Daniele Imparato

Abstract: Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC estimator, based on the zero-variance principle introduced in the physics literature, is proposed. Conditions for asymptotic unbiasedness of the zero-variance estima… ▽ More Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC estimator, based on the zero-variance principle introduced in the physics literature, is proposed. Conditions for asymptotic unbiasedness of the zero-variance estimator are derived. A central limit theorem is also proved under regularity conditions. The potential of the idea is illustrated with real applications to probit, logit and GARCH Bayesian models. For all these models, a central limit theorem and unbiasedness for the zero-variance estimator are proved (see the supplementary material available on-line). △ Less

Submitted 26 June, 2012; v1 submitted 14 December, 2010; originally announced December 2010.

Comments: 26 pages, 4 figures. This is an updated version: the results are the same as the previous one, but presentation is more essential

MSC Class: 62

Journal ref: Statistics and Computing, 2012

arXiv:cond-mat/0410289 [pdf, ps, other]

Statistical analysis of the price index of Tehran Stock Exchange

Authors: A. Rasoolizadeh, R. Solgi

Abstract: This paper presents a statistical analysis of Tehran Price Index (TePIx) for the period of 1992 to 2004. The results present asymmetric property of the return distribution which tends to the right hand of the mean. Also the return distribution can be fitted by a stable Levy distribution and the tails are very fatter than the gaussian distribution. We estimate the tail index of the TePIx returns… ▽ More This paper presents a statistical analysis of Tehran Price Index (TePIx) for the period of 1992 to 2004. The results present asymmetric property of the return distribution which tends to the right hand of the mean. Also the return distribution can be fitted by a stable Levy distribution and the tails are very fatter than the gaussian distribution. We estimate the tail index of the TePIx returns with two different methods and the results are consistent with the previous studies on the stock markets. A strong autocorrelation has been detected in the TePIx time series representing a long memory of several trading days. We have also applied a Zipf analysis on the TePIx data presenting strong correlations between the TePIx daily fluctuations. We hope that this paper be able to give a brief description about the statistical behavior of financial data in Iran stock market. △ Less

Submitted 12 October, 2004; originally announced October 2004.

Showing 1–9 of 9 results for author: Solgi, R