-
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression
Authors:
Jiayi Tian,
Ryan Solgi,
Jinming Lu,
Yifan Yang,
Hai Li,
Zheng Zhang
Abstract:
Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result i…
▽ More
Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result in inefficient model architectures that hinder real-world inference speedups. In this paper, we propose FLAT-LLM, a fast and accurate, training-free structural compression method based on fine-grained low-rank transformations in the activation space. Specifically, we reduce the hidden dimension by transforming the weights using truncated eigenvectors computed via head-wise Principal Component Analysis (PCA), and employ an importance-based metric to adaptively allocate ranks across decoders. FLAT-LLM achieves efficient and effective weight compression without recovery fine-tuning, which could complete the calibration within a few minutes. Evaluated across 4 models and 11 datasets, FLAT-LLM outperforms structural pruning baselines in generalization and downstream performance, while delivering inference speedups over decomposition-based methods.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
Authors:
Ryan Solgi,
Kai Zhen,
Rupak Vignesh Swaminathan,
Nathan Susanj,
Athanasios Mouchtaris,
Siegfried Kunzmann,
Zheng Zhang
Abstract:
The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging d…
▽ More
The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Partial Tensorized Transformers for Natural Language Processing
Authors:
Subhadra Vadlamannati,
Ryan Solgi
Abstract:
The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, n…
▽ More
The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, namely BERT and ViT. We focus both on embedding-layer compression and partial tensorization of neural networks (PTNN) through an algorithmic approach. Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments, breaking new ground in the field of tensor decomposition.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Tensor Shape Search for Optimum Data Compression
Authors:
Ryan Solgi,
Zichang He,
William Jiahua Liang,
Zheng Zhang
Abstract:
Various tensor decomposition methods have been proposed for data compression. In real world applications of the tensor decomposition, selecting the tensor shape for the given data poses a challenge and the shape of the tensor may affect the error and the compression ratio. In this work, we study the effect of the tensor shape on the tensor decomposition and propose an optimization model to find an…
▽ More
Various tensor decomposition methods have been proposed for data compression. In real world applications of the tensor decomposition, selecting the tensor shape for the given data poses a challenge and the shape of the tensor may affect the error and the compression ratio. In this work, we study the effect of the tensor shape on the tensor decomposition and propose an optimization model to find an optimum shape for the tensor train (TT) decomposition. The proposed optimization model maximizes the compression ratio of the TT decomposition given an error bound. We implement a genetic algorithm (GA) linked with the TT-SVD algorithm to solve the optimization model. We apply the proposed method for the compression of RGB images. The results demonstrate the effectiveness of the proposed evolutionary tensor shape search for the TT decomposition.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
Nonparametric hierarchical Bayesian quantiles
Authors:
Luke Bornn,
Neil Shephard,
Reza Solgi
Abstract:
Here we develop a method for performing nonparametric Bayesian inference on quantiles. Relying on geometric measure theory and employing a Hausdorff base measure, we are able to specify meaningful priors for the quantile while treating the distribution of the data otherwise nonparametrically. We further extend the method to a hierarchical model for quantiles of subpopulations, linking subgroups to…
▽ More
Here we develop a method for performing nonparametric Bayesian inference on quantiles. Relying on geometric measure theory and employing a Hausdorff base measure, we are able to specify meaningful priors for the quantile while treating the distribution of the data otherwise nonparametrically. We further extend the method to a hierarchical model for quantiles of subpopulations, linking subgroups together solely through their quantiles. Our approach is computationally straightforward, allowing for censored and noisy data. We demonstrate the proposed methodology on simulated data and an applied problem from sports statistics, where it is observed to stabilize and improve inference and prediction.
△ Less
Submitted 11 May, 2016;
originally announced May 2016.
-
Moment conditions and Bayesian nonparametrics
Authors:
Luke Bornn,
Neil Shephard,
Reza Solgi
Abstract:
Models phrased though moment conditions are central to much of modern inference. Here these moment conditions are embedded within a nonparametric Bayesian setup. Handling such a model is not probabilistically straightforward as the posterior has support on a manifold. We solve the relevant issues, building new probability and computational tools using Hausdorff measures to analyze them on real and…
▽ More
Models phrased though moment conditions are central to much of modern inference. Here these moment conditions are embedded within a nonparametric Bayesian setup. Handling such a model is not probabilistically straightforward as the posterior has support on a manifold. We solve the relevant issues, building new probability and computational tools using Hausdorff measures to analyze them on real and simulated data. These new methods which involve simulating on a manifold can be applied widely, including providing Bayesian analysis of quasi-likelihoods, linear and nonlinear regression, missing data and hierarchical models.
△ Less
Submitted 13 January, 2016; v1 submitted 30 July, 2015;
originally announced July 2015.
-
Constrained Nonlinear Model Predictive Control of an MMA Polymerization Process via Evolutionary Optimization
Authors:
Masoud Abbaszadeh,
Reza Solgi
Abstract:
In this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is…
▽ More
In this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is designed for thermal trajectory tracking of the MMA polymerization. The input control signal to the process is constrained by the maximum thermal power provided by the heaters. The constrained optimization in the model predictive controller is solved via genetic algorithms to minimize a DMC cost function in each sampling interval.
△ Less
Submitted 14 February, 2015;
originally announced February 2015.
-
Zero Variance Markov Chain Monte Carlo for Bayesian Estimators
Authors:
Antonietta Mira,
Reza Solgi,
Daniele Imparato
Abstract:
Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC estimator, based on the zero-variance principle introduced in the physics literature, is proposed. Conditions for asymptotic unbiasedness of the zero-variance estima…
▽ More
Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC estimator, based on the zero-variance principle introduced in the physics literature, is proposed. Conditions for asymptotic unbiasedness of the zero-variance estimator are derived. A central limit theorem is also proved under regularity conditions. The potential of the idea is illustrated with real applications to probit, logit and GARCH Bayesian models. For all these models, a central limit theorem and unbiasedness for the zero-variance estimator are proved (see the supplementary material available on-line).
△ Less
Submitted 26 June, 2012; v1 submitted 14 December, 2010;
originally announced December 2010.
-
Statistical analysis of the price index of Tehran Stock Exchange
Authors:
A. Rasoolizadeh,
R. Solgi
Abstract:
This paper presents a statistical analysis of Tehran Price Index (TePIx) for the period of 1992 to 2004. The results present asymmetric property of the return distribution which tends to the right hand of the mean. Also the return distribution can be fitted by a stable Levy distribution and the tails are very fatter than the gaussian distribution. We estimate the tail index of the TePIx returns…
▽ More
This paper presents a statistical analysis of Tehran Price Index (TePIx) for the period of 1992 to 2004. The results present asymmetric property of the return distribution which tends to the right hand of the mean. Also the return distribution can be fitted by a stable Levy distribution and the tails are very fatter than the gaussian distribution. We estimate the tail index of the TePIx returns with two different methods and the results are consistent with the previous studies on the stock markets. A strong autocorrelation has been detected in the TePIx time series representing a long memory of several trading days. We have also applied a Zipf analysis on the TePIx data presenting strong correlations between the TePIx daily fluctuations. We hope that this paper be able to give a brief description about the statistical behavior of financial data in Iran stock market.
△ Less
Submitted 12 October, 2004;
originally announced October 2004.