Search | arXiv e-print repository

Weight-Sharing Regularization

Authors: Mehran Shakerinava, Motahareh Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien

Abstract: Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We… ▽ More Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal mapping of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We also parallelize existing algorithms for $\operatorname{prox}_\mathcal{R}$ (to run on GPU) and find that one of them is fast in practice but slow ($O(d)$) for worst-case inputs. Using the physical interpretation, we design a novel parallel algorithm which runs in $O(\log^3 d)$ when sufficient processors are available, thus guaranteeing fast training. Our experiments reveal that weight-sharing regularization enables fully connected networks to learn convolution-like filters even when pixels have been shuffled while convolutional neural networks fail in this setting. Our code is available on github. △ Less

Submitted 10 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Our code is available at https://github.com/motahareh-sohrabi/weight-sharing-regularization

arXiv:2306.04907 [pdf, other]

Estimation of Poverty Measures for Small Areas Under a Two-Fold Nested Error Linear Regression Model: Comparison of Two Methods

Authors: Maryam Sohrabi, J. N. K. Rao

Abstract: Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular pove… ▽ More Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular poverty incidence and poverty gap called FGT measures, using a simulated census method, called ELL, based on a one-fold nested error model for a suitable transformation of the welfare variable. Modified ELL methods leading to significant gain in efficiency over ELL also have been proposed under the one-fold model. An advantage of ELL and modified ELL methods is that distributional assumptions on the random effects in the model are not needed. In this paper, we extend ELL and modified ELL to two-fold nested error models to estimate poverty indicators for areas (say a state) and subareas (say counties within a state). Our simulation results indicate that the modified ELL estimators lead to large efficiency gains over ELL at the area level and subarea level. Further, modified ELL method retaining both area and subarea estimated effects in the model (called MELL2) performs significantly better in terms of mean squared error (MSE) for sampled subareas than the modified ELL retaining only estimated area effect in the model (called MELL1). △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:1603.02665 [pdf, ps, other]

A Note on Bootstrapping M-estimates from Unstable AR(2) Process with Infinite Variance Innovations

Authors: Maryam Sohrabi, Mahmoud Zarepour

Abstract: The limiting distribution for M-estimates in a non-stationary autoregressive model with heavy-tailed error is computationally intractable. To make inferences based on the M-estimates, the bootstrap procedure can be used to approximate the sampling distribution. In this paper, we show that the bootstrap scheme with $m=o(n)$ resampling sample size when $m/n \to 0$ is approximately valid in a multipl… ▽ More The limiting distribution for M-estimates in a non-stationary autoregressive model with heavy-tailed error is computationally intractable. To make inferences based on the M-estimates, the bootstrap procedure can be used to approximate the sampling distribution. In this paper, we show that the bootstrap scheme with $m=o(n)$ resampling sample size when $m/n \to 0$ is approximately valid in a multiple unit roots time series with innovations in the domain of attraction of a stable law with index $0<α\leq2$. △ Less

Submitted 8 March, 2016; originally announced March 2016.

Comments: 11 pages

arXiv:1510.01811 [pdf, other]

Bootstrapping the Mean Vector for the Observations in the Domain of Attraction of a Multivariate Stable Law

Authors: Maryam Sohrabi, Mahmoud Zarepour

Abstract: We consider a robust estimation of the mean vector for a sequence of i.i.d. observations in the domain of attraction of a stable law with different indices of stability, $DS(α_1, \ldots, α_p)$, such that $1<α_{i}\leq 2$, $i=1,\ldots,p$. The suggested estimator is asymptotically Gaussian with unknown parameters. We apply an asymptotically valid bootstrap to construct a confidence region for the mea… ▽ More We consider a robust estimation of the mean vector for a sequence of i.i.d. observations in the domain of attraction of a stable law with different indices of stability, $DS(α_1, \ldots, α_p)$, such that $1<α_{i}\leq 2$, $i=1,\ldots,p$. The suggested estimator is asymptotically Gaussian with unknown parameters. We apply an asymptotically valid bootstrap to construct a confidence region for the mean vector. A simulation study is performed to show that the estimation method is efficient for conducting inference about the mean vector for multivariate heavy-tailed distributions. △ Less

Submitted 12 December, 2016; v1 submitted 6 October, 2015; originally announced October 2015.

Comments: 13 pages, 3 figures

MSC Class: Primary 62H10; 62G20; Secondary 62G09; 62G35

arXiv:1506.05830 [pdf, other]

Asymptotic Theory for M-Estimates in Unstable AR(p) Processes with Infinite Variance Innovations

Authors: Maryam Sohrabi, Mahmoud Zarepour

Abstract: In this paper, we present the asymptotic distribution of M-estimators for parameters in non-stationary AR(p) processes. The innovations are assumed to be in the domain of attraction of a stable law with index $0<α\le2$. In particular, when the model involves repeated unit roots or conjugate complex unit roots, M-estimators have a higher asymptotic rate of convergence compared to the least square e… ▽ More In this paper, we present the asymptotic distribution of M-estimators for parameters in non-stationary AR(p) processes. The innovations are assumed to be in the domain of attraction of a stable law with index $0<α\le2$. In particular, when the model involves repeated unit roots or conjugate complex unit roots, M-estimators have a higher asymptotic rate of convergence compared to the least square estimators and the asymptotic results can be written as Itô stochastic integrals. △ Less

Submitted 12 December, 2016; v1 submitted 18 June, 2015; originally announced June 2015.

MSC Class: 62M10; 60G52; 62F40

Showing 1–5 of 5 results for author: Sohrabi, M