-
Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model
Authors:
Divyanshu Aggarwal,
Sankarshan Damle,
Navin Goyal,
Satya Lokam,
Sunayana Sitaram
Abstract:
A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions…
▽ More
A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions and time shifts. This paper focuses on the language adaptability of LLMs through CFT. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset -- comprising task data in new languages -- in Phase 2 (predominantly Language Ability). We observe that the ``similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability. For similar phase-wise datasets, the LLM after Phase 2 does not show deterioration in task ability. In contrast, when the phase-wise datasets are not similar, the LLM's task ability deteriorates. We test our hypothesis on the open-source \mis\ and \llm\ models with multiple phase-wise dataset pairs. To address the deterioration, we analyze tailored variants of two CFT methods: layer freezing and generative replay. Our findings demonstrate their effectiveness in enhancing the language ability of LLMs while preserving task performance, in comparison to relevant baselines.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
SLIP: Securing LLMs IP Using Weights Decomposition
Authors:
Yehonathan Refael,
Adam Hakim,
Lev Greenberg,
Tal Aviv,
Satya Lokam,
Ben Fishman,
Shachar Seidman
Abstract:
Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Moreover, the high cost of cloud-based deployment has driven interest towards deployment to edge devices, yet this risks exposing valuable parameters to theft and unauthorized use.…
▽ More
Large language models (LLMs) have recently seen widespread adoption, in both academia and industry. As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners. Moreover, the high cost of cloud-based deployment has driven interest towards deployment to edge devices, yet this risks exposing valuable parameters to theft and unauthorized use. Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements. In this paper, we introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft. SLIP is the first hybrid protocol that is both practical for real-world applications and provably secure, while having zero accuracy degradation and minimal impact on latency. It involves partitioning the model between two computing resources, one secure but expensive, and another cost-effective but vulnerable. This is achieved through matrix decomposition, ensuring that the secure resource retains a maximally sensitive portion of the model's IP while performing a minimal amount of computations, and vice versa for the vulnerable resource. Importantly, the protocol includes security guarantees that prevent attackers from exploiting the partition to infer the secured information. Finally, we present experimental results that show the robustness and effectiveness of our method, positioning it as a compelling solution for protecting LLMs.
△ Less
Submitted 1 August, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
TrustRate: A Decentralized Platform for Hijack-Resistant Anonymous Reviews
Authors:
Rohit Dwivedula,
Sriram Sridhar,
Sambhav Satija,
Muthian Sivathanu,
Nishanth Chandran,
Divya Gupta,
Satya Lokam
Abstract:
Reviews and ratings by users form a central component in several widely used products today (e.g., product reviews, ratings of online content, etc.), but today's platforms for managing such reviews are ad-hoc and vulnerable to various forms of tampering and hijack by fake reviews either by bots or motivated paid workers. We define a new metric called 'hijack-resistance' for such review platforms,…
▽ More
Reviews and ratings by users form a central component in several widely used products today (e.g., product reviews, ratings of online content, etc.), but today's platforms for managing such reviews are ad-hoc and vulnerable to various forms of tampering and hijack by fake reviews either by bots or motivated paid workers. We define a new metric called 'hijack-resistance' for such review platforms, and then present TrustRate, an end-to-end decentralized, hijack-resistant platform for authentic, anonymous, tamper-proof reviews. With a prototype implementation and evaluation at the scale of thousands of nodes, we demonstrate the efficacy and performance of our platform, towards a new paradigm for building products based on trusted reviews by end users without having to trust a single organization that manages the reviews.
△ Less
Submitted 20 July, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
On the Query Complexity of Training Data Reconstruction in Private Learning
Authors:
Prateeti Mukherjee,
Satya Lokam
Abstract:
We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For $(ε, δ)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the \emph{first known lower bounds on the adversary's query complexity} as a function of the learner's privacy parameters. \emph{Our results are minimax optimal…
▽ More
We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For $(ε, δ)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the \emph{first known lower bounds on the adversary's query complexity} as a function of the learner's privacy parameters. \emph{Our results are minimax optimal for every $ε\geq 0, δ\in [0, 1]$, covering both $ε$-DP and $(0, δ)$ DP as corollaries}. Beyond this, we obtain query complexity lower bounds for $(α, ε)$ Rényi DP learners that are valid for any $α> 1, ε\geq 0$. Finally, we analyze data reconstruction attacks on locally compact metric spaces via the framework of Metric DP, a generalization of DP that accounts for the underlying metric structure of the data. In this setting, we provide the first known analysis of data reconstruction in unbounded, high dimensional spaces and obtain query complexity lower bounds that are nearly tight modulo logarithmic factors.
△ Less
Submitted 11 January, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Telechain: Bridging Telecom Policy and Blockchain Practice
Authors:
Sudheesh Singanamalla,
Apurv Mehra,
Nishanth Chandran,
Himanshi Lohchab,
Seshanuradha Chava,
Asit Kadayan,
Sunil Bajpai,
Kurtis Heimerl,
Richard Anderson,
Satya Lokam
Abstract:
The use of blockchain in regulatory ecosystems is a promising approach to address challenges of compliance among mutually untrusted entities. In this work, we consider applications of blockchain technologies in telecom regulations. In particular, we address growing concerns around Unsolicited Commercial Communication (UCC aka. spam) sent through text messages (SMS) and phone calls in India. Despit…
▽ More
The use of blockchain in regulatory ecosystems is a promising approach to address challenges of compliance among mutually untrusted entities. In this work, we consider applications of blockchain technologies in telecom regulations. In particular, we address growing concerns around Unsolicited Commercial Communication (UCC aka. spam) sent through text messages (SMS) and phone calls in India. Despite several regulatory measures taken to curb the menace of spam it continues to be a nuisance to subscribers while posing challenges to telecom operators and regulators alike.
In this paper, we present a consortium blockchain based architecture to address the problem of UCC in India. Our solution improves subscriber experiences, improves the efficiency of regulatory processes while also positively impacting all stakeholders in the telecom ecosystem. Unlike previous approaches to the problem of UCC, which are all ex-post, our approach to adherence to the regulations is ex-ante. The proposal described in this paper is a primary contributor to the revision of regulations concerning UCC and spam by the Telecom Regulatory Authority of India (TRAI). The new regulations published in July 2018 were first of a kind in the world and amended the 2010 Telecom Commercial Communication Customer Preference Regulation (TCCCPR), through mandating the use of a blockchain/distributed ledgers in addressing the UCC problem. In this paper, we provide a holistic account of of the projects' evolution from (1) its design and strategy, to (2) regulatory and policy action, (3) country wide implementation and deployment, and (4) evaluation and impact of the work.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Blockene: A High-throughput Blockchain Over Mobile Devices
Authors:
Sambhav Satija,
Apurv Mehra,
Sudheesh Singanamalla,
Karan Grover,
Muthian Sivathanu,
Nishanth Chandran,
Divya Gupta,
Satya Lokam
Abstract:
We introduce Blockene, a blockchain that reduces resource usage at member nodes by orders of magnitude, requiring only a smartphone to participate in block validation and consensus. Despite being lightweight, Blockene provides a high throughput of transactions and scales to a large number of participants. Blockene consumes negligible battery and data in smartphones, enabling millions of users to p…
▽ More
We introduce Blockene, a blockchain that reduces resource usage at member nodes by orders of magnitude, requiring only a smartphone to participate in block validation and consensus. Despite being lightweight, Blockene provides a high throughput of transactions and scales to a large number of participants. Blockene consumes negligible battery and data in smartphones, enabling millions of users to participate in the blockchain without incentives, to secure transactions with their collective honesty. Blockene achieves these properties with a novel split-trust design based on delegating storage and gossip to untrusted nodes.
We show, with a prototype implementation, that Blockene provides throughput of 1045 transactions/sec, and runs with very low resource usage on smartphones, pointing to a new paradigm for building secure, decentralized applications.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Fourier Entropy-Influence Conjecture for Random Linear Threshold Functions
Authors:
Sourav Chakraborty,
Sushrut Karmalkar,
Srijita Kundu,
Satyanarayana V. Lokam,
Nitin Saurabh
Abstract:
The Fourier-Entropy Influence (FEI) Conjecture states that for any Boolean function $f:\{+1,-1\}^n \to \{+1,-1\}$, the Fourier entropy of $f$ is at most its influence up to a universal constant factor. While the FEI conjecture has been proved for many classes of Boolean functions, it is still not known whether it holds for the class of Linear Threshold Functions. A natural question is: Does the FE…
▽ More
The Fourier-Entropy Influence (FEI) Conjecture states that for any Boolean function $f:\{+1,-1\}^n \to \{+1,-1\}$, the Fourier entropy of $f$ is at most its influence up to a universal constant factor. While the FEI conjecture has been proved for many classes of Boolean functions, it is still not known whether it holds for the class of Linear Threshold Functions. A natural question is: Does the FEI conjecture hold for a `random' linear threshold function? In this paper, we answer this question in the affirmative. We consider two natural distributions on the weights defining a linear threshold function, namely uniform distribution on $[-1,1]$ and Normal distribution.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
$ε$-MSR Codes: Contacting Fewer Code Blocks for Exact Repair
Authors:
Venkatesan Guruswami,
Satyanarayana V. Lokam,
Sai Vikneshwar Mani Jayaraman
Abstract:
$ε$-Minimum Storage Regenerating ($ε…
▽ More
$ε$-Minimum Storage Regenerating ($ε$-MSR) codes form a special class of Maximum Distance Separable (MDS) codes, providing mechanisms for exact regeneration of a single code block in their codewords by downloading slighly sub-optimal amount of information from the remaining code blocks. The key advantage of these codes is a significantly lower sub-packetization that grows only logarithmically with the length of the code, while providing optimality in storage and error-correcting capacity. However, from an implementation point of view, these codes require each remaining code block to be available for the repair of any single code block. In this paper, we address this issue by constructing $ε$-MSR codes that can repair a failed code block by contacting a fewer number of available code blocks. When a code block fails, our repair procedure needs to contact a few compulsory code blocks and is free to choose any subset of available code blocks for the remaining choices. Further, our construction requiresa field size linear in code length and ensures load balancing among the contacted code blocks in terms of information downloaded from them for a single repair.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Weight Enumerators and Higher Support Weights of Maximally Recoverable Codes
Authors:
V. Lalitha,
Satyanarayana V. Lokam
Abstract:
In this paper, we establish the matroid structures corresponding to data-local and local maximally recoverable codes (MRC). The matroid structures of these codes can be used to determine the associated Tutte polynomial. Greene proved that the weight enumerators of any code can be determined from its associated Tutte polynomial. We will use this result to derive explicit expressions for the weight…
▽ More
In this paper, we establish the matroid structures corresponding to data-local and local maximally recoverable codes (MRC). The matroid structures of these codes can be used to determine the associated Tutte polynomial. Greene proved that the weight enumerators of any code can be determined from its associated Tutte polynomial. We will use this result to derive explicit expressions for the weight enumerators of data-local and local MRC. Also, Britz proved that the higher support weights of any code can be determined from its associated Tutte polynomial. We will use this result to derive expressions for the higher support weights of data-local and local MRC with two local codes.
△ Less
Submitted 4 July, 2015;
originally announced July 2015.
-
On Restricting No-Junta Boolean Function and Degree Lower Bounds by Polynomial Method
Authors:
Chia-Jung Lee,
Satya V. Lokam,
Shi-Chun Tsai,
Ming-Chuan Yang
Abstract:
Let $\mathcal{F}_{n}^*$ be the set of Boolean functions depending on all $n$ variables. We prove that for any $f\in \mathcal{F}_{n}^*$, $f|_{x_i=0}$ or $f|_{x_i=1}$ depends on the remaining $n-1$ variables, for some variable $x_i$. This existent result suggests a possible way to deal with general Boolean functions via its subfunctions of some restrictions.
As an application, we consider the degr…
▽ More
Let $\mathcal{F}_{n}^*$ be the set of Boolean functions depending on all $n$ variables. We prove that for any $f\in \mathcal{F}_{n}^*$, $f|_{x_i=0}$ or $f|_{x_i=1}$ depends on the remaining $n-1$ variables, for some variable $x_i$. This existent result suggests a possible way to deal with general Boolean functions via its subfunctions of some restrictions.
As an application, we consider the degree lower bound of representing polynomials over finite rings. Let $f\in \mathcal{F}_{n}^*$ and denote the exact representing degree over the ring $\mathbb{Z}_m$ (with the integer $m>2$) as $d_m(f)$. Let $m=Π_{i=1}^{r}p_i^{e_i}$, where $p_i$'s are distinct primes, and $r$ and $e_i$'s are positive integers. If $f$ is symmetric, then $m\cdot d_{p_1^{e_1}}(f)... d_{p_r^{e_r}}(f) > n$. If $f$ is non-symmetric, by the second moment method we prove almost always $m\cdot d_{p_1^{e_1}}(f)... d_{p_r^{e_r}}(f) > \lg{n}-1$. In particular, as $m=pq$ where $p$ and $q$ are arbitrary distinct primes, we have $d_p(f)d_q(f)=Ω(n)$ for symmetric $f$ and $d_p(f)d_q(f)=Ω(\lg{n}-1)$ almost always for non-symmetric $f$. Hence any $n$-variate symmetric Boolean function can have exact representing degree $o(\sqrt{n})$ in at most one finite field, and for non-symmetric functions, with $o(\sqrt{\lg{n}})$-degree in at most one finite field.
△ Less
Submitted 4 February, 2015; v1 submitted 1 February, 2015;
originally announced February 2015.
-
Using Elimination Theory to construct Rigid Matrices
Authors:
Abhinav Kumar,
Satyanarayana V. Lokam,
Vijay M. Patankar,
Jayalal Sarma M. N
Abstract:
The rigidity of a matrix A for target rank r is the minimum number of entries of A that must be changed to ensure that the rank of the altered matrix is at most r. Since its introduction by Valiant (1977), rigidity and similar rank-robustness functions of matrices have found numerous applications in circuit complexity, communication complexity, and learning complexity. Almost all nxn matrices over…
▽ More
The rigidity of a matrix A for target rank r is the minimum number of entries of A that must be changed to ensure that the rank of the altered matrix is at most r. Since its introduction by Valiant (1977), rigidity and similar rank-robustness functions of matrices have found numerous applications in circuit complexity, communication complexity, and learning complexity. Almost all nxn matrices over an infinite field have a rigidity of (n-r)^2. It is a long-standing open question to construct infinite families of explicit matrices even with superlinear rigidity when r = Omega(n).
In this paper, we construct an infinite family of complex matrices with the largest possible, i.e., (n-r)^2, rigidity. The entries of an n x n matrix in this family are distinct primitive roots of unity of orders roughly exp(n^2 log n). To the best of our knowledge, this is the first family of concrete (but not entirely explicit) matrices having maximal rigidity and a succinct algebraic description.
Our construction is based on elimination theory of polynomial ideals. In particular, we use results on the existence of polynomials in elimination ideals with effective degree upper bounds (effective Nullstellensatz). Using elementary algebraic geometry, we prove that the dimension of the affine variety of matrices of rigidity at most k is exactly n^2-(n-r)^2+k. Finally, we use elimination theory to examine whether the rigidity function is semi-continuous.
△ Less
Submitted 16 April, 2014; v1 submitted 28 October, 2009;
originally announced October 2009.