-
Micro-Patterns in Solidity Code
Authors:
Luca Ruschioni,
Robert Shuttleworth,
Rumyana Neykova,
Barbara Re,
Giuseppe Destefanis
Abstract:
Solidity is the predominant programming language for blockchain-based smart contracts, and its characteristics pose significant challenges for code analysis and maintenance. Traditional software analysis approaches, while effective for conventional programming languages, often fail to address Solidity-specific features such as gas optimization and security constraints.
This paper introduces micr…
▽ More
Solidity is the predominant programming language for blockchain-based smart contracts, and its characteristics pose significant challenges for code analysis and maintenance. Traditional software analysis approaches, while effective for conventional programming languages, often fail to address Solidity-specific features such as gas optimization and security constraints.
This paper introduces micro-patterns - recurring, small-scale design structures that capture key behavioral and structural peculiarities specific to a language - for Solidity language and demonstrates their value in understanding smart contract development practices. We identified 18 distinct micro-patterns organized in five categories (Security, Functional, Optimization, Interaction, and Feedback), detailing their characteristics to enable automated detection.
To validate this proposal, we analyzed a dataset of 23258 smart contracts from five popular blockchains (Ethereum, Polygon, Arbitrum, Fantom and Optimism). Our analysis reveals widespread adoption of micro-patterns, with 99% of contracts implementing at least one pattern and an average of 2.76 patterns per contract. The Storage Saver pattern showed the highest adoption (84.62% mean coverage), while security patterns demonstrated platform-specific adoption rates. Statistical analysis revealed significant platform-specific differences in pattern adoption, particularly in Borrower, Implementer, and Storage Optimization patterns.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Authors:
Reece Shuttleworth,
Jacob Andreas,
Antonio Torralba,
Pratyusha Sharma
Abstract:
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solut…
▽ More
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
Authors:
Iddo Drori,
Sarah J. Zhang,
Reece Shuttleworth,
Sarah Zhang,
Keith Tyser,
Zad Chin,
Pedro Lantigua,
Saisamrit Surbehera,
Gregory Hunter,
Derek Austin,
Leonard Tang,
Yann Hicke,
Sage Simhon,
Sathwik Karnik,
Darnell Granberry,
Madeleine Udell
Abstract:
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has de…
▽ More
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.
△ Less
Submitted 28 June, 2023; v1 submitted 11 June, 2022;
originally announced June 2022.
-
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level
Authors:
Iddo Drori,
Sarah Zhang,
Reece Shuttleworth,
Leonard Tang,
Albert Lu,
Elizabeth Ke,
Kevin Liu,
Linda Chen,
Sunny Tran,
Newman Cheng,
Roman Wang,
Nikhil Singh,
Taylor L. Patti,
Jayson Lynch,
Avi Shporer,
Nakul Verma,
Eugene Wu,
Gilbert Strang
Abstract:
We demonstrate that a neural network pre-trained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates new questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a new dataset of questions from MIT's largest m…
▽ More
We demonstrate that a neural network pre-trained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates new questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a new dataset of questions from MIT's largest mathematics courses (Single Variable and Multivariable Calculus, Differential Equations, Introduction to Probability and Statistics, Linear Algebra, and Mathematics for Computer Science) and Columbia University's Computational Linear Algebra. We solve questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Intermediate Algebra, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems designed to assess mathematical reasoning. We randomly sample questions and generate solutions with multiple modalities, including numbers, equations, and plots. The latest GPT-3 language model pre-trained on text automatically solves only 18.8% of these university questions using zero-shot learning and 30.8% using few-shot learning and the most recent chain of thought prompting. In contrast, program synthesis with few-shot learning using Codex fine-tuned on code generates programs that automatically solve 81% of these questions. Our approach improves the previous state-of-the-art automatic solution accuracy on the benchmark topics from 8.8% to 81.1%. We perform a survey to evaluate the quality and difficulty of generated questions. This work is the first to automatically solve university-level mathematics course questions at a human level and the first work to explain and generate university-level mathematics course questions at scale, a milestone for higher education.
△ Less
Submitted 30 May, 2022; v1 submitted 31 December, 2021;
originally announced December 2021.