-
Micro-Patterns in Solidity Code
Authors:
Luca Ruschioni,
Robert Shuttleworth,
Rumyana Neykova,
Barbara Re,
Giuseppe Destefanis
Abstract:
Solidity is the predominant programming language for blockchain-based smart contracts, and its characteristics pose significant challenges for code analysis and maintenance. Traditional software analysis approaches, while effective for conventional programming languages, often fail to address Solidity-specific features such as gas optimization and security constraints.
This paper introduces micr…
▽ More
Solidity is the predominant programming language for blockchain-based smart contracts, and its characteristics pose significant challenges for code analysis and maintenance. Traditional software analysis approaches, while effective for conventional programming languages, often fail to address Solidity-specific features such as gas optimization and security constraints.
This paper introduces micro-patterns - recurring, small-scale design structures that capture key behavioral and structural peculiarities specific to a language - for Solidity language and demonstrates their value in understanding smart contract development practices. We identified 18 distinct micro-patterns organized in five categories (Security, Functional, Optimization, Interaction, and Feedback), detailing their characteristics to enable automated detection.
To validate this proposal, we analyzed a dataset of 23258 smart contracts from five popular blockchains (Ethereum, Polygon, Arbitrum, Fantom and Optimism). Our analysis reveals widespread adoption of micro-patterns, with 99% of contracts implementing at least one pattern and an average of 2.76 patterns per contract. The Storage Saver pattern showed the highest adoption (84.62% mean coverage), while security patterns demonstrated platform-specific adoption rates. Statistical analysis revealed significant platform-specific differences in pattern adoption, particularly in Borrower, Implementer, and Storage Optimization patterns.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Authors:
Reece Shuttleworth,
Jacob Andreas,
Antonio Torralba,
Pratyusha Sharma
Abstract:
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solut…
▽ More
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Modeling the extracellular matrix in cell migration and morphogenesis: A guide for the curious biologist
Authors:
Rebecca M. Crossley,
Samuel Johnson,
Erika Tsingos,
Zoe Bell,
Massimiliano Berardi,
Margherita Botticelli,
Quirine J. S. Braat,
John Metzcar,
Marco Ruscone,
Yuan Yin,
Robyn Shuttleworth
Abstract:
The extracellular matrix (ECM) is a highly complex structure through which biochemical and mechanical signals are transmitted. In processes of cell migration, the ECM also acts as a scaffold, providing structural support to cells as well as points of potential attachment. Although the ECM is a well-studied structure, its role in many biological processes remains difficult to investigate comprehens…
▽ More
The extracellular matrix (ECM) is a highly complex structure through which biochemical and mechanical signals are transmitted. In processes of cell migration, the ECM also acts as a scaffold, providing structural support to cells as well as points of potential attachment. Although the ECM is a well-studied structure, its role in many biological processes remains difficult to investigate comprehensively due to its complexity and structural variation within an organism. In tandem with experiments, mathematical models are helpful in refining and testing hypotheses, generating predictions, and exploring conditions outside the scope of experiments. Such models can be combined and calibrated with in vivo and in vitro data to identify critical cell-ECM interactions that drive developmental and homeostatic processes, or the progression of diseases. In this review, we focus on mathematical and computational models of the ECM in processes such as cell migration including cancer metastasis, and in tissue structure and morphogenesis. By highlighting the predictive power of these models, we aim to help bridge the gap between experimental and computational approaches to studying the ECM and to provide guidance on selecting an appropriate model framework to complement corresponding experimental studies.
△ Less
Submitted 30 January, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
Authors:
Iddo Drori,
Sarah J. Zhang,
Reece Shuttleworth,
Sarah Zhang,
Keith Tyser,
Zad Chin,
Pedro Lantigua,
Saisamrit Surbehera,
Gregory Hunter,
Derek Austin,
Leonard Tang,
Yann Hicke,
Sage Simhon,
Sathwik Karnik,
Darnell Granberry,
Madeleine Udell
Abstract:
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has de…
▽ More
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.
△ Less
Submitted 28 June, 2023; v1 submitted 11 June, 2022;
originally announced June 2022.
-
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level
Authors:
Iddo Drori,
Sarah Zhang,
Reece Shuttleworth,
Leonard Tang,
Albert Lu,
Elizabeth Ke,
Kevin Liu,
Linda Chen,
Sunny Tran,
Newman Cheng,
Roman Wang,
Nikhil Singh,
Taylor L. Patti,
Jayson Lynch,
Avi Shporer,
Nakul Verma,
Eugene Wu,
Gilbert Strang
Abstract:
We demonstrate that a neural network pre-trained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates new questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a new dataset of questions from MIT's largest m…
▽ More
We demonstrate that a neural network pre-trained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates new questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a new dataset of questions from MIT's largest mathematics courses (Single Variable and Multivariable Calculus, Differential Equations, Introduction to Probability and Statistics, Linear Algebra, and Mathematics for Computer Science) and Columbia University's Computational Linear Algebra. We solve questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Intermediate Algebra, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems designed to assess mathematical reasoning. We randomly sample questions and generate solutions with multiple modalities, including numbers, equations, and plots. The latest GPT-3 language model pre-trained on text automatically solves only 18.8% of these university questions using zero-shot learning and 30.8% using few-shot learning and the most recent chain of thought prompting. In contrast, program synthesis with few-shot learning using Codex fine-tuned on code generates programs that automatically solve 81% of these questions. Our approach improves the previous state-of-the-art automatic solution accuracy on the benchmark topics from 8.8% to 81.1%. We perform a survey to evaluate the quality and difficulty of generated questions. This work is the first to automatically solve university-level mathematics course questions at a human level and the first work to explain and generate university-level mathematics course questions at scale, a milestone for higher education.
△ Less
Submitted 30 May, 2022; v1 submitted 31 December, 2021;
originally announced December 2021.
-
Cell-scale degradation of peritumoural extracellular matrix fibre network and its role within tissue-scale cancer invasion
Authors:
Robyn Shuttleworth,
Dumitru Trucu
Abstract:
Local cancer invasion of tissue is a complex, multiscale process which plays an essential role in tumour progression. Occurring over many different temporal and spatial scales, the first stage of invasion is the secretion of matrix degrading enzymes (MDEs) by the cancer cells that consequently degrade the surrounding extracellular matrix (ECM). This process is vital for creating space in which the…
▽ More
Local cancer invasion of tissue is a complex, multiscale process which plays an essential role in tumour progression. Occurring over many different temporal and spatial scales, the first stage of invasion is the secretion of matrix degrading enzymes (MDEs) by the cancer cells that consequently degrade the surrounding extracellular matrix (ECM). This process is vital for creating space in which the cancer cells can progress and it is driven by the activities of specific matrix metalloproteinases (MMPs). In this paper, we consider the key role of two MMPs by developing further the novel two-part multiscale model introduced in [33] to better relate at micro-scale the two micro-scale activities that were considered there, namely, the micro-dynamics concerning the continuous rearrangement of the naturally oriented ECM fibres within the bulk of the tumour and MDEs proteolytic micro-dynamics that take place in an appropriate cell-scale neighbourhood of the tumour boundary. Focussing primarily on the activities of the membrane-tethered MT1-MMP and the soluble MMP-2 with the fibrous ECM phase, in this work we investigate the MT1-MMP/MMP-2 cascade and its overall effect on tumour progression. To that end, we will propose a new multiscale modelling framework by considering the degradation of the ECM fibres not only to take place at macro-scale in the bulk of the tumour but also explicitly in the micro-scale neighbourhood of the tumour interface as a consequence of the interactions with molecular fluxes of MDEs that exercise their spatial dynamics at the invasive edge of the tumour.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Multiscale dynamics of a heterotypic cancer cell population within a fibrous extracellular matrix
Authors:
Robyn Shuttleworth,
Dumitru Trucu
Abstract:
Local cancer cell invasion is a complex process involving many cellular and tissue interactions and is an important prerequisite for metastatic spread, the main cause of cancer related deaths. Occurring over many different temporal and spatial scales, the first stage of local invasion is the secretion of matrix-degrading enzymes (MDEs) and the resulting degradation of the extra-cellular matrix (EC…
▽ More
Local cancer cell invasion is a complex process involving many cellular and tissue interactions and is an important prerequisite for metastatic spread, the main cause of cancer related deaths. Occurring over many different temporal and spatial scales, the first stage of local invasion is the secretion of matrix-degrading enzymes (MDEs) and the resulting degradation of the extra-cellular matrix (ECM). This process creates space in which the cells can invade and thus enlarge the tumour. As a tumour increases in malignancy, the cancer cells adopt the ability to mutate into secondary cell subpopulations giving rise to a heterogeneous tumour. This new cell subpopulation often carries higher invasive qualities and permits a quicker spread of the tumour. Building upon the recent multiscale modelling framework for cancer invasion within a fibrous ECM introduced in Shuttleworth and Trucu (2019), in this paper we consider the process of local invasion by a heterotypic tumour consisting of two cancer cell populations mixed with a two-phase ECM. To that end, we address the double feedback link between the tissue-scale cancer dynamics and the cell-scale molecular processes through the development of a two-part modelling framework that crucially incorporates the multiscale dynamic redistribution of oriented fibres occurring within a two-phase extra-cellular matrix and combines this with the multiscale leading edge dynamics exploring key matrix-degrading enzymes molecular processes along the tumour interface that drive the movement of the cancer boundary. The modelling framework will be accompanied by computational results that explore the effects of the underlying fibre network on the overall pattern of cancer invasion.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Multiscale Modelling of Fibres Dynamics and Cell Adhesion within Moving Boundary Cancer Invasion
Authors:
Robyn Shuttleworth,
Dumitru Trucu
Abstract:
Cancer cell invasion is recognised as one of the hallmarks of cancer and involves several inner-related multiscale processes that ultimately contribute to its spread into the surrounding tissue. In order to gain a deeper understanding of the tumour invasion process, we pay special attention to the interacting dynamics between the cancer cell population and various constituents of the surrounding t…
▽ More
Cancer cell invasion is recognised as one of the hallmarks of cancer and involves several inner-related multiscale processes that ultimately contribute to its spread into the surrounding tissue. In order to gain a deeper understanding of the tumour invasion process, we pay special attention to the interacting dynamics between the cancer cell population and various constituents of the surrounding tumour microenvironment. To that end, we consider the key role that ECM plays within the human body tissue, providing not only structure and support to surrounding cells, but also acting as a platform for cells communication and spatial movement. There are several other vital structures within the ECM, however we are going to focus primarily on fibrous proteins, such as fibronectin. These fibres play a crucial role in tumour progression, enabling the anchorage of tumour cells to the ECM. In this work we consider the two-scale dynamic cross-talk between cancer cells and a two component ECM (consisting of both a fibre and a non-fibre phase). To that end, we incorporate the interlinked two-scale dynamics of cells-ECM interactions within the tumour support that contributes simultaneously both to cell-adhesion and to the dynamic rearrangement and restructuring of the ECM fibres. Furthermore, this is embedded within a multiscale moving boundary approach for the invading cancer cell population, in the presence of cell-adhesion at the tissue scale and cell-scale fibre redistribution activity and leading edge matrix degrading enzyme molecular proteolytic processes. The overall modelling framework will be accompanied by computational results that will explore the impact on cancer invasion patterns of different levels of cell adhesion in conjunction with the continuous ECM fibres rearrangement.
△ Less
Submitted 10 October, 2018; v1 submitted 1 October, 2018;
originally announced October 2018.