Skip to main content

Showing 1–9 of 9 results for author: Silavong, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.12332  [pdf, other

    cs.CL cs.AI cs.LG

    Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration

    Authors: Thomas Walshe, Sae Young Moon, Chunyang Xiao, Yawwani Gunawardana, Fran Silavong

    Abstract: Acquiring labelled training data remains a costly task in real world machine learning projects to meet quantity and quality requirements. Recently Large Language Models (LLMs), notably GPT-4, have shown great promises in labelling data with high accuracy. However, privacy and cost concerns prevent the ubiquitous use of GPT-4. In this work, we explore effectively leveraging open-source models for a… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: 11 pages, 1 figure

  2. A Benchmark Generative Probabilistic Model for Weak Supervised Learning

    Authors: Georgios Papadopoulos, Fran Silavong, Sean Moran

    Abstract: Finding relevant and high-quality datasets to train machine learning models is a major bottleneck for practitioners. Furthermore, to address ambitious real-world use-cases there is usually the requirement that the data come labelled with high-quality annotations that can facilitate the training of a supervised model. Manually labelling data with high-quality labels is generally a time-consuming an… ▽ More

    Submitted 4 October, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Journal ref: Lecture Notes in Computer Science 2023; vol 14174; Springer; p 36

  3. arXiv:2302.10798  [pdf, other

    cs.LG cs.CV

    Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

    Authors: Xiaoying Zhi, Varun Babbar, Rundong Liu, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran

    Abstract: The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pru… ▽ More

    Submitted 10 January, 2025; v1 submitted 17 February, 2023; originally announced February 2023.

  4. arXiv:2212.07253  [pdf, other

    cs.SE cs.AI

    API-Miner: an API-to-API Specification Recommendation Engine

    Authors: Sae Young Moon, Gregor Kerr, Fran Silavong, Sean Moran

    Abstract: When designing a new API for a large project, developers need to make smart design choices so that their code base can grow sustainably. To ensure that new API components are well designed, developers can learn from existing API components. However, the lack of standardized methods for comparing API designs makes this learning process time-consuming and difficult. To address this gap we developed… ▽ More

    Submitted 19 July, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  5. arXiv:2208.09495  [pdf, other

    cs.SE cs.AI

    Topical: Learning Repository Embeddings from Source Code using Attention

    Authors: Agathe Lherondelle, Varun Babbar, Yash Satsangi, Fran Silavong, Shaltiel Eloul, Sean Moran

    Abstract: This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on public… ▽ More

    Submitted 4 November, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Pre-print, under review

  6. arXiv:2205.08585  [pdf, other

    cs.SE cs.AI cs.CV cs.LG

    CV4Code: Sourcecode Understanding via Visual Code Representations

    Authors: Ruibo Shi, Lili Tao, Rohan Saphal, Fran Silavong, Sean J. Moran

    Abstract: We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  7. arXiv:2204.12495  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Enhancing Privacy against Inversion Attacks in Federated Learning by using Mixing Gradients Strategies

    Authors: Shaltiel Eloul, Fran Silavong, Sanket Kamthe, Antonios Georgiadis, Sean J. Moran

    Abstract: Federated learning reduces the risk of information leakage, but remains vulnerable to attacks. We investigate how several neural network design decisions can defend against gradients inversion attacks. We show that overlapping gradients provides numerical resistance to gradient inversion on the highly vulnerable dense layer. Specifically, we propose to leverage batching to maximise mixing of gradi… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: Supporting information is available. Code will be shared soon

  8. arXiv:2203.13680  [pdf, other

    eess.IV cs.CV cs.LG

    ST-FL: Style Transfer Preprocessing in Federated Learning for COVID-19 Segmentation

    Authors: Antonios Georgiadis, Varun Babbar, Fran Silavong, Sean Moran, Rob Otter

    Abstract: Chest Computational Tomography (CT) scans present low cost, speed and objectivity for COVID-19 diagnosis and deep learning methods have shown great promise in assisting the analysis and interpretation of these images. Most hospitals or countries can train their own models using in-house data, however empirical evidence shows that those models perform poorly when tested on new unseen cases, surfaci… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: 5 pages, 1 figure, full version (15 pages, 13 figures) to be published in SPIE: Medical Imaging 2022 Proceedings

  9. Senatus -- A Fast and Accurate Code-to-Code Recommendation Engine

    Authors: Fran Silavong, Sean Moran, Antonios Georgiadis, Rohan Saphal, Robert Otter

    Abstract: Machine learning on source code (MLOnCode) is a popular research field that has been driven by the availability of large-scale code repositories and the development of powerful probabilistic and deep learning models for mining source code. Code-to-code recommendation is a task in MLOnCode that aims to recommend relevant, diverse and concise code snippets that usefully extend the code currently bei… ▽ More

    Submitted 26 April, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted to MSR 2022