Skip to main content

Showing 1–8 of 8 results for author: Jahanshahi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.18971  [pdf, other

    cs.SE cs.CE

    Scientific Open-Source Software Is Less Likely to Become Abandoned Than One Might Think! Lessons from Curating a Catalog of Maintained Scientific Software

    Authors: Addi Malviya Thakur, Reed Milewicz, Mahmoud Jahanshahi, LavĂ­nia Paganini, Bogdan Vasilescu, Audris Mockus

    Abstract: Scientific software is essential to scientific innovation and in many ways it is distinct from other types of software. Abandoned (or unmaintained), buggy, and hard to use software, a perception often associated with scientific software can hinder scientific progress, yet, in contrast to other types of software, its longevity is poorly understood. Existing data curation efforts are fragmented by s… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  2. Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets

    Authors: Mahmoud Jahanshahi, Audris Mockus

    Abstract: A critical part of creating code suggestion systems is the pre-training of Large Language Models on vast amounts of source code and natural language text, often of questionable origin or quality. This may contribute to the presence of bugs and vulnerabilities in code generated by LLMs. While efforts to identify bugs at or after code generation exist, it is preferable to pre-train or fine-tune LLMs… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: Accepted in the Second International Workshop on Large Language Models for Code (LLM4Code 2025)

  3. Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development

    Authors: Mahmoud Jahanshahi, David Reid, Audris Mockus

    Abstract: In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a bette… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Journal ref: 2025 ACM Transactions on Software Engineering and Methodology (TOSEM)

  4. OSS License Identification at Scale: A Comprehensive Dataset Using World of Code

    Authors: Mahmoud Jahanshahi, David Reid, Adam McDaniel, Audris Mockus

    Abstract: The proliferation of open source software (OSS) and different types of reuse has made it incredibly difficult to perform an essential legal and compliance task of accurate license identification within the software supply chain. This study presents a reusable and comprehensive dataset of OSS licenses, created using the World of Code (WoC) infrastructure. By scanning all files containing "license"… ▽ More

    Submitted 11 March, 2025; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: Accepted in 2025 IEEE/ACM 22st International Conference on Mining Software Repositories (MSR)

  5. Dataset: Copy-based Reuse in Open Source Software

    Authors: Mahmoud Jahanshahi, Audris Mockus

    Abstract: In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Journal ref: 2024 21st International Conference on Mining Software Repositories (MSR '24)

  6. arXiv:2303.14647  [pdf

    cs.AI cs.CL

    Farspredict: A benchmark dataset for link prediction

    Authors: Najmeh Torabian, Behrouz Minaei-Bidgoli, Mohsen Jahanshahi

    Abstract: Link prediction with knowledge graph embedding (KGE) is a popular method for knowledge graph completion. Furthermore, training KGEs on non-English knowledge graph promote knowledge extraction and knowledge graph reasoning in the context of these languages. However, many challenges in non-English KGEs pose to learning a low-dimensional representation of a knowledge graph's entities and relations. T… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: 13 pages, 3 figures, 1 algorithm and 5 tables

    MSC Class: 00Axx ACM Class: E.1; H.3

  7. Building the Collaboration Graph of Open-Source Software Ecosystem

    Authors: Elena Lyulina, Mahmoud Jahanshahi

    Abstract: The Open-Source Software community has become the center of attention for many researchers, who are investigating various aspects of collaboration in this extremely large ecosystem. Due to its size, it is difficult to grasp whether or not it has structure, and if so, what it may be. Our hackathon project aims to facilitate the understanding of the developer collaboration structure and relationship… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 3 pages, 2 figures

    Journal ref: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)

  8. arXiv:1906.06774  [pdf

    cs.NI

    Gateway Placement and Selection Solutions in WMNs: A Survey

    Authors: Mohsen Jahanshahi, Arash Bozorgchenani

    Abstract: Due to the high demand of Internet access by users, and the tremendous success of wireless technologies, Wireless Mesh Networks (WMNs) have become a promising solution. IGW Placement and Selection (GPS) are significantly investigated problems to achieve QoS requirements, network performance, and reduce deployment cost in WMNs. Best effort is made to classify different works in the literature based… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.