LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards

Yue, Songhui

Computer Science > Software Engineering

arXiv:2509.00140 (cs)

[Submitted on 29 Aug 2025]

Title:LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards

Authors:Songhui Yue

View PDF HTML (experimental)

Abstract:Ontologies have supported knowledge representation and whitebox reasoning for decades; thus, the automated ontology generation (AOG) plays a crucial role in scaling their use. Software engineering standards (SES) consist of long, unstructured text (with high noise) and paragraphs with domain-specific terms. In this setting, relation triple extraction (RTE), together with term extraction, constitutes the first stage toward AOG. This work proposes an open-source large language model (LLM)-assisted approach to RTE for SES. Instead of solely relying on prompt-engineering-based methods, this study promotes the use of LLMs as an aid in constructing ontologies and explores an effective AOG workflow that includes document segmentation, candidate term mining, LLM-based relation inference, term normalization, and cross-section alignment. Golden-standard benchmarks at three granularities are constructed and used to evaluate the ontology generated from the study. The results show that it is comparable and potentially superior to the OpenIE method of triple extraction.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.00140 [cs.SE]
	(or arXiv:2509.00140v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2509.00140

Submission history

From: Songhui Yue [view email]
[v1] Fri, 29 Aug 2025 17:14:54 UTC (1,427 KB)

Computer Science > Software Engineering

Title:LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators