Classifier-Augmented Generation for Structured Workflow Prediction

Gschwind, Thomas; Chakraborty, Shramona; Gupta, Nitin; Mehta, Sameep

Computer Science > Computation and Language

arXiv:2510.12825 (cs)

[Submitted on 10 Oct 2025]

Title:Classifier-Augmented Generation for Structured Workflow Prediction

Authors:Thomas Gschwind, Shramona Chakraborty, Nitin Gupta, Sameep Mehta

View PDF HTML (experimental)

Abstract:ETL (Extract, Transform, Load) tools such as IBM DataStage allow users to visually assemble complex data workflows, but configuring stages and their properties remains time consuming and requires deep tool knowledge. We propose a system that translates natural language descriptions into executable workflows, automatically predicting both the structure and detailed configuration of the flow. At its core lies a Classifier-Augmented Generation (CAG) approach that combines utterance decomposition with a classifier and stage-specific few-shot prompting to produce accurate stage predictions. These stages are then connected into non-linear workflows using edge prediction, and stage properties are inferred from sub-utterance context. We compare CAG against strong single-prompt and agentic baselines, showing improved accuracy and efficiency, while substantially reducing token usage. Our architecture is modular, interpretable, and capable of end-to-end workflow generation, including robust validation steps. To our knowledge, this is the first system with a detailed evaluation across stage prediction, edge layout, and property generation for natural-language-driven ETL authoring.

Comments:	Accepted at EMNLP 2025
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG)
MSC classes:	68T50, 68T05, 68T09
ACM classes:	I.2.7; I.2.6; H.2.5
Cite as:	arXiv:2510.12825 [cs.CL]
	(or arXiv:2510.12825v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2510.12825

Submission history

From: Thomas Gschwind [view email]
[v1] Fri, 10 Oct 2025 18:38:25 UTC (150 KB)

Computer Science > Computation and Language

Title:Classifier-Augmented Generation for Structured Workflow Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Classifier-Augmented Generation for Structured Workflow Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators