-
NaviQAte: Functionality-Guided Web Application Navigation
Authors:
Mobina Shahbandeh,
Parsa Alian,
Noor Nashid,
Ali Mesbah
Abstract:
End-to-end web testing is challenging due to the need to explore diverse web application functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for broad functionality exploration; they rely on specific, detailed task descriptions, limiting their adaptability in dynamic web environments. We introduce NaviQAte, which frames web application exploration as a question-a…
▽ More
End-to-end web testing is challenging due to the need to explore diverse web application functionalities. Current state-of-the-art methods, such as WebCanvas, are not designed for broad functionality exploration; they rely on specific, detailed task descriptions, limiting their adaptability in dynamic web environments. We introduce NaviQAte, which frames web application exploration as a question-and-answer task, generating action sequences for functionalities without requiring detailed parameters. Our three-phase approach utilizes advanced large language models like GPT-4o for complex decision-making and cost-effective models, such as GPT-4o mini, for simpler tasks. NaviQAte focuses on functionality-guided web application navigation, integrating multi-modal inputs such as text and images to enhance contextual understanding. Evaluations on the Mind2Web-Live and Mind2Web-Live-Abstracted datasets show that NaviQAte achieves a 44.23% success rate in user task navigation and a 38.46% success rate in functionality navigation, representing a 15% and 33% improvement over WebCanvas. These results underscore the effectiveness of our approach in advancing automated web application testing.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Dockerfile Flakiness: Characterization and Repair
Authors:
Taha Shabani,
Noor Nashid,
Parsa Alian,
Ali Mesbah
Abstract:
Dockerfile flakiness-unpredictable temporal build failures caused by external dependencies and evolving environments-undermines deployment reliability and increases debugging overhead. Unlike traditional Dockerfile issues, flakiness occurs without modifications to the Dockerfile itself, complicating its resolution. In this work, we present the first comprehensive study of Dockerfile flakiness, fea…
▽ More
Dockerfile flakiness-unpredictable temporal build failures caused by external dependencies and evolving environments-undermines deployment reliability and increases debugging overhead. Unlike traditional Dockerfile issues, flakiness occurs without modifications to the Dockerfile itself, complicating its resolution. In this work, we present the first comprehensive study of Dockerfile flakiness, featuring a nine-month analysis of 8,132 Dockerized projects, revealing that around 10% exhibit flaky behavior. We propose a taxonomy categorizing common flakiness causes, including dependency errors and server connectivity issues. Existing tools fail to effectively address these challenges due to their reliance on pre-defined rules and limited generalizability. To overcome these limitations, we introduce FLAKIDOCK, a novel repair framework combining static and dynamic analysis, similarity retrieval, and an iterative feedback loop powered by Large Language Models (LLMs). Our evaluation demonstrates that FLAKIDOCK achieves a repair accuracy of 73.55%, significantly surpassing state-of-the-art tools and baselines.
△ Less
Submitted 11 February, 2025; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Feature-Driven End-To-End Test Generation
Authors:
Parsa Alian,
Noor Nashid,
Mobina Shahbandeh,
Taha Shabani,
Ali Mesbah
Abstract:
End-to-end (E2E) testing is essential for ensuring web application quality. However, manual test creation is time-consuming, and current test generation techniques produce incoherent tests. In this paper, we present AutoE2E, a novel approach that leverages Large Language Models (LLMs) to automate the generation of semantically meaningful feature-driven E2E test cases for web applications. AutoE2E…
▽ More
End-to-end (E2E) testing is essential for ensuring web application quality. However, manual test creation is time-consuming, and current test generation techniques produce incoherent tests. In this paper, we present AutoE2E, a novel approach that leverages Large Language Models (LLMs) to automate the generation of semantically meaningful feature-driven E2E test cases for web applications. AutoE2E intelligently infers potential features within a web application and translates them into executable test scenarios. Furthermore, we address a critical gap in the research community by introducing E2EBench, a new benchmark for automatically assessing the feature coverage of E2E test suites. Our evaluation on E2EBench demonstrates that AutoE2E achieves an average feature coverage of 79%, outperforming the best baseline by 558%, highlighting its effectiveness in generating high-quality, comprehensive test cases.
△ Less
Submitted 6 January, 2025; v1 submitted 3 August, 2024;
originally announced August 2024.
-
Contextual API Completion for Unseen Repositories Using LLMs
Authors:
Noor Nashid,
Taha Shabani,
Parsa Alian,
Ali Mesbah
Abstract:
Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual…
▽ More
Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. We examine relevant import statements during API completion to derive insights into local APIs, drawing from their method signatures. For API token completion, we analyze the inline variables and correlate them with the appropriate imported modules, thereby allowing our approach to rank the most contextually relevant suggestions from the available local APIs. Further, for conversational API completion, we gather APIs that are most relevant to the developer query with a retrieval-based search across the project. We employ our tool, LANCE, within the framework of our proposed benchmark, APIEval, encompassing two different programming languages. Our evaluation yields an average accuracy of 82.6% for API token completion and 76.9% for conversational API completion tasks. On average, LANCE surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively. The implications of our findings are substantial for developers, suggesting that our lightweight context analysis can be applied to multilingual environments without language-specific training or fine-tuning, allowing for efficient implementation with minimal examples and effort.
△ Less
Submitted 14 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Semantic Constraint Inference for Web Form Test Generation
Authors:
Parsa Alian,
Noor Nashid,
Mobina Shahbandeh,
Ali Mesbah
Abstract:
Automated test generation for web forms has been a longstanding challenge, exacerbated by the intrinsic human-centric design of forms and their complex, device-agnostic structures. We introduce an innovative approach, called FormNexus, for automated web form test generation, which emphasizes deriving semantic insights from individual form elements and relations among them, utilizing textual conten…
▽ More
Automated test generation for web forms has been a longstanding challenge, exacerbated by the intrinsic human-centric design of forms and their complex, device-agnostic structures. We introduce an innovative approach, called FormNexus, for automated web form test generation, which emphasizes deriving semantic insights from individual form elements and relations among them, utilizing textual content, DOM tree structures, and visual proximity. The insights gathered are transformed into a new conceptual graph, the Form Entity Relation Graph (FERG), which offers machine-friendly semantic information extraction. Leveraging LLMs, FormNexus adopts a feedback-driven mechanism for generating and refining input constraints based on real-time form submission responses. The culmination of this approach is a robust set of test cases, each produced by methodically invalidating constraints, ensuring comprehensive testing scenarios for web forms. This work bridges the existing gap in automated web form testing by intertwining the capabilities of LLMs with advanced semantic inference methods. Our evaluation demonstrates that FormNexus combined with GPT-4 achieves 89% coverage in form submission states. This outcome significantly outstrips the performance of the best baseline model by a margin of 25%.
△ Less
Submitted 22 July, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
RPS: Portfolio Asset Selection using Graph based Representation Learning
Authors:
MohammadAmin Fazli,
Parsa Alian,
Ali Owfi,
Erfan Loghmani
Abstract:
Portfolio optimization is one of the essential fields of focus in finance. There has been an increasing demand for novel computational methods in this area to compute portfolios with better returns and lower risks in recent years. We present a novel computational method called Representation Portfolio Selection (RPS) by redefining the distance matrix of financial assets using Representation Learni…
▽ More
Portfolio optimization is one of the essential fields of focus in finance. There has been an increasing demand for novel computational methods in this area to compute portfolios with better returns and lower risks in recent years. We present a novel computational method called Representation Portfolio Selection (RPS) by redefining the distance matrix of financial assets using Representation Learning and Clustering algorithms for portfolio selection to increase diversification. RPS proposes a heuristic for getting closer to the optimal subset of assets. Using empirical results in this paper, we demonstrate that widely used portfolio optimization algorithms, such as MVO, CLA, and HRP, can benefit from our asset subset selection.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.