Skip to main content

Showing 1–50 of 55 results for author: Bui, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13032  [pdf, ps, other

    cs.CV cs.AI

    AS400-DET: Detection using Deep Learning Model for IBM i (AS/400)

    Authors: Thanh Tran, Son T. Luu, Quan Bui, Shoshin Nomura

    Abstract: This paper proposes a method for automatic GUI component detection for the IBM i system (formerly and still more commonly known as AS/400). We introduce a human-annotated dataset consisting of 1,050 system screen images, in which 381 images are screenshots of IBM i system screens in Japanese. Each image contains multiple components, including text labels, text boxes, options, tables, instructions,… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: Accepted at the IVSP 2025 conference

  2. arXiv:2506.02529  [pdf, ps, other

    cs.SE cs.AI cs.CL

    Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs

    Authors: Nguyen-Khang Le, Quan Minh Bui, Minh Ngoc Nguyen, Hiep Nguyen, Trung Vo, Son T. Luu, Shoshin Nomura, Minh Le Nguyen

    Abstract: Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces. Recent advances in large language models (LLMs) have shown promise in automating complex tasks, but limitations persist in handling dynamic navigation flows and complex form interactions. This paper presents an automated system… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Published in the Proceedings of JSAI 2025

    ACM Class: I.2.7

  3. arXiv:2505.20247  [pdf

    cs.OH

    Translation of Enterprise Architecture Concept to Facilitate Digital Transformation Initiatives in Vietnam: Processes, Mechanisms and Impacts

    Authors: Duong Dang, Quang Bui

    Abstract: Governments around the world have increasingly adopted digital transformation (DT) initiatives to increase their strategic competitiveness in the global market. To support successful DT, governments have to introduce new governance logics and revise IT strategies to facilitate DT initiatives. In this study, we report a case study of how Enterprise Architecture (EA) concepts were introduced and tra… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2505.10860  [pdf, ps, other

    cs.LG stat.ML

    On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating

    Authors: Huy Nguyen, Thong T. Doan, Quang Pham, Nghi D. Q. Bui, Nhat Ho, Alessandro Rinaldo

    Abstract: Mixture of experts (MoE) methods are a key component in most large language model architectures, including the recent series of DeepSeek models. Compared to other MoE implementations, DeepSeekMoE stands out because of two unique features: the deployment of a shared expert strategy and of the normalized sigmoid gating mechanism. Despite the prominent role of DeepSeekMoE in the success of the DeepSe… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 100 pages

  5. arXiv:2504.14757  [pdf, other

    cs.SE cs.AI

    SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs

    Authors: Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes. However, the lack of high-quality, scalable training datasets, especially those with verifiable outputs and intermediate reasoning traces-limits progress, particularly for open-source models. In this work, we present SWE-Synth, a framew… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Work in progress

  6. arXiv:2502.12673  [pdf, other

    cs.CV cs.GR

    ROI-NeRFs: Hi-Fi Visualization of Objects of Interest within a Scene by NeRFs Composition

    Authors: Quoc-Anh Bui, Gilles Rougeron, Géraldine Morin, Simone Gasparini

    Abstract: Efficient and accurate 3D reconstruction is essential for applications in cultural heritage. This study addresses the challenge of visualizing objects within large-scale scenes at a high level of detail (LOD) using Neural Radiance Fields (NeRFs). The aim is to improve the visual fidelity of chosen objects while maintaining the efficiency of the computations by focusing on details only for relevant… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 17 pages including appendix, 16 figures, 8 tables

    MSC Class: 68U05; 68T45 (Primary) 68T07; 68-04 (Secondary) ACM Class: I.2.10; I.3.3; I.3.5; I.3.7; I.4.5; I.4.6; I.4.8; I.4.10

  7. arXiv:2502.04953  [pdf, other

    cs.CR cs.SE

    A Systematic Literature Review on Automated Exploit and Security Test Generation

    Authors: Quang-Cuong Bui, Emanuele Iannone, Maria Camporese, Torge Hinrichs, Catherine Tony, László Tóth, Fabio Palomba, Péter Hegedűs, Fabio Massacci, Riccardo Scandariato

    Abstract: The exploit or the Proof of Concept of the vulnerability plays an important role in developing superior vulnerability repair techniques, as it can be used as an oracle to verify the correctness of the patches generated by the tools. However, the vulnerability exploits are often unavailable and require time and expert knowledge to craft. Obtaining them from the exploit generation techniques is anot… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: This work was partially supported by EU-funded project Sec4AI4Sec (grant no. 101120393)

    ACM Class: A.2

  8. arXiv:2502.03365  [pdf, other

    cs.SE cs.CR cs.LG

    A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach

    Authors: Emanuele Iannone, Quang-Cuong Bui, Riccardo Scandariato

    Abstract: Software vulnerabilities are commonly detected via static analysis, penetration testing, and fuzzing. They can also be found by running unit tests - so-called vulnerability-witnessing tests - that stimulate the security-sensitive behavior with crafted inputs. Developing such tests is difficult and time-consuming; thus, automated data-driven approaches could help developers intercept vulnerabilitie… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: This work was partially supported by EU-funded project Sec4AI4Sec (grant no. 101120393)

    ACM Class: D.2.5; D.2.7

  9. arXiv:2501.00520  [pdf, other

    cs.CV cs.LG

    Innovative Silicosis and Pneumonia Classification: Leveraging Graph Transformer Post-hoc Modeling and Ensemble Techniques

    Authors: Bao Q. Bui, Tien T. T. Nguyen, Duy M. Le, Cong Tran, Cuong Pham

    Abstract: This paper presents a comprehensive study on the classification and detection of Silicosis-related lung inflammation. Our main contributions include 1) the creation of a newly curated chest X-ray (CXR) image dataset named SVBCX that is tailored to the nuances of lung inflammation caused by distinct agents, providing a valuable resource for silicosis and pneumonia research community; and 2) we prop… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  10. arXiv:2412.19606  [pdf, other

    cs.CV

    Enhancing Fine-grained Image Classification through Attentive Batch Training

    Authors: Duy M. Le, Bao Q. Bui, Anh Tran, Cong Tran, Cuong Pham

    Abstract: Fine-grained image classification, which is a challenging task in computer vision, requires precise differentiation among visually similar object categories. In this paper, we propose 1) a novel module called Residual Relationship Attention (RRA) that leverages the relationships between images within each training batch to effectively integrate visual feature vectors of batch images and 2) a novel… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

  11. arXiv:2410.23402  [pdf, other

    cs.SE

    VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

    Authors: Cuong Chi Le, Hoang-Chau Truong-Vinh, Huy Nhat Phan, Dung Duy Le, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating… ▽ More

    Submitted 9 February, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: NAACL 2025

  12. arXiv:2410.01999  [pdf, other

    cs.SE

    CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs

    Authors: Dung Nguyen Manh, Thang Phan Chau, Nam Le Hai, Thong T. Doan, Nam V. Nguyen, Quang Pham, Nghi D. Q. Bui

    Abstract: Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a comprehensive multiple-choice benchmark designed to evaluate the depth of software and code comprehension in LLMs. CodeMMLU includes nearly 20,000 questions spanning dive… ▽ More

    Submitted 9 April, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  13. arXiv:2409.16299  [pdf, other

    cs.SE cs.AI

    HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

    Authors: Huy Nhat Phan, Tien N. Nguyen, Phong X. Nguyen, Nghi D. Q. Bui

    Abstract: Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent s… ▽ More

    Submitted 5 November, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 49 pages

  14. arXiv:2408.04663  [pdf, other

    cs.CL cs.AI

    Dopamin: Transformer-based Comment Classifiers through Domain Post-Training and Multi-level Layer Aggregation

    Authors: Nam Le Hai, Nghi D. Q. Bui

    Abstract: Code comments provide important information for understanding the source code. They can help developers understand the overall purpose of a function or class, as well as identify bugs and technical debt. However, an overabundance of comments is meaningless and counterproductive. As a result, it is critical to automatically filter out these comments for specific purposes. In this paper, we present… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted at The 3rd Intl. Workshop on NL-based Software Engineering, 2024

  15. arXiv:2408.04660  [pdf, other

    cs.CL cs.AI

    XMainframe: A Large Language Model for Mainframe Modernization

    Authors: Anh T. V. Dau, Hieu Trung Dao, Anh Tuan Nguyen, Hieu Trung Tran, Phong X. Nguyen, Nghi D. Q. Bui

    Abstract: Mainframe operating systems, despite their inception in the 1940s, continue to support critical sectors like finance and government. However, these systems are often viewed as outdated, requiring extensive maintenance and modernization. Addressing this challenge necessitates innovative tools that can understand and interact with legacy codebases. To this end, we introduce XMainframe, a state-of-th… ▽ More

    Submitted 26 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  16. arXiv:2408.02816  [pdf, other

    cs.SE

    CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

    Authors: Cuong Chi Le, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control… ▽ More

    Submitted 9 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: FORGE 2025

  17. arXiv:2406.11927  [pdf, other

    cs.SE cs.AI

    On the Impacts of Contexts on Repository-Level Code Generation

    Authors: Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

    Abstract: CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present RepoExec, a novel benchmark designed to evaluate repository-… ▽ More

    Submitted 9 February, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to NAACL 2025

  18. arXiv:2406.11912  [pdf, other

    cs.SE cs.AI

    AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

    Authors: Minh Huynh Nguyen, Thang Phan Chau, Phong X. Nguyen, Nghi D. Q. Bui

    Abstract: Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assign… ▽ More

    Submitted 14 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Work in progress

  19. arXiv:2404.04408  [pdf, other

    cs.CE

    A novel section-section potential for short-range interactions between plane beams

    Authors: A. Borković, M. H. Gfrerer, R. A. Sauer, B. Marussig, T. Q. Bui

    Abstract: We derive a novel formulation for the interaction potential between deformable fibers due to short-range fields arising from intermolecular forces. The formulation improves the existing section-section interaction potential law for in-plane beams by considering an offset between interacting cross sections. The new law is asymptotically consistent, which is particularly beneficial for computational… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  20. arXiv:2403.14592  [pdf, other

    cs.SE cs.AI cs.HC

    Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals

    Authors: Khanh Nghiem, Anh Minh Nguyen, Nghi D. Q. Bui

    Abstract: As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants. AI coding assistants should set clear expectations for usage, integrate with advanced IDE capabilities and existing extensions, use extendable backend designs, and collect app data responsibly for downstream analyses. We propose open q… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  21. Solving Combinatorial Pricing Problems using Embedded Dynamic Programming Models

    Authors: Quang Minh Bui, Margarida Carvalho, José Neto

    Abstract: The combinatorial pricing problem (CPP) is a bilevel problem in which the leader maximizes their revenue by imposing tolls on certain items that they can control. Based on the tolls set by the leader, the follower selects a subset of items corresponding to an optimal solution of a combinatorial optimization problem. To accomplish the leader's goal, the tolls need to be sufficiently low to discoura… ▽ More

    Submitted 29 March, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

    MSC Class: 90C46; 90C27; 90C39

    Journal ref: INFORMS Journal on Computing, 2025

  22. arXiv:2403.06095  [pdf, other

    cs.SE cs.AI

    RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

    Authors: Huy N. Phan, Hoang N. Phan, Tien N. Nguyen, Nghi D. Q. Bui

    Abstract: Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed… ▽ More

    Submitted 14 August, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  23. arXiv:2311.03366  [pdf, other

    cs.SE cs.AI cs.LG

    Functional Overlap Reranking for Neural Code Generation

    Authors: Hung Quoc To, Minh Huynh Nguyen, Nghi D. Q. Bui

    Abstract: Code Large Language Models (CodeLLMs) have ushered in a new era in code generation advancements. However, selecting the best code solutions from all possible CodeLLM outputs remains a challenge. Previous methods often overlooked the intricate functional similarities and interactions between solution clusters. We introduce SRank, a novel reranking strategy for selecting the best solutions from code… ▽ More

    Submitted 7 August, 2024; v1 submitted 16 October, 2023; originally announced November 2023.

    Comments: ACL 2024, Long Findings

  24. arXiv:2311.00993  [pdf, other

    cs.LG

    Scalable Probabilistic Forecasting in Retail with Gradient Boosted Trees: A Practitioner's Approach

    Authors: Xueying Long, Quang Bui, Grady Oktavian, Daniel F. Schmidt, Christoph Bergmeir, Rakshitha Godahewa, Seong Per Lee, Kaifeng Zhao, Paul Condylis

    Abstract: The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more i… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  25. arXiv:2306.06347  [pdf, other

    cs.SE

    DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies

    Authors: Anh T. V. Dau, Jin L. C. Guo, Nghi D. Q. Bui

    Abstract: Comments within source code are essential for developers to comprehend the code's purpose and ensure its correct usage. However, as codebases evolve, maintaining an accurate alignment between the comments and the code becomes increasingly challenging. Recognizing the growing interest in automated solutions for detecting and correcting differences between code and its accompanying comments, current… ▽ More

    Submitted 2 February, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

    Journal ref: EACL 2024 - Demonstration track

  26. arXiv:2306.00029  [pdf, other

    cs.SE cs.AI

    CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

    Authors: Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi

    Abstract: Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Ongoing work - Draft Preview

  27. arXiv:2305.07922  [pdf, other

    cs.CL cs.LG cs.PL

    CodeT5+: Open Code Large Language Models for Code Understanding and Generation

    Authors: Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

    Abstract: Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limi… ▽ More

    Submitted 20 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

    Comments: 26 pages, preprint

  28. arXiv:2305.06156  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

    Authors: Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui

    Abstract: We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code. We present methods for thoroughly extracting samples that use both rule-based and deep learning-based methods to ensure that they contain high-quality pairs of code and text, resulting in a dataset of 43 million high-quality code-text… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023, Long Findings

  29. arXiv:2305.01384  [pdf, other

    cs.CL cs.LG

    Class based Influence Functions for Error Detection

    Authors: Thang Nguyen-Duc, Hoang Thanh-Tung, Quan Hung Tran, Dang Huu-Tien, Hieu Ngoc Nguyen, Anh T. V. Dau, Nghi D. Q. Bui

    Abstract: Influence functions (IFs) are a powerful tool for detecting anomalous examples in large scale datasets. However, they are unstable when applied to deep networks. In this paper, we provide an explanation for the instability of IFs and develop a solution to this problem. We show that IFs are unreliable when the two data points belong to two different classes. Our solution leverages class information… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Thang Nguyen-Duc, Hoang Thanh-Tung, and Quan Hung Tran are co-first authors of this paper. 12 pages, 12 figures. Accepted to ACL 2023

  30. arXiv:2304.01228  [pdf, other

    cs.CL cs.AI

    Better Language Models of Code through Self-Improvement

    Authors: Hung Quoc To, Nghi D. Q. Bui, Jin Guo, Tien N. Nguyen

    Abstract: Pre-trained language models for code (PLMCs) have gained attention in recent research. These models are pre-trained on large-scale datasets using multi-modal objectives. However, fine-tuning them requires extensive supervision and is limited by the size of the dataset provided. We aim to improve this issue by proposing a simple data augmentation framework. Our framework utilizes knowledge gained d… ▽ More

    Submitted 9 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted to Findings, ACL 2023

  31. arXiv:2212.10723  [pdf, other

    cs.AI

    Predict+Optimize Problem in Renewable Energy Scheduling

    Authors: Christoph Bergmeir, Frits de Nijs, Evgenii Genov, Abishek Sriramulu, Mahdi Abolghasemi, Richard Bean, John Betts, Quang Bui, Nam Trong Dinh, Nils Einecke, Rasul Esmaeilbeigi, Scott Ferraro, Priya Galketiya, Robert Glasgow, Rakshitha Godahewa, Yanfei Kang, Steffen Limmer, Luis Magdalena, Pablo Montero-Manso, Daniel Peralta, Yogesh Pipada Sunil Kumar, Alejandro Rosales-Pérez, Julian Ruddick, Akylas Stratigakos, Peter Stuckey , et al. (3 additional authors not shown)

    Abstract: Predict+Optimize frameworks integrate forecasting and optimization to address real-world challenges such as renewable energy scheduling, where variability and uncertainty are critical factors. This paper benchmarks solutions from the IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling, focusing on forecasting renewable production and demand and optimizing energy cost.… ▽ More

    Submitted 14 April, 2025; v1 submitted 20 December, 2022; originally announced December 2022.

  32. arXiv:2211.14875  [pdf, other

    cs.SE cs.CL

    Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

    Authors: Nghi D. Q. Bui, Yue Wang, Steven Hoi

    Abstract: Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug localization and program repair (or bug fixing). However, these techniques often focus only on either one of them or approach them in a stage-wise manner, ignoring the mutual benefits between them. In t… ▽ More

    Submitted 22 December, 2022; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted to EMNLP 2022 Findings Track

  33. arXiv:2208.06202  [pdf, other

    cs.CV cs.LG eess.IV

    Image Translation Based Nuclei Segmentation for Immunohistochemistry Images

    Authors: Roger Trullo, Quoc-Anh Bui, Qi Tang, Reza Olfati-Saber

    Abstract: Numerous deep learning based methods have been developed for nuclei segmentation for H&E images and have achieved close to human performance. However, direct application of such methods to another modality of images, such as Immunohistochemistry (IHC) images, may not achieve satisfactory performance. Thus, we developed a Generative Adversarial Network (GAN) based approach to translate an IHC image… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

  34. arXiv:2205.15479  [pdf, other

    cs.SE cs.AI cs.PL

    HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

    Authors: Minh Huynh Nguyen, Nghi D. Q. Bui, Truong Son Hy, Long Tran-Thanh, Tien N. Nguyen

    Abstract: We propose a novel method for code summarization utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs effectively capture essential code features at lexical, syntactic, and semantic levels by abstracting coarse-grained code elements and incorporating fine-grained program elements in a hierarchical structure. Our HierarchyNet method processes each layer… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  35. arXiv:2205.13022  [pdf, ps, other

    cs.SE cs.AI cs.PL

    Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora

    Authors: Anh T. V. Dau, Thang Nguyen-Duc, Hoang Thanh-Tung, Nghi D. Q. Bui

    Abstract: Despite the recent trend of developing and applying neural source code models to software engineering tasks, the quality of such models is insufficient for real-world use. This is because there could be noise in the source code corpora used to train such models. We adapt data-influence methods to detect such noises in this paper. Data-influence methods are used in machine learning to evaluate the… ▽ More

    Submitted 2 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: The 37th IEEE/ACM International Conference on Automated Software Engineering

  36. arXiv:2203.10233  [pdf, other

    cs.CV

    DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

    Authors: Thanh-Dat Truong, Quoc-Huy Bui, Chi Nhan Duong, Han-Seok Seo, Son Lam Phung, Xin Li, Khoa Luu

    Abstract: Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., h… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  37. arXiv:2112.11226   

    cs.LG cs.AI cs.PL cs.SE

    Energy-bounded Learning for Robust Models of Code

    Authors: Nghi D. Q. Bui, Yijun Yu

    Abstract: In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on. Various representations of code in terms of tokens, syntax trees, dependency graphs, code navigation paths, or a combination of their variants have been proposed, however, existing vanilla learning techniques have a major limitation… ▽ More

    Submitted 9 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: There are some flaws in our experiments, we would like to fix it and publish a fixed version again in the very near future

  38. arXiv:2108.06185  [pdf

    cs.CV

    CNN-based Two-Stage Parking Slot Detection Using Region-Specific Multi-Scale Feature Extraction

    Authors: Quang Huy Bui, Jae Kyu Suhr

    Abstract: Autonomous parking systems start with the detection of available parking slots. Parking slot detection performance has been dramatically improved by deep learning techniques. Deep learning-based object detection methods can be categorized into one-stage and two-stage approaches. Although it is well-known that the two-stage approach outperforms the one-stage approach in general object detection, th… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

  39. arXiv:2106.13405  [pdf, other

    cs.CL

    JNLP Team: Deep Learning Approaches for Legal Processing Tasks in COLIEE 2021

    Authors: Ha-Thanh Nguyen, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Vu Tran, Minh Le Nguyen, Ken Satoh

    Abstract: COLIEE is an annual competition in automatic computerized legal text processing. Automatic legal document processing is an ambitious goal, and the structure and semantics of the law are often far more complex than everyday language. In this article, we survey and report our methods and experimental results in using deep learning in legal document processing. The results show the difficulties as we… ▽ More

    Submitted 7 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's proceeding

  40. arXiv:2106.13403  [pdf, other

    cs.CL cs.AI

    ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing

    Authors: Ha-Thanh Nguyen, Vu Tran, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Minh Le Nguyen, Ken Satoh

    Abstract: Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original se… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's Proceeding

  41. A Catalog of Formulations for the Network Pricing Problem

    Authors: Quang Minh Bui, Bernard Gendron, Margarida Carvalho

    Abstract: We study the network pricing problem where the leader maximizes their revenue by determining the optimal amounts of tolls to charge on a set of arcs, under the assumption that the followers will react rationally and choose the shortest paths to travel. Many distinct single-level reformulations to this bilevel optimization program have been proposed, however, their relationship has not been establi… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 35 pages, 7 figures

    MSC Class: 90C35 ACM Class: G.1.6

    Journal ref: INFORMS Journal on Computing 2022 34:5, 2658-2674

  42. Multigrid reduction preconditioning framework for coupled processes in porous and fractured media

    Authors: Quan M. Bui, Francois P. Hamon, Nicola Castelletto, Daniel Osei-Kuffuor, Randolph R. Settgast, Joshua A. White

    Abstract: Many subsurface engineering applications involve tight-coupling between fluid flow, solid deformation, fracturing, and similar processes. To better understand the complex interplay of different governing equations, and therefore design efficient and safe operations, numerical simulations are widely used. Given the relatively long time-scales of interest, fully-implicit time-stepping schemes are of… ▽ More

    Submitted 30 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    MSC Class: 65Z05; 65F08; 65F50

  43. arXiv:2012.07023  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees

    Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

    Abstract: Building deep learning models on source code has found many successful software engineering applications, such as code search, code comment generation, bug detection, code migration, and so on. Current learning techniques, however, have a major drawback that these models are mostly trained on datasets labeled for particular downstream tasks, and code representations may not be suitable for other t… ▽ More

    Submitted 15 December, 2020; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: Accepted at ICSE 2021

  44. arXiv:2011.08071  [pdf, other

    cs.CL cs.IR cs.LG

    JNLP Team: Deep Learning for Legal Processing in COLIEE 2020

    Authors: Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, Minh Le Nguyen

    Abstract: We propose deep learning based methods for automatic systems of legal retrieval and legal question-answering in COLIEE 2020. These systems are all characterized by being pre-trained on large amounts of data before being finetuned for the specified tasks. This approach helps to overcome the data scarcity and achieve good performance, thus can be useful for tackling related problems in information r… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Also be published in JURISIN2020

  45. arXiv:2009.09777  [pdf, other

    cs.SE cs.AI cs.PL

    TreeCaps: Tree-Based Capsule Networks for Source Code Processing

    Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

    Abstract: Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). Although graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code needs static code semantic analysis that may not be accurate and introduces… ▽ More

    Submitted 14 December, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

    Comments: Accepted at AAAI 2021

  46. arXiv:2009.02731  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations

    Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

    Abstract: We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in… ▽ More

    Submitted 23 May, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

    Comments: Accepted at SIGIR 2021

  47. On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

    Authors: Md Rafiqul Islam Rabin, Nghi D. Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, Mohammad Amin Alipour

    Abstract: With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they gen… ▽ More

    Submitted 18 March, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: Information and Software Technology, IST Journal 2021, Elsevier. Related to arXiv:2004.07313

  48. arXiv:1910.12306  [pdf, ps, other

    cs.LG cs.SE stat.ML

    TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing

    Authors: Vinoj Jayasundara, Nghi Duy Quoc Bui, Lingxiao Jiang, David Lo

    Abstract: Program comprehension is a fundamental task in software development and maintenance processes. Software developers often need to understand a large amount of existing code before they can develop new features or fix bugs in existing programs. Being able to process programming language code automatically and provide summaries of code functionality accurately can significantly help developers to red… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: in NeurIPS Workshop on ML for Systems, 2019

  49. arXiv:1906.03835  [pdf, other

    cs.LG cs.SE stat.ML

    SAR: Learning Cross-Language API Mappings with Little Knowledge

    Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

    Abstract: To save manual effort, developers often translate programs from one programming language to another, instead of implementing it from scratch. Translating application program interfaces (APIs) used in one language to functionally equivalent ones available in another language is an important aspect of program translation. Existing approaches facilitate the translation by automatically identifying th… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Accepted at the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

  50. arXiv:1804.09322  [pdf

    cs.CV

    Robust Anomaly-Based Ship Proposals Detection from Pan-sharpened High-Resolution Satellite Image

    Authors: Viet Hung Luu, Nguyen Hoang Hoa Luong, Quang Hung Bui, Thi Nhat Thanh Nguyen

    Abstract: Pre-screening of ship proposals is now employed by top ship detectors to avoid exhaustive search across image. In very high resolution (VHR) optical image, ships appeared as a cluster of abnormal bright pixels in open sea clutter (noise-like background). Anomaly-based detector utilizing Panchromatic (PAN) data has been widely used in many researches to detect ships, however, still facing two main… ▽ More

    Submitted 24 April, 2018; originally announced April 2018.