Skip to main content

Showing 1–19 of 19 results for author: Nguyen, M L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18318  [pdf, ps, other

    cs.CL

    Enhancing Entity Aware Machine Translation with Multi-task Learning

    Authors: An Trieu, Phuong Nguyen, Minh Le Nguyen

    Abstract: Entity-aware machine translation (EAMT) is a complicated task in natural language processing due to not only the shortage of translation data related to the entities needed to translate but also the complexity in the context needed to process while translating those entities. In this paper, we propose a method that applies multi-task learning to optimize the performance of the two subtasks named e… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: In the Proceedings of SCIDOCA 2025

  2. arXiv:2506.18316  [pdf, ps, other

    cs.IR cs.CL

    Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval

    Authors: Trieu An, Long Nguyen, Minh Le Nguyen

    Abstract: The Citation Discovery Shared Task focuses on predicting the correct citation from a given candidate pool for a given paragraph. The main challenges stem from the length of the abstract paragraphs and the high similarity among candidate abstracts, making it difficult to determine the exact paper to cite. To address this, we develop a system that first retrieves the top-k most similar abstracts bas… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: In the Proceedings of SCIDOCA 2025

  3. arXiv:2506.18311  [pdf, ps, other

    cs.IR cs.CL

    Enhancing Document Retrieval in COVID-19 Research: Leveraging Large Language Models for Hidden Relation Extraction

    Authors: Hoang-An Trieu, Dinh-Truong Do, Chau Nguyen, Vu Tran, Minh Le Nguyen

    Abstract: In recent years, with the appearance of the COVID-19 pandemic, numerous publications relevant to this disease have been issued. Because of the massive volume of publications, an efficient retrieval system is necessary to provide researchers with useful information if an unexpected pandemic happens so suddenly, like COVID-19. In this work, we present a method to help the retrieval system, the Covre… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: In the Proceedings of SCIDOCA 2024

  4. arXiv:2506.02529  [pdf, ps, other

    cs.SE cs.AI cs.CL

    Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs

    Authors: Nguyen-Khang Le, Quan Minh Bui, Minh Ngoc Nguyen, Hiep Nguyen, Trung Vo, Son T. Luu, Shoshin Nomura, Minh Le Nguyen

    Abstract: Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces. Recent advances in large language models (LLMs) have shown promise in automating complex tasks, but limitations persist in handling dynamic navigation flows and complex form interactions. This paper presents an automated system… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Published in the Proceedings of JSAI 2025

    ACM Class: I.2.7

  5. arXiv:2505.13244  [pdf, ps, other

    cs.CL cs.LG

    JNLP at SemEval-2025 Task 11: Cross-Lingual Multi-Label Emotion Detection Using Generative Models

    Authors: Jieying Xue, Phuong Minh Nguyen, Minh Le Nguyen, Xin Liu

    Abstract: With the rapid advancement of global digitalization, users from different countries increasingly rely on social media for information exchange. In this context, multilingual multi-label emotion detection has emerged as a critical research area. This study addresses SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. Our paper focuses on two sub-tracks of this task: (1) Track A:… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Published in The 19th International Workshop on Semantic Evaluation (SemEval-2025)

  6. arXiv:2504.02283  [pdf

    cs.LG

    Ga$_2$O$_3$ TCAD Mobility Parameter Calibration using Simulation Augmented Machine Learning with Physics Informed Neural Network

    Authors: Le Minh Long Nguyen, Edric Ong, Matthew Eng, Yuhao Zhang, Hiu Yung Wong

    Abstract: In this paper, we demonstrate the possibility of performing automatic Technology Computer-Aided-Design (TCAD) parameter calibration using machine learning, verified with experimental data. The machine only needs to be trained by TCAD data. Schottky Barrier Diode (SBD) fabricated with emerging ultra-wide-bandgap material, Gallium Oxide (Ga$_2$O$_3$), is measured and its current-voltage (IV) is used… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 4 pages, 3 figures

  7. arXiv:2408.12959  [pdf, other

    cs.CL cs.AI

    Multimodal Contrastive In-Context Learning

    Authors: Yosuke Miyanishi, Minh Le Nguyen

    Abstract: The rapid growth of Large Language Models (LLMs) usage has highlighted the importance of gradient-free in-context learning (ICL). However, interpreting their inner workings remains challenging. This paper introduces a novel multimodal contrastive in-context learning framework to enhance our understanding of ICL in LLMs. First, we present a contrastive learning-based interpretation of ICL in real-w… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  8. arXiv:2405.08311  [pdf, ps, other

    cs.CL cs.AI

    A Decoupling and Aggregating Framework for Joint Extraction of Entities and Relations

    Authors: Yao Wang, Xin Liu, Weikun Kong, Hai-Tao Yu, Teeradaj Racharak, Kyoung-Sook Kim, Minh Le Nguyen

    Abstract: Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, informa… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  9. On the variants of SVM methods applied to GPR data to classify tack coat characteristics in French pavements: two experimental case studies

    Authors: Grégory Andreoli, Amine Ihamouten, Mai Lan Nguyen, Yannick Fargier, Cyrille Fauchard, Jean-Michel Simonin, Viktoriia Buliuk, David Souriou, Xavier Dérobert

    Abstract: Among the commonly used non-destructive techniques, the Ground Penetrating Radar (GPR) is one of the most widely adopted today for assessing pavement conditions in France. However, conventional radar systems and their forward processing methods have shown their limitations for the physical and geometrical characterization of very thin layers such as tack coats. However, the use of Machine Learning… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Journal ref: 2023 12th International Workshop on Advanced Ground Penetrating Radar (IWAGPR), LNEC, Jul 2023, Lisbon, Portugal. pp.1-5

  10. Encoded Summarization: Summarizing Documents into Continuous Vector Space for Legal Case Retrieval

    Authors: Vu Tran, Minh Le Nguyen, Satoshi Tojo, Ken Satoh

    Abstract: We present our method for tackling a legal case retrieval task by introducing our method of encoding documents by summarizing them into continuous vector space via our phrase scoring framework utilizing deep neural networks. On the other hand, we explore the benefits from combining lexical features and latent features generated with neural networks. Our experiments show that lexical features and l… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Published 2020-01-25 in AI and Law. arXiv admin note: text overlap with arXiv:2009.14083

  11. arXiv:2308.11585  [pdf, other

    cs.AI cs.CL

    Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes

    Authors: Yosuke Miyanishi, Minh Le Nguyen

    Abstract: Amidst the rapid expansion of Machine Learning (ML) and Large Language Models (LLMs), understanding the semantics within their mechanisms is vital. Causal analyses define semantics, while gradient-based methods are essential to eXplainable AI (XAI), interpreting the model's 'black box'. Integrating these, we investigate how a model's mechanisms reveal its causal effect on evidence-based decision-m… ▽ More

    Submitted 23 March, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted to LREC-COLING 2024

  12. arXiv:2211.02200  [pdf, other

    cs.CL cs.AI

    Miko Team: Deep Learning Approach for Legal Question Answering in ALQAC 2022

    Authors: Hieu Nguyen Van, Dat Nguyen, Phuong Minh Nguyen, Minh Le Nguyen

    Abstract: We introduce efficient deep learning-based methods for legal document processing including Legal Document Retrieval and Legal Question Answering tasks in the Automated Legal Question Answering Competition (ALQAC 2022). In this competition, we achieve 1\textsuperscript{st} place in the first task and 3\textsuperscript{rd} place in the second task. Our method is based on the XLM-RoBERTa model that i… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  13. arXiv:2106.13405  [pdf, other

    cs.CL

    JNLP Team: Deep Learning Approaches for Legal Processing Tasks in COLIEE 2021

    Authors: Ha-Thanh Nguyen, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Vu Tran, Minh Le Nguyen, Ken Satoh

    Abstract: COLIEE is an annual competition in automatic computerized legal text processing. Automatic legal document processing is an ambitious goal, and the structure and semantics of the law are often far more complex than everyday language. In this article, we survey and report our methods and experimental results in using deep learning in legal document processing. The results show the difficulties as we… ▽ More

    Submitted 7 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's proceeding

  14. arXiv:2106.13403  [pdf, other

    cs.CL cs.AI

    ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing

    Authors: Ha-Thanh Nguyen, Vu Tran, Phuong Minh Nguyen, Thi-Hai-Yen Vuong, Quan Minh Bui, Chau Minh Nguyen, Binh Tran Dang, Minh Le Nguyen, Ken Satoh

    Abstract: Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original se… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Also published in COLIEE 2021's Proceeding

  15. arXiv:2011.08071  [pdf, other

    cs.CL cs.IR cs.LG

    JNLP Team: Deep Learning for Legal Processing in COLIEE 2020

    Authors: Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, Minh Le Nguyen

    Abstract: We propose deep learning based methods for automatic systems of legal retrieval and legal question-answering in COLIEE 2020. These systems are all characterized by being pre-trained on large amounts of data before being finetuned for the specified tasks. This approach helps to overcome the data scarcity and achieve good performance, thus can be useful for tackling related problems in information r… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Also be published in JURISIN2020

  16. Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model

    Authors: Vu Tran, Minh Le Nguyen, Ken Satoh

    Abstract: We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the docu… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  17. arXiv:1809.05219  [pdf, other

    cs.CL

    Automatic Catchphrase Extraction from Legal Case Documents via Scoring using Deep Neural Networks

    Authors: Vu Tran, Minh Le Nguyen, Ken Satoh

    Abstract: In this paper, we present a method of automatic catchphrase extracting from legal case documents. We utilize deep neural networks for constructing scoring model of our extraction system. We achieve comparable performance with systems using corpus-wide and citation information which we do not use in our system.

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: MIning and REasoning with Legal text, MIREL 2018

  18. arXiv:1802.04986  [pdf, other

    cs.SE cs.LG cs.NE

    Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

    Authors: Anh Viet Phan, Minh Le Nguyen, Lam Thu Bui

    Abstract: Existing defects in software components is unavoidable and leads to not only a waste of time and money but also many serious consequences. To build predictive models, previous studies focus on manually extracting features or using tree representations of programs, and exploiting different machine learning algorithms. However, the performance of the models is not high since the existing features an… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: presented at ICTAI 2017

    Journal ref: ICTAI 2017

  19. arXiv:1609.00799  [pdf, other

    cs.IR cs.CL

    Lexical-Morphological Modeling for Legal Text Analysis

    Authors: Danilo S. Carvalho, Minh-Tien Nguyen, Tran Xuan Chien, Minh Le Nguyen

    Abstract: In the context of the Competition on Legal Information Extraction/Entailment (COLIEE), we propose a method comprising the necessary steps for finding relevant documents to a legal question and deciding on textual entailment evidence to provide a correct answer. The proposed method is based on the combination of several lexical and morphological characteristics, to build a language model and a set… ▽ More

    Submitted 3 September, 2016; originally announced September 2016.

    Comments: 16 pages, 5 figures, Lecture notes in computer science: New Frontiers in Artificial Intelligence, 2016/03

    MSC Class: 14J30 (Primary) ACM Class: H.3, H.3.3, I.2.7