Skip to main content

Showing 1–50 of 76 results for author: Dam, H

.
  1. arXiv:2505.23848  [pdf, ps, other

    cs.CL cs.LG

    Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models

    Authors: Harvey Dam, Jonas Knochelmann, Vinu Joseph, Ganesh Gopalakrishnan

    Abstract: We introduce a method to reduce refusal rates of large language models (LLMs) on sensitive content without modifying model weights or prompts. Motivated by the observation that refusals in certain models were often preceded by the specific token sequence of a token marking the beginning of the chain-of-thought (CoT) block (<think>) followed by a double newline token (\n\n), we investigate the impa… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  2. arXiv:2504.04202  [pdf, other

    cs.LG

    Directional Sign Loss: A Topology-Preserving Loss Function that Approximates the Sign of Finite Differences

    Authors: Harvey Dam, Tripti Agarwal, Ganesh Gopalakrishnan

    Abstract: Preserving critical topological features in learned latent spaces is a fundamental challenge in representation learning, particularly for topology-sensitive data. This paper introduces directional sign loss (DSL), a novel loss function that approximates the number of mismatches in the signs of finite differences between corresponding elements of two arrays. By penalizing discrepancies in critical… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

    ACM Class: I.2.6

  3. arXiv:2502.14631  [pdf, other

    cs.LG

    Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery

    Authors: Minh-Quyet Ha, Dinh-Khiet Le, Duc-Anh Dao, Tien-Sinh Vu, Duong-Nguyen Nguyen, Viet-Cuong Nguyen, Hiori Kino, Van-Nam Huynh, Hieu-Chi Dam

    Abstract: Discovering novel high-entropy alloys (HEAs) with desirable properties is challenging due to the vast compositional space and complex phase formation mechanisms. Efficient exploration of this space requires a strategic approach that integrates heterogeneous knowledge sources. Here, we propose a framework that systematically combines knowledge extracted from computational material datasets with dom… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 13 pages, 7 figures

  4. arXiv:2501.00779  [pdf, other

    cs.SI cs.AI

    REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization

    Authors: Huyen Nguyen, Hieu Dam, Nguyen Do, Cong Tran, Cuong Pham

    Abstract: In social online platforms, identifying influential seed users to maximize influence spread is a crucial as it can greatly diminish the cost and efforts required for information dissemination. While effective, traditional methods for Multiplex Influence Maximization (MIM) have reached their performance limits, prompting the emergence of learning-based approaches. These novel methods aim for better… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  5. arXiv:2410.03069  [pdf, other

    cs.SE

    Interactive GDPR-Compliant Privacy Policy Generation for Software Applications

    Authors: Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Omar Haggag, John Grundy

    Abstract: Software applications are designed to assist users in conducting a wide range of tasks or interactions. They have become prevalent and play an integral part in people's lives in this digital era. To use those software applications, users are sometimes requested to provide their personal information. As privacy has become a significant concern and many data protection regulations exist worldwide, s… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  6. arXiv:2408.17244  [pdf, other

    cs.LG

    Categorical data clustering: 25 years beyond K-modes

    Authors: Tai Dinh, Wong Hauchi, Philippe Fournier-Viger, Daniil Lisik, Minh-Quyet Ha, Hieu-Chi Dam, Van-Nam Huynh

    Abstract: The clustering of categorical data is a common and important task in computer science, offering profound implications across a spectrum of applications. Unlike purely numerical data, categorical data often lack inherent ordering as in nominal data, or have varying levels of order as in ordinal data, thus requiring specialized methodologies for efficient organization and analysis. This review provi… ▽ More

    Submitted 24 January, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted at Expert Systems With Applications

  7. What Operations can be Performed Directly on Compressed Arrays, and with What Error?

    Authors: Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan

    Abstract: In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental oper… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: An extended but earlier version of paper in https://dl.acm.org/doi/10.1145/3624062.3625122 published at the DRBSD Workshop in 2023

  8. arXiv:2402.14941  [pdf

    physics.chem-ph

    Density Functional Theory Calculations of the thermochemistry of the dehydration of 2-propanol

    Authors: Eugene Stephane Mananga, Aissata Diop, Paulin Dongomale, Fambougouri Diane, Hubertus van Dam

    Abstract: Electronic structure theory provides a foundation for understanding chemical transformations and processes in complex chemical environments. Our work is focused on the NWChemEx project that has selected two interrelated science challenges that address the production of advanced biomass-derived fuels and other value-added chemical compounds. One of which is the dehydration of 2-propanol over a zeol… ▽ More

    Submitted 18 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Added page numbers

  9. arXiv:2402.10291  [pdf, other

    cs.LG stat.ML

    An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM

    Authors: Vijayalakshmi Saravanan, Perry Siehien, Shinjae Yoo, Hubertus Van Dam, Thomas Flynn, Christopher Kelly, Khaled Z Ibrahim

    Abstract: Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between su… ▽ More

    Submitted 4 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 16 pages. arXiv admin note: text overlap with arXiv:1903.01661

    MSC Class: CCS

  10. arXiv:2311.05782  [pdf, other

    cs.DC

    MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications

    Authors: Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li

    Abstract: Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significan… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  11. arXiv:2308.01921  [pdf, other

    q-bio.BM cs.AI cs.LG

    Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

    Authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Lu

    Abstract: Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we… ▽ More

    Submitted 14 September, 2023; v1 submitted 17 July, 2023; originally announced August 2023.

    Comments: 8 pages, 5 figures, 2 tables, accepted by ICLMA2023

    ACM Class: I.2.1

  12. arXiv:2307.11305  [pdf, other

    cs.SE

    Quantum Software Analytics: Opportunities and Challenges

    Authors: Thong Hoang, Hoa Khanh Dam, Tingting Bi, Qinghua Lu, Zhenchang Xing, Liming Zhu, Lam Duc Nguyen, Shiping Chen

    Abstract: Quantum computing systems depend on the principles of quantum mechanics to perform multiple challenging tasks more efficiently than their classical counterparts. In classical software engineering, the software life cycle is used to document and structure the processes of design, implementation, and maintenance of software applications. It helps stakeholders understand how to build an application.… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  13. arXiv:2306.06238  [pdf, other

    cs.LG cs.AI cs.CV

    Understanding the Effect of the Long Tail on Neural Network Compression

    Authors: Harvey Dam, Vinu Joseph, Aditya Bhaskara, Ganesh Gopalakrishnan, Saurav Muralidharan, Michael Garland

    Abstract: Network compression is now a mature sub-field of neural network research: over the last decade, significant progress has been made towards reducing the size of models and speeding up inference, while maintaining the classification accuracy. However, many works have observed that focusing on just the overall accuracy can be misguided. E.g., it has been shown that mismatches between the full and com… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  14. Workflows Community Summit 2022: A Roadmap Revolution

    Authors: Rafael Ferreira da Silva, Rosa M. Badia, Venkat Bala, Debbie Bard, Peer-Timo Bremer, Ian Buckley, Silvina Caino-Lores, Kyle Chard, Carole Goble, Shantenu Jha, Daniel S. Katz, Daniel Laney, Manish Parashar, Frederic Suter, Nick Tyler, Thomas Uram, Ilkay Altintas, Stefan Andersson, William Arndt, Juan Aznar, Jonathan Bader, Bartosz Balis, Chris Blanton, Kelly Rosa Braghetto, Aharon Brodutch , et al. (80 additional authors not shown)

    Abstract: Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Report number: ORNL/TM-2023/2885

  15. arXiv:2301.10927  [pdf, other

    cs.AI

    Towards Knowledge-Centric Process Mining

    Authors: Asjad Khan, Arsal Huda, Aditya Ghose, Hoa Khanh Dam

    Abstract: Process analytic approaches play a critical role in supporting the practice of business process management and continuous process improvement by leveraging process-related data to identify performance bottlenecks, extracting insights about reducing costs and optimizing the utilization of available resources. Process analytic techniques often have to contend with real-world settings where available… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  16. arXiv:2301.10398  [pdf, other

    cs.OH

    Advances in Process Optimization: A Comprehensive Survey of Process Mining, Predictive Process Monitoring, and Process-Aware Recommender Systems

    Authors: Asjad Khan, Aditya Ghose, Hoa Dam, Arsal Syed

    Abstract: Process analytics approaches allow organizations to support the practice of Business Process Management and continuous improvement by leveraging all process-related data to extract knowledge, improve process performance and support decision-making across the organization. Process execution data once collected will contain hidden insights and actionable knowledge that are of considerable business v… ▽ More

    Submitted 23 February, 2025; v1 submitted 24 January, 2023; originally announced January 2023.

  17. arXiv:2205.05976  [pdf, other

    cs.IR cs.CV

    TaDeR: A New Task Dependency Recommendation for Project Management Platform

    Authors: Quynh Nguyen, Dac H. Nguyen, Son T. Huynh, Hoa K. Dam, Binh T. Nguyen

    Abstract: Many startups and companies worldwide have been using project management software and tools to monitor, track and manage their projects. For software projects, the number of tasks from the beginning to the end is quite a large number that sometimes takes a lot of time and effort to search and link the current task to a group of previous ones for further references. This paper proposes an efficient… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 28 pages, 1 figure, 18 tables

  18. arXiv:2205.00829  [pdf, ps, other

    cs.AI cond-mat.mtrl-sci

    Function Decomposition Tree with Causality-First Perspective and Systematic Description of Problems in Materials Informatics

    Authors: Hiori Kino, Hieu-Chi Dam, Takashi Miyake, Riichiro Mizoguchi

    Abstract: As interdisciplinary science is flourishing because of materials informatics and additional factors; a systematic way is required for expressing knowledge and facilitating communication between scientists in various fields. A function decomposition tree is such a representation, but domain scientists face difficulty in constructing it. Thus, this study cites the general problems encountered by beg… ▽ More

    Submitted 26 April, 2022; originally announced May 2022.

    Comments: 41 page, 13 figures

  19. arXiv:2204.14159  [pdf, other

    cs.CR

    Symbolic analysis meets federated learning to enhance malware identifier

    Authors: Khanh Huu The Dam, Charles-Henry Bertrand Van Ouytsel, Axel Legay

    Abstract: Over past years, the manually methods to create detection rules were no longer practical in the anti-malware product since the number of malware threats has been growing. Thus, the turn to the machine learning approaches is a promising way to make the malware recognition more efficient. The traditional centralized machine learning requires a large amount of data to train a model with excellent per… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

  20. arXiv:2112.13997  [pdf, other

    cs.SE

    On Privacy Weaknesses and Vulnerabilities in Software Systems

    Authors: Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Aditya Ghose

    Abstract: In this digital era, our privacy is under constant threat as our personal data and traceable online/offline activities are frequently collected, processed and transferred by many software applications. Privacy attacks are often formed by exploiting vulnerabilities found in those software applications. The Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) systems are… ▽ More

    Submitted 9 February, 2023; v1 submitted 28 December, 2021; originally announced December 2021.

  21. arXiv:2112.13994  [pdf, other

    cs.CR cs.SE

    Mining and Classifying Privacy and Data Protection Requirements in Issue Reports

    Authors: Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, Aditya Ghose

    Abstract: Digital and physical footprints are a trail of user activities collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data pro… ▽ More

    Submitted 27 March, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2101.01298

  22. arXiv:2102.02584  [pdf, other

    cs.SE

    Human Values in Software Release Planning

    Authors: Davoud Mougouei, Aditya Ghose, Hoa Dam, David Powers

    Abstract: Software products have become an integral part of human lives, and therefore need to account for human values such as privacy, fairness, and equality. Ignoring human values in software development leads to biases and violations of human values: racial biases in recidivism assessment and facial recognition software are well-known examples of such issues. One of the most critical steps in software d… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  23. arXiv:2101.01298  [pdf, other

    cs.SE

    A Taxonomy for Mining and Classifying Privacy Requirements in Issue Reports

    Authors: Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, Aditya Ghose

    Abstract: Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data proces… ▽ More

    Submitted 5 February, 2023; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: Accepted at Journal of Information and Software Technology

  24. A Framework for Conditional Statement Technical Debt Identification and Description

    Authors: Abdulaziz Alhefdhi, Hoa Khanh Dam, Yusuf Sulistyo Nugroho, Hideaki Hata, Takashi Ishio, Aditya Ghose

    Abstract: Technical Debt occurs when development teams favour short-term operability over long-term stability. Since this places software maintainability at risk, technical debt requires early attention to avoid paying for accumulated interest. Most of the existing work focuses on detecting technical debt using code comments, known as Self-Admitted Technical Debt (SATD). However, there are many cases where… ▽ More

    Submitted 13 October, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Journal ref: Autom Softw Eng 29, 60 (2022)

  25. arXiv:2012.11060  [pdf, other

    cs.SE

    Adversarial Patch Generation for Automated Program Repair

    Authors: Abdulaziz Alhefdhi, Hoa Khanh Dam, Thanh Le-Cong, Bach Le, Aditya Ghose

    Abstract: Automated Program Repair has attracted significant research in recent years, leading to diverse techniques that focus on two main directions: search-based and semantic-based program repair. The former techniques often face challenges due to the vast search space, resulting in difficulties in identifying correct solutions, while the latter approaches are constrained by the capabilities of the under… ▽ More

    Submitted 3 September, 2023; v1 submitted 20 December, 2020; originally announced December 2020.

  26. arXiv:2010.10517  [pdf, other

    cs.DC cs.CE

    Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

    Authors: Hyungro Lee, Andre Merzky, Li Tan, Mikhail Titov, Matteo Turilli, Dario Alfe, Agastya Bhati, Alex Brace, Austin Clyde, Peter Coveney, Heng Ma, Arvind Ramanathan, Rick Stevens, Anda Trifan, Hubertus Van Dam, Shunzhou Wan, Sean Wilkinson, Shantenu Jha

    Abstract: COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics dev… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  27. arXiv:2010.06574  [pdf, other

    cs.DC cs.CE q-bio.QM

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Authors: Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin , et al. (11 additional authors not shown)

    Abstract: The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  28. Indoor environment data time-series reconstruction using autoencoder neural networks

    Authors: Antonio Liguori, Romana Markovic, Thi Thu Ha Dam, Jérôme Frisch, Christoph van Treeck, Francesco Causone

    Abstract: As the number of installed meters in buildings increases, there is a growing number of data time-series that could be used to develop data-driven models to support and optimize building operation. However, building data sets are often characterized by errors and missing values, which are considered, by the recent research, among the main limiting factors on the performance of the proposed models.… ▽ More

    Submitted 21 January, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted in Building and Environment

    Journal ref: Building and Environment 191 (2021) 107623

  29. arXiv:2008.13742  [pdf, other

    cs.DC cs.PF

    Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

    Authors: Sungsoo Ha, Wonyong Jeong, Gyorgy Matyasfalvi, Cong Xie, Kevin Huck, Jong Youl Choi, Abid Malik, Li Tang, Hubertus Van Dam, Line Pouchard, Wei Xu, Shinjae Yoo, Nicholas D'Imperio, Kerstin Kleese Van Dam

    Abstract: Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance tra… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

  30. arXiv:2008.08818  [pdf, other

    stat.ML cond-mat.mtrl-sci cs.LG

    Ensemble learning reveals dissimilarity between rare-earth transition metal binary alloys with respect to the Curie temperature

    Authors: Duong-Nguyen Nguyen, Tien-Lam Pham, Viet-Cuong Nguyen, Hiori Kino, Takashi Miyake, Hieu-Chi Dam

    Abstract: We propose a data-driven method to extract dissimilarity between materials, with respect to a given target physical property. The technique is based on an ensemble method with Kernel ridge regression as the predicting model; multiple random subset sampling of the materials is done to generate prediction models and the corresponding contributions of the reference training materials in detail. The d… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 10 pages, 3 figures

  31. arXiv:2008.08793  [pdf, other

    cond-mat.mtrl-sci physics.chem-ph

    Explainable Machine Learning for Materials Discovery: Predicting the Potentially Formable Nd-Fe-B Crystal Structures and Extracting Structure-Stability Relationship

    Authors: Tien-Lam Pham, Duong-Nguyen Nguyen, Minh-Quyet Ha, Hiori Kino, Takashi Miyake, Hieu-Chi Dam

    Abstract: New Nd-Fe-B crystal structures can be formed via the elemental substitution of LATX host structures, including lanthanides LA, transition metals T, and light elements X as B, C, N, and O. The 5967 samples of ternary LATX materials that are collected are then used as the host structures. For each host crystal structure, a substituted crystal structure is created by substituting all lanthanide sites… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 14 pages, 7 figures

  32. arXiv:2008.08781  [pdf, other

    cond-mat.mtrl-sci physics.data-an

    Boron cage effects on Nd-Fe-B crystal structure's stability

    Authors: Duong-Nguyen Nguyen, Duc-Anh Dao, Takashi Miyake, Hieu-Chi Dam

    Abstract: In this study, we investigate the structure-stability relationship of hypothetical Nd-Fe-B crystal structures using descriptor-relevance analysis and the t-SNE dimensionality reduction method. 149 hypothetical Nd-Fe-B crystal structures are generated from 5967 LA-T-X host structures in Open Quantum Materials Database by using the elemental substitution method, with LA denoting lanthanides, T denot… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 12 pages, 8 figures

  33. arXiv:2007.03143  [pdf, other

    physics.comp-ph cs.DC physics.chem-ph

    On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

    Authors: David B. Williams-Young, Wibe A. de Jong, Hubertus J. J. van Dam, Chao Yang

    Abstract: The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increa… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: 26 pages, 9 figures

  34. arXiv:2006.02431  [pdf, other

    q-bio.BM cs.LG q-bio.QM stat.ML

    Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

    Authors: Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

    Abstract: Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort,… ▽ More

    Submitted 27 May, 2020; originally announced June 2020.

    Comments: 11 pages, 5 figures

  35. arXiv:2005.08482  [pdf, other

    stat.ML cs.LG

    Variational Hyper-Encoding Networks

    Authors: Phuoc Nguyen, Truyen Tran, Sunil Gupta, Santu Rana, Hieu-Chi Dam, Svetha Venkatesh

    Abstract: We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters θis drawn from a distribution p(θ) which is modeled by a hyper-level VAE. We propose a variational inference using Gaussian mixture models to implicitly encode the parameters θinto a low dimensional Gaussian distribution. Given a target d… ▽ More

    Submitted 12 May, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted ECML-2021

  36. arXiv:2004.12023  [pdf, other

    physics.chem-ph physics.comp-ph

    NWChem: Past, Present, and Future

    Authors: E. Aprà, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski, T. P. Straatsma, M. Valiev, H. J. J. van Dam, Y. Alexeev, J. Anchell, V. Anisimov, F. W. Aquino, R. Atta-Fynn, J. Autschbach, N. P. Bauman, J. C. Becca, D. E. Bernholdt, K. Bhaskaran-Nair, S. Bogatko, P. Borowski, J. Boschen, J. Brabec, A. Bruner, E. Cauët, Y. Chen , et al. (89 additional authors not shown)

    Abstract: Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction of atomistic and electronic properties. In this regard, electronic structure packages have played a special role by using first-principledriven methodologies to model complex chemical and materials… ▽ More

    Submitted 26 May, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: This article appeared in volume 152, issue 18, page 184102 of the Journal of Chemical Physics. It can be found at https://doi.org/10.1063/5.0004997

    Journal ref: J. Chem. Phys., 152, 184102 (2020)

  37. arXiv:1910.09902  [pdf

    cs.SE

    Theory-Software Translation: Research Challenges and Future Directions

    Authors: Caroline Jay, Robert Haines, Daniel S. Katz, Jeffrey Carver, James C. Phillips, Anshu Dubey, Sandra Gesing, Matthew Turk, Hui Wan, Hubertus van Dam, James Howison, Vitali Morozov, Steven R. Brandt

    Abstract: The Theory-Software Translation Workshop, held in New Orleans in February 2019, explored in depth the process of both instantiating theory in software - for example, implementing a mathematical model in code as part of a simulation - and using the outputs of software - such as the behavior of a simulation - to advance knowledge. As computation within research is now ubiquitous, the workshop provid… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

  38. arXiv:1907.01682  [pdf, ps, other

    cs.AI

    On Conforming and Conflicting Values

    Authors: Kinzang Chhogyal, Abhaya Nayak, Aditya Ghose, Mehmet Orgun, Hoa Dam

    Abstract: Values are things that are important to us. Actions activate values - they either go against our values or they promote our values. Values themselves can either be conforming or conflicting depending on the action that is taken. In this short paper, we argue that values may be classified as one of two types - conflicting and inherently conflicting values. They are distinguished by the fact that th… ▽ More

    Submitted 7 July, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: AI for Social Good Workshop, IJCAI 2019

  39. arXiv:1905.13380  [pdf, ps, other

    cs.AI cs.MA

    A Value-based Trust Assessment Model for Multi-agent Systems

    Authors: Kinzang Chhogyal, Abhaya Nayak, Aditya Ghose, Hoa Khanh Dam

    Abstract: An agent's assessment of its trust in another agent is commonly taken to be a measure of the reliability/predictability of the latter's actions. It is based on the trustor's past observations of the behaviour of the trustee and requires no knowledge of the inner-workings of the trustee. However, in situations that are new or unfamiliar, past observations are of little help in assessing trust. In s… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  40. arXiv:1903.10867  [pdf, other

    cs.LG stat.ML

    Measuring the Similarity between Materials with an Emphasis on the Materials Distinctiveness

    Authors: Tran-Thai Dang, Tien-Lam Pham, Hiori Kino, Takashi Miyake, Hieu-Chi Dam

    Abstract: In this study, we establish a basis for selecting similarity measures when applying machine learning techniques to solve materials science problems. This selection is considered with an emphasis on the distinctiveness between materials that reflect their nature well. We perform a case study with a dataset of rare-earth transition metal crystalline compounds represented using the Orbital Field Matr… ▽ More

    Submitted 23 March, 2019; originally announced March 2019.

  41. arXiv:1812.10578  [pdf, other

    cs.SE cs.AI

    Towards effective AI-powered agile project management

    Authors: Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose, Yasutaka Kamei

    Abstract: The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects e.g., customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume task… ▽ More

    Submitted 26 December, 2018; originally announced December 2018.

    Comments: In Proceedings of International Conference on Software Engineering (ICSE 2019), (To appear), NIER track, May 2019 (Montreal, Canada)

  42. arXiv:1809.04750  [pdf, other

    cond-mat.mtrl-sci

    Important descriptors and descriptor groups of Curie temperatures of rare-earth transition-metal binary alloys

    Authors: Hieu Chi Dam, Viet Cuong Nguyen, Tien Lam Pham, Anh Tuan Nguyen, Kiyoyuki Terakura, Takashi Miyake, Hiori Kino

    Abstract: We analyze Curie temperatures of rare-earth transition metal binary alloys with machine learning method. In order to select important descriptors and descriptor groups, we introduce newly developed subgroup relevance analysis and adopt the hierarchical clustering in the representation. We execute the exhaustive search and successfully illustrate the importance of descriptors and descriptor groups.… ▽ More

    Submitted 15 October, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

    Comments: 16 pages, 8 figures

  43. arXiv:1807.10751  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci

    Committee machine that votes for similarity between materials

    Authors: Duong-Nguyen Nguyen, Tien-Lam Pham, Viet-Cuong Nguyen, Tuan-Dung Ho, Truyen Tran, Keisuke Takahashi, Hieu-Chi Dam

    Abstract: We developed a method for measuring the similarity between materials, focusing on specific physical properties. The obtained information can be utilized to understand the underlying mechanisms and to support the prediction of the physical properties of materials. The method consists of three steps: variable evaluation based on non-linear regression, regression-based clustering, and similarity meas… ▽ More

    Submitted 26 July, 2018; originally announced July 2018.

    Comments: 13 pages

  44. arXiv:1802.00938  [pdf, other

    cs.NE

    DeepProcess: Supporting business process execution using a MANN-based recommender system

    Authors: Asjad Khan, Hung Le, Kien Do, Truyen Tran, Aditya Ghose, Hoa Dam, Renuka Sindhgatta

    Abstract: Process-aware Recommender systems can provide critical decision support functionality to aid business process execution by recommending what actions to take next. Based on recent advances in the field of deep learning, we present a novel memory-augmented neural network (MANN) based approach for constructing a process-aware recommender system. We propose a novel network architecture, namely Write-P… ▽ More

    Submitted 23 November, 2021; v1 submitted 3 February, 2018; originally announced February 2018.

    Comments: Accepted at ICSOC 2021

  45. arXiv:1802.00921  [pdf, other

    cs.SE

    A deep tree-based model for software defect prediction

    Authors: Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, Chul-Joo Kim

    Abstract: Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and d… ▽ More

    Submitted 3 February, 2018; originally announced February 2018.

  46. arXiv:1802.00603  [pdf, other

    cs.SE

    Explainable Software Analytics

    Authors: Hoa Khanh Dam, Truyen Tran, Aditya Ghose

    Abstract: Software analytics has been the subject of considerable recent attention but is yet to receive significant industry traction. One of the key reasons is that software practitioners are reluctant to trust predictions produced by the analytics machinery without understanding the rationale for those predictions. While complex models such as deep learning and ensemble methods improve predictive perform… ▽ More

    Submitted 2 February, 2018; originally announced February 2018.

    Comments: 4 pages, to appear in ICSE'18 New Ideas and Emerging Results Track

  47. arXiv:1801.08804  [pdf, other

    q-fin.PR q-fin.MF

    Rational Models for Inflation-Linked Derivatives

    Authors: Henrik Dam, Andrea Macrina, David Skovmand, David Sloth

    Abstract: We construct models for the pricing and risk management of inflation-linked derivatives. The models are rational in the sense that linear payoffs written on the consumer price index have prices that are rational functions of the state variables. The nominal pricing kernel is constructed in a multiplicative manner that allows for closed-form pricing of vanilla inflation products suchlike zero-coupo… ▽ More

    Submitted 16 July, 2020; v1 submitted 26 January, 2018; originally announced January 2018.

    Comments: 32 pages, 4 figures

    MSC Class: 60J25; 60H30; 91G20; 91G30

  48. arXiv:1708.04357  [pdf, other

    cs.LG cs.AI stat.ML

    Graph Classification via Deep Learning with Virtual Nodes

    Authors: Trang Pham, Truyen Tran, Hoa Dam, Svetha Venkatesh

    Abstract: Learning representation for graph classification turns a variable-size graph into a fixed-size vector (or matrix). Such a representation works nicely with algebraic manipulations. Here we introduce a simple method to augment an attributed graph with a virtual node that is bidirectionally connected to all existing nodes. The virtual node represents the latent aspects of the graph, which are not imm… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

  49. arXiv:1708.02368  [pdf, other

    cs.SE

    Automatic feature learning for vulnerability prediction

    Authors: Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy, Aditya Ghose

    Abstract: Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of c… ▽ More

    Submitted 8 August, 2017; originally announced August 2017.

  50. arXiv:1706.07236  [pdf, other

    astro-ph.CO gr-qc

    Apparent cosmic acceleration from type Ia supernovae

    Authors: Lawrence H. Dam, Asta Heinesen, David L. Wiltshire

    Abstract: Parameters that quantify the acceleration of cosmic expansion are conventionally determined within the standard Friedmann-Lemaitre-Robertson-Walker (FLRW) model, which fixes spatial curvature to be homogeneous. Generic averages of Einstein's equations in inhomogeneous cosmology lead to models with non-rigidly evolving average spatial curvature, and different parametrizations of apparent cosmic acc… ▽ More

    Submitted 13 September, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

    Comments: 17 pages, 6 figures; v2: Small additions, typos corrected, matches published version

    Journal ref: Mon.Not.Roy.Astron.Soc. 472 (2017) 835-851