Skip to main content

Showing 1–12 of 12 results for author: Ng, R T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17265  [pdf, other

    cs.CL cs.AI

    CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports

    Authors: Xiao Yu Cindy Zhang, Carlos R. Ferreira, Francis Rossignol, Raymond T. Ng, Wyeth Wasserman, Jian Zhu

    Abstract: Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports b… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  2. arXiv:2411.02714  [pdf, other

    cs.CL cs.AI cs.HC

    Game Plot Design with an LLM-powered Assistant: An Empirical Study with Game Designers

    Authors: Seyed Hossein Alavi, Weijia Xu, Nebojsa Jojic, Daniel Kennett, Raymond T. Ng, Sudha Rao, Haiyan Zhang, Bill Dolan, Vered Shwartz

    Abstract: We introduce GamePlot, an LLM-powered assistant that supports game designers in crafting immersive narratives for turn-based games, and allows them to test these games through a collaborative game play and refine the plot throughout the process. Our user study with 14 game designers shows high levels of both satisfaction with the generated game plots and sense of ownership over the narratives, but… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  3. arXiv:2410.21627  [pdf, other

    cs.CL cs.AI

    MCPDial: A Minecraft Persona-driven Dialogue Dataset

    Authors: Seyed Hossein Alavi, Sudha Rao, Ashutosh Adhikari, Gabriel A DesGarennes, Akanksha Malhotra, Chris Brockett, Mahmoud Adada, Raymond T. Ng, Vered Shwartz, Bill Dolan

    Abstract: We propose a novel approach that uses large language models (LLMs) to generate persona-driven conversations between Players and Non-Player Characters (NPC) in games. Showcasing the application of our methodology, we introduce the Minecraft Persona-driven Dialogue dataset (MCPDial). Starting with a small seed of expert-written conversations, we employ our method to generate hundreds of additional c… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  4. arXiv:2206.06448  [pdf

    eess.IV cs.CR cs.CV cs.LG

    Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN

    Authors: Robert V. Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Arman Rahmim, Raymond T. Ng

    Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transve… ▽ More

    Submitted 31 October, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.01866

  5. arXiv:2205.13741  [pdf, other

    cs.LG

    Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)

    Authors: Ali Seyfi, Jean-Francois Rajotte, Raymond T. Ng

    Abstract: Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such a… ▽ More

    Submitted 14 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 19 pages, 16 figures

  6. arXiv:2111.01866  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    3-D PET Image Generation with tumour masks using TGAN

    Authors: Robert V Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Ivan S Klyuzhin, Arman Rahmim, Raymond T. Ng

    Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult due to the lack of training data, labeled samples, and privacy concerns. For this reason, a robust generative method to create synthetic data is highly sought after. However, most three-dimensional image generators require additional image input or are extremely memory intensive.… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

  7. arXiv:2101.07235  [pdf, other

    stat.ML cs.AI cs.CV cs.DC cs.LG

    Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary

    Authors: Jean-Francois Rajotte, Sumit Mukherjee, Caleb Robinson, Anthony Ortiz, Christopher West, Juan Lavista Ferres, Raymond T Ng

    Abstract: We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside… ▽ More

    Submitted 28 August, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: 10 pages, 10 figures

    MSC Class: 68W15 ACM Class: I.2.11

  8. arXiv:2009.06764  [pdf, other

    stat.ML cs.CR cs.LG

    Private data sharing between decentralized users through the privGAN architecture

    Authors: Jean-Francois Rajotte, Raymond T Ng

    Abstract: More data is almost always beneficial for analysis and machine learning tasks. In many realistic situations however, an enterprise cannot share its data, either to keep a competitive advantage or to protect the privacy of the data sources, the enterprise's clients for example. We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data, nor… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

    Comments: 6 pages, 9 figures, to be in the proceedings of International Workshop on Privacy and Security in Enterprise Modeling (PriSEM'20)

  9. Topic Segmentation and Labeling in Asynchronous Conversations

    Authors: Shafiq Rayhan Joty, Giuseppe Carenini, Raymond T Ng

    Abstract: Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog conversations annotated with topics, and evaluate annotator reliability for the segmentation and labeling tasks in these asynchronous conversations. We propose a… ▽ More

    Submitted 3 February, 2014; originally announced February 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 47, pages 521-573, 2013

  10. arXiv:1303.5735  [pdf

    cs.AI

    Non-monotonic Negation in Probabilistic Deductive Databases

    Authors: Raymond T. Ng, V. S. Subrahmanian

    Abstract: In this paper we study the uses and the semantics of non-monotonic negation in probabilistic deductive data bases. Based on the stable semantics for classical logic programming, we introduce the notion of stable formula, functions. We show that stable formula, functions are minimal fixpoints of operators associated with probabilistic deductive databases with negation. Furthermore, since a. prob… ▽ More

    Submitted 20 March, 2013; originally announced March 2013.

    Comments: Appears in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI1991)

    Report number: UAI-P-1991-PG-249-256

  11. arXiv:1303.5420  [pdf

    cs.AI cs.DB

    Empirical Probabilities in Monadic Deductive Databases

    Authors: Raymond T. Ng, V. S. Subrahmanian

    Abstract: We address the problem of supporting empirical probabilities in monadic logic databases. Though the semantics of multivalued logic programs has been studied extensively, the treatment of probabilities as results of statistical findings has not been studied in logic programming/deductive databases. We develop a model-theoretic characterization of logic databases that facilitates such a treatment.… ▽ More

    Submitted 13 March, 2013; originally announced March 2013.

    Comments: Appears in Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI1992)

    Report number: UAI-P-1992-PG-215-222

  12. arXiv:1104.3212  [pdf

    cs.DB cs.DS

    Similarity Join Size Estimation using Locality Sensitive Hashing

    Authors: Hongrae Lee, Raymond T. Ng, Kyuseok Shim

    Abstract: Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the joi… ▽ More

    Submitted 16 April, 2011; originally announced April 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 6, pp. 338-349 (2011)