Search | arXiv e-print repository

Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

Authors: Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem

Abstract: The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for… ▽ More The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI. We present Hippocrates, an open-source LLM framework specifically developed for the medical domain. In stark contrast to previous efforts, it offers unrestricted access to its training datasets, codebase, checkpoints, and evaluation protocols. This open approach is designed to stimulate collaborative research, allowing the community to build upon, refine, and rigorously evaluate medical LLMs within a transparent ecosystem. Also, we introduce Hippo, a family of 7B models tailored for the medical domain, fine-tuned from Mistral and LLaMA2 through continual pre-training, instruction tuning, and reinforcement learning from human and AI feedback. Our models outperform existing open medical LLMs models by a large-margin, even surpassing models with 70B parameters. Through Hippocrates, we aspire to unlock the full potential of LLMs not just to advance medical knowledge and patient care but also to democratize the benefits of AI research in healthcare, making them available across the globe. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2402.01339 [pdf, other]

doi 10.1145/3711667

Improving Sequential Recommendations with LLMs

Authors: Artun Boz, Wouter Zorgdrager, Zoe Kotti, Jesse Harte, Panos Louridas, Dietmar Jannach, Vassilios Karakoidas, Marios Fragkoulis

Abstract: The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design thre… ▽ More The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design three orthogonal approaches and hybrids of those to leverage the power of LLMs in different ways. In addition, we investigate the potential of each approach by focusing on its comprising technical aspects and determining an array of alternative choices for each one. We conduct extensive experiments on three datasets and explore a large variety of configurations, including different language models and baseline recommendation models, to obtain a comprehensive picture of the performance of each approach. Among other observations, we highlight that initializing state-of-the-art sequential recommendation models such as BERT4Rec or SASRec with embeddings obtained from an LLM can lead to substantial performance gains in terms of accuracy. Furthermore, we find that fine-tuning an LLM for recommendation tasks enables it to learn not only the tasks, but also concepts of a domain to some extent. We also show that fine-tuning OpenAI GPT leads to considerably better performance than fine-tuning Google PaLM 2. Overall, our extensive experiments indicate a huge potential value of leveraging LLMs in future recommendation approaches. We publicly share the code and data of our experiments to ensure reproducibility. △ Less

Submitted 11 January, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 12 figures, 7 tables

arXiv:2206.12678 [pdf, ps, other]

doi 10.1088/1742-5468/abed45

Finding Proper Time Intervals for Dynamic Network Extraction

Authors: Günce Keziban Orman, Nadir Türe, Selim Balcisoy, Hasan Alp Boz

Abstract: Extracting a proper dynamic network for modelling a time-dependent complex system is an important issue. Building a correct model is related to finding out critical time points where a system exhibits considerable change. In this work, we propose to measure network similarity to detect proper time intervals. We develop three similarity metrics, node, link, and neighborhood similarities, for any co… ▽ More Extracting a proper dynamic network for modelling a time-dependent complex system is an important issue. Building a correct model is related to finding out critical time points where a system exhibits considerable change. In this work, we propose to measure network similarity to detect proper time intervals. We develop three similarity metrics, node, link, and neighborhood similarities, for any consecutive snapshots of a dynamic network. Rather than a label or a user-defined threshold, we use statistically expected values of proposed similarities under a null-model to state whether the system changes critically. We experimented on two different data sets with different temporal dynamics: The Wi-Fi access points logs of a university campus and Enron emails. Results show that, first, proposed similarities reflect similar signal trends with network topological properties with less noisy signals, and their scores are scale invariant. Second, proposed similarities generate better signals than adjacency correlation with optimal noise and diversity. Third, using statistically expected values allows us to find different time intervals for a system, leading to the extraction of non-redundant snapshots for dynamic network modelling. △ Less

Submitted 25 June, 2022; originally announced June 2022.

Comments: 19 pages, 12 figures

MSC Class: 62-08 ACM Class: H.1; H.1.1; G.3

Journal ref: J. Stat. Mech. (2021) 033414

arXiv:1604.03506 [pdf, other]

An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization

Authors: Liangjie Hong, Adnan Boz

Abstract: One of missions for personalization systems and recommender systems is to show content items according to users' personal interests. In order to achieve such goal, these systems are learning user interests over time and trying to present content items tailoring to user profiles. Recommending items according to users' preferences has been investigated extensively in the past few years, mainly thank… ▽ More One of missions for personalization systems and recommender systems is to show content items according to users' personal interests. In order to achieve such goal, these systems are learning user interests over time and trying to present content items tailoring to user profiles. Recommending items according to users' preferences has been investigated extensively in the past few years, mainly thanks for the popularity of Netflix competition. In a real setting, users may be attracted by a subset of those items and interact with them, only leaving partial feedbacks to the system to learn in the next cycle, which leads to significant biases into systems and hence results in a situation where user engagement metrics cannot be improved over time. The problem is not just for one component of the system. The data collected from users is usually used in many different tasks, including learning ranking functions, building user profiles and constructing content classifiers. Once the data is biased, all these downstream use cases would be impacted as well. Therefore, it would be beneficial to gather unbiased data through user interactions. Traditionally, unbiased data collection is done through showing items uniformly sampling from the content pool. However, this simple scheme is not feasible as it risks user engagement metrics and it takes long time to gather user feedbacks. In this paper, we introduce a user-friendly unbiased data collection framework, by utilizing methods developed in the exploitation and exploration literature. We discuss how the framework is different from normal multi-armed bandit problems and why such method is needed. We layout a novel Thompson sampling for Bernoulli ranked-list to effectively balance user experiences and data collection. The proposed method is validated from a real bucket test and we show strong results comparing to old algorithms △ Less

Submitted 12 April, 2016; originally announced April 2016.

Showing 1–4 of 4 results for author: Boz, A