-
Thought Graph: Generating Thought Process for Biological Reasoning
Authors:
Chi-Yang Hsu,
Kyle Cox,
Jiawei Xu,
Zhen Tan,
Tianhua Zhai,
Mengzhou Hu,
Dexter Pratt,
Tianlong Chen,
Ziniu Hu,
Ying Ding
Abstract:
We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis…
▽ More
We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Motivation, inclusivity, and realism should drive data science education
Authors:
Candace Savonen,
Carrie Wright,
Ava M. Hoffman,
Elizabeth M. Humphries,
Katherine E. L. Cox,
Frederick J. Tan,
Jeffrey T. Leek
Abstract:
Data science education provides tremendous opportunities but remains inaccessible to many communities. Increasing the accessibility of data science to these communities not only benefits the individuals entering data science, but also increases the field's innovation and potential impact as a whole. Education is the most scalable solution to meet these needs, but many data science educators lack f…
▽ More
Data science education provides tremendous opportunities but remains inaccessible to many communities. Increasing the accessibility of data science to these communities not only benefits the individuals entering data science, but also increases the field's innovation and potential impact as a whole. Education is the most scalable solution to meet these needs, but many data science educators lack formal training in education. Our group has led education efforts for a variety of audiences: from professional scientists to high school students to lay audiences. These experiences have helped form our teaching philosophy which we have summarized into three main ideals: 1) motivation, 2) inclusivity, and 3) realism. To put these ideals better into practice, we also aim to iteratively update our teaching approaches and curriculum as we find ways to better reach these ideals. In this manuscript we discuss these ideals as well practical ideas for how to implement these philosophies in the classroom.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Open-source Tools for Training Resources -- OTTR
Authors:
Candace Savonen,
Carrie Wright,
Ava M. Hoffman,
John Muschelli,
Katherine Cox,
Frederick J. Tan,
Jeffrey T. Leek
Abstract:
Data science and informatics tools are developing at a blistering rate, but their users often lack the educational background or resources to efficiently apply the methods to their research. Training resources often deprecate because their maintenance is not prioritized by funding, giving teams little time to devote to such endeavors. Our group has developed Open-source Tools for Training Resource…
▽ More
Data science and informatics tools are developing at a blistering rate, but their users often lack the educational background or resources to efficiently apply the methods to their research. Training resources often deprecate because their maintenance is not prioritized by funding, giving teams little time to devote to such endeavors. Our group has developed Open-source Tools for Training Resources (OTTR) to offer greater efficiency and flexibility for creating and maintaining online course content. OTTR empowers creators to customize their work and allows for a simple workflow to publish using multiple platforms. OTTR allows content creators to publish material to multiple massive online learner communities using familiar rendering mechanics. OTTR allows the incorporation of pedagogical practices like formative and summative assessments in the form of multiple choice questions and fill in the blank problems that are automatically graded. No local installation of any software is required to begin creating content with OTTR. Thus far, 15 courses have been created with OTTR repository template. By using the OTTR system, the maintenance workload for updating these courses across platforms has been drastically reduced.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Diversifying the Genomic Data Science Research Community
Authors:
The Genomic Data Science Community Network,
Rosa Alcazar,
Maria Alvarez,
Rachel Arnold,
Mentewab Ayalew,
Lyle G. Best,
Michael C. Campbell,
Kamal Chowdhury,
Katherine E. L. Cox,
Christina Daulton,
Youping Deng,
Carla Easter,
Karla Fuller,
Shazia Tabassum Hakim,
Ava M. Hoffman,
Natalie Kucher,
Andrew Lee,
Joslynn Lee,
Jeffrey T. Leek,
Robert Meller,
Loyda B. Méndez,
Miguel P. Méndez-González,
Stephen Mosher,
Michele Nishiguchi,
Siddharth Pratap
, et al. (13 additional authors not shown)
Abstract:
Over the last 20 years, there has been an explosion of genomic data collected for disease association, functional analyses, and other large-scale discoveries. At the same time, there have been revolutions in cloud computing that enable computational and data science research, while making data accessible to anyone with a web browser and an internet connection. However, students at institutions wit…
▽ More
Over the last 20 years, there has been an explosion of genomic data collected for disease association, functional analyses, and other large-scale discoveries. At the same time, there have been revolutions in cloud computing that enable computational and data science research, while making data accessible to anyone with a web browser and an internet connection. However, students at institutions with limited resources have received relatively little exposure to curricula or professional development opportunities that lead to careers in genomic data science. To broaden participation in genomics research, the scientific community needs to support students, faculty, and administrators at Underserved Institutions (UIs) including Community Colleges, Historically Black Colleges and Universities, Hispanic-Serving Institutions, and Tribal Colleges and Universities in taking advantage of these tools in local educational and research programs. We have formed the Genomic Data Science Community Network (http://www.gdscn.org/) to identify opportunities and support broadening access to cloud-enabled genomic data science. Here, we provide a summary of the priorities for faculty members at UIs, as well as administrators, funders, and R1 researchers to consider as we create a more diverse genomic data science community.
△ Less
Submitted 9 June, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Virtual Reality based Digital Twin System for remote laboratories and online practical learning
Authors:
Claire Palmer,
Ben Roullier,
Muhammad Aamir,
Leonardo Stella,
Uchenna Diala,
Ashiq Anjum,
Frank Mcquade,
Keith Cox,
Alex Calvert
Abstract:
There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Resea…
▽ More
There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research is currently being undertaken into developing generic models to enable the semi-automatic creation of a virtual learning application. A case study describing the creation of a virtual learning application for an electrical laboratory tutorial is presented.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.