Search | arXiv e-print repository

A quantum semantic framework for natural language processing

Authors: Christopher J. Agostino, Quan Le Thien, Molly Apsel, Denizhan Pak, Elina Lesyk, Ashabari Majumdar

Abstract: Semantic degeneracy represents a fundamental property of natural language that extends beyond simple polysemy to encompass the combinatorial explosion of potential interpretations that emerges as semantic expressions increase in complexity. Large Language Models (LLMs) and other modern NLP systems face inherent limitations precisely because they operate within natural language itself, making them… ▽ More Semantic degeneracy represents a fundamental property of natural language that extends beyond simple polysemy to encompass the combinatorial explosion of potential interpretations that emerges as semantic expressions increase in complexity. Large Language Models (LLMs) and other modern NLP systems face inherent limitations precisely because they operate within natural language itself, making them subject to the same interpretive constraints imposed by semantic degeneracy. In this work, we argue using Kolmogorov complexity that as an expression's complexity grows, the likelihood of any interpreting agent (human or LLM-powered AI) recovering the single intended meaning vanishes. This computational intractability suggests the classical view that linguistic forms possess meaning in and of themselves is flawed. We alternatively posit that meaning is instead actualized through an observer-dependent interpretive act. To test this, we conducted a semantic Bell inequality test using diverse LLM agents as ``computational cognitive systems'' to interpret ambiguous word pairs under varied contextual settings. Across several independent experiments, we found average CHSH expectation values ranging from 1.2 to 2.8, with several runs yielding values (e.g., 2.3-2.4) that significantly violate the classical boundary ($|S|\leq2$). This demonstrates that linguistic interpretation under ambiguity can exhibit non-classical contextuality, consistent with results from human cognition experiments. These results inherently imply that classical frequentist-based analytical approaches for natural language are necessarily lossy. Instead, we propose that Bayesian-style repeated sampling approaches can provide more practically useful and appropriate characterizations of linguistic meaning in context. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: 12 pages, 2 figures, accepted submission to Quantum AI and NLP 2025

arXiv:2504.19037 [pdf, ps, other]

doi 10.1145/3696630.3731663

"I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code

Authors: Yangtian Zi, Luisa Li, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman

Abstract: Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows that while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process -- when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are… ▽ More Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows that while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process -- when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are these tasks for beginning programmers? This paper measures how well beginners comprehend LLM-generated code and explores the challenges students face in judging code correctness. We compare how well students understand natural language descriptions of functions and LLM-generated implementations, studying 32 CS1 students on 160 task instances. Our results show a low per-task success rate of 32.5\%, with indiscriminate struggles across demographic populations. Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias. Our findings highlight the barrier that code comprehension presents to beginning programmers seeking to write code with LLMs. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: To appear in 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion '25), June 23-28, 2025, Trondheim, Norway

arXiv:2503.16674 [pdf, other]

Through the LLM Looking Glass: A Socratic Probing of Donkeys, Elephants, and Markets

Authors: Molly Kennedy, Ayyoob Imani, Timo Spinde, Hinrich Schütze

Abstract: While detecting and avoiding bias in LLM-generated text is becoming increasingly important, media bias often remains subtle and subjective, making it particularly difficult to identify and mitigate. In this study, we assess media bias in LLM-generated content and LLMs' ability to detect subtle ideological bias. We conduct this evaluation using two datasets, PoliGen and EconoLex, covering political… ▽ More While detecting and avoiding bias in LLM-generated text is becoming increasingly important, media bias often remains subtle and subjective, making it particularly difficult to identify and mitigate. In this study, we assess media bias in LLM-generated content and LLMs' ability to detect subtle ideological bias. We conduct this evaluation using two datasets, PoliGen and EconoLex, covering political and economic discourse, respectively. We evaluate seven widely used LLMs by prompting them to generate articles and analyze their ideological preferences via Socratic probing. By using our self-contained Socratic approach, the study aims to directly measure the models' biases rather than relying on external interpretations, thereby minimizing subjective judgments about media bias. Our results reveal a consistent preference of Democratic over Republican positions across all models. Conversely, in economic topics, biases vary among Western LLMs, while those developed in China lean more strongly toward socialism. △ Less

Submitted 22 May, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

arXiv:2503.11950 [pdf]

doi 10.3390/systems13060455

Privacy Ethics Alignment in AI: A Stakeholder-Centric Based Framework for Ethical AI

Authors: Ankur Barthwal, Molly Campbell, Ajay Kumar Shrestha

Abstract: The increasing integration of Artificial Intelligence (AI) in digital ecosystems has reshaped privacy dynamics, particularly for young digital citizens navigating data-driven environments. This study explores evolving privacy concerns across three key stakeholder groups, digital citizens (ages 16-19), parents/educators, and AI professionals, and assesses differences in data ownership, trust, trans… ▽ More The increasing integration of Artificial Intelligence (AI) in digital ecosystems has reshaped privacy dynamics, particularly for young digital citizens navigating data-driven environments. This study explores evolving privacy concerns across three key stakeholder groups, digital citizens (ages 16-19), parents/educators, and AI professionals, and assesses differences in data ownership, trust, transparency, parental mediation, education, and risk-benefit perceptions. Employing a grounded theory methodology, this research synthesizes insights from 482 participants through structured surveys, qualitative interviews, and focus groups. The findings reveal distinct privacy expectations: Young users emphasize autonomy and digital freedom, while parents and educators advocate for regulatory oversight and AI literacy programs. AI professionals, in contrast, prioritize the balance between ethical system design and technological efficiency. The data further highlights gaps in AI literacy and transparency, emphasizing the need for comprehensive, stakeholder-driven privacy frameworks that accommodate diverse user needs. Using comparative thematic analysis, this study identifies key tensions in privacy governance and develops the novel Privacy-Ethics Alignment in AI (PEA-AI) model, which structures privacy decision-making as a dynamic negotiation between stakeholders. By systematically analyzing themes such as transparency, user control, risk perception, and parental mediation, this research provides a scalable, adaptive foundation for AI governance, ensuring that privacy protections evolve alongside emerging AI technologies and youth-centric digital interactions. △ Less

Submitted 20 March, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

Comments: Submitted to peer reviwed venue

Journal ref: Systems 2025, 13, 455

arXiv:2503.11947 [pdf]

Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance

Authors: Austin Shouli, Ankur Barthwal, Molly Campbell, Ajay Kumar Shrestha

Abstract: The rapid expansion of Artificial Intelligence (AI) in digital platforms used by youth has created significant challenges related to privacy, autonomy, and data protection. While AI-driven personalization offers enhanced user experiences, it often operates without clear ethical boundaries, leaving young users vulnerable to data exploitation and algorithmic biases. This paper presents a call to act… ▽ More The rapid expansion of Artificial Intelligence (AI) in digital platforms used by youth has created significant challenges related to privacy, autonomy, and data protection. While AI-driven personalization offers enhanced user experiences, it often operates without clear ethical boundaries, leaving young users vulnerable to data exploitation and algorithmic biases. This paper presents a call to action for ethical AI governance, advocating for a structured framework that ensures youth-centred privacy protections, transparent data practices, and regulatory oversight. We outline key areas requiring urgent intervention, including algorithmic transparency, privacy education, parental data-sharing ethics, and accountability measures. Through this approach, we seek to empower youth with greater control over their digital identities and propose actionable strategies for policymakers, AI developers, and educators to build a fairer and more accountable AI ecosystem. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: Preprint Version | To be submitted to peer-reviewed venue

arXiv:2502.01584 [pdf, other]

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Authors: Zixuan Wu, Francesca Lucchetti, Aleksander Boruch-Gruszecki, Jingmiao Zhao, Carolyn Jane Anderson, Joydeep Biswas, Federico Cassano, Molly Q Feldman, Arjun Guha

Abstract: Existing benchmarks for frontier models often test specialized, "PhD-level" knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark with 594 problems based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models; however correct solutions are easy to verify, and models' mistakes are easy to s… ▽ More Existing benchmarks for frontier models often test specialized, "PhD-level" knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark with 594 problems based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models; however correct solutions are easy to verify, and models' mistakes are easy to spot. As LLMs are more widely deployed in society, we believe it is useful to develop benchmarks for frontier models that humans can understand without the need for deep domain expertise. Our work reveals capability gaps that are not evident in existing benchmarks: OpenAI o1 significantly outperforms other reasoning models on our benchmark, despite being on par with other models when tested on benchmarks that test specialized knowledge. Furthermore, our analysis of reasoning outputs uncovers new kinds of failures. DeepSeek R1, for instance, often concedes with "I give up" before providing an answer that it knows is wrong. R1 can also be remarkably "uncertain" in its output and in rare cases, it does not "finish thinking," which suggests the need for techniques to "wrap up" before the context window limit is reached. We also quantify the effectiveness of reasoning longer to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark. △ Less

Submitted 31 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

arXiv:2501.18782 [pdf, other]

PSO-Net: Development of an automated psoriasis assessment system using attention-based interpretable deep neural networks

Authors: Sharif A. Kamran, Molly V. Lucas, Brendon Lutnick, Chaitanya Parmar, Basudha Pal, Asha Patel Shah, David Apfel, Steven Fakharzadeh, Lloyd Miller, Stephen Yip, Kristopher Standish, Gabriela Oana Cula

Abstract: Psoriasis is a chronic skin condition that requires long-term treatment and monitoring. Although, the Psoriasis Area and Severity Index (PASI) is utilized as a standard measurement to assess psoriasis severity in clinical trials, it has many drawbacks such as (1) patient burden for in-person clinic visits for assessment of psoriasis, (2) time required for investigator scoring and (3) variability o… ▽ More Psoriasis is a chronic skin condition that requires long-term treatment and monitoring. Although, the Psoriasis Area and Severity Index (PASI) is utilized as a standard measurement to assess psoriasis severity in clinical trials, it has many drawbacks such as (1) patient burden for in-person clinic visits for assessment of psoriasis, (2) time required for investigator scoring and (3) variability of inter- and intra-rater scoring. To address these drawbacks, we propose a novel and interpretable deep learning architecture called PSO-Net, which maps digital images from different anatomical regions to derive attention-based scores. Regional scores are further combined to estimate an absolute PASI score. Moreover, we devise a novel regression activation map for interpretability through ranking attention scores. Using this approach, we achieved inter-class correlation scores of 82.2% [95% CI: 77- 87%] and 87.8% [95% CI: 84-91%] with two different clinician raters, respectively. △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: Accepted to IEEE ISBI 2025. 5 Pages, 3 figures, 2 tables

arXiv:2501.16701 [pdf, other]

Understanding the importance of SHAPE to the UK research ecosystem

Authors: Hélène Draux, Briony Fane, Daniel W. Hook, Juergen Wastl, Philip Lewis, Molly Morgan Jones, Pablo Roblero, James R. Wilsdon

Abstract: The UK has a long-established reputation for excellence in research across a broad range of fields, but in recent years, there has been greater emphasis on STEM investment and greater recognition of the UK's success in STEM. This paper examines the relative strengths of SHAPE disciplines and demonstrates that the UK's SHAPE research portfolio outperforms the UK's STEM research, for each internatio… ▽ More The UK has a long-established reputation for excellence in research across a broad range of fields, but in recent years, there has been greater emphasis on STEM investment and greater recognition of the UK's success in STEM. This paper examines the relative strengths of SHAPE disciplines and demonstrates that the UK's SHAPE research portfolio outperforms the UK's STEM research, for each international benchmark considered in this work. It is argued that SHAPE research is becoming increasingly important as a partner to STEM as the widespread use of technology creates societal challenges. It is also argued that the strength of UK SHAPE is the basis of a strategic advantage for UK research. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: 13 pages, 9 figures

arXiv:2501.13321 [pdf]

doi 10.1109/CCWC62904.2025.10903925

Investigation of the Privacy Concerns in AI Systems for Young Digital Citizens: A Comparative Stakeholder Analysis

Authors: Molly Campbell, Ankur Barthwal, Sandhya Joshi, Austin Shouli, Ajay Kumar Shrestha

Abstract: The integration of Artificial Intelligence (AI) systems into technologies used by young digital citizens raises significant privacy concerns. This study investigates these concerns through a comparative analysis of stakeholder perspectives. A total of 252 participants were surveyed, with the analysis focusing on 110 valid responses from parents/educators and 100 from AI professionals after data cl… ▽ More The integration of Artificial Intelligence (AI) systems into technologies used by young digital citizens raises significant privacy concerns. This study investigates these concerns through a comparative analysis of stakeholder perspectives. A total of 252 participants were surveyed, with the analysis focusing on 110 valid responses from parents/educators and 100 from AI professionals after data cleaning. Quantitative methods, including descriptive statistics and Partial Least Squares Structural Equation Modeling, examined five validated constructs: Data Ownership and Control, Parental Data Sharing, Perceived Risks and Benefits, Transparency and Trust, and Education and Awareness. Results showed Education and Awareness significantly influenced data ownership and risk assessment, while Data Ownership and Control strongly impacted Transparency and Trust. Transparency and Trust, along with Perceived Risks and Benefits, showed minimal influence on Parental Data Sharing, suggesting other factors may play a larger role. The study underscores the need for user-centric privacy controls, tailored transparency strategies, and targeted educational initiatives. Incorporating diverse stakeholder perspectives offers actionable insights into ethical AI design and governance, balancing innovation with robust privacy protections to foster trust in a digital age. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Comments: To appear in the 2025 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC) proceedings

Journal ref: 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC)

arXiv:2412.16369 [pdf]

Navigating AI to Unpack Youth Privacy Concerns: An In-Depth Exploration and Systematic Review

Authors: Ajay Kumar Shrestha, Ankur Barthwal, Molly Campbell, Austin Shouli, Saad Syed, Sandhya Joshi, Julita Vassileva

Abstract: This systematic literature review investigates perceptions, concerns, and expectations of young digital citizens regarding privacy in artificial intelligence (AI) systems, focusing on social media platforms, educational technology, gaming systems, and recommendation algorithms. Using a rigorous methodology, the review started with 2,000 papers, narrowed down to 552 after initial screening, and fin… ▽ More This systematic literature review investigates perceptions, concerns, and expectations of young digital citizens regarding privacy in artificial intelligence (AI) systems, focusing on social media platforms, educational technology, gaming systems, and recommendation algorithms. Using a rigorous methodology, the review started with 2,000 papers, narrowed down to 552 after initial screening, and finally refined to 108 for detailed analysis. Data extraction focused on privacy concerns, data-sharing practices, the balance between privacy and utility, trust factors in AI, transparency expectations, and strategies to enhance user control over personal data. Findings reveal significant privacy concerns among young users, including a perceived lack of control over personal information, potential misuse of data by AI, and fears of data breaches and unauthorized access. These issues are worsened by unclear data collection practices and insufficient transparency in AI applications. The intention to share data is closely associated with perceived benefits and data protection assurances. The study also highlights the role of parental mediation and the need for comprehensive education on data privacy. Balancing privacy and utility in AI applications is crucial, as young digital citizens value personalized services but remain wary of privacy risks. Trust in AI is significantly influenced by transparency, reliability, predictable behavior, and clear communication about data usage. Strategies to improve user control over personal data include access to and correction of data, clear consent mechanisms, and robust data protection assurances. The review identifies research gaps and suggests future directions, such as longitudinal studies, multicultural comparisons, and the development of ethical AI frameworks. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: To appear in the 2024 IEEE Annual Information Technology, Electronics and Mobile Communication Conference proceedings

arXiv:2412.02650 [pdf, other]

Bridging Hard and Soft: Mechanical Metamaterials Enable Rigid Torque Transmission in Soft Robots

Authors: Molly Carton, Jakub F. Kowalewski, Jiani Guo, Jacob F. Alpert, Aman Garg, Daniel Revier, Jeffrey Ian Lipton

Abstract: Torque and continuous rotation are fundamental methods of actuation and manipulation in rigid robots. Soft robot arms use soft materials and structures to mimic the passive compliance of biological arms that bend and extend. This use of compliance prevents soft arms from continuously transmitting and exerting torques to interact with their environment. Here, we show how relying on patterning struc… ▽ More Torque and continuous rotation are fundamental methods of actuation and manipulation in rigid robots. Soft robot arms use soft materials and structures to mimic the passive compliance of biological arms that bend and extend. This use of compliance prevents soft arms from continuously transmitting and exerting torques to interact with their environment. Here, we show how relying on patterning structures instead of inherent material properties allows soft robotic arms to remain compliant while continuously transmitting torque to their environment. We demonstrate a soft robotic arm made from a pair of mechanical metamaterials that act as compliant constant-velocity joints. The joints are up to 52 times stiffer in torsion than bending and can bend up to 45°. This robot arm can continuously transmit torque while deforming in all other directions. The arm's mechanical design achieves high motion repeatability (0.4 mm and 0.1°) when tracking trajectories. We then trained a neural network to learn the inverse kinematics, enabling us to program the arm to complete tasks that are challenging for existing soft robots such as installing light bulbs, fastening bolts, and turning valves. The arm's passive compliance makes it safe around humans and provides a source of mechanical intelligence, enabling it to adapt to misalignment when manipulating objects. This work will bridge the gap between hard and soft robotics with applications in human assistance, warehouse automation, and extreme environments. △ Less

Submitted 3 December, 2024; originally announced December 2024.

arXiv:2411.01111 [pdf, other]

Rule Based Rewards for Language Model Safety

Authors: Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng

Abstract: Reinforcement learning based fine-tuning of large language models (LLMs) on human preferences has been shown to enhance both their capabilities and safety behavior. However, in cases related to safety, without precise instructions to human annotators, the data collected may cause the model to become overly cautious, or to respond in an undesirable style, such as being judgmental. Additionally, as… ▽ More Reinforcement learning based fine-tuning of large language models (LLMs) on human preferences has been shown to enhance both their capabilities and safety behavior. However, in cases related to safety, without precise instructions to human annotators, the data collected may cause the model to become overly cautious, or to respond in an undesirable style, such as being judgmental. Additionally, as model capabilities and usage patterns evolve, there may be a costly need to add or relabel data to modify safety behavior. We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data. Our method, Rule Based Rewards (RBR), uses a collection of rules for desired or undesired behaviors (e.g. refusals should not be judgmental) along with a LLM grader. In contrast to prior methods using AI feedback, our method uses fine-grained, composable, LLM-graded few-shot prompts as reward directly in RL training, resulting in greater control, accuracy and ease of updating. We show that RBRs are an effective training method, achieving an F1 score of 97.1, compared to a human-feedback baseline of 91.7, resulting in much higher safety-behavior accuracy through better balancing usefulness and safety. △ Less

Submitted 1 November, 2024; originally announced November 2024.

Comments: Accepted at Neurips 2024

arXiv:2410.21276 [pdf, other]

GPT-4o System Card

Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.19792 [pdf, other]

Substance Beats Style: Why Beginning Students Fail to Code with LLMs

Authors: Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

Abstract: Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. Why is this the case? This paper explores two competing hypotheses about the cause of student-LLM miscommunication: (1) students simply lack the technical vocabulary needed to write good prompts, and (2) students do not understand the ex… ▽ More Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. Why is this the case? This paper explores two competing hypotheses about the cause of student-LLM miscommunication: (1) students simply lack the technical vocabulary needed to write good prompts, and (2) students do not understand the extent of information that LLMs need to solve code generation tasks. We study (1) with a causal intervention experiment on technical vocabulary and (2) by analyzing graphs that abstract how students edit prompts and the different failures that they encounter. We find that substance beats style: a poor grasp of technical vocabulary is merely correlated with prompt failure; that the information content of prompts predicts success; that students get stuck making trivial edits; and more. Our findings have implications for the use of LLMs in programming education, and for efforts to make computing more accessible with LLMs. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2409.06941 [pdf, other]

FreeRide: Harvesting Bubbles in Pipeline Parallelism

Authors: Jiashu Zhang, Zihan Pan, Molly, Xu, Khuzaima Daudjee, Sihang Liu

Abstract: The occurrence of bubbles in pipeline parallelism is an inherent limitation that can account for more than 40% of the large language model (LLM) training time and is one of the main reasons for the underutilization of GPU resources in LLM training. Harvesting these bubbles for GPU side tasks can increase resource utilization and reduce training costs but comes with challenges. First, because bubbl… ▽ More The occurrence of bubbles in pipeline parallelism is an inherent limitation that can account for more than 40% of the large language model (LLM) training time and is one of the main reasons for the underutilization of GPU resources in LLM training. Harvesting these bubbles for GPU side tasks can increase resource utilization and reduce training costs but comes with challenges. First, because bubbles are discontinuous with various shapes, programming side tasks becomes difficult while requiring excessive engineering effort. Second, a side task can compete with pipeline training for GPU resources and incur significant overhead. To address these challenges, we propose FreeRide, a system designed to harvest bubbles in pipeline parallelism for side tasks. FreeRide provides programmers with interfaces to implement side tasks easily, manages bubbles and side tasks during pipeline training, and controls access to GPU resources by side tasks to reduce overhead. We demonstrate that FreeRide achieves 7.8% average cost savings with a negligible overhead of about 1% in training LLMs while serving model training, graph analytics, and image processing side tasks. △ Less

Submitted 27 April, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

arXiv:2407.21037 [pdf, other]

An Application of Large Language Models to Coding Negotiation Transcripts

Authors: Ray Friedman, Jaewoo Cho, Jeanne Brett, Xuhui Zhan, Ningyu Han, Sriram Kannan, Yingxiang Ma, Jesse Spencer-Smith, Elisabeth Jäckel, Alfred Zerres, Madison Hooper, Katie Babbit, Manish Acharya, Wendi Adair, Soroush Aslani, Tayfun Aykaç, Chris Bauman, Rebecca Bennett, Garrett Brady, Peggy Briggs, Cheryl Dowie, Chase Eck, Igmar Geiger, Frank Jacob, Molly Kern , et al. (33 additional authors not shown)

Abstract: In recent years, Large Language Models (LLM) have demonstrated impressive capabilities in the field of natural language processing (NLP). This paper explores the application of LLMs in negotiation transcript analysis by the Vanderbilt AI Negotiation Lab. Starting in September 2022, we applied multiple strategies using LLMs from zero shot learning to fine tuning models to in-context learning). The… ▽ More In recent years, Large Language Models (LLM) have demonstrated impressive capabilities in the field of natural language processing (NLP). This paper explores the application of LLMs in negotiation transcript analysis by the Vanderbilt AI Negotiation Lab. Starting in September 2022, we applied multiple strategies using LLMs from zero shot learning to fine tuning models to in-context learning). The final strategy we developed is explained, along with how to access and use the model. This study provides a sense of both the opportunities and roadblocks for the implementation of LLMs in real life applications and offers a model for how LLMs can be applied to coding in other fields. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.04906 [pdf, other]

Privacy or Transparency? Negotiated Smartphone Access as a Signifier of Trust in Romantic Relationships

Authors: Periwinkle Doerfler, Kieron Ivy Turk, Chris Geeng, Damon McCoy, Jeffrey Ackerman, Molly Dragiewicz

Abstract: In this work, we analyze two large-scale surveys to examine how individuals think about sharing smartphone access with romantic partners as a function of trust in relationships. We find that the majority of couples have access to each others' devices, but may have explicit or implicit boundaries on how this access is to be used. Investigating these boundaries and related social norms, we find that… ▽ More In this work, we analyze two large-scale surveys to examine how individuals think about sharing smartphone access with romantic partners as a function of trust in relationships. We find that the majority of couples have access to each others' devices, but may have explicit or implicit boundaries on how this access is to be used. Investigating these boundaries and related social norms, we find that there is little consensus about the level of smartphone access (i.e., transparency), or lack thereof (i.e., privacy) that is desirable in romantic contexts. However, there is broad agreement that the level of access should be mutual and consensual. Most individuals understand trust to be the basis of their decisions about transparency and privacy. Furthermore, we find individuals have crossed these boundaries, violating their partners' privacy and betraying their trust. We examine how, when, why, and by whom these betrayals occur. We consider the ramifications of these boundary violations in the case of intimate partner violence. Finally, we provide recommendations for design changes to enable technological enforcement of boundaries currently enforced by trust, bringing access control in line with users' sharing preferences. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2405.06058 [pdf, other]

Large Language Models Show Human-like Social Desirability Biases in Survey Responses

Authors: Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H. Ungar, Johannes C. Eichstaedt

Abstract: As Large Language Models (LLMs) become widely used to model and simulate human behavior, understanding their biases becomes critical. We developed an experimental framework using Big Five personality surveys and uncovered a previously undetected social desirability bias in a wide range of LLMs. By systematically varying the number of questions LLMs were exposed to, we demonstrate their ability to… ▽ More As Large Language Models (LLMs) become widely used to model and simulate human behavior, understanding their biases becomes critical. We developed an experimental framework using Big Five personality surveys and uncovered a previously undetected social desirability bias in a wide range of LLMs. By systematically varying the number of questions LLMs were exposed to, we demonstrate their ability to infer when they are being evaluated. When personality evaluation is inferred, LLMs skew their scores towards the desirable ends of trait dimensions (i.e., increased extraversion, decreased neuroticism, etc). This bias exists in all tested models, including GPT-4/3.5, Claude 3, Llama 3, and PaLM-2. Bias levels appear to increase in more recent models, with GPT-4's survey responses changing by 1.20 (human) standard deviations and Llama 3's by 0.98 standard deviations-very large effects. This bias is robust to randomization of question order and paraphrasing. Reverse-coding all the questions decreases bias levels but does not eliminate them, suggesting that this effect cannot be attributed to acquiescence bias. Our findings reveal an emergent social desirability bias and suggest constraints on profiling LLMs with psychometric tests and on using LLMs as proxies for human participants. △ Less

Submitted 21 November, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 3 pages, 2 figures, accepted at PNAS Nexus

arXiv:2405.05382 [pdf, other]

DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation

Authors: Joshua N. Williams, Molly FitzMorris, Osman Aka, Sarah Laszlo

Abstract: Text-to-image models are now easy to use and ubiquitous. However, prior work has found that they are prone to recapitulating harmful Western stereotypes. For example, requesting that a model generate an "African person and their house," may produce a person standing next to a straw hut. In this example, the word "African" is an explicit descriptor of the person that the prompt is seeking to depict… ▽ More Text-to-image models are now easy to use and ubiquitous. However, prior work has found that they are prone to recapitulating harmful Western stereotypes. For example, requesting that a model generate an "African person and their house," may produce a person standing next to a straw hut. In this example, the word "African" is an explicit descriptor of the person that the prompt is seeking to depict. Here, we examine whether implicit markers, such as dialect, can also affect the portrayal of people in text-to-image outputs. We pair prompts in Mainstream American English with counterfactuals that express grammatical constructions found in dialects correlated with historically marginalized groups. We find that through minimal, syntax-only changes to prompts, we can systematically shift the skin tone and gender of people in the generated images. We conclude with a discussion of whether dialectic distribution shifts like this are harmful or are expected, possibly even desirable, model behavior. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 12 pages, 3 figures in main text, 2 tables in main text, 4 figures in appendix, 7 tables in appendix

arXiv:2404.10153 [pdf]

doi 10.1145/3613904.3641988

How Users Experience Closed Captions on Live Television: Quality Metrics Remain a Challenge

Authors: Mariana Arroyo Chavez, Molly Feanny, Matthew Seita, Bernard Thompson, Keith Delk, Skyler Officer, Abraham Glasser, Raja Kushalnagar, Christian Vogler

Abstract: This paper presents a mixed methods study on how deaf, hard of hearing and hearing viewers perceive live TV caption quality with captioned video stimuli designed to mirror TV captioning experiences. To assess caption quality, we used four commonly-used quality metrics focusing on accuracy: word error rate, weighted word error rate, automated caption evaluation (ACE), and its successor ACE2. We cal… ▽ More This paper presents a mixed methods study on how deaf, hard of hearing and hearing viewers perceive live TV caption quality with captioned video stimuli designed to mirror TV captioning experiences. To assess caption quality, we used four commonly-used quality metrics focusing on accuracy: word error rate, weighted word error rate, automated caption evaluation (ACE), and its successor ACE2. We calculated the correlation between the four quality metrics and viewer ratings for subjective quality and found that the correlation was weak, revealing that other factors besides accuracy affect user ratings. Additionally, even high-quality captions are perceived to have problems, despite controlling for confounding factors. Qualitative analysis of viewer comments revealed three major factors affecting their experience: Errors within captions, difficulty in following captions, and caption appearance. The findings raise questions as to how objective caption quality metrics can be reconciled with the user experience across a diverse spectrum of viewers. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: To appear in Proceedings of the Conference on Human Factors in Computing Systems CHI 24, May 11-16, Honolulu, HI, USA, 16 pages. https://doi.org/10.1145/3613904.3641988

arXiv:2403.04526 [pdf, other]

doi 10.1073/pnas.2407439121

Hyperspectral unmixing for Raman spectroscopy via physics-constrained autoencoders

Authors: Dimitar Georgiev, Álvaro Fernández-Galiana, Simon Vilms Pedersen, Georgios Papadopoulos, Ruoxiao Xie, Molly M. Stevens, Mauricio Barahona

Abstract: Raman spectroscopy is widely used across scientific domains to characterize the chemical composition of samples in a non-destructive, label-free manner. Many applications entail the unmixing of signals from mixtures of molecular species to identify the individual components present and their proportions, yet conventional methods for chemometrics often struggle with complex mixture scenarios encoun… ▽ More Raman spectroscopy is widely used across scientific domains to characterize the chemical composition of samples in a non-destructive, label-free manner. Many applications entail the unmixing of signals from mixtures of molecular species to identify the individual components present and their proportions, yet conventional methods for chemometrics often struggle with complex mixture scenarios encountered in practice. Here, we develop hyperspectral unmixing algorithms based on autoencoder neural networks, and we systematically validate them using both synthetic and experimental benchmark datasets created in-house. Our results demonstrate that unmixing autoencoders provide improved accuracy, robustness and efficiency compared to standard unmixing methods. We also showcase the applicability of autoencoders to complex biological settings by showing improved biochemical characterization of volumetric Raman imaging data from a monocytic cell. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Journal ref: Proceedings of the National Academy of Sciences, 2024, 121(44), e2321305121

arXiv:2402.17704 [pdf, other]

Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Authors: Ruby Sedgwick, John P. Goertz, Molly M. Stevens, Ruth Misener, Mark van der Wilk

Abstract: With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this developmen… ▽ More With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this development feasible. By combining a transfer learning surrogate model with Bayesian optimization, we show how the total number of experiments can be reduced by sharing information between optimization tasks. We demonstrate the reduction in the number of experiments using data from the development of DNA competitors for use in an amplification-based diagnostic assay. We use cross-validation to compare the predictive accuracy of different transfer learning models, and then compare the performance of the models for both single objective and penalized optimization tasks. △ Less

Submitted 22 October, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2401.15232 [pdf, other]

doi 10.1145/3613904.3642706

How Beginning Programmers and Code LLMs (Mis)read Each Other

Authors: Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman

Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluat… ▽ More Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education. △ Less

Submitted 7 July, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Published in CHI 2024

arXiv:2312.11359 [pdf, other]

Combining Game Design and Data Visualization to Inform Plastics Policy: Fostering Collaboration between Science, Decision-Makers, and Artificial Intelligence

Authors: A Samuel Pottinger, Nivedita Biyani, Roland Geyer, Douglas J McCauley, Magali de Bruyn, Molly R Morse, Neil Nathan, Kevin Koy, Ciera Martinez

Abstract: This multi-disciplinary case study details how a public web application combines information and game design to visualize effects of user-defined policies intended to reduce plastic waste. Contextualizing this open source software within a broader lineage of digital media research, this user experience exploration outlines potential directions for facilitating conversation between artificial intel… ▽ More This multi-disciplinary case study details how a public web application combines information and game design to visualize effects of user-defined policies intended to reduce plastic waste. Contextualizing this open source software within a broader lineage of digital media research, this user experience exploration outlines potential directions for facilitating conversation between artificial intelligence, scientists, and decision makers during an iterative policy building process. Furthermore, this system dissection reveals how this interactive science effort considers the practicalities of a treaty's shifting priorities and proposals in its designs. Specifically, this historically situated investigation of the tool's approach highlights options for centering human decision making where artificial intelligence helps reason about interventions but does not prescribe them. Finally, analysis summarizes this application's specific game design-inspired mechanics and their efforts to: enable users' agency to explore solution possibilities freely, invite deep engagement with scientific findings, and simultaneously serve multiple audiences with divergent objectives and expertise. △ Less

Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 29 pages of which 8 are citations, 4 figures, latex generated from markdown via Pandoc (https://pandoc.org/) for Arxiv

arXiv:2312.00023 [pdf, other]

Hypergraph Topological Features for Autoencoder-Based Intrusion Detection for Cybersecurity Data

Authors: Bill Kay, Sinan G. Aksoy, Molly Baird, Daniel M. Best, Helen Jenne, Cliff Joslyn, Christopher Potvin, Gregory Henselman-Petrusek, Garret Seppala, Stephen J. Young, Emilie Purvine

Abstract: In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that… ▽ More In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that uses an autoencoder directly to determine network intrusions, and one that de-noises input data for a persistent homology system, PHANTOM. We provide heuristic justification for the use of the methods described therein for an intrusion detection pipeline for cyber data. We conclude by showing a small example over synthetic cyber attack data. △ Less

Submitted 9 November, 2023; originally announced December 2023.

MSC Class: 55N31

arXiv:2310.05597 [pdf, other]

Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance

Authors: Molly R. Petersen, Lonneke van der Plas

Abstract: While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used… ▽ More While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance. △ Less

Submitted 3 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.06640 [pdf, other]

REVIS: An Error Visualization Tool for Rust

Authors: Ruochen Wang, Molly Maclaren, Michael Coblenz

Abstract: Rust is a programming language that uses a concept of ownership to guarantee memory safety without the use of a garbage collector. However, some error messages related to ownership can be difficult to understand and fix, particularly those that depend on value lifetimes. To help developers fix lifetime-related errors, we developed REVIS, a VSCode extension that visualizes lifetime-related Rust com… ▽ More Rust is a programming language that uses a concept of ownership to guarantee memory safety without the use of a garbage collector. However, some error messages related to ownership can be difficult to understand and fix, particularly those that depend on value lifetimes. To help developers fix lifetime-related errors, we developed REVIS, a VSCode extension that visualizes lifetime-related Rust compiler errors. We describe the design and implementation of the VSCode extension, along with a preliminary evaluation of its efficacy for student learners of Rust. Although the number of participants was too low to enable evaluation of the efficacy of REVIS, we gathered data regarding the prevalence and time to fix the compiler errors that the participants encountered. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: Presented at HATRA 2023

arXiv:2308.09895 [pdf, other]

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)… ▽ More Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer. △ Less

Submitted 21 September, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.01320 [pdf, other]

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

Abstract: ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at… ▽ More ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 14 pages, 7 figures

arXiv:2307.13650 [pdf, other]

doi 10.1021/acs.analchem.4c00383

RamanSPy: An open-source Python package for integrative Raman spectroscopy data analysis

Authors: Dimitar Georgiev, Simon Vilms Pedersen, Ruoxiao Xie, Álvaro Fernández-Galiana, Molly M. Stevens, Mauricio Barahona

Abstract: Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack of reproducibility of analysis workflows thereof.… ▽ More Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack of reproducibility of analysis workflows thereof. To address these issues, we introduce RamanSPy, an open-source Python package for Raman spectroscopic research and analysis. RamanSPy provides a comprehensive library of ready-to-use tools for spectroscopic analysis, which streamlines day-to-day tasks, integrative analyses, as well as novel research and algorithmic development. RamanSPy is modular and open source, not tied to a particular technology or data format, and can be readily interfaced with the burgeoning ecosystem for data science, statistical analysis and machine learning in Python. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Journal ref: Anal. Chem. 2024, 96, 21, 8492-8500

arXiv:2306.04556 [pdf, other]

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Authors: Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

Abstract: Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. Stud… ▽ More Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2301.10531 [pdf, other]

3D Tooth Mesh Segmentation with Simplified Mesh Cell Representation

Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris N. Metaxas

Abstract: Manual tooth segmentation of 3D tooth meshes is tedious and there is variations among dentists. %Manual tooth annotation of 3D tooth meshes is a tedious task. Several deep learning based methods have been proposed to perform automatic tooth mesh segmentation. Many of the proposed tooth mesh segmentation algorithms summarize the mesh cell as - the cell center or barycenter, the normal at barycenter… ▽ More Manual tooth segmentation of 3D tooth meshes is tedious and there is variations among dentists. %Manual tooth annotation of 3D tooth meshes is a tedious task. Several deep learning based methods have been proposed to perform automatic tooth mesh segmentation. Many of the proposed tooth mesh segmentation algorithms summarize the mesh cell as - the cell center or barycenter, the normal at barycenter, the cell vertices and the normals at the cell vertices. Summarizing of the mesh cell/triangle in this manner imposes an implicit structural constraint and makes it difficult to work with multiple resolutions which is done in many point cloud based deep learning algorithms. We propose a novel segmentation method which utilizes only the barycenter and the normal at the barycenter information of the mesh cell and yet achieves competitive performance. We are the first to demonstrate that it is possible to relax the implicit structural constraint and yet achieve superior segmentation performance △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: accepted at IEEE ISBI 2023 International Symposium on Biomedical Imaging

arXiv:2212.13638 [pdf, other]

Battling the Coronavirus Infodemic Among Social Media Users in Kenya and Nigeria

Authors: Molly Offer-Westort, Leah R. Rosenzweig, Susan Athey

Abstract: How can we induce social media users to be discerning when sharing information during a pandemic? An experiment on Facebook Messenger with users from Kenya (n = 7,498) and Nigeria (n = 7,794) tested interventions designed to decrease intentions to share COVID-19 misinformation without decreasing intentions to share factual posts. The initial stage of the study incorporated: (i) a factorial design… ▽ More How can we induce social media users to be discerning when sharing information during a pandemic? An experiment on Facebook Messenger with users from Kenya (n = 7,498) and Nigeria (n = 7,794) tested interventions designed to decrease intentions to share COVID-19 misinformation without decreasing intentions to share factual posts. The initial stage of the study incorporated: (i) a factorial design with 40 intervention combinations; and (ii) a contextual adaptive design, increasing the probability of assignment to treatments that worked better for previous subjects with similar characteristics. The second stage evaluated the best-performing treatments and a targeted treatment assignment policy estimated from the data. We precisely estimate null effects from warning flags and related article suggestions, tactics used by social media platforms. However, nudges to consider information's accuracy reduced misinformation sharing relative to control by 4.9% (estimate = -2.3 pp, s.e. = 1.0 , Z = -2.31, p = 0.021, 95% CI = [-4.2 , -0.35]). Such low-cost scalable interventions may improve the quality of information circulating online. △ Less

Submitted 15 September, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 52 pages including appendix, 9 figures

arXiv:2211.07357 [pdf, other]

Controlling Commercial Cooling Systems Using Reinforcement Learning

Authors: Jerry Luo, Cosmin Paduraru, Octavian Voicu, Yuri Chervonyi, Scott Munns, Jerry Li, Crystal Qian, Praneet Dutta, Jared Quincy Davis, Ningjia Wu, Xingwei Yang, Chu-Ming Chang, Ted Li, Rob Rose, Mingyan Fan, Hootan Nakhost, Tinglin Liu, Brian Kirkman, Frank Altamura, Lee Cline, Patrick Tonker, Joel Gouker, Dave Uden, Warren Buddy Bryan, Jason Law , et al. (11 additional authors not shown)

Abstract: This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha… ▽ More This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites. △ Less

Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 27 pages, 11 figures

arXiv:2209.08132 [pdf, other]

Automatic Tooth Segmentation from 3D Dental Model using Deep Learning: A Quantitative Analysis of what can be learnt from a Single 3D Dental Model

Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris Metaxas

Abstract: 3D tooth segmentation is an important task for digital orthodontics. Several Deep Learning methods have been proposed for automatic tooth segmentation from 3D dental models or intraoral scans. These methods require annotated 3D intraoral scans. Manually annotating 3D intraoral scans is a laborious task. One approach is to devise self-supervision methods to reduce the manual labeling effort. Compar… ▽ More 3D tooth segmentation is an important task for digital orthodontics. Several Deep Learning methods have been proposed for automatic tooth segmentation from 3D dental models or intraoral scans. These methods require annotated 3D intraoral scans. Manually annotating 3D intraoral scans is a laborious task. One approach is to devise self-supervision methods to reduce the manual labeling effort. Compared to other types of point cloud data like scene point cloud or shape point cloud data, 3D tooth point cloud data has a very regular structure and a strong shape prior. We look at how much representative information can be learnt from a single 3D intraoral scan. We evaluate this quantitatively with the help of ten different methods of which six are generic point cloud segmentation methods whereas the other four are tooth segmentation specific methods. Surprisingly, we find that with a single 3D intraoral scan training, the Dice score can be as high as 0.86 whereas the full training set gives Dice score of 0.94. We conclude that the segmentation methods can learn a great deal of information from a single 3D tooth point cloud scan under suitable conditions e.g. data augmentation. We are the first to quantitatively evaluate and demonstrate the representation learning capability of Deep Learning methods from a single 3D intraoral scan. This can enable building self-supervision methods for tooth segmentation under extreme data limitation scenario by leveraging the available data to the fullest possible extent. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: accepted to SIPAIM 2022

arXiv:2208.08227 [pdf, other]

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Authors: Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi… ▽ More Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark and MBPP benchmark to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder. We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages. △ Less

Submitted 19 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2207.11357 [pdf, other]

PREPRINT: Found Object Puppeteering as a Tool for Rapid Movement Sketching in 3D Animation

Authors: Molly Jane Nicholas, Eric Paulos

Abstract: Both expert and novice animators have a need to engage in movement sketching -- low-cost, rapid iteration on a character's movement style -- especially early on in the ideation process. Yet animation tools currently focus on low-level character control mechanisms rather than encouraging engagement with and deep observation of movement. We identify Found Object puppeteering -- where puppeteers mani… ▽ More Both expert and novice animators have a need to engage in movement sketching -- low-cost, rapid iteration on a character's movement style -- especially early on in the ideation process. Yet animation tools currently focus on low-level character control mechanisms rather than encouraging engagement with and deep observation of movement. We identify Found Object puppeteering -- where puppeteers manipulate everyday physical objects with their hands -- as a creative practice whose use of material "jigs" is uniquely well-positioned to scaffold the novice animator's developing skills. In this paper, we draw on the practice of an expert puppeteer practitioner to inform the design of a system that incorporates physical objects into the animation workflow to scaffold novices into diverse movement exploration while manipulating digital puppets. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2205.03231 [pdf, other]

Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with Subjective Tinnitus

Authors: Yun Li, Zhe Liu, Lina Yao, Molly Lucas, Jessica J. M. Monaghan, Yu Zhang

Abstract: With the development of digital technology, machine learning has paved the way for the next generation of tinnitus diagnoses. Although machine learning has been widely applied in EEG-based tinnitus analysis, most current models are dataset-specific. Each dataset may be limited to a specific range of symptoms, overall disease severity, and demographic attributes; further, dataset formats may differ… ▽ More With the development of digital technology, machine learning has paved the way for the next generation of tinnitus diagnoses. Although machine learning has been widely applied in EEG-based tinnitus analysis, most current models are dataset-specific. Each dataset may be limited to a specific range of symptoms, overall disease severity, and demographic attributes; further, dataset formats may differ, impacting model performance. This paper proposes a side-aware meta-learning for cross-dataset tinnitus diagnosis, which can effectively classify tinnitus in subjects of divergent ages and genders from different data collection processes. Owing to the superiority of meta-learning, our method does not rely on large-scale datasets like conventional deep learning models. Moreover, we design a subject-specific training process to assist the model in fitting the data pattern of different patients or healthy people. Our method achieves a high accuracy of 73.8\% in the cross-dataset classification. We conduct an extensive analysis to show the effectiveness of side information of ears in enhancing model performance and side-aware meta-learning in improving the quality of the learned features. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2202.03868 [pdf, other]

Mapping DNN Embedding Manifolds for Network Generalization Prediction

Authors: Molly O'Brien, Julia Bukowski, Mathias Unberath, Aria Pezeshk, Greg Hager

Abstract: Understanding Deep Neural Network (DNN) performance in changing conditions is essential for deploying DNNs in safety critical applications with unconstrained environments, e.g., perception for self-driving vehicles or medical image analysis. Recently, the task of Network Generalization Prediction (NGP) has been proposed to predict how a DNN will generalize in a new operating domain. Previous NGP a… ▽ More Understanding Deep Neural Network (DNN) performance in changing conditions is essential for deploying DNNs in safety critical applications with unconstrained environments, e.g., perception for self-driving vehicles or medical image analysis. Recently, the task of Network Generalization Prediction (NGP) has been proposed to predict how a DNN will generalize in a new operating domain. Previous NGP approaches have relied on labeled metadata and known distributions for the new operating domains. In this study, we propose the first NGP approach that predicts DNN performance based solely on how unlabeled images from an external operating domain map in the DNN embedding space. We demonstrate this technique for pedestrian, melanoma, and animal classification tasks and show state of the art NGP in 13 of 15 NGP tasks without requiring domain knowledge. Additionally, we show that our NGP embedding maps can be used to identify misclassified images when the DNN performance is poor. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 11 pages, 5 figures

arXiv:2112.08460 [pdf]

Friendscope: Exploring In-the-Moment Experience Sharing on Camera Glasses via a Shared Camera

Authors: Molly Jane Nicholas, Brian A. Smith, Rajan Vaish

Abstract: We introduce Friendscope, an instant, in-the-moment experience sharing system for lightweight commercial camera glasses. Friendscope explores a new concept called a shared camera. This concept allows a wearer to share control of their camera with a remote friend, making it possible for both people to capture photos/videos from the camera in the moment. Through a user study with 48 participants, we… ▽ More We introduce Friendscope, an instant, in-the-moment experience sharing system for lightweight commercial camera glasses. Friendscope explores a new concept called a shared camera. This concept allows a wearer to share control of their camera with a remote friend, making it possible for both people to capture photos/videos from the camera in the moment. Through a user study with 48 participants, we found that users felt connected to each other, describing the shared camera as a more intimate form of livestreaming. Moreover, even privacy-sensitive users were able to retain their sense of privacy and control with the shared camera. Friendscope's different shared camera configurations give wearers ultimate control over who they share the camera with and what photos/videos they share. We conclude with design implications for future experience sharing systems. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: ACM CSCW 2022

arXiv:2110.12501 [pdf, other]

Abstractified Multi-instance Learning (AMIL) for Biomedical Relation Extraction

Authors: William Hogan, Molly Huang, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Yoshiki Vazquez Baeza, Andrew Bartko, Chun-Nan Hsu

Abstract: Relation extraction in the biomedical domain is a challenging task due to a lack of labeled data and a long-tail distribution of fact triples. Many works leverage distant supervision which automatically generates labeled data by pairing a knowledge graph with raw textual data. Distant supervision produces noisy labels and requires additional techniques, such as multi-instance learning (MIL), to de… ▽ More Relation extraction in the biomedical domain is a challenging task due to a lack of labeled data and a long-tail distribution of fact triples. Many works leverage distant supervision which automatically generates labeled data by pairing a knowledge graph with raw textual data. Distant supervision produces noisy labels and requires additional techniques, such as multi-instance learning (MIL), to denoise the training signal. However, MIL requires multiple instances of data and struggles with very long-tail datasets such as those found in the biomedical domain. In this work, we propose a novel reformulation of MIL for biomedical relation extraction that abstractifies biomedical entities into their corresponding semantic types. By grouping entities by types, we are better able to take advantage of the benefits of MIL and further denoise the training signal. We show this reformulation, which we refer to as abstractified multi-instance learning (AMIL), improves performance in biomedical relationship extraction. We also propose a novel relationship embedding architecture that further improves model performance. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: 14 pages, 3 figures, submitted to Automated Knowledge Base Construction (2021)

Report number: 13

Journal ref: 3rd Conference on Automated Knowledge Base Construction (2021)

arXiv:2108.07399 [pdf, other]

Network Generalization Prediction for Safety Critical Tasks in Novel Operating Domains

Authors: Molly O'Brien, Mike Medoff, Julia Bukowski, Greg Hager

Abstract: It is well known that Neural Network (network) performance often degrades when a network is used in novel operating domains that differ from its training and testing domains. This is a major limitation, as networks are being integrated into safety critical, cyber-physical systems that must work in unconstrained environments, e.g., perception for autonomous vehicles. Training networks that generali… ▽ More It is well known that Neural Network (network) performance often degrades when a network is used in novel operating domains that differ from its training and testing domains. This is a major limitation, as networks are being integrated into safety critical, cyber-physical systems that must work in unconstrained environments, e.g., perception for autonomous vehicles. Training networks that generalize to novel operating domains and that extract robust features is an active area of research, but previous work fails to predict what the network performance will be in novel operating domains. We propose the task Network Generalization Prediction: predicting the expected network performance in novel operating domains. We describe the network performance in terms of an interpretable Context Subspace, and we propose a methodology for selecting the features of the Context Subspace that provide the most information about the network performance. We identify the Context Subspace for a pretrained Faster RCNN network performing pedestrian detection on the Berkeley Deep Drive (BDD) Dataset, and demonstrate Network Generalization Prediction accuracy within 5% or less of observed performance. We also demonstrate that the Context Subspace from the BDD Dataset is informative for completely unseen datasets, JAAD and Cityscapes, where predictions have a bias of 10% or less. △ Less

Submitted 16 August, 2021; originally announced August 2021.

arXiv:2105.01006 [pdf, other]

Robotic Surgery With Lean Reinforcement Learning

Authors: Yotam Barnoy, Molly O'Brien, Will Wang, Gregory Hager

Abstract: As surgical robots become more common, automating away some of the burden of complex direct human operation becomes ever more feasible. Model-free reinforcement learning (RL) is a promising direction toward generalizable automated surgical performance, but progress has been slowed by the lack of efficient and realistic learning environments. In this paper, we describe adding reinforcement learning… ▽ More As surgical robots become more common, automating away some of the burden of complex direct human operation becomes ever more feasible. Model-free reinforcement learning (RL) is a promising direction toward generalizable automated surgical performance, but progress has been slowed by the lack of efficient and realistic learning environments. In this paper, we describe adding reinforcement learning support to the da Vinci Skill Simulator, a training simulation used around the world to allow surgeons to learn and rehearse technical skills. We successfully teach an RL-based agent to perform sub-tasks in the simulator environment, using either image or state data. As far as we know, this is the first time an RL-based agent is taught from visual data in a surgical robotics environment. Additionally, we tackle the sample inefficiency of RL using a simple-to-implement system which we term hybrid-batch learning (HBL), effectively adding a second, long-term replay buffer to the Q-learning process. Additionally, this allows us to bootstrap learning from images from the data collected using the easier task of learning from state. We show that HBL decreases our learning times significantly. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.10034 [pdf, other]

On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware

Authors: Molly Buchanan, Jeffrey W. Collyer, Jack W. Davidson, Saikat Dey, Mark Gardner, Jason D. Hiser, Jeffry Lang, Alastair Nottingham, Alina Oprea

Abstract: Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is… ▽ More Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is easily distinguishable from real traffic by even simple machine learning (ML) algorithms. Real network data is preferable, but while ubiquitous is broadly both sensitive and lacking in ground truth labels, limiting its utility for ML research. This paper presents a multi-faceted approach to generating a data set of labeled malicious connections embedded within anonymized network traffic collected from large production networks. Real-world malware is defanged and introduced to simulated, secured nodes within those networks to generate realistic traffic while maintaining sufficient isolation to protect real data and infrastructure. Network sensor data, including this embedded malware traffic, is collected at a network edge and anonymized for research use. Network traffic was collected and produced in accordance with the aforementioned methods at two major educational institutions. The result is a highly realistic, long term, multi-institution data set with embedded data labels spanning over 1.5 trillion connections and over a petabyte of sensor log data. The usability of this data set is demonstrated by its utility to our artificial intelligence and machine learning (AI/ML) research program. △ Less

Submitted 27 May, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 4+2 pages, 3 figures, 1 table, for AI4CS-SDM21

arXiv:2012.00144 [pdf]

doi 10.1016/j.jcjp.2021.100009

Improved Diagnosis of Tibiofemoral Cartilage Defects on MRI Images Using Deep Learning

Authors: Gergo Merkely, Alireza Borjali, Molly Zgoda, Evan M. Farina, Simon Gortz, Orhun Muratoglu, Christian Lattermann, Kartik M. Varadarajan

Abstract: Background: MRI is the modality of choice for cartilage imaging; however, its diagnostic performance is variable and significantly lower than the gold standard diagnostic knee arthroscopy. In recent years, deep learning has been used to automatically interpret medical images to improve diagnostic accuracy and speed. Purpose: The primary purpose of this study was to evaluate whether deep learning a… ▽ More Background: MRI is the modality of choice for cartilage imaging; however, its diagnostic performance is variable and significantly lower than the gold standard diagnostic knee arthroscopy. In recent years, deep learning has been used to automatically interpret medical images to improve diagnostic accuracy and speed. Purpose: The primary purpose of this study was to evaluate whether deep learning applied to the interpretation of knee MRI images can be utilized to identify cartilage defects accurately. Methods: We analyzed data from patients who underwent knee MRI evaluation and consequently had arthroscopic knee surgery (207 with cartilage defect, 90 without cartilage defect). Patients' arthroscopic findings were compared to preoperative MRI images to verify the presence or absence of isolated tibiofemoral cartilage defects. We developed three convolutional neural networks (CNNs) to analyze the MRI images and implemented image-specific saliency maps to visualize the CNNs' decision-making process. To compare the CNNs' performance against human interpretation, the same test dataset images were provided to an experienced orthopaedic surgeon and an orthopaedic resident. Results: Saliency maps demonstrated that the CNNs learned to focus on the clinically relevant areas of the tibiofemoral articular cartilage on MRI images during the decision-making processes. One CNN achieved higher performance than the orthopaedic surgeon, with two more accurate diagnoses made by the CNN. All the CNNs outperformed the orthopaedic resident. Conclusion: CNN can be used to enhance the diagnostic performance of MRI in identifying isolated tibiofemoral cartilage defects and may replace diagnostic knee arthroscopy in certain cases in the future. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Journal ref: https://doi.org/10.1016/j.jcjp.2021.100009

arXiv:2011.10575 [pdf, other]

Design of Experiments for Verifying Biomolecular Networks

Authors: Ruby Sedgwick, John Goertz, Molly Stevens, Ruth Misener, Mark van der Wilk

Abstract: There is a growing trend in molecular and synthetic biology of using mechanistic (non machine learning) models to design biomolecular networks. Once designed, these networks need to be validated by experimental results to ensure the theoretical network correctly models the true system. However, these experiments can be expensive and time consuming. We propose a design of experiments approach for v… ▽ More There is a growing trend in molecular and synthetic biology of using mechanistic (non machine learning) models to design biomolecular networks. Once designed, these networks need to be validated by experimental results to ensure the theoretical network correctly models the true system. However, these experiments can be expensive and time consuming. We propose a design of experiments approach for validating these networks efficiently. Gaussian processes are used to construct a probabilistic model of the discrepancy between experimental results and the designed response, then a Bayesian optimization strategy used to select the next sample points. We compare different design criteria and develop a stopping criterion based on a metric that quantifies this discrepancy over the whole surface, and its uncertainty. We test our strategy on simulated data from computer models of biochemical processes. △ Less

Submitted 25 November, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: Comment: Updated to correct typo "that that" => "that"

arXiv:2010.00704 [pdf, other]

BCNN: A Binary CNN with All Matrix Ops Quantized to 1 Bit Precision

Authors: Arthur J. Redfern, Lijun Zhu, Molly K. Newquist

Abstract: This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent with a constructive proof outline showing that binary neural networks are universal function approximators. 71.24% top 1 accuracy on the 2012 ImageNet validation set was achieved with a… ▽ More This paper describes a CNN where all CNN style 2D convolution operations that lower to matrix matrix multiplication are fully binary. The network is derived from a common building block structure that is consistent with a constructive proof outline showing that binary neural networks are universal function approximators. 71.24% top 1 accuracy on the 2012 ImageNet validation set was achieved with a 2 step training procedure and implementation strategies optimized for binary operands are provided. △ Less

Submitted 5 March, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

arXiv:2009.13318 [pdf]

doi 10.1021/acs.analchem.1c02178

High-throughput molecular imaging via deep learning enabled Raman spectroscopy

Authors: Conor C. Horgan, Magnus Jensen, Anika Nagelkerke, Jean-Phillipe St-Pierre, Tom Vercauteren, Molly M. Stevens, Mads S. Bergholt

Abstract: Raman spectroscopy enables non-destructive, label-free imaging with unprecedented molecular contrast but is limited by slow data acquisition, largely preventing high-throughput imaging applications. Here, we present a comprehensive framework for higher-throughput molecular imaging via deep learning enabled Raman spectroscopy, termed DeepeR, trained on a large dataset of hyperspectral Raman images,… ▽ More Raman spectroscopy enables non-destructive, label-free imaging with unprecedented molecular contrast but is limited by slow data acquisition, largely preventing high-throughput imaging applications. Here, we present a comprehensive framework for higher-throughput molecular imaging via deep learning enabled Raman spectroscopy, termed DeepeR, trained on a large dataset of hyperspectral Raman images, with over 1.5 million spectra (400 hours of acquisition) in total. We firstly perform denoising and reconstruction of low signal-to-noise ratio Raman molecular signatures via deep learning, with a 9x improvement in mean squared error over state-of-the-art Raman filtering methods. Next, we develop a neural network for robust 2-4x super-resolution of hyperspectral Raman images that preserves molecular cellular information. Combining these approaches, we achieve Raman imaging speed-ups of up to 160x, enabling high resolution, high signal-to-noise ratio cellular imaging in under one minute. Finally, transfer learning is applied to extend DeepeR from cell to tissue-scale imaging. DeepeR provides a foundation that will enable a host of higher-throughput Raman spectroscopy and molecular imaging applications across biomedicine. △ Less

Submitted 28 September, 2020; originally announced September 2020.

arXiv:2006.15292 [pdf, other]

Project Calico: Wearable Chemical Sensors for Environmental Monitoring

Authors: Alex Mariakakis, Sifang Chen, Bichlien Nguyen, Kirsten Bray, Molly Blank, Jonathan Lester, Lauren Ryan, Paul Johns, Gonzalo Ramos, Asta Roseway

Abstract: Environmental hazards often go unnoticed because they are invisible to the naked eye, posing risks to our health over time. Project Calico aims to raise awareness of these risks by augmenting everyday fashion with color-changing chemical sensors that can be observed at a glance or captured by a smartphone camera. Project Calico leverages existing cosmetic and fabrication processes to democratize e… ▽ More Environmental hazards often go unnoticed because they are invisible to the naked eye, posing risks to our health over time. Project Calico aims to raise awareness of these risks by augmenting everyday fashion with color-changing chemical sensors that can be observed at a glance or captured by a smartphone camera. Project Calico leverages existing cosmetic and fabrication processes to democratize environmental sensing, enabling creators to make their own accessories. We present two fashionable instantiations of Project Calico involving UV irradiation. EcoHair, created by hair treatment, is UV-sensitive hair that intensifies in color saturation depending on the UV intensity. EcoPatches, created by inkjet printing, can be worn as temporary tattoos that change their color to reflect cumulative UV exposure over time. We present findings from two focus groups regarding the Project Calico vision and gathered insights from their overall impressions and projected use patterns. △ Less

Submitted 6 July, 2020; v1 submitted 27 June, 2020; originally announced June 2020.

Comments: 9 pages, 6 figures 1 table

Showing 1–50 of 61 results for author: Molly