Skip to main content

Showing 1–24 of 24 results for author: Glenski, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.03542  [pdf, other

    cs.CL cs.AI

    Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry

    Authors: Anurag Acharya, Shivam Sharma, Robin Cosbey, Megha Subramanian, Scott Howland, Maria Glenski

    Abstract: A proliferation of Large Language Models (the GPT series, BLOOM, LLaMA, and more) are driving forward novel development of multipurpose AI for a variety of tasks, particularly natural language processing (NLP) tasks. These models demonstrate strong performance on a range of tasks; however, there has been evidence of brittleness when applied to more niche or narrow domains where hallucinations or f… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  2. arXiv:2204.07203  [pdf, other

    cs.AI

    EXPERT: Public Benchmarks for Dynamic Heterogeneous Academic Graphs

    Authors: Sameera Horawalavithana, Ellyn Ayton, Anastasiya Usenko, Shivam Sharma, Jasmine Eshun, Robin Cosbey, Maria Glenski, Svitlana Volkova

    Abstract: Machine learning models that learn from dynamic graphs face nontrivial challenges in learning and inference as both nodes and edges change over time. The existing large-scale graph benchmark datasets that are widely used by the community primarily focus on homogeneous node and edge attributes and are static. In this work, we present a variety of large scale, dynamic heterogeneous academic graphs t… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  3. arXiv:2203.07640  [pdf, other

    cs.CL

    Unsupervised Keyphrase Extraction via Interpretable Neural Networks

    Authors: Rishabh Joshi, Vidhisha Balachandran, Emily Saldanha, Maria Glenski, Svitlana Volkova, Yulia Tsvetkov

    Abstract: Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via embedding clustering or graph centrality, requiring extensive domain expertise. Our work presents a simple alternative approach which defines keyphrases as docum… ▽ More

    Submitted 17 February, 2023; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Accepted at EACL 2023

  4. arXiv:2110.07938  [pdf, other

    cs.CL cs.CY

    Identifying Causal Influences on Publication Trends and Behavior: A Case Study of the Computational Linguistics Community

    Authors: Maria Glenski, Svitlana Volkova

    Abstract: Drawing causal conclusions from observational real-world data is a very much desired but challenging task. In this paper we present mixed-method analyses to investigate causal influences of publication trends and behavior on the adoption, persistence, and retirement of certain research foci -- methodologies, materials, and tasks that are of interest to the computational linguistics (CL) community.… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted to First Workshop on Causal Inference & NLP at EMNLP 2021

  5. VAINE: Visualization and AI for Natural Experiments

    Authors: Grace Guo, Maria Glenski, ZhuanYi Shaw, Emily Saldanha, Alex Endert, Svitlana Volkova, Dustin Arendt

    Abstract: Natural experiments are observational studies where the assignment of treatment conditions to different populations occurs by chance "in the wild". Researchers from fields such as economics, healthcare, and the social sciences leverage natural experiments to conduct hypothesis testing and causal effect estimation for treatment and outcome variables that would otherwise be costly, infeasible, or un… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: 5 pages, 4 figures, accepted as short paper at IEEE VIS 2021

  6. arXiv:2104.13490  [pdf, other

    cs.CY

    Leveraging Community and Author Context to Explain the Performance and Bias of Text-Based Deception Detection Models

    Authors: Galen Weld, Ellyn Ayton, Tim Althoff, Maria Glenski

    Abstract: Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors -- the context of how and where content is posted -- to explain the performance of a neural network deception detection model and identify sub-populations who are disproporti… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

  7. arXiv:2104.11761  [pdf, other

    cs.CL

    Towards Trustworthy Deception Detection: Benchmarking Model Robustness across Domains, Modalities, and Languages

    Authors: Maria Glenski, Ellyn Ayton, Robin Cosbey, Dustin Arendt, Svitlana Volkova

    Abstract: Evaluating model robustness is critical when developing trustworthy models not only to gain deeper understanding of model behavior, strengths, and weaknesses, but also to develop future models that are generalizable and robust across expected environments a model may encounter in deployment. In this paper we present a framework for measuring model robustness for an important but difficult text cla… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Journal ref: Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM). 2020

  8. arXiv:2104.11729  [pdf, other

    cs.CL

    Evaluating Deception Detection Model Robustness To Linguistic Variation

    Authors: Maria Glenski, Ellyn Ayton, Robin Cosbey, Dustin Arendt, Svitlana Volkova

    Abstract: With the increasing use of machine-learning driven algorithmic judgements, it is critical to develop models that are robust to evolving or manipulated inputs. We propose an extensive analysis of model robustness against linguistic variation in the setting of deceptive news detection, an important task in the context of misinformation spread online. We consider two prediction tasks and compare thre… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

  9. arXiv:2102.08537  [pdf, other

    cs.CY

    Political Bias and Factualness in News Sharing across more than 100,000 Online Communities

    Authors: Galen Weld, Maria Glenski, Tim Althoff

    Abstract: As civil discourse increasingly takes place online, misinformation and the polarization of news shared in online communities have become ever more relevant concerns with real world harms across our society. Studying online news sharing at scale is challenging due to the massive volume of content which is shared by millions of users across thousands of communities. Therefore, existing research has… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 12 pages, 7 figures. Published at ICWSM 2021

  10. arXiv:2101.01793  [pdf, ps, other

    cs.SI

    Behavior Change in Response to Subreddit Bans and External Events

    Authors: Pamela Bilo Thomas, Daniel Riehm, Maria Glenski, Tim Weninger

    Abstract: As more people flock to social media to connect with others and form virtual communities, it is important to research how members of these groups interact to understand human behavior on the Web. In response to an increase in hate speech, harassment and other antisocial behaviors, many social media companies have implemented different content and user moderation policies. On Reddit, for example, c… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

  11. arXiv:2009.12924  [pdf, other

    cs.HC cs.AI

    Measure Utility, Gain Trust: Practical Advice for XAI Researcher

    Authors: Brittany Davis, Maria Glenski, William Sealy, Dustin Arendt

    Abstract: Research into the explanation of machine learning models, i.e., explainable AI (XAI), has seen a commensurate exponential growth alongside deep artificial neural networks throughout the past decade. For historical reasons, explanation and trust have been intertwined. However, the focus on trust is too narrow, and has led the research community astray from tried and true empirical methods that prod… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

    Comments: To appear in TREX 2020: Workshop on TRust and EXperience in Visual Analytics. https://trexvis.github.io/Workshop2020/

  12. arXiv:2009.09961  [pdf, other

    cs.CL

    Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

    Authors: Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff

    Abstract: Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by… ▽ More

    Submitted 6 May, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: to appear at ICWSM 2022

  13. arXiv:2004.07993  [pdf, other

    cs.HC

    CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

    Authors: Dustin Arendt, Zhuanyi Huang, Prasha Shrestha, Ellyn Ayton, Maria Glenski, Svitlana Volkova

    Abstract: Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  14. arXiv:1909.05838  [pdf, other

    cs.SI

    Multilingual Multimodal Digital Deception Detection and Disinformation Spread across Social Platforms

    Authors: Maria Glenski, Ellyn Ayton, Josh Mendoza, Svitlana Volkova

    Abstract: Our main contribution in this work is novel results of multilingual models that go beyond typical applications of rumor or misinformation detection in English social news content to identify fine-grained classes of digital deception across multiple languages (e.g. Russian, Spanish, etc.). In addition, we present models for multimodal deception detection from images and text and discuss the limitat… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

  15. arXiv:1907.00558  [pdf, ps, other

    q-fin.ST cs.LG cs.SI stat.ML

    Improved Forecasting of Cryptocurrency Price using Social Signals

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: Social media signals have been successfully used to develop large-scale predictive and anticipatory analytics. For example, forecasting stock market prices and influenza outbreaks. Recently, social data has been explored to forecast price fluctuations of cryptocurrencies, which are a novel disruptive technology with significant political and economic implications. In this paper we leverage and con… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  16. Propagation from Deceptive News Sources: Who Shares, How Much, How Evenly, and How Quickly?

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: As people rely on social media as their primary sources of news, the spread of misinformation has become a significant concern. In this large-scale study of news in social media we analyze eleven million posts and investigate propagation behavior of users that directly interact with news accounts identified as spreading trusted versus malicious content. Unlike previous work, which looks at specifi… ▽ More

    Submitted 9 December, 2018; originally announced December 2018.

    Comments: 12 pages, 6 figures, 7 tables, published in IEEE TCSS December 2018

    Journal ref: IEEE Transactions on Computational Social Systems ( Volume: 5 , Issue: 4 , Dec. 2018 )

  17. arXiv:1809.00740  [pdf, other

    cs.HC

    GuessTheKarma: A Game to Assess Social Rating Systems

    Authors: Maria Glenski, Greg Stoddard, Paul Resnick, Tim Weninger

    Abstract: Popularity systems, like Twitter retweets, Reddit upvotes, and Pinterest pins have the potential to guide people toward posts that others liked. That, however, creates a feedback loop that reduces their informativeness: items marked as more popular get more attention, so that additional upvotes and retweets may simply reflect the increased attention and not independent information about the fracti… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: 15 pages, 7 figures, accepted to CSCW 2018

  18. arXiv:1807.05327  [pdf, ps, other

    cs.SI

    How Humans versus Bots React to Deceptive and Trusted News Sources: A Case Study of Active Users

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: Society's reliance on social media as a primary source of news has spawned a renewed focus on the spread of misinformation. In this work, we identify the differences in how social media accounts identified as bots react to news sources of varying credibility, regardless of the veracity of the content those sources have shared. We analyze bot and human responses annotated using a fine-grained model… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

  19. arXiv:1805.12032  [pdf, ps, other

    cs.CL

    Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: In the age of social news, it is important to understand the types of reactions that are evoked from news sources with various levels of credibility. In the present work we seek to better understand how users react to trusted and deceptive news sources across two popular, and very different, social media platforms. To that end, (1) we develop a model to classify user reactions into one of nine typ… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

  20. arXiv:1710.06390  [pdf, other

    cs.LG cs.CL cs.SI

    Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models

    Authors: Maria Glenski, Ellyn Ayton, Dustin Arendt, Svitlana Volkova

    Abstract: This paper presents the results and conclusions of our participation in the Clickbait Challenge 2017 on automatic clickbait detection in social media. We first describe linguistically-infused neural network models and identify informative representations to predict the level of clickbaiting present in Twitter posts. Our models allow to answer the question not only whether a post is a clickbait or… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

    Comments: Pineapplefish Clickbait Detector, Clickbait Challenge 2017

  21. arXiv:1707.00195  [pdf, other

    cs.SI cs.HC

    Predicting User-Interactions on Reddit

    Authors: Maria Glenski, Tim Weninger

    Abstract: In order to keep up with the demand of curating the deluge of crowd-sourced content, social media platforms leverage user interaction feedback to make decisions about which content to display, highlight, and hide. User interactions such as likes, votes, clicks, and views are assumed to be a proxy of a content's quality, popularity, or news-worthiness. In this paper we ask: how predictable are the… ▽ More

    Submitted 1 July, 2017; originally announced July 2017.

    Comments: Presented at ASONAM 2017

  22. arXiv:1703.05267  [pdf, other

    cs.SI cs.HC

    Consumers and Curators: Browsing and Voting Patterns on Reddit

    Authors: Maria Glenski, Corey Pennycuff, Tim Weninger

    Abstract: As crowd-sourced curation of news and information become the norm, it is important to understand not only how individuals consume information through social news Web sites, but also how they contribute to their ranking systems. In the present work, we introduce and make available a new dataset containing the activity logs that recorded all activity for 309 Reddit users for one year. Using this new… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

    Comments: 16 pages, 12 figures, 2 tables

  23. arXiv:1606.06140  [pdf, other

    cs.SI cs.CY cs.HC

    Rating Effects on Social News Posts and Comments

    Authors: Maria Glenski, Tim Weninger

    Abstract: At a time when information seekers first turn to digital sources for news and opinion, it is critical that we understand the role that social media plays in human behavior. This is especially true when information consumers also act as information producers and editors through their online activity. In order to better understand the effects that editorial ratings have on online human behavior, we… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: 18 pages, 7 figures, accepted to ACM TIST

  24. arXiv:1506.01977  [pdf, ps, other

    cs.SI cs.CY cs.MM

    Random Voting Effects in Social-Digital Spaces: A case study of Reddit Post Submissions

    Authors: Maria Glenski, Thomas J. Johnston, Tim Weninger

    Abstract: At a time when information seekers first turn to digital sources for news and opinion, it is critical that we understand the role that social media plays in human behavior. This is especially true when information consumers also act as information producers and editors by their online activity. In order to better understand the effects that editorial ratings have on online human behavior, we repor… ▽ More

    Submitted 5 June, 2015; originally announced June 2015.

    Comments: Paper preprint accepted to 2015 ACM Hypertext Conference

    ACM Class: H.1.2; H.5.3