Skip to main content

Showing 1–6 of 6 results for author: Yousuf, R B

.
  1. arXiv:2502.15177  [pdf, other

    cs.LG cs.CY

    Optimizing Product Provenance Verification using Data Valuation Methods

    Authors: Raquib Bin Yousuf, Hoang Anh Just, Shengzhe Xu, Brian Mayer, Victor Deklerck, Jakub Truszkowski, John C. Simeone, Jade Saunders, Chang-Tien Lu, Ruoxi Jia, Naren Ramakrishnan

    Abstract: Determining and verifying product provenance remains a critical challenge in global supply chains, particularly as geopolitical conflicts and shifting borders create new incentives for misrepresentation of commodities, such as hiding the origin of illegally harvested timber or agriculture grown on illegally cleared land. Stable Isotope Ratio Analysis (SIRA), combined with Gaussian process regressi… ▽ More

    Submitted 16 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  2. arXiv:2502.14115  [pdf, other

    cs.LG cs.CE cs.CY

    Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation

    Authors: Shailik Sarkar, Raquib Bin Yousuf, Linhan Wang, Brian Mayer, Thomas Mortier, Victor Deklerck, Jakub Truszkowski, John C. Simeone, Marigold Norman, Jade Saunders, Chang-Tien Lu, Naren Ramakrishnan

    Abstract: Illegal logging poses a significant threat to global biodiversity, climate stability, and depresses international prices for legal wood harvesting and responsible forest products trade, affecting livelihoods and communities across the globe. Stable isotope ratio analysis (SIRA) is rapidly becoming an important tool for determining the harvest location of traded, organic, products. The spatial patt… ▽ More

    Submitted 16 March, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 9 pages, 5 figures

    ACM Class: J.m; K.4.1; I.2.0; J.2

  3. arXiv:2411.16116  [pdf, other

    cs.CL cs.AI

    LLM Augmentations to support Analytical Reasoning over Multiple Documents

    Authors: Raquib Bin Yousuf, Nicholas Defelice, Mandar Sharma, Shengzhe Xu, Naren Ramakrishnan

    Abstract: Building on their demonstrated ability to perform a variety of tasks, we investigate the application of large language models (LLMs) to enhance in-depth analytical reasoning within the context of intelligence analysis. Intelligence analysts typically work with massive dossiers to draw connections between seemingly unrelated entities, and uncover adversaries' plans and motives. We explore if and ho… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 2024 IEEE International Conference on Big Data (IEEE BigData 2024)

  4. arXiv:2406.14541  [pdf, other

    cs.LG

    Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)

    Authors: Shengzhe Xu, Cho-Ting Lee, Mandar Sharma, Raquib Bin Yousuf, Nikhil Muralidhar, Naren Ramakrishnan

    Abstract: Synthetic data generation is integral to ML pipelines, e.g., to augment training data, replace sensitive information, and even to power advanced platforms like DeepSeek. While LLMs fine-tuned for synthetic data generation are gaining traction, synthetic table generation -- a critical data type in business and science -- remains under-explored compared to text and image synthesis. This paper shows… ▽ More

    Submitted 13 March, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.14005  [pdf, other

    cs.CL cs.AI cs.LG

    Information Guided Regularization for Fine-tuning Language Models

    Authors: Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan

    Abstract: The pretraining-fine-tuning paradigm has been the de facto strategy for transfer learning in modern language modeling. With the understanding that task adaptation in LMs is often a function of parameters shared across tasks, we argue that a more surgical approach to regularization needs to exist for smoother transfer learning. Towards this end, we investigate how the pretraining loss landscape is… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2207.04029  [pdf, other

    cs.IR cs.AI

    Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions

    Authors: Raquib Bin Yousuf, Subhodip Biswas, Kulendra Kumar Kaushal, James Dunham, Rebecca Gelles, Sathappan Muthiah, Nathan Self, Patrick Butler, Naren Ramakrishnan

    Abstract: Understanding key insights from full-text scholarly articles is essential as it enables us to determine interesting trends, give insight into the research and development, and build knowledge graphs. However, some of the interesting key insights are only available when considering full-text. Although researchers have made significant progress in information extraction from short documents, extract… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: ACM KDD 2022 Workshop on Data-driven Science of Science

    ACM Class: I.2; I.2.7; H.3