Skip to main content

Showing 1–6 of 6 results for author: Yin, J O

.
  1. arXiv:2506.15553  [pdf, ps, other

    cs.CL

    Approximating Language Model Training Data from Weights

    Authors: John X. Morris, Junjie Oscar Yin, Woojeong Kim, Vitaly Shmatikov, Alexander M. Rush

    Abstract: Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering useful data given only weights of the original and finetuned models.… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2410.16208  [pdf, other

    cs.LG cs.AI cs.CL

    Compute-Constrained Data Selection

    Authors: Junjie Oscar Yin, Alexander M. Rush

    Abstract: Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute. Motivated by the practical challenge of compute-constrained finetuning, we consider the setting in which both the cost of selecting data and training are budgeted for. We first formalize the problem of data selection with a cost-aware utility func… ▽ More

    Submitted 7 April, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: Published as a conference paper at ICLR 2025

  3. Information criteria for efficient quantum state estimation

    Authors: J. O. S. Yin, S. J. van Enk

    Abstract: Recently several more efficient versions of quantum state tomography have been proposed, with the purpose of making tomography feasible even for many-qubit states. The number of state parameters to be estimated is reduced by tentatively introducing certain simplifying assumptions on the form of the quantum state, and subsequently using the data to rigorously verify these assumptions. The simplifyi… ▽ More

    Submitted 30 March, 2011; v1 submitted 16 March, 2011; originally announced March 2011.

  4. Criteria for reliable entanglement quantification with finite data

    Authors: Jun O. S. Yin, Steven J. van Enk

    Abstract: We propose one and a half criteria for determining how many measurements are needed to quantify entanglement reliably. We base these criteria on Bayesian analysis of measurement results, and apply our methods to four-qubit entanglement, but generalizations to more qubits are straightforward.

    Submitted 10 January, 2011; v1 submitted 11 November, 2010; originally announced November 2010.

    Comments: >42

  5. Entanglement verification with finite data

    Authors: Robin Blume-Kohout, Jun O. S. Yin, S. J. van Enk

    Abstract: Suppose an experimentalist wishes to verify that his apparatus produces entangled quantum states. A finite amount of data cannot conclusively demonstrate entanglement, so drawing conclusions from real-world data requires statistical reasoning. We propose a reliable method to quantify the weight of evidence for (or against) entanglement, based on a likelihood ratio test. Our method is universal… ▽ More

    Submitted 30 April, 2010; originally announced May 2010.

    Comments: 4 pages, 3 pretty pictures

    Journal ref: Phys. Rev. Lett. 105, 170501 (2010)

  6. Entanglement and purity of single- and two-photon states

    Authors: Jun O. S. Yin, S. J. van Enk

    Abstract: Whereas single- and two-photon wave packets are usually treated as pure states, in practice they will be mixed. We study how entanglement created with mixed photon wave packets is degraded. We find in particular that the entanglement of a delocalized single-photon state of the electro-magnetic field is determined simply by its purity. We also discuss entanglement for two-photon mixed states, as… ▽ More

    Submitted 20 May, 2008; v1 submitted 6 March, 2008; originally announced March 2008.

    Comments: 11 pages, 10 figures, 1 debuting author

    Journal ref: Phys. Rev. A 77, 062333 (2008)