Skip to main content

Showing 1–1 of 1 results for author: Vijapurapu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.03945  [pdf, other

    cs.LG cs.AI

    Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks

    Authors: Ryan Campbell, Nelson Lojo, Kesava Viswanadha, Christoffer Grondal Tryggestad, Derrick Han Sun, Sriteja Vijapurapu, August Rolfsen, Anant Sahai

    Abstract: In-Context Learning (ICL) is a phenomenon where task learning occurs through a prompt sequence without the necessity of parameter updates. ICL in Multi-Headed Attention (MHA) with absolute positional embedding has been the focus of more study than other sequence model varieties. We examine implications of architectural differences between GPT-2 and LLaMa as well as LlaMa and Mamba. We extend work… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 18 pages, 16 figures