Skip to main content

Showing 1–1 of 1 results for author: Karacali, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.17078  [pdf, other

    cs.NI cs.DC

    FlowTracer: A Tool for Uncovering Network Path Usage Imbalance in AI Training Clusters

    Authors: Hasibul Jamil, Abdul Alim, Laurent Schares, Pavlos Maniotis, Liran Schour, Ali Sydney, Abdullah Kayi, Tevfik Kosar, Bengi Karacali

    Abstract: The increasing complexity of AI workloads, especially distributed Large Language Model (LLM) training, places significant strain on the networking infrastructure of parallel data centers and supercomputing systems. While Equal-Cost Multi- Path (ECMP) routing distributes traffic over parallel paths, hash collisions often lead to imbalanced network resource utilization and performance bottlenecks. T… ▽ More

    Submitted 24 October, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Submitted for peer reviewing in IEEE ICC 2025