Study On Optical Communications, Volume. 50, Issue 5, 24004901(2024)

Application of Reconfigurable OCS Technology for Pre-training Large Language Models

Chen ZHU*, Xu ZHOU, and Peilong WANG
Author Affiliations
  • System Department, Baidu Online Network Technology Co., Ltd., Beijing 100085, China
  • show less

    【Objective】

    Compared to Electronic Packet Switching (EPS), Optical Circuit Switching (OCS) demonstrates advantages in latency, power consumption, cost, and stability. This study aims to explore feasible applications of OCS in the networking of training tasks by analyzing parallel partitioning strategies, collective communication requirements, traffic patterns, and current network architectures in large model pretraining, in order to fully leverage the benefits of OCS.

    【Methods】

    We propose a mechanism for network device redundancy protection using multiple small-port OCS devices, enabling rapid switching without interrupting training tasks in the event of Top-of-Rack (ToR) switch failures. Additionally, we advocate for the exclusive service of OCS to data parallelism, requiring configuration only at the start of the task.

    【Results】

    We present several feasible opto-electronic networking architectures and specific configurations under different AllReduce algorithms, including joint optimization of collective communication algorithms and architectural design to achieve optimal bandwidth.

    【Conclusion】

    By adequately integrating the traffic models of training tasks, OCS can seamlessly blend into existing EPS network architectures and optimize the large model pretraining from multiple perspectives, including cost, low power consumption, reduced latency, and enhanced stability.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Chen ZHU, Xu ZHOU, Peilong WANG. Application of Reconfigurable OCS Technology for Pre-training Large Language Models[J]. Study On Optical Communications, 2024, 50(5): 24004901

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Mar. 1, 2024

    Accepted: --

    Published Online: Oct. 15, 2024

    The Author Email: ZHU Chen (zhuchen06@baidu.com)

    DOI:10.13756/j.gtxyj.2024.240049

    Topics