The heterogeneity of applications and their divergent resource requirements lead to uneven traffic distribution and imbalanced resource utilization across data center networks (DCNs). We propose a fine-grained baseband function reallocation scheme in heterogeneous optical switching-based DCNs. A deep reinforcement learning-based functional split and resource mapping approach (DRL-BFM) is proposed to maximize throughput in high-load server racks by implementing load balancing in DCNs. The results demonstrate that DRL-BFM improves the throughput by 20.8%, 22.8%, and 29.8% on average compared to existing algorithms under different computational capacities, bandwidth constraints, and latency conditions, respectively.
【AIGC One Sentence Reading】:We propose DRL-BFM for optical DCNs, enhancing throughput by 20.8-29.8% via dynamic functional split and resource mapping.
【AIGC Short Abstract】:We propose a DRL-BFM approach for heterogeneous optical switching-based DCNs, which dynamically reallocates baseband functions and maps resources to balance load and maximize throughput. Results show significant improvements in throughput compared to existing algorithms.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
The tremendous increase in Internet traffic, mainly driven by large online applications such as high-resolution video streaming, virtual reality/augmented reality (VR/AR), and smart cities, leads to explosive demands on cloud computing in data center networks (DCNs)[1]. As the scale of DCN increases, traffic exhibits fast burstiness and high dynamics. Traditional electrical switches, however, struggle to support high-throughput, low-latency interactions due to electronic bottlenecks and power constraints[2]. Many optical switching-enabled technologies, such as semiconductor optical amplifiers (SOAs)[3], tunable lasers with arrayed wavelength grating (AWG) routers[4], and wavelength-selective switches (WSSs)[5], have been used to implement optical circuit switching. The high bandwidth of optical switches substantially lowers latency in hierarchical switching architectures. Additionally, switching data in the optical domain greatly decreases the power consumption by eliminating optical/electrical (OE) and electrical/optical (EO) conversions.
The centralized radio access network (C-RAN) has been regarded as a promising architecture in ultradense cellular networks. The main innovation introduced by C-RAN is that the baseband functions of the legacy base station are incorporated in the base band unit (BBU) pool, which is hosted by DCNs[6]. With massive radio access network (RAN) applications spanning diverse categories and varying resource requirements, computing tasks and traffic fluctuate significantly over time, causing uneven traffic distribution and imbalanced resource utilization across DCNs. For instance, some racks become overloaded, leading to degraded quality of service (QoS), while others remain underutilized. Additionally, the vast number of interconnect links and network elements in DCNs complicates network management. Therefore, an efficient network reconfiguration method is crucial for managing and optimally reallocating network resources to achieve workload balance.
However, migrating full-stack baseband functions across server racks requires substantial optical bandwidth, which limits flexibility in allocating computing and bandwidth resources. Network function virtualization (NFV) has been proposed as a solution to address these challenges by utilizing virtualization technology to consolidate various types of network equipment onto high-capacity servers, which can be deployed in data centers, network edges, and end-user premises. This enables the decoupling of virtual network functions (VNFs) from hardware devices, allowing them to operate as software applications on general-purpose hardware. The VNFs can be instantiated and mapped in different network locations, and they are typically interconnected through virtual networks, offering a level of flexibility that traditional hardware-based networking cannot provide. Standardization organizations such as the 3rd Generation Partnership Project (3GPP) and Next Generation Fronthaul Interface (NGFI) have highlighted the need to leverage virtualization-enabled functional splits to build scalable and cost-effective transport networks[7]. A virtual network embedding (VNE) algorithm is employed to jointly minimize cell interference and optimize bandwidth utilization by dynamically selecting optimal functional split points[8]. A joint optimization scheme for selecting functional split options and controlling power is proposed to maximize throughput[9]. A problem of isolation-aware RAN slice mapping is proposed that takes into account various functional splits within a wavelength division multiplexing (WDM) metro-aggregation network[10]. However, the use of integer linear programming (ILP) and heuristic algorithms makes it challenging to provide real-time and adaptive decisions.
Sign up for Chinese Optics Letters TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
Machine learning and artificial intelligence (ML/AI) have been extensively investigated and successfully deployed to address critical challenges in optical networks, such as transmission quality diagnostics and network performance optimization. Within this framework, deep reinforcement learning (DRL) agents autonomously learn decision-making strategies by iteratively approximating value or policy functions through interactions with high-dimensional state spaces. A DRL-based method is proposed to jointly select and map VNFs to provide agile and flexible network services[11]. The routing, modulation, and spectrum assignment (RMSA) problem is modeled as a Markov decision process (MDP) for learning the optimal online policies in elastic optical networks[12]. A QoS-aware approach is presented based on a DRL agent and traffic prediction[13].
In this paper, we model fine-grained baseband functions as a function chain (FC) and propose a DRL-based joint functional split and baseband function mapping algorithm (DRL-BFM) for DCNs. The objective is to maximize throughput in high-load racks by implementing load balancing through computational workload migration. Simulation results show that the proposed DRL-BFM algorithm improves average throughput by 20.8%, 22.8%, and 29.8% compared to benchmark algorithms under different computational capacities, bandwidth constraints, and latency requirements, respectively.
2. System Model
2.1 Heterogeneous optical switching-based DCN
In this work, we consider a heterogeneous optical switching based DCN (HOS-DCN) architecture[14]. In Fig. 1, a HOS-DCN architecture comprises clusters, and each cluster includes server racks. Each rack is equipped with an optical top of rack (ToR) and hosts multiple servers. The intracluster traffic is routed through an intracluster optical switch (IAOS), while the intercluster traffic is routed through an intercluster optical switch (IEOS). Both the IAOS and IEOS are designed using SOA-based optical gates[15]. The th IEOS is responsible for establishing optical connections between the th racks across all clusters. Therefore, HOS-DCN enables both inter-/intracluster traffic migration via a single hop.
In the control plane, we propose utilizing DRL agents to determine the placement of fine-grained FCs. These DRL agents are trained within the IAOS layer. Specifically, each IAOS gathers real-time status information from the server racks in its cluster, including available computational and bandwidth resources, which serve as input for training. Once trained, the IAOS distributes the DRL agents to each ToR within the cluster, ensuring uniform decision-making across all ToRs. Upon task arrival, the DRL agent at the ToR determines the optimal processing location for a function unit (FU). Additionally, the DRL agents configure both intracluster and intercluster optical switching, sending switching signals to the IAOS and IEOS. As a result, deploying an FC may require coordinated decision-making among DRL agents across multiple server racks.
Furthermore, an optical ToR should have the capability to support both intracluster and intercluster traffic migration. Its internal structure is illustrated in Fig. 2. As shown, an optical ToR consists of multiple OE/EO conversion devices, along with optical multiplexers and demultiplexers. Optical demultiplexers separate optical signals of different wavelengths and direct them to the OE devices. These OE devices detect downstream optical traffic and convert it into electrical signals, which are then transmitted to the servers for processing. In the upstream direction, Ethernet frames generated by the servers are temporarily stored in dedicated electric buffers and forwarded to either IAOSs or IEOSs based on the switching forwarding table configured by the DRL agents.
In this section, the baseband functions and the RAN application are modeled as an FC, where the fine-grained FUs are sequentially processed, followed by the execution of the RAN application. In Fig. 3, we present the fine-grained functional split options. As shown, the full-stack RAN functions are divided into seven FUs , where includes transmission/receiving functions and low-PHY functions, which are executed in the remote radio head (RRH) to reduce the fronthaul bandwidth consumption, while to are executed in the BBU pool. Furthermore, these FUs are interconnected through virtual links .
The execution of each FU of a RAN request requires a certain number of giga operations per second (GOPS) . We also denote as the processing load for the RAN application. The computational demand (i.e., GOPS) to execute full-stack RAN functions for request can be defined as where refers to the number of antennas used for request in the RAN, indicates the modulation bits, represents the code rate, denotes the number of antenna layers, and is the number of resource blocks . A scaling factor, , is introduced to represent the proportion of the computational demand of relative to the full-stack RAN functions[16,17]. Thus, the computational demand (i.e., GOPS) of function is calculated as .
The execution of two consecutive FUs on two different racks will have specific bandwidth demands to map the virtual link between them. Let denote the data rate of the virtual link . The data rate of is calculated as follows (Mbps): where represents the symbols per subframe and denotes the number of IQ bits. According to the 3GPP document[18], to follows a proportional relationship with and can be expressed as .
2.3 Resource mapping in HOS-DCNs
In this section, we present the resource mapping framework managed by the NFV-management and orchestration (NFV-MANO) plane, as illustrated in Fig. 4. The framework consists of three layers: the service function layer, the physical infrastructure layer, and the virtualization layer. The service function layer defines radio processing functionalities as FUs, which are flexibly distributed across server racks. The physical infrastructure layer comprises computing, storage, and networking hardware, including servers, storage devices, and fiber links. The virtualization layer serves as a software platform, abstracting underlying physical resources into isolated virtual environments that provide virtualized computing, networking, and storage resources. The virtual environments for the virtual computing and communication resources are virtual machines (VMs) and virtual fiber links (VFLs), which can create isolated virtual environments for hosting different FUs and virtual links. In our work, it is assumed that each FU is instantiated on a single VM , and each rack contains six types of VMs. Besides, each rack has intracluster VFLs and intercluster VFLs. A virtual link is instantiated on either an intracluster VFL or an intercluster VFL . To deploy an FC, fine-grained FUs and virtual links must be dynamically and orderly mapped together.
In Fig. 4, we consider two FCs (FC1 and FC2) that initially arrive at Rack 1. Given the high workload in Rack 1, these FCs must be reallocated to other racks to balance the computational load and optimize resource utilization. Based on the control signals from the DRL agents, FC1 is divided into two parts, with FU2 mapped onto VM #2 and processed in Rack 1, while FU3 to FU7 are mapped onto VM #3 to VM #7 and processed in Rack 2. The virtual link between FU2 and FU3 is mapped onto VL in Rack 1. FC2 is divided into three parts, with FU2 mapped onto VM #2 and processed in Rack 1, while FU3 and FU4 are mapped onto VM #3 and VM #4 for processing in Rack 3. Additionally, FU5 to FU7 are mapped onto VM #5 to VM #7 for processing in Rack 4. The virtual link of FC2 is mapped onto VL in Rack 1 and is mapped onto VL in Rack 3.
3. Proposed Algorithm
In this section, we first define the optimization problem and formulate the dynamic functional split and mapping process as a Markov decision process (MDP). Subsequently, we introduce the DRL-BFM approach to execute the FC mapping process, utilizing the proximal policy optimization (PPO) algorithm to train the DRL agents effectively.
3.1 Problem definition
The dynamic functional split and mapping problem can be defined as: Given the DCN topology (including the locations of ToRs, IAOSs, and IEOSs), set of high-load server racks, set of service requests, computational capacity of server racks, and bandwidth capacity of fiber links interconnecting two ToRs; Decide the optimal mapping of FCs; To maximize the throughput of high-load server racks through balancing the DCN workloads; Subject to constraints such as QoS requirements, computational capacities, and bandwidth capacities. The mapping of fine-grained FUs and virtual links is a complex decision-making problem and is known to be NP-hard. In the following, we formulate an MDP and present a DRL-based approach to solve this problem.
3.2 MDP formulation
For a service request , the fine-grained baseband functions are modeled as an FC , where represents FU . Let denote the source rack of . In our problem, is processed in RRHs, while the placement of to is determined by DRL agents. Specifically, let denote the rack where is processed, and the placement of its subsequent FU is determined by the agent at . We formulate this process as an MDP.
1)Observation: The observation should provide the DRL agents with enough information to accurately capture the state of the environment. The input state for allocating is denoted as and can be expressed as where represents the set of computational capacities of intracluster racks and intercluster racks connecting to . represents the set of bandwidth capacities for fiber links interconnecting to intracluster ToRs and to intercluster ToRs. represents the set of latency required for migrating from to intra-cluster ToRs and intercluster ToRs. denotes the latency threshold for placing and its subsequent FUs. and represent the computational and bandwidth demands for allocating . Note that , , and are each normalized individually.
2)Action: The action space includes all potential actions an agent can take within a given environment. In DRL-BFM, an action represents the allocation of to a specific rack, within the same cluster with via IAOS or in different clusters with via IEOS. The set also includes action , which signifies the termination of an FC when the constraints on computational capacity, bandwidth, and latency are violated.
3)State transition: After processing , the MDP transitions to a new state in one of two scenarios:
•If is not the last FU in , and none of the constraints are violated, then the action is valid. Thus, the new state needs to be set and used as input to the DRL agent at for allocating . The changes in computational capacity, bandwidth capacity, and latency incurred by processing are updated, after which can be calculated.
•If is the last FU in , a new FC will begin to be allocated. Since in is processed directly in the source rack, the next action is to process based on a new state . Similar to the previous scenario, resource changes incurred by processing are updated.
4)Reward function: Let denote an action that is selected to process . The reward value for processing by taking is defined as where represents the normalized value of selecting in , and represents the normalized value of the virtual link between and in . If , is calculated as the maximum value in . Therefore, the greater the remaining computational capacity in , the larger the value of . Similarly, a higher remaining bandwidth capacity on the selected virtual link results in an increased value of . The third term, , and the fourth term, , represent penalties for exceeding latency thresholds and bandwidth capacities, respectively.
3.3 DRL-BFM algorithm
PPO is a policy gradient algorithm based on the actor-critic (AC) architecture and has been proven effective for training DRL agents[19]. The AC network consists of an actor module and a critic module. The actor module parameterizes the policy function to select optimal actions, while the critic module parameterizes the value function to evaluate the selected actions. In our work, PPO is used to train DRL agents for dynamic selecting functional splitting options and allocating FCs.
Algorithm 1 presents the pseudo-code of the PPO-based training process. Service requests arrive sequentially. The computational demand for processing , the bandwidth demand for migrating , as well as the latency threshold , serve as inputs to Algorithm 1. Each cluster trains a single agent, and the weights of the actor and critic networks are initialized before the algorithm begins. Line 1 starts the main loop of the training process, which is divided into two parts. The first part outlines the process of generating samples through interaction with the environment (lines 2–19). The training is started by initializing the experience replay buffer and an iteration is terminated once the computational resources of the ESs are fully utilized (lines 2–3). To reallocate an FC , is processed in the source rack (lines 4–6). Next, identify the cluster where request is currently located. The network controller will execute policy to choose an action and obtain the reward (lines 7–14). Notably, the mapping process of different FUs within an FC can be implemented by agents in different clusters. The state transition tuple is then collected and stored in the replay buffer (lines 15–17). The second part illustrates the learning process (lines 20–35). Let denote the number of minibatches. For each minibatch , the generalized advantage (GAE) is used to calculate the advantage function (line 23). The state-action probability under the old policy is also calculated (line 24). Then, the strategy gradient and the value function estimator are calculated to update actor and critic module parameters, respectively (lines 26–31)[20]. When network configurations change significantly, agents must be retrained to adapt to new states or action spaces.
Once the training is complete, the DRL agents are deployed in the ToRs, and the DRL-BFM algorithm is applied to jointly manage the functional split and execute the mapping process of fine-grained FUs in HOS-DCNs. The pseudo-code of the DRL-BFM algorithm is described in Algorithm 2. Service requests and trained DRL agents serve as inputs to the DRL-BFM algorithm. Before executing each FU , the current cluster is identified (line 3). Then, the network state for processing is calculated and fed into the DRL agent , which determines the rack where should be executed (lines 4–5). Subsequently, is mapped onto VM and processed in and the remaining computational capacity of is updated (lines 7–8). If and are not the same and belong to the same cluster, the virtual link between and is mapped onto the VAL in (lines 9-12). On the other hand, if and belong to different clusters, the virtual link between and is mapped onto the VEL in (lines 13–16). Additionally, if is not a valid action (i.e., latency or bandwidth requirements are not met), will continue to be executed in (lines 18–20).
In Algorithm 2, the FUs of each service are executed sequentially. An FU can either be processed on the same server as the previous FU or migrated to another server within the same cluster or across clusters. When two consecutive FUs are executed on different servers, functional splitting occurs, necessitating the mapping of their virtual link. This flexible functional splitting allows any two FUs to be split, enabling dynamic adaptation to resource availability. Once a split occurs, both the FU placement and the corresponding virtual link mapping must be optimized to ensure efficient resource utilization and seamless service execution.
4. Performance Evaluation
In this section, the learning environment is constructed using OpenAI Gym, and both the actor and critic networks are trained with PyTorch 2.0.1. Service requests are transmitted from RRHs to the BBU pool (hosted at server racks within the DCN) sequentially. We assume that the service arrival rate in high-load racks is 5 times higher than that in other racks. The number of resource blocks per task is selected from [50,100], and the number of antennas per task is randomly selected from [2,4]. Several studies present parameters for estimating the computational and bandwidth demands for different functional splits[9]. Initially, the workload in high-load racks is set between [2000,2500] GOPS, while in other racks, it ranges between [1000,1500] GOPS. The initial data rate for the fiber link interconnecting two ToRs is set between [0.5,1] Gbps. The hyperparameters for training policy and critic networks are summarized in Table 1.
We evaluate the performance of DRL-BFM by comparing it with the following heuristic algorithms: i) RBM (random-based mapping): Randomly selects a rack from those meeting the resource requirements for FU mapping. ii) LCBM (least-loaded computing-based mapping): Selects the least-loaded rack that satisfies the resource demands for processing an FU. iii) LBFO (least-loaded bandwidth-focused optimization): Chooses the rack that is connected via the least-loaded fiber link to the current rack while meeting the computing resource requirements.
Figure 5 illustrates the throughput in high-load racks, measured by the number of successfully processed service requests as a function of increasing computational capacities using DRL-BFM, RBM, LCBM, and LBFO algorithms. As shown, DRL-BFM achieves an average throughput increase of 30.6%, 5.5%, and 26.4% compared to RBM, LCBM, and LBFO algorithms, respectively. Furthermore, as computational capacity increases, the performance gap between DRL-BFM and the other algorithms (RBM, LCBM, and LBFO) widens. This is because a higher computational capacity allows more FCs to migrate between ToRs, thereby amplifying the constraints imposed by fiber bandwidth capacities. These results demonstrate that our proposed DRL-BFM effectively improves throughput in high-load server racks under varying computational capacities.
Figure 5.Comparison of throughput with increasing computational capacities using DRL-BFM, RBM, LCBM, and LBFO.
Figure 6 shows the throughput in high-load racks with increasing bandwidth capacities using DRL-BFM, RBM, LCBM, and LBFO. As shown, the computational capacity is set to 7000 GOPS. DRL-BFM achieves an average throughput improvement of 32%, 4.5%, and 32.1% compared to RBM, LCBM, and LBFO, respectively. It can be observed that as the bandwidth capacity increases to 6.5 Gbps, the throughput performance gap between LCBM and DRL-BFM gradually narrows. This is because higher computational capacity enables more FCs to migrate between ToRs, which in turn intensifies the constraints imposed by fiber bandwidth limitations. The results demonstrate that DRL-BFM effectively enhances throughput in high-load server racks under varying bandwidth capacities.
Figure 6.Comparison of throughput with increasing bandwidth capacities using DRL-BFM, RBM, LCBM, and LBFO.
High-latency services allow traffic to be reallocated among server racks. Figure 7 shows the throughput in high-load racks with increasing ratios of high-latency services. The computational and bandwidth capacities are set to 7000 GOPS and 4 Gbps, respectively. The results show that DRL-BFM improves the throughput by 38.1%, 11.5%, and 40.0% compared to RBM, LCBM, and LBFO algorithms, respectively. It can also be observed that as the ratio of high-latency services increases, the performance gap between DRL-BFM and the other algorithms widens. These results indicate that the proposed DRL-BFM effectively enables traffic migration for a larger number of services, enhancing throughput in high-load server racks.
Figure 7.Comparison of throughput with increasing ratios of high-latency services using DRL-BFM, RBM, LCBM, and LBFO.
We use the load balancing factor (LBF), which quantifies the average relative deviation of the workloads across all racks, to reflect the load balancing performance in DCNs. Figure 8 shows the LBF values as the number of service requests increases using the DRL-BFM, RBM, LCBM, and LBFO algorithms. Each algorithm runs 10 times, and the average results are presented. As shown, LCBM achieves an average LBF that is 5.1%, 44.7%, and 47.3% lower than that calculated by DRL-BFM, RBM, and LBFO, respectively. The reason LCBM outperforms the other algorithms is that it greedily selects low-load server racks for offloading FCs. The small gap between DRL-BFM and LCBM indicates that the proposed DRL-BFM approach effectively balances DCN workloads while improving throughput.
Figure 8.Load balancing performance using DRL-BFM, RBM, LCBM, and LBFO.
In summary, we have developed a DRL-BFM approach to jointly select functional splits and map fine-grained baseband functions in a heterogeneous optical switching-based DCN. We conducted simulations to compare the throughput and load-balancing performance of DRL-BFM with other heuristic algorithms under varying computational capacities, bandwidth capacities, and latency conditions. The results demonstrate that the proposed DRL-BFM approach achieves higher throughput in high-load racks while effectively balancing the DCN workloads.
[19] J. Schulman, F. Wolski, P. Dhariwal et al. Proximal policy optimization algorithms(2017).
[20] J. Ziazet, M. Junio, B. Jaumard. Deep reinforcement learning for network provisioning in elastic optical networks. IEEE International Conference on Communications (ICC), 1(2022).
Tools
Get Citation
Copy Citation Text
Bo Tian, Shanting Hu, Qi Zhang, Xiaofei Huang, Lei Zhu, Huan Chang, Xiaolong Pan, Xiangjun Xin, "Virtualization-enabled dynamic functional split and resource mapping in optical data center networks [Invited]," Chin. Opt. Lett. 23, 050001 (2025)