Federated Learning (FL) enables multiple devices (clients) training a shared machine learning (ML) model on local datasets and then sending the updated models to a central server, whose task is aggregating the locally-computed updates and sharing the learned global model again with the clients in an iterative process. The population of clients may change at each round, whereas the node executing the aggregation function is typically placed at an edge domain and remains static until the end of the overall FL training process. Indeed, the computing capabilities of the edge node hosting the aggregation function and the distance (latency) of such a node from the selected clients can highly affect the convergence rate of the FL training procedure. Moreover, the heterogeneous time-varying capabilities of edge nodes, coupled with the dynamic client population selected at each round, call for the optimal dynamic placement of the aggregation function across the available nodes in an edge domain. In this work, we formulate an optimization problem for the placement of the FL aggregation function, which aims to select at each round the edge node able to minimize the overall per-round training time, encompassing the aggregation time, the local training time at the clients and the time for exchanging the global model and the model updates. A time-efficient greedy heuristics is proposed, which is shown to well approximate the optimal solution and outperform the considered benchmark solutions.
Optimal Placement of the Virtualized Federated Learning Aggregation Function at the Edge / Ruggeri, G.; Amadeo, M.; Campolo, C.; Molinaro, A.. - In: IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT. - ISSN 1932-4537. - (2025), pp. 1-1. [10.1109/TNSM.2025.3551257]
Optimal Placement of the Virtualized Federated Learning Aggregation Function at the Edge
Ruggeri G.;Amadeo M.;Campolo C.;Molinaro A.
2025-01-01
Abstract
Federated Learning (FL) enables multiple devices (clients) training a shared machine learning (ML) model on local datasets and then sending the updated models to a central server, whose task is aggregating the locally-computed updates and sharing the learned global model again with the clients in an iterative process. The population of clients may change at each round, whereas the node executing the aggregation function is typically placed at an edge domain and remains static until the end of the overall FL training process. Indeed, the computing capabilities of the edge node hosting the aggregation function and the distance (latency) of such a node from the selected clients can highly affect the convergence rate of the FL training procedure. Moreover, the heterogeneous time-varying capabilities of edge nodes, coupled with the dynamic client population selected at each round, call for the optimal dynamic placement of the aggregation function across the available nodes in an edge domain. In this work, we formulate an optimization problem for the placement of the FL aggregation function, which aims to select at each round the edge node able to minimize the overall per-round training time, encompassing the aggregation time, the local training time at the clients and the time for exchanging the global model and the model updates. A time-efficient greedy heuristics is proposed, which is shown to well approximate the optimal solution and outperform the considered benchmark solutions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.