Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training

IRIS

We introduce Geometric-Entropic Optimization (GEO), an algorithm for neural network training that integrates Riemannian gradient methods with entropy-regularized optimal transport. The algorithm operates on a parameter manifold equipped with a combined Fisher-Wasserstein metric and incorporates Sinkhorn-type projections to enforce distributional constraints on layer activations. We establish convergence guarantees showing that GEO achieves an O(1/T)+O(ρ2K) rate, where the first term reflects Riemannian gradient descent and the second captures the contraction of Sinkhorn iterations. Computational experiments on continuous control tasks and language modeling demonstrate consistent improvements over standard optimizers, with performance gains of approximately 20% on benchmark tasks. The theoretical framework unifies recent architectural innovations in deep learning, including manifold-constrained connections and orthogonality-preserving updates within a coherent optimization-theoretic perspective rooted in the geometric dynamics tradition.

Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training / Ferrara, Massimiliano. - In: JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS. - ISSN 0022-3239. - 209:(2026). [10.1007/s10957-026-02958-8]

Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training

Ferrara, Massimiliano^{Conceptualization}

2026-01-01

Abstract

We introduce Geometric-Entropic Optimization (GEO), an algorithm for neural network training that integrates Riemannian gradient methods with entropy-regularized optimal transport. The algorithm operates on a parameter manifold equipped with a combined Fisher-Wasserstein metric and incorporates Sinkhorn-type projections to enforce distributional constraints on layer activations. We establish convergence guarantees showing that GEO achieves an O(1/T)+O(ρ2K) rate, where the first term reflects Riemannian gradient descent and the second captures the contraction of Sinkhorn iterations. Computational experiments on continuous control tasks and language modeling demonstrate consistent improvements over standard optimizers, with performance gains of approximately 20% on benchmark tasks. The theoretical framework unifies recent architectural innovations in deep learning, including manifold-constrained connections and orthogonality-preserving updates within a coherent optimization-theoretic perspective rooted in the geometric dynamics tradition.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Fisher information metric; Geometric dynamics; Neural network training; Optimal transport; Riemannian optimization; Sinkhorn algorithm
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Ferrara_2026_JOTaA_Optimization_editor.pdf accesso aperto Descrizione: Articolo Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 261.17 kB Formato Adobe PDF Visualizza/Apri	261.17 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12318/166566

Citazioni

ND

0

0

social impact