Activation functions govern the expressive power and training dynamics of deep neural networks through their analytical properties. This paper provides a rigorous mathematical analysis of six fundamental activation functions – Linear, Sigmoid, Hyperbolic Tangent, ReLU, Parametric ReLU, and Exponential Linear Unit – examining how regularity, gradient structure, and spectral properties influence representational capacity, gradient flow stability, and convergence behavior in deep architectures. We establish formal results on the representational collapse of linear activations, derive sharp gradient decay bounds for saturating functions, prove gradient preservation theorems for piecewiselinear activations, and characterize the convergence advantages of smooth non-saturating units. Our analysis yields a unified mathematical framework connecting activation function properties to network trainability, with direct implications for the design of deep learning architectures in sequential decision-making, continuous control, and safety-critical applications

MATHEMATICAL PROPERTIES OF ACTIVATION FUNCTIONS IN ARTIFICIAL INTELLIGENCE DEVELOPMENTS: Analysis and Implications for Deep Neural Architectures / Ferrara, Massimiliano; Ciccia, Celeste. - In: THE JOURNAL OF THE INDIAN ACADEMY OF MATHEMATICS. - ISSN 0970-5120. - 48:1(2026), pp. 1-9.

MATHEMATICAL PROPERTIES OF ACTIVATION FUNCTIONS IN ARTIFICIAL INTELLIGENCE DEVELOPMENTS: Analysis and Implications for Deep Neural Architectures

Massimiliano Ferrara
Conceptualization
;
2026-01-01

Abstract

Activation functions govern the expressive power and training dynamics of deep neural networks through their analytical properties. This paper provides a rigorous mathematical analysis of six fundamental activation functions – Linear, Sigmoid, Hyperbolic Tangent, ReLU, Parametric ReLU, and Exponential Linear Unit – examining how regularity, gradient structure, and spectral properties influence representational capacity, gradient flow stability, and convergence behavior in deep architectures. We establish formal results on the representational collapse of linear activations, derive sharp gradient decay bounds for saturating functions, prove gradient preservation theorems for piecewiselinear activations, and characterize the convergence advantages of smooth non-saturating units. Our analysis yields a unified mathematical framework connecting activation function properties to network trainability, with direct implications for the design of deep learning architectures in sequential decision-making, continuous control, and safety-critical applications
2026
Activation functions, deep neural networks, gradient flow, vanishing gradients, convergence analysis, ReLU, ELU, representational capacity
File in questo prodotto:
File Dimensione Formato  
Ferrara_2026_JIAMS_Math. properties_editor.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 314.59 kB
Formato Adobe PDF
314.59 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12318/165206
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact