In the contemporary landscape of artificial intelligence (AI) and machine learning (ML), the integrity, diversity and quality of training datasets are critical for ensuring the accuracy and reliability of predictive models. However, the phenomenon of big-data pollution, manifested through AI-generated synthetic data, inconsistencies, biases, and data poisoning within datasets, undermines model performance by diminishing the Shannon Entropy of the system. This study proposes a novel framework that integrates the Dataset Core approach with tokenized data, triple-entry accounting (TEA), and distributed ledger technology (DLT) to address these challenges. Our Dataset Core method preserves essential information value while filtering out potentially harmful elements, providing mathematically grounded protection against data pollution. Combined with blockchain-based verification, this approach establishes a foundation for enhanced transparency and trustworthiness in AI applications, with significant implications for sectors such as finance, healthcare, and beyond.

Mitigating Big Data Pollution and AI Model Deterioration: A Dataset Core Approach with Blockchain-Based Verification / Sgantzos, Konstantinos; Ferrara, Massimiliano. - In: WSEAS TRANSACTIONS ON BUSINESS AND ECONOMICS. - ISSN 1109-9526. - 23:(2026), pp. 123-133. [10.37394/23207.2026.23.11]

Mitigating Big Data Pollution and AI Model Deterioration: A Dataset Core Approach with Blockchain-Based Verification

Ferrara, Massimiliano
Conceptualization
2026-01-01

Abstract

In the contemporary landscape of artificial intelligence (AI) and machine learning (ML), the integrity, diversity and quality of training datasets are critical for ensuring the accuracy and reliability of predictive models. However, the phenomenon of big-data pollution, manifested through AI-generated synthetic data, inconsistencies, biases, and data poisoning within datasets, undermines model performance by diminishing the Shannon Entropy of the system. This study proposes a novel framework that integrates the Dataset Core approach with tokenized data, triple-entry accounting (TEA), and distributed ledger technology (DLT) to address these challenges. Our Dataset Core method preserves essential information value while filtering out potentially harmful elements, providing mathematically grounded protection against data pollution. Combined with blockchain-based verification, this approach establishes a foundation for enhanced transparency and trustworthiness in AI applications, with significant implications for sectors such as finance, healthcare, and beyond.
2026
AI deterioration; Blockchain; Data Poisoning; Dataset Core; Distributed Ledger Technology; Machine Learning Security; Shannon Entropy; Triple-Entry Accounting
File in questo prodotto:
File Dimensione Formato  
Ferrara_2026_WSEAS_Dataset_editor.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 960.37 kB
Formato Adobe PDF
960.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12318/166607
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact