In the contemporary landscape of artificial intelligence (AI) and machine learning (ML), the integrity, diversity and quality of training datasets are critical for ensuring the accuracy and reliability of predictive models. However, the phenomenon of big-data pollution, manifested through AI-generated synthetic data, inconsistencies, biases, and data poisoning within datasets, undermines model performance by diminishing the Shannon Entropy of the system. This study proposes a novel framework that integrates the Dataset Core approach with tokenized data, triple-entry accounting (TEA), and distributed ledger technology (DLT) to address these challenges. Our Dataset Core method preserves essential information value while filtering out potentially harmful elements, providing mathematically grounded protection against data pollution. Combined with blockchain-based verification, this approach establishes a foundation for enhanced transparency and trustworthiness in AI applications, with significant implications for sectors such as finance, healthcare, and beyond.
Mitigating Big Data Pollution and AI Model Deterioration: A Dataset Core Approach with Blockchain-Based Verification / Sgantzos, Konstantinos; Ferrara, Massimiliano. - In: WSEAS TRANSACTIONS ON BUSINESS AND ECONOMICS. - ISSN 1109-9526. - 23:(2026), pp. 123-133. [10.37394/23207.2026.23.11]
Mitigating Big Data Pollution and AI Model Deterioration: A Dataset Core Approach with Blockchain-Based Verification
Ferrara, Massimiliano
Conceptualization
2026-01-01
Abstract
In the contemporary landscape of artificial intelligence (AI) and machine learning (ML), the integrity, diversity and quality of training datasets are critical for ensuring the accuracy and reliability of predictive models. However, the phenomenon of big-data pollution, manifested through AI-generated synthetic data, inconsistencies, biases, and data poisoning within datasets, undermines model performance by diminishing the Shannon Entropy of the system. This study proposes a novel framework that integrates the Dataset Core approach with tokenized data, triple-entry accounting (TEA), and distributed ledger technology (DLT) to address these challenges. Our Dataset Core method preserves essential information value while filtering out potentially harmful elements, providing mathematically grounded protection against data pollution. Combined with blockchain-based verification, this approach establishes a foundation for enhanced transparency and trustworthiness in AI applications, with significant implications for sectors such as finance, healthcare, and beyond.| File | Dimensione | Formato | |
|---|---|---|---|
|
Ferrara_2026_WSEAS_Dataset_editor.pdf
accesso aperto
Descrizione: Articolo
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
960.37 kB
Formato
Adobe PDF
|
960.37 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


