Detecting communities in graphs is a fundamental data analysis tool to understand the structure of Web-based systems and predict their evolution. Many community detection algorithms handle undirected graphs (i.e., graphs with bidirectional edges) but many graphs on the Web -- e.g. microblogging Web sites, trust networks or the Web graph itself -- are often directed. Few community detection algorithms deal with directed graphs but we lack their experimental comparison. In this paper we evaluated some community detection algorithms across accuracy and scalability. A first group of algorithms (Label Propagation and Infomap) are explicitly designed to manage directed graphs while a second group (e.g., WalkTrap) simply ignores edge directionality; finally, a third group of algorithms (e.g., Eigenvector) maps input graphs onto undirected ones and extracts communities from the symmetrized version of the input graph. We ran our tests on both artificial and real graphs and, on artificial graphs, WalkTrap achieved the highest accuracy, closely followed by other algorithms; Label Propagation has outstanding performance in scalability on both artificial and real graphs. The Infomap algorithm showcased the best trade-off between accuracy and computational performance and, therefore, it has to be considered as a promising tool for Web Data Analytics purposes.

An Empirical Comparison of Algorithms to Find Communities in Directed Graphs and their Application in Web Data Analytics

ROSACI D;SARNE' G;
2017-01-01

Abstract

Detecting communities in graphs is a fundamental data analysis tool to understand the structure of Web-based systems and predict their evolution. Many community detection algorithms handle undirected graphs (i.e., graphs with bidirectional edges) but many graphs on the Web -- e.g. microblogging Web sites, trust networks or the Web graph itself -- are often directed. Few community detection algorithms deal with directed graphs but we lack their experimental comparison. In this paper we evaluated some community detection algorithms across accuracy and scalability. A first group of algorithms (Label Propagation and Infomap) are explicitly designed to manage directed graphs while a second group (e.g., WalkTrap) simply ignores edge directionality; finally, a third group of algorithms (e.g., Eigenvector) maps input graphs onto undirected ones and extracts communities from the symmetrized version of the input graph. We ran our tests on both artificial and real graphs and, on artificial graphs, WalkTrap achieved the highest accuracy, closely followed by other algorithms; Label Propagation has outstanding performance in scalability on both artificial and real graphs. The Infomap algorithm showcased the best trade-off between accuracy and computational performance and, therefore, it has to be considered as a promising tool for Web Data Analytics purposes.
2017
Web Data Analytics; Graph Analytics; Community Detection and Clustering
File in questo prodotto:
File Dimensione Formato  
Agreste_2016_TBD_an_editor.pdf

non disponibili

Descrizione: versione editoriale
Tipologia: Versione Editoriale (PDF)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 867.09 kB
Formato Adobe PDF
867.09 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Agreste_2016_TBD_an_post.pdf

accesso aperto

Descrizione: versione postprint
Tipologia: Documento in Post-print
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 880.29 kB
Formato Adobe PDF
880.29 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12318/4699
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 17
social impact