skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Diffusion in complex networks revisited: a machine learning approach towards achieving fair, dynamics-sensitive, and data-driven solutions

Digital Resources/Online E-Resources

Citations Cited by
  • Title:
    Diffusion in complex networks revisited: a machine learning approach towards achieving fair, dynamics-sensitive, and data-driven solutions
  • Author: ASGHARIAN REZAEI, Ahmad
  • Subjects: Complex Networks ; Epidemic Analysis ; Graph Convolutional Networks ; Influence Maximization ; Influential Node Ranking ; Machine Learning ; Support Vector Machines ; Vital Node Identification
  • Description: In recent years, much research has focused on studying the spreading process in complex networks. Identifying and ranking the vital and influential nodes in a complex network is the first question when studying the diffusion in complex networks. This thesis focuses on two of the most important problems in this context: the influential node ranking and influence maximization. The thesis identified several gaps in the existing solutions to these two problems, four of the most notable ones are: the apathy of influence maximization methods to the concept of fair propagation of the influence/news/information through the complex networks, insensitivity of the influential node ranking methods to the parameters of the underlying dynamics, reliance of existing approaches on heuristic methods for calculating the influentiality of nodes, and dependency on the most general type of networks when studying diffusion in complex networks. In the proposed solution to the first gap, Adversarial Graph Embeddings is introduced where an auto-encoder for graph embedding and a discriminator to discern sensitive attributes are co-trained simultaneously. This leads to embeddings which are similarly distributed across sensitive attributes. A good initial set is then found by clustering the embeddings. This method is the first to use embeddings for the task of fair influence maximization. While there are typically trade-offs between fairness and influence maximization objectives, experiments on synthetic and real-world datasets show that the proposed approach dramatically reduces disparity while remaining competitive with state-of-the-art influence maximization methods. To address the issue of insensitivity of influential node ranking methods to the parameters of the underlying dynamics, a randomized sampling algorithm is proposed that not only considers the local and global structural features of the network, but also considers the underlying diffusion dynamics and its parameters. The main idea is to approximate the influential ability (influentiality) of a node with the reachability of other nodes from that node in a set of random sub-graphs. To this end, several random sub-graphs are sampled from the original network and then a hyper-graph is created in which each sub-graph is represented with a hyper-edge. From a theoretical standpoint, one could argue that degree of nodes in the hyper-graph approximates influentiality.  The performance of the proposed model is also evaluated empirically by measuring the correlation between the ranking generated by the proposed method and the ground-truth ranking. The empirical results show that the proposed method substantially outperforms state-of-the-art methods and achieves the highest correlation with the ground-truth ranking, while the generated ranking by this method hits the highest level of uniqueness and uniformity. Theoretical and practical analysis of the running time of the algorithm also confirms that the proposed method maintains a competitive running time compared with state-of-the-art methods. Furthermore, to address the third gap, inspired by the power of machine learning models for efficiently capturing different types of patterns and relations, we propose a machine learning-based, data driven approach for influential node ranking. Unlike most of the exiting approaches that mainly rely on heuristic methods and therefore have a weak adaptability, in this method a data-driven approach is employed that can adapt its ranking according to the network data and the underlying dynamic. The main idea is to train the model with a small portion of the graph, say 0.5% of the nodes, and do the prediction on the rest of the nodes. The ground-truth vitality for the train data is computed by simulating the SIR diffusion method starting from the train nodes. A collective feature engineering method is used where each node in the network is represented by incorporating elements of its connectivity, degree and extended coreness. Several machine learning models are trained on the node representations, but the best results are achieved by a Support Vector Regression machine with RBF kernel. This is because that the training set is very small and SVR is among the few machine learning algorithms capable of modelling with limited data. The empirical results confirms that the proposed model outperforms state-of-the-art models on a selection of datasets, while it also shows more adaptability to changes in the dynamics parameters. Finally, to further extend the applicability of data-driven models to different types of the networks, in the final chapter this thesis studies influential node ranking in directed attributed networks. This study is inspired by the prevalence of directed and attributed networks in real-world applications. In such applications, any feature engineering method that merely relies on adjacency matrix is not capable of capturing network’s full properties. In fact, since such networks deal with information that falls beyond the structural properties of the network, some form of embedding the asymmetric network connections and nodes meta information is required. To this end, a Spatial Graph Convolutional method is used for obtaining a base embedding that incorporates asymmetric network structure and meta information for each node in the form of a vectorized representation. The base embedding is then enriched in an iterative process to also capture higher-order relations between the nodes. The enriched embedding is then used as input features to a random forest algorithm that models the influentiality of the nodes. Finally, the trained random forest is used to predict influential ability of the nodes. The empirical results demonstrate that the proposed method outperforms fifteen other models in terms of degree of agreement with the ground truth, the ability to find the most influential nodes, and the uniqueness of ranking. All in all, the gap in the performance the proposed method and most of the existing method is significant, underscoring the importance of an embedding-based approach for finding influential nodes in directed attributed networks. Source: TROVE
  • Creation Date: 2023
  • Language: English
  • Source: Trove Australian Thesis (Full Text Open Access)

Searching Remote Databases, Please Wait