MarkovRCnet

MarkovRCnet (which stands for Markovian Refined Complex networks) is a Python package for analyzing network structure using Markov-based clustering and reachability measures, integrating MCL, refined MCL variants, and the MiF family of metrics.

The URL of this PyPA project is : https://pypi.org/project/markovrcnet

The URL of this project at github.com is: https://github.com/hilolani/MarkovRCnet

Run

pip install markovrcnet

or if you use Docker,

docker pull akamahilolani/markovrcnet:latest

Current Version 1.1.5

-Bug fixed in mcl/core.py

Version 1.1.4

- Decoupled PyTorch dependency from core modules

- Added optional dependency group [gnn]

- Core MiF/MCL now framework-agnostic

About

MarkovRCnet is a Python library (not a CLI tool), designed for research and experimental use. It is not a deep learning framework. Instead, it provides classical Markov-based graph clustering and influence metrics, intended to be composed with other graph learning or GNN pipelines.

Input

scipy.sparse.csr_matrix (adjacency matrix)

Output

dict[int, list[int]] (cluster assignments)
float (influence / score metrics)

Quick Start

MarkovRCnet is a Python library. Please install it in a virtual environment using pip. pipx is not recommended, as this package is not a CLI tool.

Setup

sudo apt install python3-pip python3-venv

python3 -m venv .venv

source .venv/bin/activate

Install

pip install markovrcnet

Example

from markovrcnet.datasets import load_all_adjmats

from markovrcnet.mcl import mclprocess

mats = load_all_adjmats()

clusters = mclprocess(mats["karateclub"])

print(clusters)

Overview

MarkovRCNet is a Python library for analyzing complex networks using Markov random walk–based methods.

Many real-world systems—such as social, biological, and information networks—can be naturally modeled as graphs. Yet, understanding their global and local structure remains challenging due to sparsity, scale, and complex connectivity.

A natural way to explore a graph is to imagine a random walker moving from node to node. At each step, the walker selects its next position based solely on the current node, following a Markov process. Despite this simple rule, the resulting dynamics reveal rich structural information about the network.

MarkovRCNet leverages this principle to provide tools for:

measuring similarity and distance between nodes,

identifying communities and flow structures,

extracting interpretable representations of graphs.

By grounding network analysis in Markov dynamics, MarkovRCNet offers an intuitive yet mathematically principled framework—accessible to beginners and useful for advanced research, including applications in machine learning and graph-based AI.

Two Components of This Library

This library consists of two components that are formally independent yet closely related: Re_MCL and MiF.

Re_MCL revisits and reimplements the Markov Cluster Algorithm (MCL), a graph clustering method proposed over two decades ago, while incorporating unique extensions and refinements.

MiF (Markov inverse F-measure), in contrast, provides a flexible framework for measuring distances and similarities between nodes based on the intrinsic characteristics of complex networks.

Re_MCL: MCL and Recurrent MCL (RMCL)

The Markov Cluster Algorithm (MCL), proposed by Van Dongen, is a fast, scalable, and high-quality method for graph clustering (community detection) based on the simulation of random walks on graphs.

MCL models random walks using two simple algebraic matrix operations known as expansion and contraction. Expansion corresponds to standard matrix multiplication, while contraction combines Hadamard (element-wise) multiplication with rescaling. By alternately applying these two operations, MCL partitions a graph into a fixed number of disjoint subgraphs.

When applied to semantic networks—such as those constructed from word co-occurrence relationships—MCL often produces clusters that correspond to meaningful conceptual categories.

This library implements the conventional MCL algorithm and additionally provides an extended algorithm called Recurrent MCL (RMCL), originally developed at the former Akama Laboratory at Tokyo Institute of Technology. RMCL can be executed via the function rmcl_basic().

A key component of RMCL is Branching Markov Clustering (BMCL), which addresses a well-known limitation of standard MCL: severe cluster size imbalance. This issue becomes particularly pronounced in graphs exhibiting scale-free and heterophilic characteristics.

Core Clusters and Core Hubs

In graphs whose degree distributions approximately follow a power law—such as semantic networks derived from large language corpora—standard MCL often produces one extremely large cluster alongside many small ones. We refer to this large cluster as the core cluster. The node with the highest degree within the core cluster is designated as the core hub.

BMCL enables the division of such oversized core clusters into appropriately sized subgraphs. It achieves this by introducing latent adjacency relationships between Markov clusters, thereby reconstructing a more informative network structure.

Latent Adjacency Construction

To discover new relationships between Markov clusters, BMCL utilizes both the core cluster and smaller non-core clusters adjacent to it. From each Markov cluster, the node with the maximum degree is selected as a representative node (hub).

Using the adjacency matrix of the original graph, latent adjacency information is computed, and MCL is reapplied to this reconstructed adjacency matrix.

Splitting Large Core Clusters

First, BMCL identifies existing connections between non-corehub nodes within the core cluster and representative hubs of other Markov clusters. It then determines whether two non-corehub nodes in the core cluster can be virtually connected via a representative hub of an external Markov cluster within two hops, and counts the number of such distinct paths.

These inferred connections are interpreted as latent adjacencies between node pairs inside the core cluster. To divide the core cluster into balanced subclusters, MCL is reapplied to the resulting latent adjacency matrix.

Merging Small Non-Core Clusters (Reverse Branching)

BMCL also supports the merging of small non-core clusters through a process known as reverse branching. In this step, connections between the core hub and representative hubs of non-core clusters are examined.

BMCL determines whether two representative hubs can be connected via the core hub within two hops, and if so, how many distinct paths exist. These inferred connections are likewise treated as latent adjacency information. MCL is then applied to this latent adjacency matrix, enabling the merging of small non-core clusters into larger, more meaningful structures.

MiF (Markov inverse F-measure)

While Re_MCL focuses on restructuring graph topology through Markov dynamics, MiF addresses a complementary problem: measuring similarity and distance within complex networks.

MiF (Markov inverse F-measure) is a similarity (distance) measure between vertices in a graph, originally proposed by Akama et al. (2015).

MiF evaluates how closely two vertices are related by modeling how information flows between them through a Markov random walk. Unlike conventional graph similarity measures, MiF integrates both local and global structural information into a single framework.

From a local perspective, MiF considers co-occurrence-based similarity, reflecting how strongly two vertices overlap in their immediate neighborhoods. This idea is conceptually related to measures such as the Jaccard and Simpson coefficients.

From a global perspective, MiF incorporates geodesic-based similarity, taking into account shortest path lengths and the number of such paths. MiF naturally balances these two perspectives, enabling robust similarity estimation even in complex network structures.

The MiF value is normalized to lie within the interval [0, 1], where larger values indicate stronger similarity.

Parameterization and Network Characteristics

MiF includes several free parameters that allow the metric to adapt to different network characteristics.

In classical set-based similarity measures, normalization often relies on the size of the union of two sets. In graph-based settings, however, such normalization can become problematic due to degree imbalance, degree correlation, or scale-free structures.

To address this issue, MiF introduces a parameter β (0 < β < 1), which controls how vertex degrees contribute to normalization. By using a harmonic-mean–based formulation, MiF can emphasize or suppress degree effects. In practice, choosing β values close to zero allows heterophilic or homophilic properties of the network to be highlighted more clearly.

MiF also accounts for the influence of longer paths in random walks. While shorter paths usually dominate similarity, longer paths and detours may still carry meaningful structural information. This effect is controlled by another parameter α, which gradually decreases the contribution of paths as their length increases.

MiF Degradation Index (MiFDI)

This package also introduces a derived metric called the MiF Degradation Index (MiFDI).

MiFDI analyzes how similarity degrades as a random walk expands from a selected starting vertex. A random walk is initiated from a specific vertex (for example, a vertex with minimal degree), and MiF values between the starting vertex and each visited vertex are computed.

These values are recorded on a logarithmic scale and averaged at each step of the walk. Depending on the configuration, self-loops can be included or excluded. When self-loops are excluded, propagation stops at a vertex once it has been reached.

MiFDI provides a compact representation of how rapidly relational similarity decays across the network.

Usage

Install

This package (current version: 1.1.5) has been published on PyPI, so you can install MarkovRCnet via:

pip install markovrcnet

(If you are using Google Colab, prefix `pip` with `!`.)

Input

In this package, all public APIs accept either a file path or a SafeCSR object defined in utils.sparse for specifying the adjacency matrix of a graph. Not only Matrix Market format sparse matrices (.mtx files), but also sparse matrices in other formats, and even dense matrices (though not recommended), can be automatically converted into SafeCSR object by io.load_matrix. Dense matrices are supported for convenience, but sparse representations are strongly recommended for performance and memory efficiency.

Input normalization is handled consistently across MCL and MiF, and internal functions operate on SafeCSR objects only. Input normalization (path / SafeCSR) and copy control are handled exclusively at API entry points.

MCL and MiF

This section demonstrates a minimal yet representative workflow of MarkovRCnet: loading an adjacency matrix, performing Markov Clustering (MCL), and analyzing intra- and inter-cluster reachability using MiF. The same workflow applies to RMCL and MiFDI, which extend MCL and MiF by incorporating refinement and directional information. See the following examples for their basic usage.

We use the classic Zachary’s Karate Club network as a small but non-trivial example. In this example, MiF values from a dangling node inside its own MCL cluster are significantly larger than those to nodes in the other cluster. This illustrates that MiF captures cluster-aware reachability consistent with the MCL partition.

from markovrcnet.datasets import load_all_adjmats

from markovrcnet.mcl import mclprocess

from markovrcnet.mif import MiF

from markovrcnet.io.load_matrix import load_adjacency

import numpy as np

import networkx as nx

mats = load_all_adjmats()

karatec = load_adjacency(mats["karateclub"])

# Analysis of Karate Club

Gobj = nx.from_scipy_sparse_array(karatec)

diameter_karatec = nx.diameter(Gobj)

print(f"The diameter of the Karate Club is: {diameter_karatec}")

# The diameter of the Karate Club is: 5

deglist = dict(nx.degree(Gobj))

min_deg_node = min(deglist, key=deglist.get)

print(f"The vertex number of the smallest degree is: {min_deg_node}")

# The vertex number of the smallest degree is: 11

# MCL

result_karate_mcl = mclprocess(karatec)

print(result_karate_mcl)

# {0: [0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21], 1: [2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]}

# MiF

# Select as the starting node the vertex 11 belonging to the cluster number 0.

# Remove the starting node itself from the target list

result_karate_mcl[0].remove(min_deg_node)

mif_clus0 = [MiF(karatec, min_deg_node, i, 0.5, diameter_karatec) for i in result_karate_mcl[0]]

mif_clus1 = [MiF(karatec, min_deg_node, i, 0.5, diameter_karatec) for i in result_karate_mcl[1]]

print(

f"The mean MiF value of the cluster 0 from a dangling node inside is large: "

f"{np.mean(mif_clus0)}"

)

# 0.03705702889419532

print(

f"The mean MiF value of cluster 1 from that exterior node is small: "

f"{np.mean(mif_clus1)}"

)

# 0.0044844434956278776

# This illustrates that MiF captures cluster-aware reachability consistent with the MCL partition.

The extension of workflow

The same workflow applies to RMCL and MiFDI, which extend MCL and MiF by incorporating refinement and directional information.

See the following examples for their basic usage.

RMCL functions

The RMCL addresses a well-known limitation of standard MCL, i.e. issue of severe cluster size imbalance particularly pronounced in graphs exhibiting scale-free and heterophilic characteristics. Therefore, it is mainly applicable when the size (number of elements) of the largest cluster output as a result of MCL exceeds a certain threshold (by default, when the number of elements per cluster exceeds twice the standard deviation).

Branching_Rmcl

The branching_rmcl function enables the division of such oversized core clusters into appropriately sized subgraphs. It achieves this by introducing latent adjacency relationships between Markov clusters, thereby reconstructing a more informative network structure.

As suitable for the demonstration purpose, the following example doesn't use Karate club network, but "scalefree.mtx" (stored in data repository), which was created based on the Barabási–Albert (BA) model, starting with a complete graph of 5 nodes for which additional nodes with degree 5 repeatedly underwent preferential attachment until the total number of vertices reaches 100. The branching_rmcl() also supports the merging of small, non-core clusters through a process called reverse branching.

SRmcl and Mixed Rmcl

Simply-repeated MCL, abbreviated as srmcl, operates as follows: if a core cluster exists, it extracts the hub with the highest degree from within it and then re-runs MCL iteratively only on the core cluster.

The mixed_rmcl applies srmcl to the core cluster and, for non-core ones, applies reverse branching rmcl.

# Scale free graph.

from markovrcnet.datasets import load_all_adjmats

import markovrcnet.mcl as mcl

mats = load_all_adjmats()

sf = mats["scalefree"]

cluslist = mcl.mclprocess(sf)

print(f"The MCL result of 'scalefree' graph is: {cluslist}")

# The MCL result of 'scalefree' graph is: {0: [0, 11, 18, 21, 49, 80], 1: [3, 48, 57, 72], 2: [1, 2, 4, 5, 9, 10, 12, 13, 14, 15, 16, 17, 23, 24, 25, 26, 28, 29, 33, 37, 39, 41, 43, 45, 50, 51, 52, 54, 58, 59, 60, 61, 62, 63, 68, 70, 75, 79, 85, 86, 88, 89, 90, 92, 93, 96, 98], 3: [6, 46], 4: [7, 66, 77, 97], 5: [8, 27, 32, 42, 53, 55, 56, 64, 67, 76, 78, 83, 84, 87, 94, 95], 6: [19, 30, 69], 7: [20, 38], 8: [22, 47, 81], 9: [31], 10: [35], 11: [36, 40, 65], 12: [44], 13: [73], 14: [82], 15: [34, 71, 74, 91, 99]}

result_branching = mcl.branching_rmcl(cluslist, sf)

print(f"The Branching RMCL result of 'scalefree' graph under the default settings is: {result_branching}")

# {0: [1, 2, 5, 9, 10, 12, 13, 14, 15, 16, 17, 23, 24, 25, 28, 29, 33, 37, 39, 41, 45, 50, 51, 52, 54, 58, 59, 60, 61, 62, 63, 68, 70, 85, 88, 90, 92, 93, 98], 1: [26], 2: [43], 3: [75], 4: [79], 5: [86], 6: [89], 7: [96], 8: [4]}

result_sr = mcl.sr_mcl(cluslist, sf)

print(f"The SR MCL result under the default settings is : {result_sr}")

# {0: [1, 2, 4, 5, 9, 10, 12, 13, 14, 15, 17, 24, 25, 26, 28, 29, 33, 37, 39, 41, 50, 52, 54, 60, 61, 62, 63, 68, 75, 79, 85, 86, 88, 92, 98], 1: [16, 89], 2: [23, 93], 3: [43, 58], 4: [45, 70, 90, 96], 5: [51, 59]}

result_mixed = mcl.mixed_rmcl(cluslist, sf, threspruning = 3.0, branching = False)

print(f"The Mixed MCL result under the default settings but changing the threshold of latent adjacency weight into 3.0 and without applying reverse branching to non-core clusters is : {mcl.mcllist_to_mcldict(result_mixed)}")

# {0: [0, 11, 18, 21, 49, 80], 1: [1, 2, 4, 5, 9, 10, 12, 13, 14, 15, 17, 24, 25, 26, 28, 29, 33, 37, 39, 41, 50, 52, 54, 60, 61, 62, 63, 68, 75, 79, 85, 86, 88, 92, 98], 2: [3, 48, 57, 72], 3: [6, 46], 4: [7, 66, 77, 97], 5: [8, 27, 32, 42, 53, 55, 56, 64, 67, 76, 78, 83, 84, 87, 94, 95], 6: [16, 89], 7: [19, 30, 69], 8: [20, 38], 9: [22, 47, 81], 10: [23, 93], 11: [31], 12: [34, 71, 74, 91, 99], 13: [35], 14: [36, 40, 65], 15: [43, 58], 16: [44], 17: [45, 70, 90, 96], 18: [51, 59], 19: [73], 20: [82]}

Other MiF functions

MiFDI as an extension of MiF Broadcast

The MiF_broadcast function calculates the MiF value between the starting vertex and each reached vertex, with the default value of beta set to 0.5 and within a default range of 10 steps. This function allows to choose whether the random walk continues even after reaching the target node, permitting self-loops, or whether it terminates once the target node is reached.

On the other hand, in the MiFDI function, the starting point is set by default to the vertex with the minimum degree. However, by specifying “max,” it is also possible to select the vertex with the maximum degree.

The MiF_broadcast function and the MiFDI function are similar in that they calculate the MiF value between the starting point and other vertices. However, they have the following crucial differences.

Considering the feature extraction of homophily/heterophily, the beta value in MiFDI() is set to 0.2 by default.

MiFDI() determines the starting point based on degree, so multiple starting points may exist, requiring separate calculations.

MiFDI() logarithmizes the MiF value.

from markovrcnet.datasets import load_all_adjmats

import markovrcnet.mif as mif

mats = load_all_adjmats()

karatec = mats["karateclub"]

result0_bc = mif.MiF_broadcast(karatec, 3)

print(f"The MiF broadcast value from the vertex number 3 under the default settings is : {result0_bc}")

# [[3, 0, np.float64(0.11458333333333333)], [3, 1, np.float64(0.1388888888888889)], [3, 2, np.float64(0.13333333333333333)], [3, 4, np.float64(0.012455413410329515)], [3, 5, np.float64(0.011791124695111939)], [3, 6, np.float64(0.011791124695111939)], [3, 7, np.float64(0.20833333333333334)], [3, 8, np.float64(0.014777609130899424)], [3, 9, np.float64(0.011225249122889563)], [3, 10, np.float64(0.012455413410329515)], [3, 11, np.float64(0.016088242321675623)], [3, 12, np.float64(0.3333333333333333)], [3, 13, np.float64(0.18333333333333332)], [3, 14, np.float64(0.0017972981920494257)], [3, 15, np.float64(0.0017972981920494257)], [3, 16, np.float64(0.0037360611450214516)], [3, 17, np.float64(0.023582249390223877)], [3, 18, np.float64(0.0017972981920494257)], [3, 19, np.float64(0.017398037779507894)], [3, 20, np.float64(0.0017972981920494257)], [3, 21, np.float64(0.023582249390223877)], [3, 22, np.float64(0.0017972981920494257)], [3, 23, np.float64(0.001925205592813683)], [3, 24, np.float64(0.00208211901421573)], [3, 25, np.float64(0.0010118945956142003)], [3, 26, np.float64(0.0010565082473545091)], [3, 27, np.float64(0.00960846177368277)], [3, 28, np.float64(0.009939168276929611)],[3, 29, np.float64(0.0013812253658064847)], [3, 30, np.float64(0.008593269717204083)], [3, 31, np.float64(0.0076885267964997)], [3, 32, np.float64(0.00728267341478283)], [3, 33, np.float64(0.0070900045566491075)]]

result1_bc = mif.MiF_broadcast(karatec, 3, loop = 1)

print(f"The MiF broadcast value from the vertex number 3 with loop is : {result1_bc}")

# [[3, 0, np.float64(0.10287618822169536)], [3, 1, np.float64(0.11246038170048131)], [3, 2, np.float64(0.105856363630985)], [3, 4, np.float64(0.016273004862745834)], [3, 5, np.float64(0.015451231517468984)], [3, 6, np.float64(0.015451231517468984)], [3, 7, np.float64(0.14523190300250574)], [3, 8, np.float64(0.01773891854249341)], [3, 9, np.float64(0.013058588488864924)], [3, 10, np.float64(0.016273004862745834)], [3, 11, np.float64(0.019577882669195107)], [3, 12, np.float64(0.20171698534104984)], [3, 13, np.float64(0.12757116941668883)], [3, 14, np.float64(0.0017972981920494257)], [3, 15, np.float64(0.0017972981920494257)], [3, 16, np.float64(0.0037360611450214516)], [3, 17, np.float64(0.026574969559341004)], [3, 18, np.float64(0.0017972981920494257)], [3, 19, np.float64(0.020393309131807678)], [3, 20, np.float64(0.0017972981920494257)], [3, 21, np.float64(0.026574969559341004)], [3, 22, np.float64(0.0017972981920494257)], [3, 23, np.float64(0.001925205592813683)], [3, 24, np.float64(0.00208211901421573)], [3, 25, np.float64(0.0010118945956142003)], [3, 26, np.float64(0.0010565082473545091)], [3, 27, np.float64(0.010923938326264579)], [3, 28, np.float64(0.011863322089205354)],[3, 29, np.float64(0.0013812253658064847)], [3, 30, np.float64(0.01144984681786761)], [3, 31, np.float64(0.010558212147211865)], [3, 32, np.float64(0.009407675325989316)], [3, 33, np.float64(0.01047582624117161)]]

result0_di = mif.MiFDI(karatec)

print(f"The MiFDI values under the default settings is : {result0_di[0]}")

# [[11, 0, -0.2076393647782445], [11, 1, -3.884047951519476], [11, 2, -3.8993154236502643], [11, 3, -3.8747743147341467], [11, 4, -3.7978132735980186], [11, 5, -3.809735918554925], [11, 6, -3.809735918554925], [11, 7, -3.865065500607186], [11, 8, -3.8925586411873847], [11, 9, -6.161680931542804], [11, 10, -3.7978132735980186], [11, 11, 0.0], [11, 12, -3.791101839010032],[11, 13, -3.8914645491745254], [11, 14, -7.1807237074616985], [11, 15, -7.1807237074616985], [11, 16, -5.295842452605739], [11, 17, -3.809735918554925], [11, 18, -7.1807237074616985], [11, 19, -3.8671841454674714], [11, 20, -7.1807237074616985], [11, 21, -3.809735918554925], [11, 22, -7.1807237074616985], [11, 23, -6.963401667614471], [11, 24, -6.130539725463252], [11, 25, -6.135916815704655], [11, 26, -7.697503264464141], [11, 27, -6.195771487529799], [11, 28, -5.503087734799266], [11, 29, -7.205294482155748], [11, 30, -5.521978571866003], [11, 31, -3.886696959691053], [11, 32, -5.153378465404749], [11, 33, -4.873054990733721]]

result1_di = mif.MiFDI(karatec, loop = 1)

print(f"MiFDI values under the default settings but with loop is : {result1_di[0]}")

# [[11, 0, -0.7945585829166409], [11, 1, -3.5044104458896275], [11, 2, -3.6159050359212515], [11, 3, -3.620069261748534], [11, 4, -3.8267064577520142], [11, 5, -3.8272596040975957], [11, 6, -3.8272596040975957], [11, 7, -3.753837476377918], [11, 8, -3.9558046724183398], [11, 9, -5.873394895387084], [11, 10, -3.8267064577520142], [11, 11, 0.0], [11, 12, -3.9059671325338075],[11, 13, -3.759187715860139], [11, 14, -7.1807237074616985], [11, 15, -7.1807237074616985], [11, 16, -5.304002817506114], [11, 17, -3.911939265212226], [11, 18, -7.1807237074616985], [11, 19, -3.945527612247624], [11, 20, -7.1807237074616985], [11, 21, -3.911939265212226], [11, 22, -7.1807237074616985], [11, 23, -6.963401667614471], [11, 24, -6.149006361384828], [11, 25, -6.213809678628969], [11, 26, -7.697503264464141], [11, 27, -5.8663238048688005], [11, 28, -5.403713004246248], [11, 29, -7.205294482155748], [11, 30, -5.291757387799649], [11, 31, -4.060855606431937], [11, 32, -5.077499652817014], [11, 33, -4.82166344179776]]

"""

The significance of this value becomes clear when comparing both homophily and heterophily graphs created under nearly identical conditions. This is self-evident given the differences in network diameters.

The homophily and heterophily networks used here were created as follows.

First, for each node in a random graph with nearly uniform degree, we randomly assigned the degrees from a scale-free graph (generated by the BA model) with the same number of nodes as its intrinsic weight. This scale-free graph is identical to the one set in the same data directory.

We then pruned edges based on the difference in these weights.

For the generation functions of homophily and heterophily, instead of setting a threshold on the weight difference between nodes for edge pruning as in standard threshold graph calculations, we set two separate symmetric sigmoid functions specifically for homophily and heterophily.

The independent variable for these functions is the absolute value of the difference in the logarithm of the intrinsic weights. Pruning is then determined probabilistically.

"""

hetero = mats["heterophilly"]

homo = mats["homophilly"]

resultdi_homo = mif.MiFDI(homo, startingvertices = "max")

resultdi_hetero = mif.MiFDI(hetero, startingvertices = "max", dangn = 1)

print(f"The MiFDI values for the remaining nodes at the last step (gamma = 3) of the homophilly graph under the default settings but from the largest hub (vertex number 51) is : {resultdi_homo[1]}")

#[0.0, -13.816401830633342, -15.313733207392165]

print(f"The MiFDI values for the remaining nodes at the last step (gamma = 1) of the heterophilly graph under the default settings but from the second one (vertex number 99) of the largest hubs (96 and 99) is : {resultdi_hetero[1]}")

# [-6.751245193254783, -6.817386901413719, -6.75332563583608, -6.71020117998387, -6.85224069005932, -6.823407968669435, -6.945072303533687, -6.890682281015667, -6.878297506891731, -6.99516162492838, -6.8423601752486025, -6.901265305628976, -6.970274161488641, -7.098002182228425, -7.0738759687407535, -6.865518427788805, -6.725124110528816, -6.64305615135676, -6.504512766970748, -6.547474470365769, -6.308383417145835, -6.387056838658736, -6.385230462487238, -6.357980963821984, -6.41787777487924, -6.373961468730117, 0.0]

By comparing between homophily and heterophily the number of nodes remaining and the characteristics of the log-transformed MiF values from the starting hub to those points when all nodes in the network have been reached through repeated random walks, using homophily/heterophily as a measure, the validity of the MiFDI metric becomes evident.

Docker (Reproducible environment)

You can also run MarkovRCnet using Docker without installing Python dependencies.

Setup and Install

sudo apt update

sudo apt install -y docker.io

sudo usermod -aG docker $USER

docker ps

docker pull akamahilolani/markovrcnet:latest

Example

docker run --rm akamahilolani/markovrcnet:latest\

python -c "from markovrcnet.mif import MiF; print(MiF)"

When using python - with Docker, -i is required to pass stdin.

docker run --rm -i akamahilolani/markovrcnet:latest python - << 'EOF'

from markovrcnet.datasets import load_all_adjmats

from markovrcnet.mcl import mclprocess

mats = load_all_adjmats()

clusters = mclprocess(mats["karateclub"])

print("OK:", clusters)

EOF

latest always tracks the best stable PyPI release.

Jupyter Docker

docker pull akamahilolani/markovrcnet-jupyter:latest

docker run --rm -it -p 10001:10001 akamahilolani/markovrcnet-jupyter:latest

Open the printed URL in your browser (There is no 8888 port problem).

PyTorch Geometric integration (experimental)

markovrcnet provides a lightweight helper module to interface with PyTorch Geometric (PyG).

This allows Markov Clustering (MCL) results and MiF-based quantities to be directly used as node labels and node features in GNN pipelines.

Note PyTorch and PyTorch Geometric are optional dependencies. This module is imported only when explicitly used.

Run

pip install markovrcnet[gnn]

to use the functions in utils/pyg.py.

Example: Karate Club network

from torch_geometric.data import Data

from markovrcnet.utils.pyg import (

csr_to_edge_index,

clusters_to_node_labels,

mifdi_to_node_features,

)

from markovrcnet.datasets import load_all_adjmats

from markovrcnet.mcl import mclprocess

import markovrcnet.mif as mif

from markovrcnet.io import load_adjacency

# load adjacency matrix

mats = load_all_adjmats()

adj = load_adjacency(mats["karateclub"])

# edge index for PyG

edge_index = csr_to_edge_index(adj)

# node labels from MCL

clusters = mclprocess(mats["karateclub"])

y = clusters_to_node_labels(clusters, adj.shape[0])

# node features from MiFDI

mifdi_raw = mif.MiFDI(mats["karateclub"], dangn=0)[0]

x = mifdi_to_node_features(mifdi_raw, adj.shape[0])

# PyG Data object

data = Data(x=x, edge_index=edge_index, y=y)

The resulting data object can be directly used in standard PyTorch Geometric models.

A convenience function adjmats_to_pyg_data is provided to construct PyTorch Geometric Data objects directly from MarkovRCnet outputs.

This interface is designed to be compatible with downstream graph learning libraries such as PyTorch Geometric and Graphein.

References

Stijn van Dongen, Graph Clustering by Flow Simulation, 2000 https://dspace.library.uu.nl/bitstream/handle/1874/848/full.pdf?sequence=1&isAllowed=y

Jaeyoung Jung and Hiroyuki Akama. 2008. Employing Latent Adjacency for Appropriate Clusteringof Semantic Networks. New Trends in Psychometrics p.131-140

Hiroyuki Akama et al, 2008. Random graph model simulations of semantic networks for associative Concept dictionaries, TextGraphs-3 doi: https://dl.acm.org/doi/10.5555/1627328.1627337

Hiroyuki Akama et al., 2008. How to Take Advantage of the Limitations with Markov Clustering?--The Foundations of Branching Markov Clustering (BMCL), IJCNLP-2008, p.901~906 https://aclanthology.org/I08-2129.pdf

Hiroyuki Akama et al., 2007. Building a clustered semantic network for an Entire Large Dictionary of Japanese, PACLING-2007, p.308~316 https://www.researchgate.net/publication/228950233_Building_a_clustered_semantic_network_for_an_Entire_Large_Dictionary_of_Japanese

Jaeyoung Jung, Maki Miyake, Hiroyuki Akama. 2006. Recurrent Markov Cluster (RMCL) Algorithm for the Refinement of the Semantic Network. In: LREC. p. 1428–1431 http://www.lrec-conf.org/proceedings/lrec2006/

Hiroyuki Akama, Maki Miyake, Jaeyoung Jung, Brian Murphy, 2015. Using Graph Components Derived from an Associative Concept Dictionary to Predict fMRI Neural Activation Patterns that Represent the Meaning of Nouns, PLoS ONE, doi: https://doi.org/10.1371/journal.pone.0125725

Page updated

Google Sites

Report abuse