Metrics¶

class gct.GraphMetrics(data)¶

metrics for a graph. e.g. density, diameter etc.

The class is here just for convenience. It is not powerful and efficient. One may analyze the graph use other graph libraries (e.g. SNAP, igraph, networkit, etc) using converting functions.

__init__(data)¶

Parameters: data – a gct.Dataset object

property directed¶: Return True if the graph is directed

property weighted¶: Return True if the graph is weighted

property num_edges¶: Return number of edges

property num_vertices¶: Return number of vertices (nodes)

property density1¶: Return density of \(\frac{|E|}{|V|}\)

property density¶: Return density of \(\frac{|E|}{|all\ possible\ edges|}\).

property degrees¶: return weighted node degrees

property unweighted_degrees¶: return unweighted node degrees (e.g. number of out edges)

class gct.SNAPGraphMetrics(data)¶

metrics for a graph using SNAP

__init__(data)¶

Parameters: data – a gct.Dataset object

property degree_histogram¶: return list[(degree,num_node)]

property degree_in_histogram¶: return list[(degree,num_node)] of in degree

property degree_out_histogram¶: return list[(degree,num_node)] of out degree

property node_degrees¶: return list[(node,degree)]

property num_self_edges¶: return number of self edges

property scc_distribution¶: return list[(scc_size, count)] for strongly connected components

property wcc_distribution¶: return list[(cc_size, count)] for weakly connected components

property edge_bridges¶

return list[edge] where edge is a bridge.

An edge is a bridge if, when removed, increases the number of connected components.

effect_diameter(n_node=100, isDir=False)¶

Returns the (approximation of the) Effective Diameter (90-th percentile of the distribution of shortest path lengths) of a graph

Parameters

n_node – number of nodes to sample
isDir – consider direct or not

diameter(n_node=100, isDir=False)¶

Computes the diameter, or ‘longest shortest path’, of a Graph

This diameter is approximate, as it is calculated with an n_node number of random starting nodes.

Parameters

n_node – number of nodes to sample
isDir – consider direct or not

sample_shortest_path(n_node=100, isDir=False)¶

sample diameter, e.g. ‘shortest path’, of a Graph

Parameters

n_node – number of nodes to sample
isDir – consider direct or not

sample_degree_centrality(n_node=100)¶

Degree centrality of a node is defined as its degree/(N-1), where N is the number of nodes in the network.

Parameters: n_node – number of nodes to sample

sample_betweenness_centrality(quality=1, isDir=False)¶

Computes (approximate) Node and Edge Betweenness Centrality based on a sample

Parameters

quality – Quality of the approximation. 1.0 gives exact betweenness values.
isDir – consider direct or not

sample_closeness_centrality(n_node=100, normalized=True, isDir=False)¶

Returns closeness centrality sample in Graph. Closeness centrality is equal to 1/farness centrality.

Parameters

n_node – number of nodes to sample
normalized – Output should be normalized (True) or not (False).
isDir – consider direct or not

sample_farness_centrality(n_node=100, normalized=True, isDir=False)¶

Returns farness centrality sample in Graph. Farness centrality of a node is the average shortest path length to all other nodes that reside in the same connected component as the given node.

Parameters

n_node – number of nodes to sample
normalized – Output should be normalized (True) or not (False).
isDir – consider direct or not

page_rank_score(C=0.85, Eps=0.0001, MaxIter=100)¶

Computes the PageRank score of every node in Graph

Parameters

C – Damping factor.
Eps – Convergence difference.
MaxIter – Maximum number of iterations.

hubs_and_authorities_score(MaxIter=20)¶

Computes the Hubs and Authorities score of every node in Graph

return tuple of hubs score and authorrities score :param MaxIter: Maximum number of iterations.

sample_node_eccentricity(n_node=100, normalized=True, isDir=False)¶

Returns node eccentricity, the largest shortest-path distance from the node to any other node in the Graph.

return tuple of hubs score and authorrities score :param MaxIter: Maximum number of iterations.

eigenvector_centrality(Eps=0.0001, MaxIter=100)¶

Computes eigenvector centrality of all nodes in Graph. Eigenvector Centrality of a node N is defined recursively as the average of centrality values of N’s neighbors in the network.

Parameters

Eps – Convergence difference.
MaxIter – Maximum number of iterations.

average_clustering_coefficient(sample_node=False)¶

Computes the average clustering coefficient as defined in Watts and Strogatz, Collective dynamics of ‘small-world’ networks

Parameters: sample_node – compute clustering coefficient only for a random sample of SampleNodes nodes. Useful for approximate but quick computations.

distribution_clustering_coefficient(sample_node=False)¶

Computes the distribution of clustering coefficient as defined in Watts and Strogatz, Collective dynamics of ‘small-world’ networks

Parameters: sample_node – compute clustering coefficient only for a random sample of SampleNodes nodes. Useful for approximate but quick computations.

k_core(K)¶: Returns the K-core of the graph Graph. If the core of order K does not exist, the function returns an empty graph.

class gct.GraphClusterMetrics(data, clusteringobj)¶

metrics for a clustering of a graph. e.g. modularity.

__init__(data, clusteringobj)¶

Parameters

data – a gct.Dataset object
clusteringobj – a gct.Clustering object or refer to the groundtruth parameter of gct.from_edgelist()

property num_edges¶: number of edges of the graph

property num_vertices¶: number of vertices (nodes) of the graph

property node_degrees¶: degrees for all the nodes

property node_weights¶: weight sum for each node

property unweighted_degrees¶: unweighted degrees of nodes

property weighted_degrees¶: weighted degrees of nodes

property num_clusters¶: number of partitions

property cluster_indexes¶: cluster identifiers

property cluster_sizes¶: return cluster size for each cluster

property cluster_expansions¶

return expansions for each cluster where expansions is

\[\frac{c_s}{n_s}\]

where \(c_s\) is the cluster size of cluster \(s\) and \(n_s\) is the number of vertices for the cluster.

property cluster_cut_ratios¶

return cut ratio for each cluster which is

\[\frac{c_s}{n_s(n-n_s)}\]

where \(c_s\) is the cluster size of cluster \(s\), n is the number of node of graph and \(n_s\) is the number of vertices for the cluster.

property intra_cluster_densities¶

return internal cluster density for each cluster which is

\[\frac{m_s}{n_s(n-n_s)/2}\]

where \(m_s\) is number of edges of cluster \(s\), n is the number of node of graph and \(n_s\) is the number of vertices for the cluster.

property inter_cluster_density¶

return internal cluster density for the clustring which is

\[\frac{|\{ (u,v) | u \in C_i, v \in C_j, i \neq j \}|}{n(n-1)/2 + \sum_{s=1}^{k} n_s(n-n_s)/2}\]

where \(m_s\) is number of edges of cluster \(s\), n is the number of node of graph, \(n_s\) is the number of vertices for the cluster, \(C_i\) is the set of nodes in i-th cluster

property relative_cluster_densities¶

return relative cluster density for each cluster which is

\[\frac{deg_{intra}}{deg_{intra}+deg_{out}}\]

where the degree is weighted degree.

property modularity¶

return modularity of the clustering for weighted or unweighted graph. For unweighted graph it is

\[Q=1/(2m) \sum_{i,j} (A_{ij}- \frac{k_i k_j}{2m}) 1_{(i=j)}\]

where m is the number of edges, A is the adjacency matrix, 1 is indicator function, \(k_i k_j\) is the expected number of random edges between the two nodes.

property conductance¶

return conductance of a cluster S, for unweighted graph which is

\[\frac{Cut_S}{\min (deg_S, deg_{(V \setminus S}))}\]

where S is a cluster

property normalized_cut¶: normalized cut

property cluster_max_out_degree_fraction¶: max out degree fraction

property cluster_avg_out_degree_fraction¶: average out degree fraction

property cluster_flake_out_degree_fraction¶: Flake out degree fraction

property separability¶: separability \(m_s/c_s\)

property cluster_clustering_coefficient¶: (global) clustering coefficient

property cluster_local_clustering_coefficient¶: local clustering coefficient

class gct.ClusterComparator(clusteringobj1, clusteringobj2)¶

metrics for two clustering. e.g. nmi, overlap nmi etc. In case that some metrics requires ground truth, make ground truth as the first parameter.

When calculating a metric that does not support overlapping, the overlapped nodes are removed or retrun None

property ground_truth¶: Alias for for the first Clustering of constructor

property predition¶: Alias for for the second Clustering of constructor

sklean_nmi()¶: sklearn normalized_mutual_info_score

sklean_ami()¶: sklearn adjusted_mutual_info_score

sklean_ars()¶: sklearn adjusted_rand_score

sklean_completeness()¶: sklearn completeness_score

GenConvNMI(sync=None, id_remap=None, nmis=None, fnmi=True, risk=None, error=None, fast=None, membership=None, retain_dups=None)¶

A wrapper for https://github.com/eXascaleInfolab/GenConvNMI

Arguments

Usage: ./gecmi [options] <clusters1> <clusters2> clusters - clusters file in the CNL format, where each line lists space separated ids of the cluster members

-h [ –help ]

produce help message

–input arg

name of the input files

-s [ –sync ]

synchronize the node base omitting the non-matching nodes for the fair evaluation. The node base is selected automatically as a clustering having the least number of nodes.

-i [ –id-remap ]

remap ids allowing arbitrary input ids (non-contiguous ranges), otherwise ids should form a solid range and start from 0 or 1

-n [ –nmis ]

output both NMI [max] and NMI_sqrt

-f [ –fnmi ]

evaluate also FNMI, includes ‘-n’

-r [ –risk ] arg (=0.01)

probability of value being outside

-e [ –error ] arg (=0.01)

admissible error

-a [ –fast ]

apply fast approximate evaluations that are less accurate, but much faster on large networks

-m [–membership] arg (=1)

average expected membership of nodes in the clusters, > 0, typically >= 1

-d [ –retain-dups ]

retain duplicated clusters if any instead of filtering them out (not recommended)

Reference

overlap nmi	Esquivel, Alcides Viamontes, and Martin Rosvall. “Comparing network covers using mutual information.” arXiv preprint arXiv:1202.0425 (2012).
fair nmi	Amelio, Alessia, and Clara Pizzuti. “Is normalized mutual information a fair measure for comparing community detection methods?.” Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM, 2015.

OvpNMI(sync=None, allnmi=None, omega=None, membership=None, verbose=None)¶

A wrapper for https://github.com/eXascaleInfolab/OvpNMI

Compare sets of clusters by their members (nodes) using various measures (NMI, Omega) and considering overlaps

Arguments

Usage: onmi [OPTIONS] clsfile1 clsfile2

-a, –allnmis

output all NMIs (sqrt and sum-denominators, LFK besides the max-denominator) (default=off)

-m, –membership=FLOAT

average expected membership of nodes in the clusters, > 0, typically >= 1 (default=`1’)

-o, –omega

print the Omega measure (can be slow) (default=off)

-v, –verbose

detailed debugging (default=off)

Reference

overlap nmi

McDaid, Aaron F., Derek Greene, and Neil Hurley. “Normalized mutual information to evaluate overlapping community finding algorithms.” arXiv preprint arXiv:1110.2515 (2011).

overlap nmi

Lancichinetti, Andrea, Santo Fortunato, and János Kertész. “Detecting the overlapping and hierarchical community structure in complex networks.” New Journal of Physics 11.3 (2009): 033015.

Omega

Collins, Linda M., and Clyde W. Dent. “Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions.” Multivariate Behavioral Research 23.2 (1988): 231-242.

xmeasure_nmi(sync=None, all=False, membership=None, detailed=None)¶

A wrapper for https://github.com/eXascaleInfolab/xmeasures.

Normalized Mutual Information, normalized by either max or also sqrt, avg and min information content denominators.

ATTENTION: This is a standard NMI, which should be used ONLY for the HARD partitioning evaluation (non-overlapping clustering on a single resolution). It penalizes overlapping and multi-resolution structures.

Arguments

Usage: onmi [OPTIONS] clsfile1 clsfile2

-m, –membership=FLOAT

average expected membership of the nodes in the
clusters, > 0, typically >= 1. Used only to facilitate estimation of the nodes number on the containers preallocation if this number is not specified in the file header. (default=`1’)

-n, –nmi

evaluate NMI (Normalized Mutual Information),
applicable only to the non-overlapping clusters (default=off)

-a, –all

evaluate all NMIs using sqrt, avg and min
denominators besides the max one (default=off)

-d, –detailed

detailed (verbose) results output
(default=off)

xmeasure(sync=None, ovp=None, unique=None, omega=None, extended=None, f1=None, kind=False, membership=None, detailed=None)¶

Evaluating measures are:

OI: Omega Index (a fuzzy version of the Adjusted Rand Index, identical to the Fuzzy Rand Index), which yields the same value as Adjusted Rand Index when applied to the non-overlapping clusterings.
[M]F1: various [mean] F1 measures of the Greatest (Max) Match including the Average F1-Score (suggested by J. Leskovec) with optional weighting. NOTE: There are 3 matching policies available for each kind of F1. The most representative evaluation is performed by the F1p with combined matching policy (considers both micro and macro weighting).

Arguments

-O, –ovp

evaluate overlapping instead of the
multi-resolution clusters, where max matching for any shared member between R overlapping clusters is 1/R (the member is shared) instead of 1 (the member fully belongs to each [hierarchical sub]group) for the member belonging to R distinct clusters on R resolutions. NOTE: It has no effect for the Omega Index evaluation. (default=off)

-q, –unique

ensure on loading that all cluster members are
unique by removing all duplicates. (default=off)

-m, –membership=FLOAT

average expected membership of the nodes in the
clusters, > 0, typically >= 1. Used only to facilitate estimation of the nodes number on the containers preallocation if this number is not specified in the file header. (default=`1’)

-d, –detailed

detailed (verbose) results output
(default=off)

Omega Index options :

-o, –omega

evaluate Omega Index (a fuzzy version of the
Adjusted Rand Index, identical to the Fuzzy Rand Index and on the non-overlapping clusterings equals to ARI). (default=off)

-x, –extended

evaluate extended (Soft) Omega Index, which
does not excessively penalize distinctly shared nodes. (default=off)

Mean F1 options:

-f, –f1[=ENUM]

evaluate mean F1 of the [weighted] average of
the greatest (maximal) match by F1 or partial probability. NOTE: F1p <= F1h <= F1a, where:

p (F1p or Ph): Harmonic mean (F1) of two [weighted] averages of the Partial Probabilities, the most indicative as satisfies the largest number of the Formal Constraints (homogeneity, completeness and size/quantity except the rag bag in some cases);

h (F1h): Harmonic mean (F1) of two [weighted] averages of all local F1 (harmonic means of the Precision and Recall of the best matches of the clusters);

a (F1a): Arithmetic mean (average) of two [weighted] averages of all local F1, the least discriminative and satisfies the lowest number of the Formal Constraints.

(possible values=”partprob”,”harmonic”, “average” default=`partprob’)

-k, –kind[=ENUM]

kind of the matching policy:

w - Weighted by the number of nodes in each cluster

u - Unweighed, where each cluster is treated equally

c - Combined(w, u) using geometric mean (drops the value not so much as harmonic mean)

(possible values=”weighted”, “unweighed”, “combined” default=`weighted’)

Reference

F1a (Average F1-Score)

Yang, Jaewon, and Jure Leskovec. “Overlapping community detection at scale: a nonnegative matrix factorization approach.” Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.

Omega

Collins, Linda M., and Clyde W. Dent. “Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions.” Multivariate Behavioral Research 23.2 (1988): 231-242.

Metrics¶

Graph Clustering Toolkit

Navigation

Related Topics

-h [ –help ]	produce help message
–input arg	name of the input files
-s [ –sync ]	synchronize the node base omitting the non-matching nodes for the fair evaluation. The node base is selected automatically as a clustering having the least number of nodes.
-i [ –id-remap ]	remap ids allowing arbitrary input ids (non-contiguous ranges), otherwise ids should form a solid range and start from 0 or 1
-n [ –nmis ]	output both NMI [max] and NMI_sqrt
-f [ –fnmi ]	evaluate also FNMI, includes ‘-n’
-r [ –risk ] arg (=0.01)	probability of value being outside
-e [ –error ] arg (=0.01)	admissible error
-a [ –fast ]	apply fast approximate evaluations that are less accurate, but much faster on large networks
-m [–membership] arg (=1)	average expected membership of nodes in the clusters, > 0, typically >= 1
-d [ –retain-dups ]	retain duplicated clusters if any instead of filtering them out (not recommended)

-a, –allnmis	output all NMIs (sqrt and sum-denominators, LFK besides the max-denominator) (default=off)
-m, –membership=FLOAT	average expected membership of nodes in the clusters, > 0, typically >= 1 (default=`1’)
-o, –omega	print the Omega measure (can be slow) (default=off)
-v, –verbose	detailed debugging (default=off)

overlap nmi	McDaid, Aaron F., Derek Greene, and Neil Hurley. “Normalized mutual information to evaluate overlapping community finding algorithms.” arXiv preprint arXiv:1110.2515 (2011).
overlap nmi	Lancichinetti, Andrea, Santo Fortunato, and János Kertész. “Detecting the overlapping and hierarchical community structure in complex networks.” New Journal of Physics 11.3 (2009): 033015.
Omega	Collins, Linda M., and Clyde W. Dent. “Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions.” Multivariate Behavioral Research 23.2 (1988): 231-242.

-m, –membership=FLOAT	average expected membership of the nodes in the clusters, > 0, typically >= 1. Used only to facilitate estimation of the nodes number on the containers preallocation if this number is not specified in the file header. (default=`1’)
-n, –nmi	evaluate NMI (Normalized Mutual Information), applicable only to the non-overlapping clusters (default=off)
-a, –all	evaluate all NMIs using sqrt, avg and min denominators besides the max one (default=off)
-d, –detailed	detailed (verbose) results output (default=off)

-O, –ovp	evaluate overlapping instead of the multi-resolution clusters, where max matching for any shared member between R overlapping clusters is 1/R (the member is shared) instead of 1 (the member fully belongs to each [hierarchical sub]group) for the member belonging to R distinct clusters on R resolutions. NOTE: It has no effect for the Omega Index evaluation. (default=off)
-q, –unique	ensure on loading that all cluster members are unique by removing all duplicates. (default=off)
-m, –membership=FLOAT	average expected membership of the nodes in the clusters, > 0, typically >= 1. Used only to facilitate estimation of the nodes number on the containers preallocation if this number is not specified in the file header. (default=`1’)
-d, –detailed	detailed (verbose) results output (default=off)

-o, –omega	evaluate Omega Index (a fuzzy version of the Adjusted Rand Index, identical to the Fuzzy Rand Index and on the non-overlapping clusterings equals to ARI). (default=off)
-x, –extended	evaluate extended (Soft) Omega Index, which does not excessively penalize distinctly shared nodes. (default=off)

F1a (Average F1-Score)	Yang, Jaewon, and Jure Leskovec. “Overlapping community detection at scale: a nonnegative matrix factorization approach.” Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
Omega	Collins, Linda M., and Clyde W. Dent. “Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions.” Multivariate Behavioral Research 23.2 (1988): 231-242.