Algorithms¶
-
gct.
oslom_Infohiermap
(name, graph, seed=None)¶ A wrapper of Hierarchical Infomap collected from OSLOM
- Arguments
seed : int, random seed
- Reference
Martin Rosvall and Carl T. Bergstrom Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6(4): e18209 (2011).
-
gct.
oslom_Infomap
(name, graph, seed=None)¶ A wrapper of infomap collected from OSLOM
- Arguments
seed : int, random seed
- Reference
Rosvall, M. and Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 105, 11181123 (2008).
-
gct.
oslom_OSLOM
(name, graph, seed=None, r=10, hr=50, t=0.1, cp=0.5, fast=False, singlet=False, infomap=False, copra=False, louvain=False, runs=1)¶ A wrapper of OSLOM (Order Statistics Local Optimization Method) collected from OSLOM
Arguments
[-r R]: sets the number of runs for the first hierarchical level, bigger this value, more accurate the output (of course, it takes more). Default value is 10.
[-hr R]: sets the number of runs for higher hierarchical levels. Default value is 50 (the method should be faster since the aggregated network is usually much smaller).
[-seed m]: sets m equal to the seed for the random number generator. (instead of reading from time_seed.dat)
[-hint filename]: takes a partition from filename. The file is expected to have the nodes belonging to the same cluster on the same line.
[-load filename]: takes modules from a tp file you already got in a previous run.
[-t T]: sets the threshold equal to T, default value is 0.1
- [-singlet]: finds singletons. If you use this flag, the program generally finds a number of nodes which are not assigned to any module.
the program will assign each node with at least one not homeless neighbor. This only applies to the lowest hierarchical level.
- [-cp P]: sets a kind of resolution parameter equal to P. This parameter is used to decide if it is better to take some modules or their union.
Default value is 0.5. Bigger value leads to bigger clusters. P must be in the interval (0, 1).
[-fast]: is equivalent to “-r 1 -hr 1” (the fastest possible execution).
[-infomap runs]: calls infomap and uses its output as a starting point. runs is the number of times you want to call infomap.
[-copra runs]: same as above using copra.
[-louvain runs]: same as above using louvain method.
- Reference
A. Lancichinetti, F. Radicchi, J.J. Ramasco and S. Fortunato Finding statistically sig- nificant communities in networks PloS One 6, e18961 (2011)
-
gct.
oslom_copra
(name, graph, v=5, v1=None, v2=None, prop=None, repeat=None, mo=None, nosplit=False, extrasimplify=False, q=False, seed=None)¶ A wrapper of COPRA (Community Overlap PRopagation Algorithm) collected from OSLOM
- Arguments
Usage: java COPRA <file> <options>
-bi
<file> is a bipartite network. “-w” not allowed.
-w
<file> is a weighted unipartite network. “-bi” not allowed.
-v <v>
<v> is maximum number of communities/vertex. Default: 1.
-vs <v1> <v2>
Repeats for -v <v> for all <v> between <v1>-<v2>.
-prop <p>
<p> is maximum number of propagations. Default: no limit.
-repeat <r>
Repeats <r> times, for each <v>, and computes average.
-mo
Compute the overlap modularity of each solution.
-nosplit
Don’t split discontiguous communities.
-extrasimplify
Simplify communities again after splitting.
-q
Don’t show information when starting program.
- Reference
Gregory, Steve. “Finding overlapping communities in networks by label propagation.” New Journal of Physics 12.10 (2010): 103018.
-
gct.
oslom_louvain_method
(name, graph, seed=None)¶ A wrapper of Louvain algorithm collected from OSLOM
- Arguments
seed : int, random seed
- Reference
V.D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre Fast unfolding of com- munity hierarchies in large networks. J. Stat. Mech. 2008 (10): P10008
-
gct.
oslom_lpm
(name, graph, seed=None)¶ A wrapper of Label Propagation Method collected from OSLOM
- Arguments
seed : int, random seed
- Reference
Raghavan, U. N., Albert, R. and Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007).
-
gct.
oslom_modopt
(name, graph, seed=None, lamb=1, trials=5, temp_step=0.999, initial_temp=1e-06)¶ A wrapper of Modularity Optimization (Simulated Annealing) collected from OSLOM
- Arguments
./modopt [netfile] [random_seed] [lambda] [trials] [temp_step] [initial_temp]
default random_seed = 1
default lambda= 1
default trials= 5
default temp_step= 0.999
default initial_temp= 1e-6
- Reference
Sales-Pardo, M., Guimer, R., Moreira, A. A. and Amaral, L. A. N Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. USA 104, 1522415229 (2007).
-
gct.
pycabem_GANXiSw
(name, graph, **kwargs)¶ A wrapper of GANXiSw algorithm from https://sites.google.com/site/communitydetectionslpa/.
- Arguments
GANXiSw 3.0.2(used to be SLPAw) is for weighted (directed) networks, version=3.0.2 Usage: java -jar GANXiSw.jar -i networkfile
-i
input network file
-d
output director (default: output)
-L
set to 1 to use only the largest connected component
-t
maximum iteration (default: 100)
-run
number of repetitions
-r a
specific threshold in [0,0.5]
-ov
set to 0 to perform disjoint detection
-W
treat the input as a weighted network, set 0 to ignore the weights(default 1)
-Sym
set to 1 to make the edges symmetric/bi-directional (default 0)
-seed
user specified seed for random generator
-help
to display usage info
—————————–Advanced Parameters———————————
-v
weighted version in {1,2,3}, default=3
-Oov
set to 1 to output overlapping file, default=0
-Onc
set to 1 to output <nodeID communityID> format, 2 to output <communityID nodeID> format
-minC
min community size threshold, default=2
-maxC
max community size threshold
-ev
embedded SLPAw’s weighted version in {1,2,3}, default=1
-loopfactor
determine the num of loops for depomposing each large com, default=1.0
-Ohis1
set to 1 to output histgram Level1
-Ohis2
set to 1 to output histgram Level2
-OMem1
set to 1 to output each node’s memory content at Level 1
-EC
evolution cutoff, a real value > 1.0
- NOTE: 1. more parameters refer to Readme.pdf
parameters are CASE-SENSITIVE, e.g., -Onc is not -onc
- Reference
Xie, B. K. Szymanski and X. Liu, “SLPA: Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic Process”
-
gct.
pycabem_HiReCS
(name, graph, f=False, m=None, seed=None)¶ A wrapper of hirecs algorithm from http://www.lumais.com/hirecs.
- Arguments
Usage: ./hirecs [-o{t,c,j}] [-f] [-r] [-m<float>] <adjacency_matrix.hig>
-c
clean links, skip links validation
-f
fast quazy-mutual clustering (faster). Default: strictly-mutual (better)
-r
rand reorder (shuffle) nodes and links on nodes construction
-m<float>
modularity profit margin for early exit, float E [-1, 1]. Default: -0.999, but on practice >~= 0
- Reference
Cannot find any publication. Please refer to http://www.lumais.com/hirecs
-
gct.
pycabem_LabelRank
(name, graph, cutoff_r=0.01, inflation_in=2, NBDisimilarity_q=0.3, seed=None)¶ A wrapper of LabelRank algorithm from https://sites.google.com/site/communitydetectionslpa/.
- Arguments
> LabelRank netName cutoff_r inflation_in NBDisimilarity_q
- Reference
Xie, B. K. Szymanski, “LabelRank: A Stabilized Label Propagation Algorithm for Community Detection in Networks”, IEEE NSW, West point, NY, 2013
-
gct.
cdc_CONGA
(name, graph, silent=True, recompute=False, horizon=None, nComm=[5], weight='mean', seed=None)¶ Cluster-Overlap Newman Girvan Algorithm
- Arguments
- Usage: java conga.CONGA <file> [-e] [-g f] [-n nC] [-s] [-cd t] [-f v] [-r]
[-mem] [-m c] [-mo c] [-vad c] [-ov c] [-dia c] [-h h] [-GN] [-peacock s] [-w eW]
- Options:
- -n
Find clustering containing nC clusters. Default: 0.
- -s
Silent operation: don’t display steps in algorithm.
- -r
Recompute clusters even if clustering file exists.
- -h
Use region with horizon h. Default: unlimited.
- -w
Include edge weights in computations. A positive number or ‘min’, ‘mean’,’max’. Default: unweighted.
- Reference
Gregory, Steve. “An algorithm to find overlapping community structure in networks.” European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2007.
Gregory, Steve. “A fast algorithm to find overlapping communities in networks.” Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2008.
-
gct.
cdc_CliquePercolation
(name, graph, k=None, verbose=False, seed=None)¶ The sequential clique percolation algorithm is method for detecting clique percolation communities. It is an alternative to CFinder: instead of finding maximal cliques in a graph and producing communities of all the possible clique sizes, the SCP algorithm finds all the cliques of single size and produces the community structure for all the possible thresholds. The SCP algorithm should work well even for large sparse networks, but might have trouble for dense networks and cliques of large size.
- Arguments
Usage: ./k_clique [inputfile] [options]
Options:
-k=[clique size] : The size of the clique.
-v : Verbose mode.
- Reference
Kumpula, Jussi M., et al. “Sequential algorithm for fast clique percolation.” Physical Review E 78.2 (2008): 026109.
-
gct.
cdc_Connected_Iterative_Scan
(name, graph, l=None, seed=None)¶ Connected Iterative Scan Connected Iterative Scan is also known at times as Locally Optimal Sets.
- Arguments
./cis -i network -o output -dl delimiter -s seed file -l lambda value
- Reference
Kelley, Stephen. The existence and discovery of overlapping communities in large-scale networks. Diss. Rensselaer Polytechnic Institute, 2009.
-
gct.
cdc_DEMON
(name, graph, epsilon=0.25, min_community_size=3, seed=None)¶ DEMON
- Arguments
epsilon the tolerance required in order to merge communities
min_community_size: min_community_size
- Reference
Michele Coscia, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi: DEMON: a local-first discovery method for overlapping communities. KDD 2012:615-623
-
gct.
cdc_EAGLE
(name, graph, nThread=None, seed=None)¶ EAGLE (agglomerativE hierarchicAl clusterinG based on maximaL cliquE)
- Arguments
Sinopsi: ./2009-eagle nThreads src <dir|undir> [dest]
- Reference
Shen, Huawei, et al. “Detect overlapping and hierarchical community structure in networks.” Physica A: Statistical Mechanics and its Applications 388.8 (2009): 1706-1712.
-
gct.
cdc_FastCpm
(name, graph, k=4, seed=None)¶ Find maximal cliques, via the Bron Kerbosch algorithm, http://en.wikipedia.org/wiki/Bron%E2%80%93Kerbosch_algorithm. Then perform k-clique percolation on these cliques, for a given value of k.
- Arguments
k: k for k-clique
- Reference
Reid, Fergal, Aaron McDaid, and Neil Hurley. “Percolation computation in complex networks.” Proceedings of the 2012 international conference on advances in social networks analysis and mining (asonam 2012). IEEE Computer Society, 2012.
-
gct.
cdc_GCE
(name, graph, minimumCliqueSizeK=4, overlapToDiscardEta=0.6, fitnessExponentAlpha=1.0, CCHthresholdPhi=0.75, seed=None)¶ Greedy Clique Expansion Community Finder Community finder. Requires edge list of nodes. Processes graph in undirected, unweighted form. Edgelist must be two values separated with non digit character.
- Arguments
Use with either full (if specify all 5) or default (specify just graph file) parameters:
Full parameters are:
The name of the file to load
The minimum size of cliques to use as seeds. Recommend 4 as default, unless particularly small communities are required (in which case use 3).
The minimum value for one seed to overlap with another seed before it is considered sufficiently overlapping to be discarded (eta). 1 is complete overlap. However smaller values may be used to prune seeds more aggressively. A value of 0.6 is recommended.
The alpha value to use in the fitness function greedily expanding the seeds. 1.0 is recommended default. Values between .8 and 1.5 may be useful. As the density of edges increases, alpha may need to be increased to stop communities expanding to engulf the whole graph. If this occurs, a warning message advising that a higher value of alpha be used, will be printed.
The proportion of nodes (phi) within a core clique that must have already been covered by other cliques, for the clique to be ‘sufficiently covered’ in the Clique Coveage Heuristic
Usage: ./2011-gce graphfilename minimumCliqueSizeK overlapToDiscardEta fitnessExponentAlpha CCHthresholdPhi
Usage (with defaults): ./2011-gce graphfilename
This will run with the default values of: minimumCliqueSizeK 4, overlapToDiscardEta 0.6, fitnessExponentAlpha 1.0, CCHthresholdPhi .75
- Reference
Lee, Conrad, et al. “Detecting highly overlapping community structure by greedy clique expansion.” arXiv preprint arXiv:1002.1827 (2010).
-
gct.
cdc_HDEMON
(name, graph, epsilon=0.25, min_community_size=5, seed=None)¶ Hierarchical Demon
- Arguments
epsilon the tolerance required in order to merge communities
min_community_size: min_community_size
- Reference
M. Coscia, G. Rossetti, F. Giannotti, D. Pedreschi: Uncovering Hierarchical and Overlapping Communities with a Local-First Approach, TKDD 2015
-
gct.
cdc_LinkCommunities
(name, graph, threshold=[0.01], seed=None)¶ Link communities algorithm
- Arguments
./calcJaccards input.pairs output.jaccs
./clusterJaccards network.pairs network.jaccs network.clusters network.cluster_stats threshold
- Reference
Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann, Link communities reveal multiscale complexity in networks, Nature 466, 761 (2010)
-
gct.
cdc_MOSES
(name, graph, seed=None)¶ Model-based Overlapping Seed Expansion
Arguments
Reference
Aaron McDaid, Neil Hurley. Detecting highly overlapping communities with Model-based Overlapping Seed Expansion. ASONAM 2010
-
gct.
cdc_MSCD_AFG
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ (Arenas et al.’s) Fast Multi-Scale Community Detection Tools
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
- Reference
Arenas, Alex, Alberto Fernandez, and Sergio Gomez. “Analysis of the structure of complex networks at different resolution levels.” New Journal of Physics 10.5 (2008): 053039.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_HSLSW
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ (Huang et al.’s) Fast Multi-Scale Community Detection Tools
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
- The optional extra parameters must be given in order as a list with ‘,’ between values and no space. The parameters are:
Merging threshold beyond which communities are merged. Value must be in [0,1]. (default 0.5)
- Reference
Huang, Jianbin, et al. “Towards online multiresolution community detection in large-scale networks.” PloS one 6.8 (2011): e23829.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_LFK
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ (Lancichinetti et al.’s) Fast Multi-Scale Community Detection Tools
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
- The optional extra parameters must be given in order as a list with ‘,’ between values and no space. The parameters are:
Merging threshold beyond which communities are merged. Value must be in [0,1]. (default 0.5)
When growing a community, maximum number of neighbours from the sorted list of candidate nodes that can fail to join before the failures makes the growth stop. (default: infinite / value 0)
When growing a community, maximum number of iterations through all nodes during which nodes can be removed. (default: infinite / value 0)
Ex: To merge communities with 30% of overlapping nodes
mscd -g network.edges -w -c coms -p 0.3 -a LFK [1:-0.01:0]
- Reference
Lancichinetti, Andrea, Santo Fortunato, and János Kertész. “Detecting the overlapping and hierarchical community structure in complex networks.” New Journal of Physics 11.3 (2009): 033015.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_LFK2
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ ( Lancichinetti et al.’s multi-threaded ) Fast Multi-Scale Community Detection Tools
Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10]) extra_param: optional extra parameters for the given algorithm verbose: verbose level (default=0, max=2)
The optional extra parameters must be given in order as a list with ‘,’ between values and no space. The parameters are:
Merging threshold beyond which communities are merged. Value must be in [0,1]. (default 0.5)
Maximum number of additional threads the program can use (default: number of cores available / value 0)
Maximum number of seeds to start growing communities. (default: infinite / value 0)
Recursive level of seed neighbours not to use as seeds must be 0, 1 or 2. (default 1) (i.e. with 1, all the neighbours of a seed cannot be added as a seed)
- Reference
Lancichinetti, Andrea, Santo Fortunato, and János Kertész. “Detecting the overlapping and hierarchical community structure in complex networks.” New Journal of Physics 11.3 (2009): 033015.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_RB
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ (Reichardt and Bornholdt’s) Fast Multi-Scale Community Detection Tools
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
- Reference
Reichardt, Jörg, and Stefan Bornholdt. “Statistical mechanics of community detection.” Physical Review E 74.1 (2006): 016110.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_RN
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ (Ronhovde and Nussinov’s) Fast Multi-Scale Community Detection Tools
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
- Reference
Ronhovde, Peter, and Zohar Nussinov. “Local resolution-limit-free Potts model for community detection.” Physical Review E 81.4 (2010): 046114.
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_SO
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ stability optimisation (Fast Multi-Scale Community Detection Tools)
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
The optional extra parameters must be given in order as a list with ‘,’ between values and no space. The parameters are:
edge threshold (for walks of length>1 computed edges below this value are discarded) (default: 0)
memory saving mode (for walks of length>1 the number of networks kept in memory) (default: 0, i.e. off)
Ex: mscd -g network.edges -w -c coms -p 0.01,1 -a SO [0:0.1:5]
- Reference
Le Martelot, Erwan, and Chris Hankin. “Multi-scale community detection using stability optimisation within greedy algorithms.” arXiv preprint arXiv:1201.3307 (2012).
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_MSCD_SOM
(name, graph, scale_param='[1.2,2]', extra_param=None, verbose=0, seed=None)¶ tability optimisation using matrices (Fast Multi-Scale Community Detection Tools)
- Arguments
scale_param: scale parameter values (e.g.[1,2], [1:1:10])
extra_param: optional extra parameters for the given algorithm
verbose: verbose level (default=0, max=2)
The optional extra parameters must be given in order as a list with ‘,’ between values and no space. The parameters are:
memory saving mode (for walks of length>1 the number of networks kept in memory) (default: 0, i.e. off)
Ex: mscd -g network.edges -w -c coms -p 1 -a SOM [0:0.1:5]
- Reference
Le Martelot, Erwan, and Chris Hankin. “Multi-scale community detection using stability optimisation within greedy algorithms.” arXiv preprint arXiv:1201.3307 (2012).
Le Martelot, Erwan, and Chris Hankin. “Fast multi-scale detection of relevant communities in large-scale networks.” The Computer Journal 56.9 (2013): 1136-1150.
-
gct.
cdc_ParCPM
(name, graph, n_thread=None, W=30, poc=False, seed=None)¶ Model-based Overlapping Seed Expansion
- Arguments
-P <num_threads>
Specifies the number of parallel threads to be executed. Default value is 8.
-W <exp>
Set the sliding window buffer size to 2**<exp> Bytes. Default value is 30 for a sliding window buffer size of 2**30 B = 1 GB.
-p
Specifies to use the proof-of-concept method COSpoc rather than COS. With this option, parameter -W is ignored.
- Reference
Gregori, Enrico, Luciano Lenzini, and Simone Mainardi. “Parallel k-clique community detection on large-scale networks.” IEEE Transactions on Parallel and Distributed Systems 24.8 (2013): 1651-1660.
-
gct.
cdc_SVINET
(name, graph, num_cluster=None, inference='link-sampling', stratified=False, rfreq=None, max_iterations=None, no_stop=False, seed=None)¶ SVINET implements sampling based algorithms that derive from stochastic variational inference under the (assortative) mixed-membership stochastic blockmodel.
Arguments
SVINET: fast stochastic variational inference of undirected networks
- svinet [OPTIONS]
-file <name>
input tab-separated file with a list of undirected links
-n <N>
number of nodes in network
-k <K>
number of communities
-batch
run batch variational inference
-stratified
use stratified sampling, use with rpair or rnode options
-rnode
inference using random node sampling
-rpair
inference using random pair sampling
-link-sampling
inference using link sampling
-infset
inference using informative set sampling
-rfreq
set the frequency at which (convergence is estimated; statistics, e.g., heldout likelihood are computed)
-max-iterations
maximum number of iterations (use with -no-stop to avoid stopping in an earlier iteration)
-no-stop
disable stopping criteria
-seed
set GSL random generator seed
- Reference
Prem K. Gopalan, David M. Blei. Efficient discovery of overlapping communities in massive networks. To appear in the Proceedings of the National Academy of Sciences, 2013
-
gct.
cdc_TopGC
(name, graph, seed=None, **kwargs)¶ Top Graph Clusters (TopGC)
Arguments
Usage: java -jar TopGC.jar -i inputGraph [-p mostProm] [-max maxClusterSize] [-min minClusterSize] [-lambda overlapThreshold] [-l wordLength] [-m m] [-w numWords] [-trials trials]
Options:
- -max
maxClusterSize The maximum size allowed for a cluster. Default is 20.
- -min
minClusterSize The minimum size allowed for a cluster. Default is 5.
- -p
mostProm The pruning parameter. Limits the number of nodes to consider for final clustering. Lower values will achieve greater pruning, as well as limit memory usage of the program. Ranges from 1 to graph size. Default is 0.3*(# of nodes in graph) or 50,000 (whichever is smaller).
- -lambda
overlapThreshold The maximum overlap allowed between the nodes of two clusters. Calculated as the ratio of the size of their intersection and the smallest cluster size. A double value ranging from 0 to 1. Default is 0.2.
- -l
wordLength Length of node signature. Experimentally, higher values tend to achieve greater precision, though lower recall.
- -m
m Number of minhash values to obtain per node.
- -w
w Number of signatures to create per node. Higher values may be necessary in graphs with loose clusters.
- -trials
trials The number of neighborhood instances to create per node. Used only in weighted graphs.
- Reference
Macropol, Kathy, and Ambuj Singh. “Scalable discovery of best clusters on large graphs.” Proceedings of the VLDB Endowment 3.1-2 (2010): 693-702.
-
gct.
cdc_clique_modularity
(name, graph, method='BK', nComm=10, seed=None)¶ Detecting Communities in Networks by Merging Cliques
- Arguments
java -cp CM.jar clique_modularity.CM <networkFile> -m <method> -c <nComm> where method is BK or KJ
- Reference
Yan, Bowen, and Steve Gregory. “Detecting communities in networks by merging cliques.” Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. Vol. 1. IEEE, 2009.
-
gct.
cgcc_CGGC
(name, graph, startk=None, finalk=None, runs=None, ensemblesize=None, algorithm=None, seed=None)¶ A wrapper of CGGC (Core Groups Graph ensemble Clustering) method* from https://github.com/eXascaleInfolab/CGGC
Performs clustering of the unweighed undirected network (graph) using RG, CGGC_RG or CGGCi_RG algorithms.
Arguments
-h [ –help ]
Display this message
-i [ –inpfmt ] arg (=)
input network format (inferred from the file extension if not specified explicitly): e - nse format: header in comments + each line specifies a single edge: <sid> <did>, a - nse format: header in comments + each line specifies a single arc: <sid> <did>, m - Metis format of the unweighted network, p - Pajek format of the undirected unweighted network
-s [ –startk ] arg (=2)
sample size of RG
-f [ –finalk ] arg (=2000)
sample size for final RG step
-r [ –runs ] arg (=1)
number of runs from which to pick the best result
-e [ –ensemblesize ] arg (=-1)
size of ensemble for ensemble algorithms (-1 = ln(#vertices))
-a [ –algorithm ] arg (=3)
algorithm: 1: RG, 2: CGGC_RG, 3: CGGCi_RG
-o [ –outfmt ] arg (=l)
output clusters format: l - cnl format: header in comments + each line corresponds to the cluster and contains ids of the member vertices, c - each line corresponds to the cluster and contains ids of the member vertices, v - each line corresponds to the vertex and contains id of the owner cluster
-f [ –outfile ] arg
file to store the detected communities
-d [ –seed ] arg
seed value to initialize random number generator
- Reference
Ovelgönne, Michael, and Andreas Geyer-Schulz. “An ensemble learning strategy for graph clustering.” Graph Partitioning and Graph Clustering 588 (2012): 187.
-
gct.
dct_dlplm
(name, graph, seed=None)¶ A wrapper of dlplm (Distributed Graph Clustering using Thrill) algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
- Arguments
None
- Reference
Hamann, Michael, et al. “Distributed Graph Clustering Using Modularity and Map Equation.” European Conference on Parallel Processing. Springer, Cham, 2018.
-
gct.
dct_dlslm
(name, graph, seed=None)¶ A wrapper of dlslm (Distributed Graph Clustering using Thrill) algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
- Arguments
None
- Reference
Hamann, Michael, et al. “Distributed Graph Clustering Using Modularity and Map Equation.” European Conference on Parallel Processing. Springer, Cham, 2018.
-
gct.
dct_dlslm_map_eq
(name, graph, seed=None)¶ A wrapper of dlslm_map_eq (Distributed Graph Clustering using Thrill) algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
- Arguments
None
- Reference
Hamann, Michael, et al. “Distributed Graph Clustering Using Modularity and Map Equation.” European Conference on Parallel Processing. Springer, Cham, 2018.
-
gct.
dct_dlslm_no_contraction
(name, graph, seed=None)¶ A wrapper of dlslm_no_contraction (Distributed Graph Clustering using Thrill) algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
Arguments None
Reference Hamann, Michael, et al. “Distributed Graph Clustering Using Modularity and Map Equation.” European Conference on Parallel Processing. Springer, Cham, 2018.
-
gct.
dct_dlslm_with_seq
(name, graph, seed=None)¶ A wrapper of dlslm_with_seq (Distributed Graph Clustering using Thrill) algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
- Arguments
None
- Reference
Hamann, Michael, et al. “Distributed Graph Clustering Using Modularity and Map Equation.” European Conference on Parallel Processing. Springer, Cham, 2018.
-
gct.
dct_infomap
(name, graph, seed=None)¶ A wrapper of infomap algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
Arguments seed : int, random seed
- Reference
Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. The European Physical Journal Special Topics 178(1), 13–23 (2009)
-
gct.
dct_seq_louvain
(name, graph, seed=None)¶ A wrapper of Louvain algorithm collected from https://github.com/kit-algo/distributed_clustering_thrill
- Arguments
seed : int, random seed
- Reference
Blondel, V., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10) (2008)
-
gct.
igraph_community_edge_betweenness
(name, graph, seed=None)¶ A wrapper of community_edge_betweenness algorithm from iGraph
- Arguments
None
- Reference
M Girvan and MEJ Newman: Community structure in social and biological networks, Proc. Nat. Acad. Sci. USA 99, 7821-7826 (2002)
-
gct.
igraph_community_fastgreedy
(name, graph, seed=None)¶ A wrapper of community_fastgreedy algorithm from iGraph
- Arguments
None
- Reference
A Clauset, MEJ Newman and C Moore: Finding community structure in very large networks. Phys Rev E 70, 066111 (2004).
-
gct.
igraph_community_infomap
(name, graph, seed=None)¶ A wrapper of community_infomap algorithm from iGraph
- Arguments
None
Reference M. Rosvall and C. T. Bergstrom: Maps of information flow reveal community structure in complex networks, PNAS 105, 1118 (2008). http://dx.doi.org/10.1073/pnas.0706851105, http://arxiv.org/abs/0707.0609
Rosvall, D. Axelsson, and C. T. Bergstrom: The map equation, Eur. Phys. J. Special Topics 178, 13 (2009). http://dx.doi.org/10.1140/epjst/e2010-01179-1, http://arxiv.org/abs/0906.1405
-
gct.
igraph_community_label_propagation
(name, graph, seed=None)¶ A wrapper of community_label_propagation algorithm from iGraph
- Arguments
None
- Reference
Raghavan, U.N. and Albert, R. and Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036106, 2007. http://arxiv.org/abs/0709.2938.
-
gct.
igraph_community_leading_eigenvector
(name, graph, seed=None)¶ A wrapper of community_leading_eigenvector algorithm from iGraph
- Arguments
None
- Reference
MEJ Newman: Finding community structure in networks using the eigenvectors of matrices, arXiv:physics/0605087
-
gct.
igraph_community_multilevel
(name, graph, seed=None)¶ A wrapper of community_multilevel algorithm from iGraph
- Arguments
None
- Reference
VD Blondel, J-L Guillaume, R Lambiotte and E Lefebvre: Fast unfolding of community hierarchies in large networks, J Stat Mech P10008 (2008), http://arxiv.org/abs/0803.0476
-
gct.
igraph_community_optimal_modularity
(name, graph, seed=None)¶ A wrapper of community_optimal_modularity algorithm from iGraph
- Arguments
None
- Reference
TBD
-
gct.
igraph_community_spinglass
(name, graph, seed=None)¶ A wrapper of community_spinglass algorithm from iGraph
- Arguments
None
- Reference
Reichardt J and Bornholdt S: Statistical mechanics of community detection. Phys Rev E 74:016110 (2006). http://arxiv.org/abs/cond-mat/0603718.
Traag VA and Bruggeman J: Community detection in networks with positive and negative links. Phys Rev E 80:036115 (2009). http://arxiv.org/abs/0811.2329.
-
gct.
igraph_community_walktrap
(name, graph, seed=None)¶ A wrapper of community_walktrap algorithm from iGraph
- Arguments
None
- Reference
Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, http://arxiv.org/abs/physics/0512106
-
gct.
karateclub_BigClam
(name, graph, dimensions=8, iterations=50, learning_rate=0.005, seed=None)¶ An implementation of “BigClam” from the WSDM ‘13 paper “Overlapping Community Detection at Scale: A Non-negative Matrix Factorization Approach”. The procedure uses gradient ascent to create an embedding which is used for deciding the node-cluster affiliations.
- Parameters:
dimensions (int) – Number of embedding dimensions. Default 8. iterations (int) – Number of training iterations. Default 50. learning_rate (float) – Gradient ascent learning rate. Default is 0.005.
-
gct.
karateclub_DANMF
(name, graph, layers=[32, 8], pre_iterations=100, iterations=100, seed=42, lamb=0.01)¶ An implementation of “DANMF” from the CIKM ‘18 paper “Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection”. The procedure uses telescopic non-negative matrix factorization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
- Parameters:
layers (list) – Autoencoder layer sizes in a list of integers. Default [32, 8]. pre_iterations (int) – Number of pre-training epochs. Default 100. iterations (int) – Number of training epochs. Default 100. seed (int) – Random seed for weight initializations. Default 42. lamb (float) – Regularization parameter. Default 0.01.
-
gct.
karateclub_EdMot
(name, graph, component_count=2, cutoff=50, seed=None)¶ An implementation of “Edge Motif Clustering” from the KDD ‘19 paper “EdMot: An Edge Enhancement Approach for Motif-aware Community Detection”. The tool first creates the graph of higher order motifs. This graph is clustered by the Louvain method. The resulting cluster memberships are stored as a dictionary.
- Parameters:
component_count (int) – Number of extracted motif hypergraph components. Default is 2. cutoff (int) – Motif edge cut-off value. Default is 50.
-
gct.
karateclub_EgoNetSplitter
(name, graph, resolution=1.0, seed=None)¶ An implementation of “Ego-Splitting” from the KDD ‘17 paper “Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters”. The tool first creates the ego-nets of nodes. A persona-graph is created which is clustered by the Louvain method. The resulting overlapping cluster memberships are stored as a dictionary.
- Parameters:
resolution (float) – Resolution parameter of Python Louvain. Default 1.0.
-
gct.
karateclub_NNSED
(name, graph, dimensions=32, iterations=10, seed=42)¶ An implementation of “NNSED” from the CIKM ‘17 paper “A Non-negative Symmetric Encoder-Decoder Approach for Community Detection”. The procedure uses non-negative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
- Parameters:
layers (int) – Embedding layer size. Default is 32. iterations (int) – Number of training epochs. Default 10. seed (int) – Random seed for weight initializations. Default 42.
-
gct.
karateclub_SCD
(name, graph, iterations=25, eps=1e-06, seed=None)¶ An implementation of “SCD” from the WWW ‘14 paper “High Quality, Scalable and Parallel Community Detection for Large Real Graphs”. The procedure greedily optimizes the approximate weighted community clustering metric. First, clusters are built around highly clustered nodes. Second, we refine the initial partition by using the approximate WCC. These refinements happen for the whole vertex set.
- Parameters:
iterations (int) – Refinemeent iterations. Default is 25. eps (float) – Epsilon score for zero division correction. Default is 10**-6.
-
gct.
karateclub_SymmNMF
(name, graph, dimensions=32, iterations=200, rho=100.0, seed=None)¶ An implementation of “Symm-NMF” from the SDM‘12 paper “Symmetric Nonnegative Matrix Factorization for Graph Clustering”. The procedure decomposed the second power od the normalized adjacency matrix with an ADMM based non-negative matrix factorization based technique. This results in a node embedding and each node is associated with an embedding factor in the created latent space.
- Parameters:
dimensions (int) – Number of dimensions. Default is 32. iterations (int) – Number of power iterations. Default is 200. rho (float) – ADMM tuning parameter. Default is 100.0.
-
gct.
mcl_MCL
(name, graph, **kwargs)¶ A wrapper of MCL (Markov Cluster Algorithm) from https://micans.org/mcl/
- Arguments
Since there are a lot options for mcl. refer to https://micans.org/mcl/man/mcl.html for all of them. However only specify algrithm options, don’t specify file/folder/format related option.
- Reference
Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000
-
gct.
networkit_CutClustering
(name, graph, alpha=0.1, seed=None)¶ A wrapper of CutClustering algorithm from NetworKit.
- Arguments
None
- Reference
Tarjan, Robert E.; Tsioutsiouliklis, Kostas. Graph Clustering and Minimum Cut Trees. Internet Mathematics 1 (2003), no. 4, 385–408.
-
gct.
networkit_LPDegreeOrdered
(name, graph, seed=None)¶ A wrapper of LPDegreeOrdered algorithm from NetworKit. Label propagation-based community detection algorithm which processes nodes in increasing order of node degree.
- Arguments
None
- Reference
TBD
-
gct.
networkit_PLM
(name, graph, refine=False, gamma=1.0, par='balanced', maxIter=32, turbo=True, recurse=True, seed=None)¶ A wrapper of PLM (Parallel Louvain Method) algorithm from NetworKit. Parallel Louvain Method - the Louvain method, optionally extended to a full multi-level algorithm with refinement
- Arguments
refine (bool, optional) – Add a second move phase to refine the communities.
gamma (double) – Multi-resolution modularity parameter: 1.0 -> standard modularity 0.0 -> one community 2m -> singleton communities
par (string) – parallelization strategy
maxIter (count) – maximum number of iterations for move phase
turbo (bool, optional) – faster but uses O(n) additional memory per thread
recurse (bool, optional) – use recursive coarsening, see http://journals.aps.org/pre/abstract/10.1103/PhysRevE.89.049902 for some explanations (default: true)
- Reference
TBD
-
gct.
networkit_PLP
(name, graph, updateThreshold=None, maxIterations=None, seed=None)¶ A wrapper of PLP (Parallel Label Propagation) algorithm from NetworKit. Parallel label propagation for community detection: Moderate solution quality, very short time to solution.
As described in Ovelgoenne et al: An Ensemble Learning Strategy for Graph Clustering Raghavan et al. proposed a label propagation algorithm for graph clustering. This algorithm initializes every vertex of a graph with a unique label. Then, in iterative sweeps over the set of vertices the vertex labels are updated. A vertex gets the label that the maximum number of its neighbors have. The procedure is stopped when every vertex has the label that at least half of its neighbors have.
- Arguments
None
- Reference
Ovelgönne, Michael, and Andreas Geyer-Schulz. “An ensemble learning strategy for graph clustering.” Graph Partitioning and Graph Clustering 588 (2012): 187.
-
gct.
alg_GossipMap
(name, graph, thresh=None, tol=None, maxiter=None, maxspiter=None, trials=None, interval=None, outmode=None, ncpus=None, scheduler=None, engine_opts=None, graph_opts=None, scheduler_opts=None, seed=None)¶ A wrapper of GossipMap algorithm from https://github.com/uwescience/GossipMap.
- Arguments
GossipMap Algorithm:
–help
Print this help message.
–graph arg
The graph file. Required.
–format arg (=snap)
The graph file format.
–thresh arg (=0.001)
The threshold for convergence condition.
–tol arg (=1.00e-15)
The threshold for pagerank (ergodic state) convergence condition.
–maxiter arg (=10)
The maximum of the iteration for finding community.
–maxspiter arg (=3)
The maximum of the iteration of sp-graph for finding community.
–trials arg (=1)
The number of trials for finding community repeatedly.
–interval arg (=3)
The time interval for checking whether the received message is valid or not.
–mode arg (=1)
The running mode of finding community: 1 - coreOnce, 2 - coreRepeat.
–outmode arg (=2)
The running outerloop mode of finding community: 1 - outerOnce, 2 - outerRepeat.
–prefix arg
If set, this app will save the community detection result.
–ncpus arg (=6)
Number of cpus to use per machine. Defaults to (#cores - 2)
–scheduler arg
Supported schedulers are: fifo, sweep, priority, queued_fifo. Too see options for each scheduler, run the program with the option —schedhelp=[scheduler_name]
–engine_opts arg
string of engine options i.e., “timeout=100”
–graph_opts arg
String of graph options i.e., “ingress=random”
–scheduler_opts arg
String of scheduler options i.e., “strict=true”
–engine_help arg
Display help for engine options.
–graph_help arg
Display help for the distributed graph.
–scheduler_help arg
Display help for schedulers.
- Reference
Bae, Seung-Hee, and Bill Howe. “GossipMap: A distributed community detection algorithm for billion-edge directed graphs.” High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conference for. IEEE, 2015.
-
gct.
alg_RelaxMap
(name, graph, seed=None, thread=None, threshold=0.001, vThresh=0.0, maxIter=10, prior=True)¶ A wrapper of RelaxMap algorithm from https://github.com/uwescience/RelaxMap.
Arguments
args[0]:
seed - random seed value for generating random sequential order of vertices for each iteration.
args[1]:
network data - the input graph data. RelaxMap supports 1) pajek format (.net) and 2) edge list format (.txt).
args[2]:
# thread - the number of threads
args[3]:
# attempts - the number of attempts. (this is not applied yet, so it only return with 1 attempt.)
args[4]:
threshold - the stop condition threshold (recommended 1e-3 or 1e-4)
args[5]:
vThresh - the threshold value for each vertex movement (recommended 0.0)
args[6]:
maxIter - the number of maximum iteration for each super-step.
args[7]:
outDir - the directory where the output files will be located.
args[8]:
prior/normal flag - apply the prioritized search for efficient runs (prior) or not (normal).
- Reference
Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe, “Scalable Flow-Based Community Detection for Large-Scale Network Analysis,” In Proceedings of IEEE 13th International Conference on Data Mining Workshop (ICDMW), 2013
-
gct.
alg_pg_label_propagation
(name, graph, execution='async', ncpus=None, scheduler=None, engine_opts=None, graph_opts=None, scheduler_opts=None, seed=None)¶ A wrapper of LPA algorithm from PowerGraph.
- Arguments
Label Propagation algorithm:
–help
Print this help message.
–graph arg
The graph file. Required
–execution arg (=synchronous)
Execution type (synchronous or asynchronous)
–saveprefix arg
If set, will save the resultant pagerank to a sequence of files with prefix saveprefix
–ncpus arg (=6)
Number of cpus to use per machine. Defaults to (#cores - 2)
–scheduler arg
Supported schedulers are: fifo, sweep, priority, queued_fifo. Too see options for each scheduler, run the program with the option —schedhelp=[scheduler_name]
–engine_opts arg
string of engine options i.e., “timeout=100”
–graph_opts arg
String of graph options i.e., “ingress=random”
–scheduler_opts arg
String of scheduler options i.e., “strict=true”
–engine_help arg
Display help for engine options.
–graph_help arg
Display help for the distributed graph.
–scheduler_help arg
Display help for schedulers.
- Reference
Gonzalez, Joseph E., et al. “Powergraph: distributed graph-parallel computation on natural graphs.” OSDI. Vol. 12. No. 1. 2012.
-
gct.
scan_AnyScan_Scan
(name, graph, **kwargs)¶ A wrapper of scan@anyscan algorithm
- Arguments
- Usage: ./anyscan [switches] -i filename -m minpts -e epsilon -o output -t threads
- -m minpts
: input parameter of DBSCAN, min points to form a cluster, e.g. 2
- -e epsilon
: input parameter of DBSCAN, radius or threshold on neighbourhoods retrieved, e.g. 0.8
- -a alpha
: block size alpha
- -b beta
: block size beta
- -t threads
: number of threads to be employed
- Reference
Mai S T, Dieu M S, Assent I, et al. Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs[C]//Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017: 349-360
-
gct.
scan_AnyScan_anyScan
(name, graph, **kwargs)¶ A wrapper of anyscan algorithm
- Arguments
- Usage: ./anyscan [switches] -i filename -m minpts -e epsilon -o output -t threads
- -m minpts
: input parameter of DBSCAN, min points to form a cluster, e.g. 2
- -e epsilon
: input parameter of DBSCAN, radius or threshold on neighbourhoods retrieved, e.g. 0.8
- -a alpha
: block size alpha
- -b beta
: block size beta
- -t threads
: number of threads to be employed
- Reference
Mai S T, Dieu M S, Assent I, et al. Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs[C]//Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017: 349-360
-
gct.
scan_AnyScan_anyScanParl
(name, graph, **kwargs)¶ A wrapper of anyscan parallel algorithm
- Arguments
- Usage: ./anyscan [switches] -i filename -m minpts -e epsilon -o output -t threads
- -m minpts
: input parameter of DBSCAN, min points to form a cluster, e.g. 2
- -e epsilon
: input parameter of DBSCAN, radius or threshold on neighbourhoods retrieved, e.g. 0.8
- -a alpha
: block size alpha
- -b beta
: block size beta
- -t threads
: number of threads to be employed
- Reference
Mai S T, Dieu M S, Assent I, et al. Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs[C]//Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017: 349-360
-
gct.
scan_AnyScan_pScan
(name, graph, **kwargs)¶ A wrapper of pScan@anyscan algorithm
- Arguments
- Usage: ./anyscan [switches] -i filename -m minpts -e epsilon -o output -t threads
- -m minpts
: input parameter of DBSCAN, min points to form a cluster, e.g. 2
- -e epsilon
: input parameter of DBSCAN, radius or threshold on neighbourhoods retrieved, e.g. 0.8
- -a alpha
: block size alpha
- -b beta
: block size beta
- -t threads
: number of threads to be employed
- Reference
Mai S T, Dieu M S, Assent I, et al. Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs[C]//Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017: 349-360
-
gct.
scan_Scanpp
(name, graph, mu=1, epsilon=0, seed=None)¶ A wrapper of scan++ algorithm
Arguments
option
arguments (Default)
description
-e
Real number between 0 and 1 (Default: 0)
Set the epsilon parameter for the clustering.
-m
Natural number (Default: 1)
Set the mu parameter for the clustering.
-v
No arguments are required.
Display statistics of the clustering.
-r
No arguments are required.
Display the clustering results.
- Reference
Shiokawa H, Fujiwara Y, Onizuka M. SCAN++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs[J]. Proceedings of the VLDB Endowment, 2015, 8(11): 1178-1189.
-
gct.
scan_pScan
(name, graph, **kwargs)¶ A wrapper of pScan algorithm
Arguments
epsilon:
similarity-threshold
mu:
density-threshold
- Reference
Chang L, Li W, Qin L, et al. $mathsf {pSCAN} $: Fast and Exact Structural Graph Clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(2): 387-401.
Yulin Che, Shixuan Sun, Qiong Luo. 2018. Parallelizing Pruning-based Graph Structural Clustering. In ICPP 2018: 47th International Conference on Parallel Processing, August 13–16, 2018, Eugene, OR, USA. ACM, New York, NY, USA
-
gct.
scan_ppScan
(name, graph, **kwargs)¶ A wrapper of ppScan algorithm
Arguments
epsilon:
similarity-threshold
mu:
density-threshold
- Reference
Chang L, Li W, Qin L, et al. $mathsf {pSCAN} $: Fast and Exact Structural Graph Clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(2): 387-401.
Yulin Che, Shixuan Sun, Qiong Luo. 2018. Parallelizing Pruning-based Graph Structural Clustering. In ICPP 2018: 47th International Conference on Parallel Processing, August 13–16, 2018, Eugene, OR, USA. ACM, New York, NY, USA
-
gct.
scan_ppScanSSE
(name, graph, **kwargs)¶ A wrapper of ppScanSSE algorithm
Arguments
epsilon:
similarity-threshold
mu:
density-threshold
- Reference
Chang L, Li W, Qin L, et al. $mathsf {pSCAN} $: Fast and Exact Structural Graph Clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(2): 387-401.
Yulin Che, Shixuan Sun, Qiong Luo. 2018. Parallelizing Pruning-based Graph Structural Clustering. In ICPP 2018: 47th International Conference on Parallel Processing, August 13–16, 2018, Eugene, OR, USA. ACM, New York, NY, USA
-
gct.
sklearn_AffinityPropagation
(name, graph, damping=None, max_iter=None, convergence=None, verbose=None, seed=None)¶ A wrapper of AffinityPropagation algorithm from http://scikit-learn.org.
Parameters
- dampingfloat, optional, default: 0.5
Damping factor (between 0.5 and 1) is the extent to which the current value is maintained relative to incoming values (weighted 1 - damping). This in order to avoid numerical oscillations when updating these values (messages).
- max_iterint, optional, default: 200
Maximum number of iterations.
- convergence_iterint, optional, default: 15
Number of iterations with no change in the number of estimated clusters that stops the convergence.
- verboseboolean, optional, default: False
Whether to be verbose.
- Reference
Brendan J. Frey and Delbert Dueck, “Clustering by Passing Messages Between Data Points”, Science Feb. 2007
-
gct.
sklearn_SpectralClustering
(name, graph, **kwargs)¶ A wrapper of SpectralClustering algorithm from http://scikit-learn.org.
- Parameters
- eigen_solver{None, ‘arpack’, ‘lobpcg’, or ‘amg’}
The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities
- random_stateint, RandomState instance or None (default)
A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver == ‘amg’ and by the K-Means initialization. Use an int to make the randomness deterministic.
- n_initint, optional, default: 10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
- gammafloat, default=1.0
Kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for
affinity='nearest_neighbors'
.- n_neighborsinteger
Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for
affinity='rbf'
.- eigen_tolfloat, optional, default: 0.0
Stopping criterion for eigendecomposition of the Laplacian matrix when using arpack eigen_solver.
- assign_labels{‘kmeans’, ‘discretize’}, default: ‘kmeans’
The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization.
- degreefloat, default=3
Degree of the polynomial kernel. Ignored by other kernels.
- coef0float, default=1
Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.
- kernel_paramsdictionary of string to any, optional
Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.
- n_jobsint or None, optional (default=None)
The number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.
- Reference
Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324
A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323
Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi http://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
-
gct.
snap_Clauset_Newman_Moore
(name, graph, seed=None)¶ A wrapper of CommunityCNM algorithm from SNAP
- Arguments
None
- Reference
Clauset, Aaron, Mark EJ Newman, and Cristopher Moore. “Finding community structure in very large networks.” Physical review E 70.6 (2004): 066111.
-
gct.
snap_Girvan_Newman
(name, graph, seed=None)¶ A wrapper of CommunityGirvanNewman algorithm from SNAP
- Arguments
None
- Reference
Girvan, Michelle, and Mark EJ Newman. “Community structure in social and biological networks.” Proceedings of the national academy of sciences 99.12 (2002): 7821-7826.
-
gct.
alg_Paris
(name, graph, vmax_start=None, vmax_end=None, c=None, niter=None, seed=None)¶ A wrapper of paris algorithm from https://github.com/tbonald/paris
- Arguments
None
- Reference
Bonald, Thomas, et al. “Hierarchical Graph Clustering using Node Pair Sampling.” arXiv preprint arXiv:1806.01664 (2018).
-
gct.
alg_lso_cluster
(name, graph, **kwargs)¶ A wrapper of lso-cluster algorithm from https://github.com/twanvl/graph-cluster
Please specify long-format options. Don’t specify output.
- Arguments
Optional parameters regarding the clustering:
‘eval’,
clustering Evaluate the objective function on clustering, do not optimize.
‘init’,
clustering Use the given clustering as initial value for the optimization. Default: each node is initially in a separate cluster, i.e. clustering=1:length(A).
Optional parameters regarding the objective:
‘loss’,
loss Use the given loss/objective function. The loss is given a string name. See below for a list of supported loss functions. Default: loss = ‘modularity’
‘total_volume’,
m Replace the total volume (sum of edges) by m. Many objectives use the total volume for normalization, and changing it will change the scale at which clusters are found. Usually increasing the total volume will result in larger clusters. Default: m = sum(sum(A))
‘extra_loss’,
alpha Add a term to the loss function that penalizes the volume of clusters, with weight alpha.
‘num_clusters’,
n Force the solution to have the given number of clusters. The algorithm uses a binary search to alter the objective until it finds a solution with the given number of clusters. The alteration is the same as the one used by extra_loss.
‘min_num_cluster’,
n Force the solution to have at least the given number of clusters.
‘max_num_cluster’,
n Force the solution to have at most the given number of clusters.
‘max_cluster_size’,
n Allow clusters to have at most n nodes.
Optional parameters about internal algorithm details, you only need these if you know what you are doing:
‘seed’,
random_seed Use a given random seed. By default a fixed seed is used, so repeated runs with the same input give the same output.
‘num_repeats’,
n Repeat the search n times from scratch with different random seeds and return the best result. Default: 1
‘num_partitions’,
n Number of times to try and break apart the clusters and re-cluster them Default: 0
‘optimize_exhaustive’,
bool Use an exhaustive search instead of local search. This is of course very slow. Can be combined with max_num_cluster. Default: false.
‘optimize_higher_level’,
bool Use a hierarchical optimizer, where small clusters are considered as nodes of a higher level graph. Default: true.
‘always_consider_empty’,
bool Always consider the move of a node into a new singleton cluster. Default: true.
‘num_loss_tweaks’,
n Maximum number of iterations in the binary search to force the specified number of clusters. Default: 32
‘check_invariants’,
bool Check invariants of the algorithm (for debugging). Default: false.
‘trace_file’,
filename Write a trace of the steps performed by the optimization algorithm to a file in JSON format.
‘verbose’,
level Level of debug output. Levels go up to 7 and are increasingly chatty. Default: level = 0, i.e. no output.
Some of the supported loss functions are:
‘modularity’,
loss = -sum_c (w_c/m - v_c^2/m^2) This is the negation of the usual definition
‘infomap’:
The infomap objective by [3].
‘ncut’:
Normalized cut, loss = sum_c (v_c - w_c) / n_c
‘rcut’:
Ratio cut, loss = sum_c (v_c - w_c) / v_c
{‘pmod’,p}:
Modularity with a different power, loss = -sum_c (w_c/m - (v_c/m)^p / (p-1))
{‘mom’,m}:
Monotonic variant of modularity, loss = -sum_c (w_c/(m + 2v_c) - (v_c/(m + 2v_c))^2)
‘w-log-v’,
loss = sum_c (w_c/m * log(v_c) )
num loss = |C|:
Minimize the number of clusters, this leads to finding the connected components.
- Reference
Graph clustering: does the optimization procedure matter more than the objective function?; Twan van Laarhoven and Elena Marchiori; Physical Review E 87, 012812 (2013)
-
gct.
alg_streamcom
(name, graph, vmax_start=None, vmax_end=None, c=None, niter=None, seed=None)¶ A wrapper of streamcom algorithm from https://github.com/ahollocou/graph-streaming
- Arguments
Usage: streamcom <flags> Availaible flags:
-f [graph file name] :
Specifies the graph file.
–skip [number of lines] :
Specifies the number of lines to skip in the graph file.
-o [output file name] :
Specifies the output file for communities.
–vmax-start [maximum volume range start] :
Specifies the maximum volume for the aggregation phase, beginning of the range.
–vmax-end [maximum volume range end] :
Specifies the maximum volume for the aggregation phase, end of the range.
-c [condition] :
Specifies the aggregation condition for the aggregation phase: 0 is AND and 1 is OR.
–seed [random seed]:
Specifies the random seed if the edges need to be shuffle.
–niter [number of iteration]:
Specifies the number of iteration of the algorithm.
- Reference
Hollocou, Alexandre, et al. “A Streaming Algorithm for Graph Clustering.” arXiv preprint arXiv:1712.04337 (2017)