geometric2dr.decomposition

geometric2dr.decomposition.anonymous_walk_patterns

This module contains algorithms useful for inducing anonymous walks in graphs.

The algorithms are adapted from the original source code by Ivanov and Burnaev 2018 “Anonymous Walk Embeddings” [1]. Original reference implementation can be found in: https://github.com/nd7141/AWE

geometric2dr.decomposition.anonymous_walk_patterns.all_paths(paths, steps, keep_last=False)[source]

Get all possible anonymous walks of length up to steps.

geometric2dr.decomposition.anonymous_walk_patterns.all_paths_edges(paths, steps, keep_last=True)[source]

Get all possible anonymous walks of length up to steps, using edge labels

geometric2dr.decomposition.anonymous_walk_patterns.all_paths_edges_nodes(paths, steps, keep_last=True)[source]

Get all possible anonymous walks of length up to steps, using edge-node labels

geometric2dr.decomposition.anonymous_walk_patterns.all_paths_nodes(paths, steps, keep_last=True)[source]

Get all possible anonymous walks of length up to steps, using node labels

geometric2dr.decomposition.anonymous_walk_patterns.anonymous_walk_node(graph, rw_graph, node, steps, label_setting=None)[source]

Creates anonymous walk from a node.

geometric2dr.decomposition.anonymous_walk_patterns.anonymous_walks(graph, neighborhood_size, walk_ids, awe_length, label_setting)[source]

Generates anonymous walks for a given graph and input hyperparameters

geometric2dr.decomposition.anonymous_walk_patterns.awe_corpus(corpus_dir, awe_length, label_setting, neighborhood_size=10, saving_graph_docs=True)[source]

Induces anonymous walks up to a supplied length across a set of graphs.

This function extracts anonymous walks of awe_length across the graphs in a dataset into a vocabulary. The function saves a graphdoc for each graph which records string identifiers of the anonymous walks present in each graph

The main use case for this function is for the user to supply a path to the directory containing the .gexf file which contains .gexf files of each graph in a dataset. The decomposition function will produce a file with the extension .awe_<`awe_length`>_<`label_setting`> for each .gexf file which contains the anonymous walks patterns specified by a string hash controlled by the vocabulary of patterns across the list of graphs being studied.

Parameters:
  • corpus_dir (str) – path to folder containing gexf graph files on which the substructure patterns should be induced
  • awe_length (int) – desired length of anonymous walk patterns
  • label_setting (str) – information of node/edge labels that should be used; this can be None, ‘nodes’, ‘edges’, ‘edges_nodes’
  • neighborhood_size (int (default=10)) – the number of anonymous walks to take from a source node
  • saving_graph_docs (bool (default=True)) – boolean value which dictates whether graph documents will be generated and saved.
Returns:

None – Induces the anonymous walks of awe_length across the list of gexf file paths supplied in corpus_dir. Graph “documents” with the extension .awe_<`awe_length`>_<`label_setting`> are created for each of the gexf files in the same directory containing string identifiers of the patterns induced in each graph.

Return type:

None

geometric2dr.decomposition.anonymous_walk_patterns.create_random_walk_graph(nx_graph)[source]

Generate a random walk graph equivalent of a graph as described in anonymous walk embeddings

geometric2dr.decomposition.anonymous_walk_patterns.load_graph(file_handle)[source]

Loads a nx graph object and a corresponding numpy adjacency matrix given a GEXF file graph.

Parameters:file_handle (str) – path to gexf file
Returns:
  • graph (networkx graph) – Networkx graph instance of input gexf file
  • adj_matrix (numpy ndarray) – The adjacency matrix of the graph
geometric2dr.decomposition.anonymous_walk_patterns.random_step_node(rw_graph, node)[source]

Moves one step from the current according to probabilities of outgoing edges. Return next node.

geometric2dr.decomposition.anonymous_walk_patterns.random_walk_node(rw_graph, node, steps)[source]

Creates anonymous walk from a node for arbitrary steps. Returns a tuple with consequent nodes.

geometric2dr.decomposition.anonymous_walk_patterns.random_walk_with_label_edges(graph, rw_graph, node, steps)[source]

Creates anonymous walk from a node for arbitrary steps with usage of edge labels. Returns a tuple with consequent nodes.

geometric2dr.decomposition.anonymous_walk_patterns.random_walk_with_label_edges_nodes(graph, rw_graph, node, steps)[source]

Creates anonymous walk from a node for arbitrary steps with usage of edge and node labels. Returns a tuple with consequent nodes.

geometric2dr.decomposition.anonymous_walk_patterns.random_walk_with_label_nodes(graph, rw_graph, node, steps)[source]

Creates anonymous walk from a node for arbitrary steps with usage of node labels. Returns a tuple with consequent nodes.

geometric2dr.decomposition.graphlet_patterns

Graphlet based graph decomposition algorithm to create graph documents. Inspired and adapted from Yanardag and Vishwanathan “Deep Graph Kernels” [2].

geometric2dr.decomposition.graphlet_patterns.get_graphlet(window, num_graphlets)[source]

Compute the Nauty certificate of the graphlet in within a window of the adjacency matrix

This function takes the upper triangle of a nxn matrix and computes its hash certificate using nauty. Given the parameters this usually involved computing the graphlet of num_graphlets size

This is used for comparison with a bank of known certificates as loaded in get_maps().

Parameters:
  • window (numpy ndarray) – submatrix inside the adjacency matrix of a graph
  • num_graphlets (int) – the size of the graphlets to extract
Returns:

cert – certicate of the graphlet produced by finding its canonical representation with Nauty.

Return type:

byte str

geometric2dr.decomposition.graphlet_patterns.get_maps(num_graphlets)[source]

Load certificates created by the canonical representations of graphlets into canonical maps dictionary for quick isomorphism check when we extract graphlets from graphs in the users dataset

Parameters:num_graphlets (int) – The number of nodes in the graphlet pattern, aka the size of a graphlet pattern
Returns:canonical_map – A dict of Nauty certificates of the canonical representations of graphlets of the size num_graphlets.
Return type:dict
geometric2dr.decomposition.graphlet_patterns.graphlet_corpus(corpus_dir, num_graphlets, samplesize)[source]

Function which induces the graphlet patterns over the graphs inside a given directory.

The main use case is for the user to input a path containing individual graphs of the dataset in gexf format. The decomposition algorithm will induce graphlet patterns for graphs recording the dataset/”global” vocabulary of patterns within a dictionary. The graph and its associated patterns (by IDs given through our hash function) are saved into a <graphid>.wldr<depth> file which contains a line delimited list of all the substructure pattern ids.

Parameters:
  • corpus_dir (str) – path to directory containing graph files of dataset in .gexf format.
  • num_graphlets (int) – the size of the graphlet patterns to be extracted
  • samplesize (int) – the number of samples to take from a graph.
Returns:

  • corpus (list of lists of str) – a list of lists, with each inner list containing all the samplesize*(1+ num_graphlets) patterns in one graph of the dataset
  • vocabulary (list) – a set of the unique graphlet pattern ids
  • prob_map (dict) – a map {gidx: {graphlet_idx: normalized_prob}} of normalized probabilities of a graphlet pattern appearing in a graph based on counts made in generation
  • num_graphs (int) – the number of graphs in the dataset
  • graph_map (dict) – a map {gidx: {graphlet_idx: count}} of the number of times a certain graphlet pattern appeared in a graph for each graph gidx in the dataset

geometric2dr.decomposition.graphlet_patterns.load_graph(file_handle)[source]

Load the gexf format file as a nx_graph and an adjacency matrix.

Parameters:file_handle (str) – path to gexf file
Returns:
  • graph (networkx graph) – Networkx graph instance of input gexf file
  • adj_matrix (numpy ndarray) – The adjacency matrix of the graph
geometric2dr.decomposition.graphlet_patterns.save_graphlet_document(gexf_fh, gidx, num_graphlets, samplesize, cooccurence_corpus)[source]

Saves the induced graphlet patterns into dataset folder, it is only used in conjunction with graphlet_corpus()

Parameters:
  • gexf_fh (str) – path to gexf file of graph
  • gidx (int) – integer id of the graph, typically matches number in gexf_fh, if using TU Dortmund benchmark datasets
  • num_graphlets (int) – size of graphlet patterns induced
  • samplesize (int) – number of graphlet patterns sampled from graph
  • cooccurrence_corpus (list) – list of graphlet patterns induced under cooccurrence rules, in this case all graphlets immediately adjacent to an induced graphlet pattern.
Returns:

None – The decomposition algorithm will induce graphlet patterns for graphs recording the dataset/”global” vocabulary of patterns within a dictionary. The graph and its associated patterns (by IDs given through our hash function) are saved into a <graphid>.wldr<depth> file which contains a line delimited list of all the substructure pattern ids.

Return type:

None

geometric2dr.decomposition.shortest_path_patterns

Shortest path based decomposition algorithm to create graph documents. Inspired and adapted from Yanardag and Vishwanathan “Deep Graph Kernels” [2].

geometric2dr.decomposition.shortest_path_patterns.load_graph(file_handle)[source]

Load the gexf format file as a nx_graph and an adjacency matrix.

Parameters:file_handle (str) – path to gexf file
Returns:
  • graph (networkx graph) – Networkx graph instance of input gexf file
  • adj_matrix (numpy ndarray) – The adjacency matrix of the graph
geometric2dr.decomposition.shortest_path_patterns.save_sp_doc(gexf_fh, gidx, coocurrence_corpus)[source]

Saves a shortest path graph doc with the extension .spp

Parameters:
  • gexf_fh (str) – sdf
  • gidx (int) – int id of the gexf (note: deprecated and will be removed in next version)
  • cooccurrence_corpus (list) – list of lists containing cooccuring shortest paths (ie those starting from the same starting node)
Returns:

None – Saves the induced shortest path patterns into a graph document for the graph specified in the gexf_fh

Return type:

None

geometric2dr.decomposition.shortest_path_patterns.sp_corpus(corpus_dir)[source]

Function which induces the shortest path patterns over the graphs inside a given directory.

The main use case for this script is for the user to supply a path to the directory containing the .gexf graph files of a dataset. The decomposition function will produce a .spp file for each .gexf file which contains the shortest path patterns of a graph given the vocabulary of patterns across the dataset.

Parameters:corpus_dir (str) – path to directory containing graph files of dataset in .gexf format
Returns:
  • corpus (list of lists of str) – a list of lists, with each inner list containing all the shortest path patterns in one graph of the dataset
  • vocabulary (list) – a set of the unique shortest path pattern ids
  • prob_map (dict) – a map {gidx: {path: normalized_prob}} of normalized probabilities of a shortest path pattern appearing in a graph based on counts made in generation
  • num_graphs (int) – the number of graphs in the dataset
  • graph_map (dict) – a map {gidx: {path: count}} of the number of times a certain shortest path pattern appeared in a graph for each graph gidx in the dataset

geometric2dr.decomposition.weisfeiler_lehman_patterns

Functions for inducing Weisfeiler Lehman graph decomposition algorithm via node relabeling as described in Shervashidze et al. [3].

Based on the implementation available in the original source code of Graph2Vec [4] and adapted for Geo2DR https://github.com/MLDroid/graph2vec_tf which has no license

geometric2dr.decomposition.weisfeiler_lehman_patterns.initial_relabel(graph, node_label_attr_name='Label')[source]

The initial relabeling of the graphs in the dataset.

Taking the attributed label of the node as stated by the attr in the gexf file, it gives new labels to these node types (without regard for neighbours) hence it really is just a relabeling of the initial labels into our own “0+<newlabel>” format (to use with WL relabeling scheme after

Parameters:
  • graph (networkx graph) – a networkx graph
  • node_label_attr_name (str) – string literal of the attribute name used as a label in the gexf file NB: “label” in the gexf refers to nodeID “Label” refers to the dataset node label
Returns:

graph – the same nx graph but with a new “relabel” attribute with the 0th wlk-h entry label

Return type:

networkx graph

geometric2dr.decomposition.weisfeiler_lehman_patterns.save_wl_doc(fname, max_h, graph=None)[source]

Saves the induced rooted subgraph patterns

Saves the subgraph sentences in format <center> <context> <context> …. In other words we are saving the relabelings of node from the WL algorithm into a text document which can be fed into our skipgram architecture

Parameters:
  • fname (str) – path/filename of the graph
  • max_h (int) – highest_iteration wlk_h specified (ie depth of rooted subgraph)
  • graph (networkx graph) – the nx graph of the filename
Returns:

None – The rooted subgraph patterns are saved into a text file in the format <center> <context> <context> <context> ….

Return type:

None

geometric2dr.decomposition.weisfeiler_lehman_patterns.wl_corpus(fnames, max_h, node_label_attr_name='Label')[source]

Induce rooted subgraph patterns using the WL node relabeling algorith given list gexf files and save corresponding graph files.

Given a set of graphs from the dataset, a maximum h for WL, and the label attribute name used in the gexf files we initially relabel the original labels into a compliant relabeling (caesar shift for ease) then perform max-h iterations of the WL relabeling algorithm (1979) to create new labels which are compressed versions of the rooted subgraphs for each node in the graph. These are all present in the nx graph objects’s nodes as attributes, with the original label being ‘Label’ and our subsequent relabelings in the “relabel” attribute

The main use case is for the user to input a path containing individual graphs of the dataset in gexf format. The decomposition algorithm will induce substructure patterns for graphs recording the dataset/”global” vocabulary of patterns within a dictionary. The graph and its associated patterns (by IDs given through our hash function) are saved into a <graphid>.wldr<depth> file which contains a line delimited list of all the substructure pattern ids.

Parameters:
  • fnames (list) – list of gexf file paths for the graphs in the dataset
  • max_h (int) – the maximum depth of rooted subgraph pattern to induce across the dataset of graphs
  • node_label_attr_name (str) – string literal of the attribute name used as a label in the gexf file NB: “label” in the gexf refers to nodeID “Label” refers to the dataset node label
Returns:

  • corpus (list of lists of str) – a list of lists, with each inner list containing all the rooted subgraph patterns in one graph of the dataset
  • vocabulary (list) – a set of the unique rooted subgraph pattern ids
  • prob_map (dict) – a map {gidx: {wl_pattern: normalized_prob}} of normalized probabilities of a rooted subgraph pattern appearing in a graph based on counts made in generation
  • num_graphs (int) – the number of graphs in the dataset
  • graph_map (dict) – a map {gidx: {wl_pattern: count}} of the number of times a certain rooted subgraph pattern appeared in a graph for each graph gidx in the dataset
  • None (None) – The rooted subgraph patterns are also saved into a text file for each graph in ‘fnames’ in the format <center> <context> <context> <context> ….

geometric2dr.decomposition.weisfeiler_lehman_patterns.wl_relabel(graph, it)[source]

Runs an iteration of the WL relabeling algorithm, onto the graph using the global label_to_compressed_label_map

Parameters:
  • graph (networkx graph) – a networkx graph from the dataset which we want to relabel (has had to be initally relabeled, ie have the graph.nodes.[node][‘relabel’] attribute)
  • it (int) – an int, signifiying iteration in the WL relabeling algorithm.
Returns:

graph – the input nx graph with more labels in the “relabel” attribute

Return type:

networkx graph

geometric2dr.decomposition.random_walk_patterns

Functions for inducing random walk patterns within the graphs as in Gartner et al.

Inspired by the implementation of anonymous walks by Sergey Ivanov and Evgeny Burnaev [1]

geometric2dr.decomposition.random_walk_patterns.create_random_walk_graph(nx_graph)[source]

Generate a random walk graph equivalent of a graph as described in anonymous walk embeddings

geometric2dr.decomposition.random_walk_patterns.load_graph(file_handle)[source]

Loads a nx graph object and a corresponding numpy adjacency matrix given a GEXF file graph.

Parameters:file_handle (str) – path to gexf file
Returns:
  • graph (networkx graph) – Networkx graph instance of input gexf file
  • adj_matrix (numpy ndarray) – The adjacency matrix of the graph
geometric2dr.decomposition.random_walk_patterns.random_step_node(rw_graph, node)[source]

Moves one step from the current according to probabilities of outgoing edges. Return next node.

geometric2dr.decomposition.random_walk_patterns.random_walk_with_label_nodes(graph, rw_graph, node, steps)[source]

Creates anonymous walk from a node for arbitrary steps with usage of node labels. Returns a tuple with consequent nodes.

geometric2dr.decomposition.random_walk_patterns.random_walks_graph(nx_graph, walk_length, neighborhood_size=0)[source]

Perform a random walk with cooccurring “neighbouring” walks starting from the same node for all the nodes in the graph.

geometric2dr.decomposition.random_walk_patterns.rw_corpus(corpus_dir, walk_length, neighborhood_size, saving_graph_docs=True)[source]

Induces random walks up to a desired length across the input set of graphs

Parameters:
  • corpus_dir (str) – path to directory containing graph files
  • walk_length (int) – desired length of the random walk
  • neighborhood_size (int) – number of cooccuring walks to find per walk source (node)
  • saving_graph_docs (bool) – boolean on whether the graph documents should be generated or not
Returns:

None – Induces the random walks of walk_length across the list of graphs files supplied in corpus_dir

Return type:

None

geometric2dr.decomposition.random_walk_patterns.save_rw_doc(gexf_fh, coocurrence_corpus, walk_length)[source]

Saves a shortest path graph doc with the extension .spp

Parameters:
  • gexf_fh (str) – sdf
  • cooccurrence_corpus (list) – list of lists containing cooccuring shortest paths (ie those starting from the same starting node)
  • walk_length (int) – length of the walk
Returns:

None – Saves the induced shortest path patterns into a graph document for the graph specified in the gexf_fh

Return type:

None

References

[1](1, 2) Sergey Ivanov, Evgeny Burnaev. “Anonymous Walk Embeddings”. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2186-2195, 2018.
[2](1, 2)
  1. Yanardag and S. Vishwanathan, “Deep Graph Kernels”, KDD ‘15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015
[3]Shervashidze, Nino & Schweitzer, Pascal & Jan, Erik & Leeuwen, Van & Mehlhorn, Kurt & Borgwardt, Karsten. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research. 1. 1-48., 2010
[4]Narayanan, Annamalai & Mahinthan, Chandramohan & Venkatesan, Rajasekar & Chen, Lihui & Liu, Yang & Jaiswal, Shantanu. “graph2vec: Learning Distributed Representations of Graphs”, 2017