brain_loop_search.search

This module for now involves 2 loop search algorithms. Because of the great complexity this task can have, such loop searching can only be done in a screening way. Thus, they might not be able to give every possible results, but it's good to make a combination of different strategies and use multiple modailty of data to make up for this.

Shortest Path Methods

One method is based on shortest path that will typically get you a series of loops made up of nodes. The shortest path integrates scores on the path so the smaller the score is the better the loop can be. And the scores are actually addable. These are the basic requirements of using such method.

Shortest paths itself can't be loops, so the strategy tries different ways to assemble the shortest paths into loops. It can be a direction connection plus a reverse shortest path, or both shortest path. By doing so, a new concept is introduced as the axis, which is the key node in the graph that you use them as the source or target in shortest path searching. You can have more than 2 axes, of course.

There are other problems like duplication or knots in a loop. Typically, you wouldn't expect that because it means a more basic loop exists, but between some nodes there are only crossed shortest paths connecting them.

Max flow methods

Given the problems the shortest paths can have, we find that shortest paths tend to throw away much of the information in the network, and we need another form of computation. Flow methods come in handy because they balance between the overall graph weights, while introducing other problems. In a flow, the bigger the weight the better the connection can be. The weight can't be summed along any path, but be summed in a node. This requires a very different kind of data than shortest path.

It returns a new graph of flows. You can see the graph as a new distribution of connection, when you want to look into the overall connection from one node to another, which ways are preferred. The downsides are obvious, since flow computation is intensive, you can only pick specific axes to make the flow, and also a flow is a graph instead of an actual loop. No mass screening this time but look more closely at specific sites.

Is it possible to bring these two methods together? Yes, because the flow results can replace the graph to do shortest paths. This way you may get diverse loops with different flow strategies on a single network.

  1"""
  2This module for now involves 2 loop search algorithms. Because of the great complexity this task can have, such
  3loop searching can only be done in a screening way. Thus, they might not be able to give every possible results,
  4but it's good to make a combination of different strategies and use multiple modailty of data to make up for this.
  5
  6# Shortest Path Methods
  7One method is based on shortest path that will typically get you a series of loops made up of nodes. The shortest path
  8integrates scores on the path so the smaller the score is the better the loop can be. And the scores are actually
  9addable. These are the basic requirements of using such method.
 10
 11Shortest paths itself can't be loops, so the strategy tries different ways to assemble the shortest paths into loops.
 12It can be a direction connection plus a reverse shortest path, or both shortest path. By doing so, a new concept is
 13introduced as the axis, which is the key node in the graph that you use them as the source or target in shortest path
 14searching. You can have more than 2 axes, of course.
 15
 16There are other problems like duplication or knots in a loop. Typically, you wouldn't expect that because it means
 17a more basic loop exists, but between some nodes there are only crossed shortest paths connecting them.
 18
 19# Max flow methods
 20Given the problems the shortest paths can have, we find that shortest paths tend to throw away much of the information
 21in the network, and we need another form of computation. Flow methods come in handy because they balance between the overall
 22graph weights, while introducing other problems. In a flow, the bigger the weight the better the connection can be.
 23The weight can't be summed along any path, but be summed in a node. This requires a very different kind of data than shortest
 24path.
 25
 26It returns a new graph of flows. You can see the graph as a new distribution of connection, when you want to look into
 27the overall connection from one node to another, which ways are preferred. The downsides are obvious, since flow computation
 28is intensive, you can only pick specific axes to make the flow, and also a flow is a graph instead of an actual loop.
 29No mass screening this time but look more closely at specific sites.
 30
 31Is it possible to bring these two methods together? Yes, because the flow results can replace the graph to do shortest paths.
 32This way you may get diverse loops with different flow strategies on a single network.
 33"""
 34
 35
 36import typing
 37
 38import networkx as nx
 39import pandas as pd
 40import numpy as np
 41import itertools
 42from collections import OrderedDict
 43from tqdm import tqdm
 44
 45
 46class GraphMaintainer:
 47
 48    def __init__(self):
 49        self.graph = nx.DiGraph()
 50
 51    def add_subgraph(self, adj_mat: pd.DataFrame, def_val: float = -1):
 52        """
 53        Add weighted edges from a pandas dataframe as an adjacent matrix.
 54
 55        Each row/column name is a vertex that can be mapped to the input ontology.
 56
 57        The graph is directed and the projection is from row to column.
 58
 59        This function can be called multiple times for different dataframes if you don't have
 60        them in a single dataframe, but loading edges with same indices will overwrite the original one.
 61
 62        :param adj_mat: a pandas dataframe, with unique rows and columns.
 63        :param def_val: the default value for non-connecting edges.
 64        """
 65        assert adj_mat.index.is_unique, "rows contain duplication"
 66        assert adj_mat.columns.is_unique, "columns contain duplication"
 67        for fro, row in adj_mat.iterrows():
 68            active = row[row != def_val]
 69            self.graph.add_weighted_edges_from([*zip([fro] * len(active), active.index, active)])
 70
 71    def weight_transform(self, in_key='weight', out_key='weight', func=lambda x: 1 / x):
 72        """
 73        Transform the edge weights. The transform is based on networkx edge data, which is maintained as a dict.
 74        The default key is 'weight', which is the default key when the edge is added to the graph and used for both
 75        shortest path and flow algorithms' input. So this function replace the weights by default.
 76
 77        This is useful because different algorithms such as network flow and shortest paths expect edge weights
 78        of different meaning, thus leading to different treatment to 'big' and 'small'. Smaller weights in network flow
 79        is equivalent to bigger weights in shortest path.
 80
 81        :param func: transform function.
 82        :param in_key: input attribute key.
 83        :param out_key: output attribute key.
 84        """
 85        w = nx.get_edge_attributes(self.graph, 'weight')
 86        for k, v in w.items():
 87            w[k] = {'weight': func(v)}
 88        nx.set_edge_attributes(self.graph, w)
 89
 90
 91class MaxFlowLoopSearch(GraphMaintainer):
 92    """
 93    Using networkx directed graph for implementing loop search algorithms based on network flow.
 94
 95    As this method uses network flow, you should make sure the edge weights are meaningful so that
 96    a bigger weight means a strong connection. The flow along a path is throttled by the max capacity,
 97    and the whole flow is distributed over the edges and determined by the overall capacity.
 98
 99    With this class, you can get loops in forms of a residual network.
100    There is no definite loop that can form a list, unlike shortest path, but it maximally considers and
101    retains the information of the whole graph while shortest paths may omit lesser good routes.
102    The graph generated can even be applied the shortest path search, if you find it make sense. You can
103    assign the graph member in `ShortestPathLoopSearch` with the result from this one, because they are identical in form.
104
105    Since network flow considers only one pair of source and sink, applying this on multiple pairs
106    can be time-consuming, so the method isn't designed to search through all pairs in the graph.
107    """
108
109    def single_flow(self, s, t, rm_negative_and_zero=True):
110        """
111        compute a max flow, on edge attribute 'weight'. Though it uses the networkx algorithm that stores the result
112        in a new attribute 'flow', it will get this value back to 'weight' for less confusion.
113
114        :param s: source
115        :param t: sink
116        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
117        :return: a directed graph with flow results as its edges.
118        """
119        flow = nx.flow.preflow_push(self.graph, s, t, 'weight')
120        it = nx.get_edge_attributes(flow, 'flow').items()
121        out = nx.DiGraph()
122        out.add_weighted_edges_from([k + (v,) for k, v in it if not rm_negative_and_zero or v > 0])
123        return out
124
125    def cycle_flows(self, axes: typing.Iterable, rm_negative_and_zero=True):
126        """
127        compute multiple max flow along multiple key vertices that are supposed to be on the loop.
128
129        The flow will be computed from the 1st to 2nd, 2nd to 3rd, ..., last to 1st.
130
131        :param axes: the key vertices.
132        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
133        :return: a dict of all the flow that is identical to that of the `single_flow`.
134        """
135        axes = list(axes)
136        axes = axes + axes[:1]
137        out = {}
138        for i, j in itertools.pairwise(axes):
139            out[(i, j)] = self.single_flow(i, j, rm_negative_and_zero)
140        return out
141
142    def magnet_flow(self, s, t):
143        """
144        With a pair of key vertices s and t in a loop, when edge t -> s exists, it calculates the max flow from s -> t.
145        This is similar to a magnet, where the magnetic field outside corresponds to the max flow and the inner field
146        corresponds to the direct edge.
147
148        Networkx will return a residual network with flow[u][v] == -flow[v][u], but we will remove any negative and zero
149        flows and replace the flow t -> s with just edge t -> s.
150
151        This new graph can be seen as a redistribution of connection intensity, and can be used to do shortest path
152        searches since it's identical in form with the input of the algorithm.
153
154        :param s: the source of the max flow
155        :param t: the sink of the max flow
156        :return: a flow graph with 'weight' as its edge.
157        """
158        assert self.graph.has_edge(t, s), "the edge from t to s doesn't exist, loop can't be formed"
159        flow = self.single_flow(s, t, rm_negative_and_zero=True)
160        flow.add_edge(t, s, weight=self.graph[t][s]['weight'])
161        return flow
162
163    def merged_cycle_flow(self, axes: typing.Iterable, merge_func=np.sum):
164        """
165        compute multiple max flows along a cycle of key vertices, and merge the flows into one graph.
166
167        The max flows used are rid of negative and zero flow weights. It's still possible that a single edge may encounter
168        multiple positive flows, meaning some paths can be reused between different key vertices. Then it's necessary
169        to determine how they are merged.
170
171        :param axes: the key vertices.
172        :param merge_func: how to merge multiple weights on a single edge. By default, they are summed.
173        :return: a flow graph with 'weight' as its edge.
174        """
175        axes = list(axes)
176        sg = nx.DiGraph()
177        for k, v in self.cycle_flows(axes, rm_negative_and_zero=True).items():
178            sg.add_weighted_edges_from([(u, v, d['weight']) for u, v, d in v.edges(data=True)], k)
179        out = nx.DiGraph()
180        out.add_weighted_edges_from([(u, v, merge_func(list(d.values()))) for u, v, d in sg.edges(data=True)])
181        return out
182
183
184class ShortestPathLoopSearch(GraphMaintainer):
185    """
186    Using networkx directed graph for implementing loop search algorithms based on shortest paths.
187
188    As this method uses shortest paths, you should make sure the edge weights are meaningful so that a
189    smaller weight means a stronger connection. The shortest path aggregates the weights along the path, so that
190    summation is also meaningful.
191
192    With this class, you can get loops in forms of exact lists of nodes along the loop.
193
194    The shortest path algorithm uses the 'weight' attribute in the networkx graph.
195    """
196
197    def __init__(self):
198        super().__init__()
199        self.sssp = None
200
201    def init(self):
202        # bellman ford
203        self.sssp = {}
204        for k, v in tqdm(nx.all_pairs_bellman_ford_path(self.graph), total=self.graph.number_of_nodes()):
205            self.sssp[k] = v
206
207    def chain_screen(self, n_axis=2, top: int = None, must_include: typing.Iterable = None, allow_knots=False,
208                     axis_pool: typing.Iterable = None, axis_must_include: typing.Iterable = None, priority=np.sum):
209        """
210        Using connected single source shortest paths to find loops. You should ensure that in your graph smaller edge
211        weights means stronger connection for this method.
212
213        First, use Bellman Ford to find the shortest paths between all regions. Then loops can be found by
214        pairing any shortest paths like A -> B and B -> A. You can also find loops with more anchors' shortest path like A -> B,
215        B -> C, C -> A, and even more. But with more anchors, the computation will be greatly slowed down, and number of
216        discovery will be greatly expanded.
217
218        When the search number is very great, you can use `top` to have an early stop. The shortest path are sorted such that a
219        loop with minimal shortest path element will be retrieved first, but it won't guarantee the final loop will be ordered
220        ascending. Besides, you can also set the must included nodes in the loop so that those without are skipped.
221
222        The method attempts to located unique loops by the axes, i.e. one set of axes corresponding to one loop.
223        It's likely that multiple axes can be connected in multiple ways, in which case once finding a shorter one a
224        replacement will occur. The purpose is for balance between the uniqueness and computation.
225        Since the shortest path will also be returned, you can check other connection ways easily.
226
227        This can fail to give a loop between a set of axis, because there may be overlap between their shortest path, thus giving
228        rise to smaller loops, whereas there may exist an exact loop that contain non shortest paths but unique vertices.
229        Therefore, the shortest path from A to B here is better off interpreted as the most preferable path from A to B,
230        which means other paths from A to B are not considered.
231
232        :param n_axis: the number to consider for connecting shortest path into a loop. Larger n_axis will significantly
233            increase the computation
234        :param top: the most prominent results. This can speed up as it searches from the most probable loops and can
235            have an early stop. Default as None to turn off.
236        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
237            Default as None to turn off.
238        :param must_include: the vertices that must appear in the search result.
239        :param axis_must_include: the vertices that must appear in axes, and must be subset of `axis_pool`. This check
240            is independent of `must_include`.
241        :param allow_knots: whether to allow duplicate nodes (but not duplicated with axes).
242        :param priority: the edge weight aggregation function for shortest path sorting. Default as sum, you can also use, for
243        example, mean to make up for shortest path that travel through many vertices.
244        :return: a dict of axis mapped to loop in the form of loops[(a,b,c)] -> [[a, ..., b], [b, ..., c], [c, ..., a]],
245            where the key (a,b,c) is a sorted tuple of axes, doesn't necessarily mean a -> b -> c -> a,
246            and shortest path in the form of sssp[from_node][to_node] -> [from_node, ..., to_node]
247        """
248        if must_include is not None:
249            must_include =  set(must_include)
250        if axis_must_include is not None:
251            axis_must_include = set(axis_must_include)
252            assert len(axis_must_include) <= n_axis, "number of must included axis is more than specified"
253        if axis_pool is not None:
254            axis_pool = set(axis_pool)
255            assert len(axis_pool) >= n_axis, "size of axis pool is smaller than specified"
256            if axis_must_include is not None:
257                assert axis_must_include.issubset(axis_pool), "must included axis should be subset of axis pool"
258
259        if self.sssp is None:
260            self.init()
261
262        # do an edge sorting beforehand, ascending
263        # vertices not in axis pool (if not None) will be omitted
264        connection = []
265        for k1, v1 in self.sssp.items():
266            if axis_pool is not None and k1 not in axis_pool:
267                continue
268            for k2, v2 in v1.items():
269                if axis_pool is not None and k2 not in axis_pool:
270                    continue
271                if k1 == k2:
272                    continue
273                w = []
274                for i, j in itertools.pairwise(v2):
275                    w.append(self.graph[i][j]['weight'])
276                connection.append([k1, k2, priority(w)])
277        connection.sort(key=lambda x: x[2])
278        sorted_sssp = OrderedDict()
279        # sorted_sssp[k1][k2] -> (length, list of vertices)
280        # and k2 mapping is ordered
281        for i, (k1, k2, w) in enumerate(connection):
282            if k1 not in sorted_sssp:
283                sorted_sssp[k1] = OrderedDict()
284            sorted_sssp[k1][k2] = (w, self.sssp[k1][k2])
285
286        max_iter = [0 if top is None else top]
287        loops = {}
288
289        def dfs_loop_search(loop: list, visited: set, length: float):
290            # finishing, the last part of the cycle,
291            if len(loop) == n_axis - 1:
292                # check the existence of the last part
293                k1 = loop[-1][-1]
294                k2 = loop[0][0]
295                if k1 not in sorted_sssp or k2 not in sorted_sssp[k1]:
296                    yield
297                last_part = sorted_sssp[k1][k2][1]
298
299                # check knot
300                sl = set(last_part[1:-1])
301                new_visited = visited | sl
302                if not allow_knots and len(new_visited) != len(visited) + len(last_part) - 2 or \
303                        not sl.isdisjoint(i[0] for i in loop):
304                    yield
305                loop.append(last_part)
306                length += sorted_sssp[k1][k2][0]
307                axis = tuple(sorted(i[0] for i in loop))
308
309                if (must_include is None or must_include.issubset(new_visited)) and \
310                        (axis_must_include is None or axis_must_include.issubset(axis)):
311                    if axis in loops:
312                        # check if smaller than existed
313                        if loops[axis]['length'] < length:
314                            yield
315                        else:
316                            max_iter[0] += 1
317                    max_iter[0] -= 1
318                    loops[axis] = {
319                        'loop': loop,
320                        'length': length,
321                    }
322            # start
323            elif len(loop) == 0:
324                for k1, k2, w in connection:
325                    v = sorted_sssp[k1][k2][1]
326                    yield dfs_loop_search([v], set(v), w)
327            else:
328                k1 = loop[-1][-1]   # last axis
329                for k, (w, c) in sorted_sssp[k1].items():
330                    sc = set(c[1:])
331                    new_visited = visited | sc
332                    if len(new_visited) == len(visited) + len(c) - 1 or \
333                            allow_knots and sc.isdisjoint(i[0] for i in loop):
334                        yield dfs_loop_search(loop + [c], new_visited, length + w)
335            yield
336
337        gen = dfs_loop_search([], set(), 0)
338        stk = []
339        while top is None or max_iter[0] > 0:
340            if isinstance(gen, typing.Generator):
341                stk.append(gen)
342                gen = next(gen)
343            else:
344                stk.pop()
345                if not stk:
346                    break
347                gen = stk[-1].send(gen)
348
349        return loops
350
351    def pair_complement(self, axis_pool: typing.Iterable = None):
352        """
353        Given a pair of vertices, the loop is defined as the direct connection between them plus the inverted shortest path.
354        For example, with A and B, the loop can be either edge AB + the shortest path from B to A
355        or edge BA + the shortest path from A to B. This comes in handy when shortest paths from A to B and B to A have
356        intersection, leading to knots in a loop.
357
358        This can also be much faster and doesn't need speedup or sorting.
359
360        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
361            Default as None to turn off, and it will try every pair of vertices as long as there's a direct edge.
362        :return: a dict of loops, keys are the direct edges and values are the corresponding inverse shortest path.
363        """
364        if axis_pool is not None:
365            axis_pool = set(axis_pool)
366        if self.sssp is None:
367            self.init()
368        out = {}
369        for fro, to in self.graph.edges:
370            if axis_pool is not None and (fro not in axis_pool or to not in axis_pool) or to == fro:
371                continue
372            if to in self.sssp and fro in self.sssp[to]:
373                out[(fro, to)] = self.sssp[to][fro]
374        return out
class GraphMaintainer:
47class GraphMaintainer:
48
49    def __init__(self):
50        self.graph = nx.DiGraph()
51
52    def add_subgraph(self, adj_mat: pd.DataFrame, def_val: float = -1):
53        """
54        Add weighted edges from a pandas dataframe as an adjacent matrix.
55
56        Each row/column name is a vertex that can be mapped to the input ontology.
57
58        The graph is directed and the projection is from row to column.
59
60        This function can be called multiple times for different dataframes if you don't have
61        them in a single dataframe, but loading edges with same indices will overwrite the original one.
62
63        :param adj_mat: a pandas dataframe, with unique rows and columns.
64        :param def_val: the default value for non-connecting edges.
65        """
66        assert adj_mat.index.is_unique, "rows contain duplication"
67        assert adj_mat.columns.is_unique, "columns contain duplication"
68        for fro, row in adj_mat.iterrows():
69            active = row[row != def_val]
70            self.graph.add_weighted_edges_from([*zip([fro] * len(active), active.index, active)])
71
72    def weight_transform(self, in_key='weight', out_key='weight', func=lambda x: 1 / x):
73        """
74        Transform the edge weights. The transform is based on networkx edge data, which is maintained as a dict.
75        The default key is 'weight', which is the default key when the edge is added to the graph and used for both
76        shortest path and flow algorithms' input. So this function replace the weights by default.
77
78        This is useful because different algorithms such as network flow and shortest paths expect edge weights
79        of different meaning, thus leading to different treatment to 'big' and 'small'. Smaller weights in network flow
80        is equivalent to bigger weights in shortest path.
81
82        :param func: transform function.
83        :param in_key: input attribute key.
84        :param out_key: output attribute key.
85        """
86        w = nx.get_edge_attributes(self.graph, 'weight')
87        for k, v in w.items():
88            w[k] = {'weight': func(v)}
89        nx.set_edge_attributes(self.graph, w)
graph
def add_subgraph(self, adj_mat: pandas.core.frame.DataFrame, def_val: float = -1):
52    def add_subgraph(self, adj_mat: pd.DataFrame, def_val: float = -1):
53        """
54        Add weighted edges from a pandas dataframe as an adjacent matrix.
55
56        Each row/column name is a vertex that can be mapped to the input ontology.
57
58        The graph is directed and the projection is from row to column.
59
60        This function can be called multiple times for different dataframes if you don't have
61        them in a single dataframe, but loading edges with same indices will overwrite the original one.
62
63        :param adj_mat: a pandas dataframe, with unique rows and columns.
64        :param def_val: the default value for non-connecting edges.
65        """
66        assert adj_mat.index.is_unique, "rows contain duplication"
67        assert adj_mat.columns.is_unique, "columns contain duplication"
68        for fro, row in adj_mat.iterrows():
69            active = row[row != def_val]
70            self.graph.add_weighted_edges_from([*zip([fro] * len(active), active.index, active)])

Add weighted edges from a pandas dataframe as an adjacent matrix.

Each row/column name is a vertex that can be mapped to the input ontology.

The graph is directed and the projection is from row to column.

This function can be called multiple times for different dataframes if you don't have them in a single dataframe, but loading edges with same indices will overwrite the original one.

Parameters
  • adj_mat: a pandas dataframe, with unique rows and columns.
  • def_val: the default value for non-connecting edges.
def weight_transform( self, in_key='weight', out_key='weight', func=<function GraphMaintainer.<lambda>>):
72    def weight_transform(self, in_key='weight', out_key='weight', func=lambda x: 1 / x):
73        """
74        Transform the edge weights. The transform is based on networkx edge data, which is maintained as a dict.
75        The default key is 'weight', which is the default key when the edge is added to the graph and used for both
76        shortest path and flow algorithms' input. So this function replace the weights by default.
77
78        This is useful because different algorithms such as network flow and shortest paths expect edge weights
79        of different meaning, thus leading to different treatment to 'big' and 'small'. Smaller weights in network flow
80        is equivalent to bigger weights in shortest path.
81
82        :param func: transform function.
83        :param in_key: input attribute key.
84        :param out_key: output attribute key.
85        """
86        w = nx.get_edge_attributes(self.graph, 'weight')
87        for k, v in w.items():
88            w[k] = {'weight': func(v)}
89        nx.set_edge_attributes(self.graph, w)

Transform the edge weights. The transform is based on networkx edge data, which is maintained as a dict. The default key is 'weight', which is the default key when the edge is added to the graph and used for both shortest path and flow algorithms' input. So this function replace the weights by default.

This is useful because different algorithms such as network flow and shortest paths expect edge weights of different meaning, thus leading to different treatment to 'big' and 'small'. Smaller weights in network flow is equivalent to bigger weights in shortest path.

Parameters
  • func: transform function.
  • in_key: input attribute key.
  • out_key: output attribute key.
class MaxFlowLoopSearch(GraphMaintainer):
 92class MaxFlowLoopSearch(GraphMaintainer):
 93    """
 94    Using networkx directed graph for implementing loop search algorithms based on network flow.
 95
 96    As this method uses network flow, you should make sure the edge weights are meaningful so that
 97    a bigger weight means a strong connection. The flow along a path is throttled by the max capacity,
 98    and the whole flow is distributed over the edges and determined by the overall capacity.
 99
100    With this class, you can get loops in forms of a residual network.
101    There is no definite loop that can form a list, unlike shortest path, but it maximally considers and
102    retains the information of the whole graph while shortest paths may omit lesser good routes.
103    The graph generated can even be applied the shortest path search, if you find it make sense. You can
104    assign the graph member in `ShortestPathLoopSearch` with the result from this one, because they are identical in form.
105
106    Since network flow considers only one pair of source and sink, applying this on multiple pairs
107    can be time-consuming, so the method isn't designed to search through all pairs in the graph.
108    """
109
110    def single_flow(self, s, t, rm_negative_and_zero=True):
111        """
112        compute a max flow, on edge attribute 'weight'. Though it uses the networkx algorithm that stores the result
113        in a new attribute 'flow', it will get this value back to 'weight' for less confusion.
114
115        :param s: source
116        :param t: sink
117        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
118        :return: a directed graph with flow results as its edges.
119        """
120        flow = nx.flow.preflow_push(self.graph, s, t, 'weight')
121        it = nx.get_edge_attributes(flow, 'flow').items()
122        out = nx.DiGraph()
123        out.add_weighted_edges_from([k + (v,) for k, v in it if not rm_negative_and_zero or v > 0])
124        return out
125
126    def cycle_flows(self, axes: typing.Iterable, rm_negative_and_zero=True):
127        """
128        compute multiple max flow along multiple key vertices that are supposed to be on the loop.
129
130        The flow will be computed from the 1st to 2nd, 2nd to 3rd, ..., last to 1st.
131
132        :param axes: the key vertices.
133        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
134        :return: a dict of all the flow that is identical to that of the `single_flow`.
135        """
136        axes = list(axes)
137        axes = axes + axes[:1]
138        out = {}
139        for i, j in itertools.pairwise(axes):
140            out[(i, j)] = self.single_flow(i, j, rm_negative_and_zero)
141        return out
142
143    def magnet_flow(self, s, t):
144        """
145        With a pair of key vertices s and t in a loop, when edge t -> s exists, it calculates the max flow from s -> t.
146        This is similar to a magnet, where the magnetic field outside corresponds to the max flow and the inner field
147        corresponds to the direct edge.
148
149        Networkx will return a residual network with flow[u][v] == -flow[v][u], but we will remove any negative and zero
150        flows and replace the flow t -> s with just edge t -> s.
151
152        This new graph can be seen as a redistribution of connection intensity, and can be used to do shortest path
153        searches since it's identical in form with the input of the algorithm.
154
155        :param s: the source of the max flow
156        :param t: the sink of the max flow
157        :return: a flow graph with 'weight' as its edge.
158        """
159        assert self.graph.has_edge(t, s), "the edge from t to s doesn't exist, loop can't be formed"
160        flow = self.single_flow(s, t, rm_negative_and_zero=True)
161        flow.add_edge(t, s, weight=self.graph[t][s]['weight'])
162        return flow
163
164    def merged_cycle_flow(self, axes: typing.Iterable, merge_func=np.sum):
165        """
166        compute multiple max flows along a cycle of key vertices, and merge the flows into one graph.
167
168        The max flows used are rid of negative and zero flow weights. It's still possible that a single edge may encounter
169        multiple positive flows, meaning some paths can be reused between different key vertices. Then it's necessary
170        to determine how they are merged.
171
172        :param axes: the key vertices.
173        :param merge_func: how to merge multiple weights on a single edge. By default, they are summed.
174        :return: a flow graph with 'weight' as its edge.
175        """
176        axes = list(axes)
177        sg = nx.DiGraph()
178        for k, v in self.cycle_flows(axes, rm_negative_and_zero=True).items():
179            sg.add_weighted_edges_from([(u, v, d['weight']) for u, v, d in v.edges(data=True)], k)
180        out = nx.DiGraph()
181        out.add_weighted_edges_from([(u, v, merge_func(list(d.values()))) for u, v, d in sg.edges(data=True)])
182        return out

Using networkx directed graph for implementing loop search algorithms based on network flow.

As this method uses network flow, you should make sure the edge weights are meaningful so that a bigger weight means a strong connection. The flow along a path is throttled by the max capacity, and the whole flow is distributed over the edges and determined by the overall capacity.

With this class, you can get loops in forms of a residual network. There is no definite loop that can form a list, unlike shortest path, but it maximally considers and retains the information of the whole graph while shortest paths may omit lesser good routes. The graph generated can even be applied the shortest path search, if you find it make sense. You can assign the graph member in ShortestPathLoopSearch with the result from this one, because they are identical in form.

Since network flow considers only one pair of source and sink, applying this on multiple pairs can be time-consuming, so the method isn't designed to search through all pairs in the graph.

def single_flow(self, s, t, rm_negative_and_zero=True):
110    def single_flow(self, s, t, rm_negative_and_zero=True):
111        """
112        compute a max flow, on edge attribute 'weight'. Though it uses the networkx algorithm that stores the result
113        in a new attribute 'flow', it will get this value back to 'weight' for less confusion.
114
115        :param s: source
116        :param t: sink
117        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
118        :return: a directed graph with flow results as its edges.
119        """
120        flow = nx.flow.preflow_push(self.graph, s, t, 'weight')
121        it = nx.get_edge_attributes(flow, 'flow').items()
122        out = nx.DiGraph()
123        out.add_weighted_edges_from([k + (v,) for k, v in it if not rm_negative_and_zero or v > 0])
124        return out

compute a max flow, on edge attribute 'weight'. Though it uses the networkx algorithm that stores the result in a new attribute 'flow', it will get this value back to 'weight' for less confusion.

Parameters
  • s: source
  • t: sink
  • rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
Returns

a directed graph with flow results as its edges.

def cycle_flows(self, axes: Iterable, rm_negative_and_zero=True):
126    def cycle_flows(self, axes: typing.Iterable, rm_negative_and_zero=True):
127        """
128        compute multiple max flow along multiple key vertices that are supposed to be on the loop.
129
130        The flow will be computed from the 1st to 2nd, 2nd to 3rd, ..., last to 1st.
131
132        :param axes: the key vertices.
133        :param rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
134        :return: a dict of all the flow that is identical to that of the `single_flow`.
135        """
136        axes = list(axes)
137        axes = axes + axes[:1]
138        out = {}
139        for i, j in itertools.pairwise(axes):
140            out[(i, j)] = self.single_flow(i, j, rm_negative_and_zero)
141        return out

compute multiple max flow along multiple key vertices that are supposed to be on the loop.

The flow will be computed from the 1st to 2nd, 2nd to 3rd, ..., last to 1st.

Parameters
  • axes: the key vertices.
  • rm_negative_and_zero: remove the negative and zero flow on all edges, default as True.
Returns

a dict of all the flow that is identical to that of the single_flow.

def magnet_flow(self, s, t):
143    def magnet_flow(self, s, t):
144        """
145        With a pair of key vertices s and t in a loop, when edge t -> s exists, it calculates the max flow from s -> t.
146        This is similar to a magnet, where the magnetic field outside corresponds to the max flow and the inner field
147        corresponds to the direct edge.
148
149        Networkx will return a residual network with flow[u][v] == -flow[v][u], but we will remove any negative and zero
150        flows and replace the flow t -> s with just edge t -> s.
151
152        This new graph can be seen as a redistribution of connection intensity, and can be used to do shortest path
153        searches since it's identical in form with the input of the algorithm.
154
155        :param s: the source of the max flow
156        :param t: the sink of the max flow
157        :return: a flow graph with 'weight' as its edge.
158        """
159        assert self.graph.has_edge(t, s), "the edge from t to s doesn't exist, loop can't be formed"
160        flow = self.single_flow(s, t, rm_negative_and_zero=True)
161        flow.add_edge(t, s, weight=self.graph[t][s]['weight'])
162        return flow

With a pair of key vertices s and t in a loop, when edge t -> s exists, it calculates the max flow from s -> t. This is similar to a magnet, where the magnetic field outside corresponds to the max flow and the inner field corresponds to the direct edge.

Networkx will return a residual network with flow[u][v] == -flow[v][u], but we will remove any negative and zero flows and replace the flow t -> s with just edge t -> s.

This new graph can be seen as a redistribution of connection intensity, and can be used to do shortest path searches since it's identical in form with the input of the algorithm.

Parameters
  • s: the source of the max flow
  • t: the sink of the max flow
Returns

a flow graph with 'weight' as its edge.

def merged_cycle_flow(self, axes: Iterable, merge_func=<function sum>):
164    def merged_cycle_flow(self, axes: typing.Iterable, merge_func=np.sum):
165        """
166        compute multiple max flows along a cycle of key vertices, and merge the flows into one graph.
167
168        The max flows used are rid of negative and zero flow weights. It's still possible that a single edge may encounter
169        multiple positive flows, meaning some paths can be reused between different key vertices. Then it's necessary
170        to determine how they are merged.
171
172        :param axes: the key vertices.
173        :param merge_func: how to merge multiple weights on a single edge. By default, they are summed.
174        :return: a flow graph with 'weight' as its edge.
175        """
176        axes = list(axes)
177        sg = nx.DiGraph()
178        for k, v in self.cycle_flows(axes, rm_negative_and_zero=True).items():
179            sg.add_weighted_edges_from([(u, v, d['weight']) for u, v, d in v.edges(data=True)], k)
180        out = nx.DiGraph()
181        out.add_weighted_edges_from([(u, v, merge_func(list(d.values()))) for u, v, d in sg.edges(data=True)])
182        return out

compute multiple max flows along a cycle of key vertices, and merge the flows into one graph.

The max flows used are rid of negative and zero flow weights. It's still possible that a single edge may encounter multiple positive flows, meaning some paths can be reused between different key vertices. Then it's necessary to determine how they are merged.

Parameters
  • axes: the key vertices.
  • merge_func: how to merge multiple weights on a single edge. By default, they are summed.
Returns

a flow graph with 'weight' as its edge.

class ShortestPathLoopSearch(GraphMaintainer):
185class ShortestPathLoopSearch(GraphMaintainer):
186    """
187    Using networkx directed graph for implementing loop search algorithms based on shortest paths.
188
189    As this method uses shortest paths, you should make sure the edge weights are meaningful so that a
190    smaller weight means a stronger connection. The shortest path aggregates the weights along the path, so that
191    summation is also meaningful.
192
193    With this class, you can get loops in forms of exact lists of nodes along the loop.
194
195    The shortest path algorithm uses the 'weight' attribute in the networkx graph.
196    """
197
198    def __init__(self):
199        super().__init__()
200        self.sssp = None
201
202    def init(self):
203        # bellman ford
204        self.sssp = {}
205        for k, v in tqdm(nx.all_pairs_bellman_ford_path(self.graph), total=self.graph.number_of_nodes()):
206            self.sssp[k] = v
207
208    def chain_screen(self, n_axis=2, top: int = None, must_include: typing.Iterable = None, allow_knots=False,
209                     axis_pool: typing.Iterable = None, axis_must_include: typing.Iterable = None, priority=np.sum):
210        """
211        Using connected single source shortest paths to find loops. You should ensure that in your graph smaller edge
212        weights means stronger connection for this method.
213
214        First, use Bellman Ford to find the shortest paths between all regions. Then loops can be found by
215        pairing any shortest paths like A -> B and B -> A. You can also find loops with more anchors' shortest path like A -> B,
216        B -> C, C -> A, and even more. But with more anchors, the computation will be greatly slowed down, and number of
217        discovery will be greatly expanded.
218
219        When the search number is very great, you can use `top` to have an early stop. The shortest path are sorted such that a
220        loop with minimal shortest path element will be retrieved first, but it won't guarantee the final loop will be ordered
221        ascending. Besides, you can also set the must included nodes in the loop so that those without are skipped.
222
223        The method attempts to located unique loops by the axes, i.e. one set of axes corresponding to one loop.
224        It's likely that multiple axes can be connected in multiple ways, in which case once finding a shorter one a
225        replacement will occur. The purpose is for balance between the uniqueness and computation.
226        Since the shortest path will also be returned, you can check other connection ways easily.
227
228        This can fail to give a loop between a set of axis, because there may be overlap between their shortest path, thus giving
229        rise to smaller loops, whereas there may exist an exact loop that contain non shortest paths but unique vertices.
230        Therefore, the shortest path from A to B here is better off interpreted as the most preferable path from A to B,
231        which means other paths from A to B are not considered.
232
233        :param n_axis: the number to consider for connecting shortest path into a loop. Larger n_axis will significantly
234            increase the computation
235        :param top: the most prominent results. This can speed up as it searches from the most probable loops and can
236            have an early stop. Default as None to turn off.
237        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
238            Default as None to turn off.
239        :param must_include: the vertices that must appear in the search result.
240        :param axis_must_include: the vertices that must appear in axes, and must be subset of `axis_pool`. This check
241            is independent of `must_include`.
242        :param allow_knots: whether to allow duplicate nodes (but not duplicated with axes).
243        :param priority: the edge weight aggregation function for shortest path sorting. Default as sum, you can also use, for
244        example, mean to make up for shortest path that travel through many vertices.
245        :return: a dict of axis mapped to loop in the form of loops[(a,b,c)] -> [[a, ..., b], [b, ..., c], [c, ..., a]],
246            where the key (a,b,c) is a sorted tuple of axes, doesn't necessarily mean a -> b -> c -> a,
247            and shortest path in the form of sssp[from_node][to_node] -> [from_node, ..., to_node]
248        """
249        if must_include is not None:
250            must_include =  set(must_include)
251        if axis_must_include is not None:
252            axis_must_include = set(axis_must_include)
253            assert len(axis_must_include) <= n_axis, "number of must included axis is more than specified"
254        if axis_pool is not None:
255            axis_pool = set(axis_pool)
256            assert len(axis_pool) >= n_axis, "size of axis pool is smaller than specified"
257            if axis_must_include is not None:
258                assert axis_must_include.issubset(axis_pool), "must included axis should be subset of axis pool"
259
260        if self.sssp is None:
261            self.init()
262
263        # do an edge sorting beforehand, ascending
264        # vertices not in axis pool (if not None) will be omitted
265        connection = []
266        for k1, v1 in self.sssp.items():
267            if axis_pool is not None and k1 not in axis_pool:
268                continue
269            for k2, v2 in v1.items():
270                if axis_pool is not None and k2 not in axis_pool:
271                    continue
272                if k1 == k2:
273                    continue
274                w = []
275                for i, j in itertools.pairwise(v2):
276                    w.append(self.graph[i][j]['weight'])
277                connection.append([k1, k2, priority(w)])
278        connection.sort(key=lambda x: x[2])
279        sorted_sssp = OrderedDict()
280        # sorted_sssp[k1][k2] -> (length, list of vertices)
281        # and k2 mapping is ordered
282        for i, (k1, k2, w) in enumerate(connection):
283            if k1 not in sorted_sssp:
284                sorted_sssp[k1] = OrderedDict()
285            sorted_sssp[k1][k2] = (w, self.sssp[k1][k2])
286
287        max_iter = [0 if top is None else top]
288        loops = {}
289
290        def dfs_loop_search(loop: list, visited: set, length: float):
291            # finishing, the last part of the cycle,
292            if len(loop) == n_axis - 1:
293                # check the existence of the last part
294                k1 = loop[-1][-1]
295                k2 = loop[0][0]
296                if k1 not in sorted_sssp or k2 not in sorted_sssp[k1]:
297                    yield
298                last_part = sorted_sssp[k1][k2][1]
299
300                # check knot
301                sl = set(last_part[1:-1])
302                new_visited = visited | sl
303                if not allow_knots and len(new_visited) != len(visited) + len(last_part) - 2 or \
304                        not sl.isdisjoint(i[0] for i in loop):
305                    yield
306                loop.append(last_part)
307                length += sorted_sssp[k1][k2][0]
308                axis = tuple(sorted(i[0] for i in loop))
309
310                if (must_include is None or must_include.issubset(new_visited)) and \
311                        (axis_must_include is None or axis_must_include.issubset(axis)):
312                    if axis in loops:
313                        # check if smaller than existed
314                        if loops[axis]['length'] < length:
315                            yield
316                        else:
317                            max_iter[0] += 1
318                    max_iter[0] -= 1
319                    loops[axis] = {
320                        'loop': loop,
321                        'length': length,
322                    }
323            # start
324            elif len(loop) == 0:
325                for k1, k2, w in connection:
326                    v = sorted_sssp[k1][k2][1]
327                    yield dfs_loop_search([v], set(v), w)
328            else:
329                k1 = loop[-1][-1]   # last axis
330                for k, (w, c) in sorted_sssp[k1].items():
331                    sc = set(c[1:])
332                    new_visited = visited | sc
333                    if len(new_visited) == len(visited) + len(c) - 1 or \
334                            allow_knots and sc.isdisjoint(i[0] for i in loop):
335                        yield dfs_loop_search(loop + [c], new_visited, length + w)
336            yield
337
338        gen = dfs_loop_search([], set(), 0)
339        stk = []
340        while top is None or max_iter[0] > 0:
341            if isinstance(gen, typing.Generator):
342                stk.append(gen)
343                gen = next(gen)
344            else:
345                stk.pop()
346                if not stk:
347                    break
348                gen = stk[-1].send(gen)
349
350        return loops
351
352    def pair_complement(self, axis_pool: typing.Iterable = None):
353        """
354        Given a pair of vertices, the loop is defined as the direct connection between them plus the inverted shortest path.
355        For example, with A and B, the loop can be either edge AB + the shortest path from B to A
356        or edge BA + the shortest path from A to B. This comes in handy when shortest paths from A to B and B to A have
357        intersection, leading to knots in a loop.
358
359        This can also be much faster and doesn't need speedup or sorting.
360
361        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
362            Default as None to turn off, and it will try every pair of vertices as long as there's a direct edge.
363        :return: a dict of loops, keys are the direct edges and values are the corresponding inverse shortest path.
364        """
365        if axis_pool is not None:
366            axis_pool = set(axis_pool)
367        if self.sssp is None:
368            self.init()
369        out = {}
370        for fro, to in self.graph.edges:
371            if axis_pool is not None and (fro not in axis_pool or to not in axis_pool) or to == fro:
372                continue
373            if to in self.sssp and fro in self.sssp[to]:
374                out[(fro, to)] = self.sssp[to][fro]
375        return out

Using networkx directed graph for implementing loop search algorithms based on shortest paths.

As this method uses shortest paths, you should make sure the edge weights are meaningful so that a smaller weight means a stronger connection. The shortest path aggregates the weights along the path, so that summation is also meaningful.

With this class, you can get loops in forms of exact lists of nodes along the loop.

The shortest path algorithm uses the 'weight' attribute in the networkx graph.

sssp
def init(self):
202    def init(self):
203        # bellman ford
204        self.sssp = {}
205        for k, v in tqdm(nx.all_pairs_bellman_ford_path(self.graph), total=self.graph.number_of_nodes()):
206            self.sssp[k] = v
def chain_screen( self, n_axis=2, top: int = None, must_include: Iterable = None, allow_knots=False, axis_pool: Iterable = None, axis_must_include: Iterable = None, priority=<function sum>):
208    def chain_screen(self, n_axis=2, top: int = None, must_include: typing.Iterable = None, allow_knots=False,
209                     axis_pool: typing.Iterable = None, axis_must_include: typing.Iterable = None, priority=np.sum):
210        """
211        Using connected single source shortest paths to find loops. You should ensure that in your graph smaller edge
212        weights means stronger connection for this method.
213
214        First, use Bellman Ford to find the shortest paths between all regions. Then loops can be found by
215        pairing any shortest paths like A -> B and B -> A. You can also find loops with more anchors' shortest path like A -> B,
216        B -> C, C -> A, and even more. But with more anchors, the computation will be greatly slowed down, and number of
217        discovery will be greatly expanded.
218
219        When the search number is very great, you can use `top` to have an early stop. The shortest path are sorted such that a
220        loop with minimal shortest path element will be retrieved first, but it won't guarantee the final loop will be ordered
221        ascending. Besides, you can also set the must included nodes in the loop so that those without are skipped.
222
223        The method attempts to located unique loops by the axes, i.e. one set of axes corresponding to one loop.
224        It's likely that multiple axes can be connected in multiple ways, in which case once finding a shorter one a
225        replacement will occur. The purpose is for balance between the uniqueness and computation.
226        Since the shortest path will also be returned, you can check other connection ways easily.
227
228        This can fail to give a loop between a set of axis, because there may be overlap between their shortest path, thus giving
229        rise to smaller loops, whereas there may exist an exact loop that contain non shortest paths but unique vertices.
230        Therefore, the shortest path from A to B here is better off interpreted as the most preferable path from A to B,
231        which means other paths from A to B are not considered.
232
233        :param n_axis: the number to consider for connecting shortest path into a loop. Larger n_axis will significantly
234            increase the computation
235        :param top: the most prominent results. This can speed up as it searches from the most probable loops and can
236            have an early stop. Default as None to turn off.
237        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
238            Default as None to turn off.
239        :param must_include: the vertices that must appear in the search result.
240        :param axis_must_include: the vertices that must appear in axes, and must be subset of `axis_pool`. This check
241            is independent of `must_include`.
242        :param allow_knots: whether to allow duplicate nodes (but not duplicated with axes).
243        :param priority: the edge weight aggregation function for shortest path sorting. Default as sum, you can also use, for
244        example, mean to make up for shortest path that travel through many vertices.
245        :return: a dict of axis mapped to loop in the form of loops[(a,b,c)] -> [[a, ..., b], [b, ..., c], [c, ..., a]],
246            where the key (a,b,c) is a sorted tuple of axes, doesn't necessarily mean a -> b -> c -> a,
247            and shortest path in the form of sssp[from_node][to_node] -> [from_node, ..., to_node]
248        """
249        if must_include is not None:
250            must_include =  set(must_include)
251        if axis_must_include is not None:
252            axis_must_include = set(axis_must_include)
253            assert len(axis_must_include) <= n_axis, "number of must included axis is more than specified"
254        if axis_pool is not None:
255            axis_pool = set(axis_pool)
256            assert len(axis_pool) >= n_axis, "size of axis pool is smaller than specified"
257            if axis_must_include is not None:
258                assert axis_must_include.issubset(axis_pool), "must included axis should be subset of axis pool"
259
260        if self.sssp is None:
261            self.init()
262
263        # do an edge sorting beforehand, ascending
264        # vertices not in axis pool (if not None) will be omitted
265        connection = []
266        for k1, v1 in self.sssp.items():
267            if axis_pool is not None and k1 not in axis_pool:
268                continue
269            for k2, v2 in v1.items():
270                if axis_pool is not None and k2 not in axis_pool:
271                    continue
272                if k1 == k2:
273                    continue
274                w = []
275                for i, j in itertools.pairwise(v2):
276                    w.append(self.graph[i][j]['weight'])
277                connection.append([k1, k2, priority(w)])
278        connection.sort(key=lambda x: x[2])
279        sorted_sssp = OrderedDict()
280        # sorted_sssp[k1][k2] -> (length, list of vertices)
281        # and k2 mapping is ordered
282        for i, (k1, k2, w) in enumerate(connection):
283            if k1 not in sorted_sssp:
284                sorted_sssp[k1] = OrderedDict()
285            sorted_sssp[k1][k2] = (w, self.sssp[k1][k2])
286
287        max_iter = [0 if top is None else top]
288        loops = {}
289
290        def dfs_loop_search(loop: list, visited: set, length: float):
291            # finishing, the last part of the cycle,
292            if len(loop) == n_axis - 1:
293                # check the existence of the last part
294                k1 = loop[-1][-1]
295                k2 = loop[0][0]
296                if k1 not in sorted_sssp or k2 not in sorted_sssp[k1]:
297                    yield
298                last_part = sorted_sssp[k1][k2][1]
299
300                # check knot
301                sl = set(last_part[1:-1])
302                new_visited = visited | sl
303                if not allow_knots and len(new_visited) != len(visited) + len(last_part) - 2 or \
304                        not sl.isdisjoint(i[0] for i in loop):
305                    yield
306                loop.append(last_part)
307                length += sorted_sssp[k1][k2][0]
308                axis = tuple(sorted(i[0] for i in loop))
309
310                if (must_include is None or must_include.issubset(new_visited)) and \
311                        (axis_must_include is None or axis_must_include.issubset(axis)):
312                    if axis in loops:
313                        # check if smaller than existed
314                        if loops[axis]['length'] < length:
315                            yield
316                        else:
317                            max_iter[0] += 1
318                    max_iter[0] -= 1
319                    loops[axis] = {
320                        'loop': loop,
321                        'length': length,
322                    }
323            # start
324            elif len(loop) == 0:
325                for k1, k2, w in connection:
326                    v = sorted_sssp[k1][k2][1]
327                    yield dfs_loop_search([v], set(v), w)
328            else:
329                k1 = loop[-1][-1]   # last axis
330                for k, (w, c) in sorted_sssp[k1].items():
331                    sc = set(c[1:])
332                    new_visited = visited | sc
333                    if len(new_visited) == len(visited) + len(c) - 1 or \
334                            allow_knots and sc.isdisjoint(i[0] for i in loop):
335                        yield dfs_loop_search(loop + [c], new_visited, length + w)
336            yield
337
338        gen = dfs_loop_search([], set(), 0)
339        stk = []
340        while top is None or max_iter[0] > 0:
341            if isinstance(gen, typing.Generator):
342                stk.append(gen)
343                gen = next(gen)
344            else:
345                stk.pop()
346                if not stk:
347                    break
348                gen = stk[-1].send(gen)
349
350        return loops

Using connected single source shortest paths to find loops. You should ensure that in your graph smaller edge weights means stronger connection for this method.

First, use Bellman Ford to find the shortest paths between all regions. Then loops can be found by pairing any shortest paths like A -> B and B -> A. You can also find loops with more anchors' shortest path like A -> B, B -> C, C -> A, and even more. But with more anchors, the computation will be greatly slowed down, and number of discovery will be greatly expanded.

When the search number is very great, you can use top to have an early stop. The shortest path are sorted such that a loop with minimal shortest path element will be retrieved first, but it won't guarantee the final loop will be ordered ascending. Besides, you can also set the must included nodes in the loop so that those without are skipped.

The method attempts to located unique loops by the axes, i.e. one set of axes corresponding to one loop. It's likely that multiple axes can be connected in multiple ways, in which case once finding a shorter one a replacement will occur. The purpose is for balance between the uniqueness and computation. Since the shortest path will also be returned, you can check other connection ways easily.

This can fail to give a loop between a set of axis, because there may be overlap between their shortest path, thus giving rise to smaller loops, whereas there may exist an exact loop that contain non shortest paths but unique vertices. Therefore, the shortest path from A to B here is better off interpreted as the most preferable path from A to B, which means other paths from A to B are not considered.

Parameters
  • n_axis: the number to consider for connecting shortest path into a loop. Larger n_axis will significantly increase the computation
  • top: the most prominent results. This can speed up as it searches from the most probable loops and can have an early stop. Default as None to turn off.
  • axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space. Default as None to turn off.
  • must_include: the vertices that must appear in the search result.
  • axis_must_include: the vertices that must appear in axes, and must be subset of axis_pool. This check is independent of must_include.
  • allow_knots: whether to allow duplicate nodes (but not duplicated with axes).
  • priority: the edge weight aggregation function for shortest path sorting. Default as sum, you can also use, for example, mean to make up for shortest path that travel through many vertices.
Returns

a dict of axis mapped to loop in the form of loops[(a,b,c)] -> [[a, ..., b], [b, ..., c], [c, ..., a]], where the key (a,b,c) is a sorted tuple of axes, doesn't necessarily mean a -> b -> c -> a, and shortest path in the form of sssp[from_node][to_node] -> [from_node, ..., to_node]

def pair_complement(self, axis_pool: Iterable = None):
352    def pair_complement(self, axis_pool: typing.Iterable = None):
353        """
354        Given a pair of vertices, the loop is defined as the direct connection between them plus the inverted shortest path.
355        For example, with A and B, the loop can be either edge AB + the shortest path from B to A
356        or edge BA + the shortest path from A to B. This comes in handy when shortest paths from A to B and B to A have
357        intersection, leading to knots in a loop.
358
359        This can also be much faster and doesn't need speedup or sorting.
360
361        :param axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space.
362            Default as None to turn off, and it will try every pair of vertices as long as there's a direct edge.
363        :return: a dict of loops, keys are the direct edges and values are the corresponding inverse shortest path.
364        """
365        if axis_pool is not None:
366            axis_pool = set(axis_pool)
367        if self.sssp is None:
368            self.init()
369        out = {}
370        for fro, to in self.graph.edges:
371            if axis_pool is not None and (fro not in axis_pool or to not in axis_pool) or to == fro:
372                continue
373            if to in self.sssp and fro in self.sssp[to]:
374                out[(fro, to)] = self.sssp[to][fro]
375        return out

Given a pair of vertices, the loop is defined as the direct connection between them plus the inverted shortest path. For example, with A and B, the loop can be either edge AB + the shortest path from B to A or edge BA + the shortest path from A to B. This comes in handy when shortest paths from A to B and B to A have intersection, leading to knots in a loop.

This can also be much faster and doesn't need speedup or sorting.

Parameters
  • axis_pool: a list of vertices for axes to choose from, which can largely narrow down the search space. Default as None to turn off, and it will try every pair of vertices as long as there's a direct edge.
Returns

a dict of loops, keys are the direct edges and values are the corresponding inverse shortest path.