15 Feb 2023 Notes on the correctness of Kosaraju's algorithm Jaikumar Radhakrishnan I wrote the following claim in class. ====================================================================== Suppose w is reachable from v but w is not reachable from v in G, then after running dfs on G, we have postnumber(v) >= postnumber(w). This claim is FALSE. Please provide a counterexample. ====================================================================== To prove the correctness of Kosaraju's algorithm, we need to show that when the second dfs is performed, then for each strongly connected component C: (a) explore(v) is called for some vertex v in C from dfs(G); (b) during this exploration, a vertex w is visited iff it is in the same strongly connected component as v. We proceed as follows. (I consulted https://sharmaeklavya2.github.io/theoremdep/nodes/graph-theory/kosaraju-algo.html ) I. A general property valid for all DFS --- Claim 1: Let v and w be vertices of a graph H. (Warning: sometimes H will be G-reverse.) Consider the moment when explore(v) is called. Then, the following are equivalent. (a) w is reachable from v by a path passing through unvisited vertices; (b) w is a descendent of v in one of the trees constructed by dfs; (c) pre_number[v] <= pre_number[w] <= post_number[w] <= post_number[v] Proof. The claim (b) iff (c) is obvious because (b) holds iff explore(w) was called in the exploration initiated by explore(v). Again (b) -> (a) is straightforward, because an edge (x,y) is in the DFS tree only if visited[y] was False when visited[x] was called. To show (a) -> (b), let the path from v to w be v_0 = v -- v_1 -- v_2 ... -- v_k = w By our assumption, when explore(v) was called, visited[v_i] was False for i = 0,1, ..., k. Now, Let i be the largest index for which explore(v_i) is called during the exploration initiated by explore(v). Note that i is well defined, because 0 is in the set of such indices. Let us assume i < k, and derive a contradiction. Then, explore(v_{i+1}) was not called, so visited[v_{i+1}] remained False when explore(v_i) returned. But then, in the for-loop of explore(v_i), when its neighbour v_{i+1} was considered, explore(v_{i+1}) must have been called---a contradiction. Thus, for each i, explore(v_i) was called. In particular, explore(w) was called, in the exploration when explore(v) was called. Claim 1 follows from this. --- For an scc C of H, define pre_number[C] = min {pre_number[v]: v in C} post_number[C] = max {post_number[v]: v in C} --- Claim 2: Suppose v is the first vertex in C for which explore(v) was called during dfs(H). Then, pre_number[C] = pre_number[v] post_number[C] = post_number[v] Proof. Follows from Claim 1. --- --- Claim 3: Suppose C and D are distinct scc's in H, such that (C,D) is a directed edge in the DAG of strongly connected components (that is (v,w) is an edge of H for a v in C and a w in D). Then, post_number[D] < post_number[C] Proof. We have two cases. Case: pre_number[D] < pre_number[C]: Let w be the first vertex of D for which explore(w) was called. So, at the moment explore(w) was called, no vertex of C had been visited, and none of them will be visited in this exploration because there is no path from D to C. Fix a vertex v in C. Then, post_number[D] = post_number[w] < pre_number[v] < post_number[v] <= post_number[C]. Case pre_number[D] > pre_number[C]: Let v be the first vertex of C that was explored. Then, pre_number[v] = pre_number[C], and when v started to be explored no vertex of D was explored. It follows that when explore(v) was called, all vertices of D were reachable from C by paths that passed through unvisited vertices. Let w be the first vertex of D explored by dfs. Then, post_number[D] = post_number[w] (by definition) < post_number[v] (by Claim 1 and since v,w are distinct) <= post_number[C] So in both cases, the claim holds. --- --- Claim 4: If there is a path from C to D in the DAG of scc's of H, then post_number[C] >= post_number[D] Proof. Follows from Claim 3 and induction. --- II. Correctness of Kosaraju's algorithm --- Claim 5: Let C be a connected component of G. Consider the various explore calls issued from the outer dfs function of the second dfs. In any DFS of G, all the vertices of C are visited when for the the first time explore(v) is called for a vertex v from which C is reachable. Proof. Follows from Claim 1. --- --- Claim 6: In the (second) dfs of Kasaraju's algorithm, a vertex w is visited in the exploration when explore(v) is called from the outer dfs with the first vertex v (wrt to the new ordering) in the same scc as w. Proof. Let C be the scc of w. By Claims 4, in the dfs(G-reverse) (*) post_number[D] > post_number[C] for all OTHER scc D that are reachable from C. Thus, by the time the outer DFS considers v, all vertices in other scc's reachable from C will have been visited. Similarly, again by (*), no other scc's from which C can be reached will have been visited before the exploration considers v. To summarize, when v is considered by the outer (second) dfs -- v is unvisited -- All other scc D reachable from C have been visited -- No vertex of C has been visited By Claim 5, the vertices visited in this exploration starting with explore(v) are precisely the vertices in C. ---