15 Feb 2023
Notes on the correctness of Kosaraju's algorithm
Jaikumar Radhakrishnan


I wrote the following claim in class. 

======================================================================
Suppose w is reachable from v but w is not reachable from v in G, then
after running dfs on G, we have

postnumber(v) >= postnumber(w).

This claim is FALSE. Please provide a counterexample.
======================================================================

To prove the correctness of Kosaraju's algorithm, we need to show that
when the second dfs is performed, then for each strongly connected
component C:

(a) explore(v) is called for some vertex v in C from dfs(G);
(b) during this exploration, a vertex w is visited iff it is in the
same strongly connected component as v.

We proceed as follows. (I consulted
https://sharmaeklavya2.github.io/theoremdep/nodes/graph-theory/kosaraju-algo.html
)

I. A general property valid for all DFS

---
Claim 1: Let v and w be vertices of a graph H. (Warning: sometimes H
will be G-reverse.) Consider the moment when explore(v) is
called. Then, the following are equivalent.

(a) w is reachable from v by a path passing through unvisited vertices;
(b) w is a descendent of v in one of the trees constructed by dfs;
(c) pre_number[v] <= pre_number[w] <= post_number[w] <= post_number[v]

Proof. The claim (b) iff (c) is obvious because (b) holds iff explore(w) was
called in the exploration initiated by explore(v).

Again (b) -> (a) is straightforward, because an edge (x,y) is in the
DFS tree only if visited[y] was False when visited[x] was called. To
show (a) -> (b), let the path from v to w be

         v_0 = v -- v_1 -- v_2 ... -- v_k = w

By our assumption, when explore(v) was called, visited[v_i] was False
for i = 0,1, ..., k.  Now, Let i be the largest index for which
explore(v_i) is called during the exploration initiated by
explore(v). Note that i is well defined, because 0 is in the set of
such indices. Let us assume i < k, and derive a contradiction. Then,
explore(v_{i+1}) was not called, so visited[v_{i+1}] remained False
when explore(v_i) returned. But then, in the for-loop of explore(v_i),
when its neighbour v_{i+1} was considered, explore(v_{i+1}) must have
been called---a contradiction. Thus, for each i, explore(v_i) was
called. In particular, explore(w) was called, in the exploration when
explore(v) was called. Claim 1 follows from this.
---
 
For an scc C of H, define         

pre_number[C] = min {pre_number[v]: v in C}
post_number[C] = max {post_number[v]: v in C}

---
Claim 2: Suppose v is the first vertex in C for which explore(v) was
called during dfs(H). Then,

pre_number[C] = pre_number[v]
post_number[C] = post_number[v]

Proof. Follows from Claim 1.
---
---
Claim 3: Suppose C and D are distinct scc's in H, such that (C,D) is a directed
edge in the DAG of strongly connected components (that is (v,w) is an
edge of H for a v in C and a  w in D). Then,

post_number[D] < post_number[C]

Proof. We have two cases.

Case: pre_number[D] < pre_number[C]: Let w be the first vertex of D
for which explore(w) was called. So, at the moment explore(w) was
called, no vertex of C had been visited, and none of them will be
visited in this exploration because there is no path from D to C. Fix
a vertex v in C. Then,

post_number[D] = post_number[w] < pre_number[v] < post_number[v]
<= post_number[C].

Case pre_number[D] > pre_number[C]: Let v be the first vertex of C
that was explored. Then, pre_number[v] = pre_number[C], and when v
started to be explored no vertex of D was explored. It follows that
when explore(v) was called, all vertices of D were reachable from C by
paths that passed through unvisited vertices. Let w be the first
vertex of D explored by dfs. Then,

post_number[D] = post_number[w]   (by definition)
<  post_number[v] (by Claim 1 and since v,w are distinct)
<= post_number[C]

So in both cases, the claim holds.
---
---
Claim 4: If there is a path from C to D in the DAG of scc's of H, then

post_number[C] >= post_number[D]

Proof. Follows from Claim 3 and induction.
---

II. Correctness of Kosaraju's algorithm

---
Claim 5: Let C be a connected component of G. Consider the various
explore calls issued from the outer dfs function of the second dfs. In any DFS
of G, all the vertices of C are visited when for the the first time
explore(v) is called for a vertex v from which C is reachable.

Proof. Follows from Claim 1.
---

---
Claim 6: In the (second) dfs of Kasaraju's algorithm, a vertex w
is visited in the exploration when explore(v) is called from the outer
dfs with the first vertex v (wrt to the new ordering) in the same scc
as w.

Proof. Let C be the scc of w. By Claims 4, in the dfs(G-reverse)

(*)  post_number[D] > post_number[C]

for all OTHER scc D that are reachable from C. Thus, by the time the
outer DFS considers v, all vertices in other scc's reachable from C
will have been visited. Similarly, again by (*), no other scc's from
which C can be reached will have been visited before the exploration
considers v.  To summarize, when v is considered by the outer (second)
dfs

-- v is unvisited
-- All other scc D reachable from C have been visited
-- No vertex of C has been visited

By Claim 5, the vertices visited in this exploration starting with
explore(v) are precisely the vertices in C.

---