Online construction of a minimal spanning tree (MST)

Remark: The motivation for this problem was the construction of a Java Applet as part of a Pattern Recognition Project.

We examine the problems of calculating a MST given: or

Deleting, although an interesting problem, is not the one we will discuss in detail here. However, it does merit some mentioning. One can delete a point by keeping a Delaunay Triangulation, which can be maintained online in linear time for each addition or deletion. When a point from an MST is deleted, we are left with d - 1 subtrees, where d is the degree of the vertex being deleted. We are then left with the problem of selecting the shortest edges which connect these trees without creating a cycle. This can be done in O(d²n). [For minimal spanning trees in the plane based on Euclidean distance, d is bounded by the constant 6, and therefore this is O(n). This can be demonstrated in a similar fashion to the proof of Theorem 3 below.]

If anyone has an elegant way of deleting a point from a MST in linear time that doesn't involve carrying around another geometric structure (such as the Delaunay triangulation), or any ideas, I'd love to know, and I'll put it up on this page (with due credit given, of course!).



A linear-time algorithm for adding a point into a MST

Remark: This algorithm is optimal. I'll leave this as an exercise. It's not hard.

Problem Statement: Given a MST of n points, insert a new point x and calculate a MST of the n + 1 points.

Our algorithm has two main parts, as follows:
  1. First, we mark the set of points {p} that are near enough to x to be considered for connecting x to p. We refer to these {p} as the "near points of x," or just "near points" for brevity.
  2. Next, we decide whether or not to connect x to each p. If we can connect x to p and get rid of a longer edge in the MST, then we do so.

The first part of the addition consists of calculating the near points of x, which we will now define by the following algorithm.

Algorithm NearPoints(set S, vertex x)
  1. V <-- S
  2. while (V is not empty) do
  3.           c <-- the vertex in V which is closest to x [This step requires linear time.]
  4.           mark c
  5.           for each vertex p in V do
  6.                     if distance(p,x) >= distance(p,c) then
  7.                               delete p from V
  8.                     endif
  9.           endfor
  10. endwhile
  11. return the marked vertices
end Algorithm NearPoints(S,x)

We define the near points of x as the vertices returned.

The idea here is that when we consider how to connect x to the MST of S, we only need to consider the edges from x to its near points. [We will prove this later, but for now, let's just accept this as fact and see the algorithm.] x will always be connected to its nearest neighbor, a fact which can be proven by examining Kruskal's algorithm and the knowledge that the Nearest Neighbor Graph is acyclic. This gives us n - 1 edges, the number of edges in a tree. Thus, for each other near point to which x is connected, we have to delete some edge in the tree. Since we're interested in the MINIMAL spanning tree, we want to delete the longest edge possible while maintaining a tree structure.

Algorithm AddToMST(minimal spanning tree T, vertex x)
  1. N <-- NearPoints(vertices of T,x)
  2. z <-- the point in N which is closest to x (in other words, x's nearest neighbor)
  3. add an edge between x and z
  4. N <-- N - {z}
  5. for each vertex p in N,
  6.           E <-- LongestEdgeInPath(x,p,T)
  7.           if (distance(x,p) <= length(E)) then
  8.                     delete edge E from T
  9.                     add edge {x,p} to T
  10.           endif
  11. endfor
  12. return T
end Algorithm AddToMST(T,x)

LongestEdgeInPath(vertex x, vertex p, tree T) returns the longest edge in the simple path in T from vertex x to vertex p.


Complexity

Before I discuss the proof of correctness, I will discuss the complexity of the algorithms.

In NearPoints, steps 3-9 consist of an examination of O(n) points, where n is the number of points in the set S. We iterate this from steps 2-10, a number of times equal to the number of points marked. Therefore, NearPoints runs in time O(kn), where k is the number of near points outputted. [We will later show that k is at most 5, and therefore our algorithm runs in time O(n).]

In AddToMST, step 1 runs in time O(kn), as we determined above. The remainder of the algorithm is dominated by k calls to LongestEdgeInPath. LongestEdgeInPath can be computed via a depth first search, which is O(E), where E is the number of edges. Since we have a tree, O(E) is O(n). There are k of these searches, so this section of the algorithm runs in time O(kn), where k is the number of near points of x. Therefore, AddToMST runs in time O(kn).



Proof of Correctness



We will prove two facts:
1. Running AddToMST with a vertex and a MST of n vertices returns a MST of the n + 1 vertices.
2. The set of the near points of x has at most 5 elements, and therefore AddToMST runs in linear time.

Lemma 1. Let S be a set of points in the plane, and T be a MST of S. Then for any three points x, c, and p in S such that
distance(x,p) > distance(x,c), and
distance(p,x) > distance(p,c),
the edge {x,p} does not appear in T.


Remark. Let T be a graph of n nodes. Then T is a tree if and only if it contains no cycles and is connected (which is equivalent to having exactly n - 1 edges and being connected). An undirected graph is connected if and only if from some vertex in the graph, there is a path to any other vertex.

Proof. Let x, p, and p be three points in S such that
distance(x,p) > distance(x,c),
distance(p,x) > distance(p,c).
Let T be a tree of S in which the edge {x,p} appears.
To show that T is not an MST of S, we now consider two cases:
  1. The path in T from c to p contains point x.
  2. The path in T from c to p does not contain point x.
Case 1. The path in T from c to p contains point x.
Let T' be the graph constructed by removing the edge {x,p} from T and adding {c,p} (which cannot be in T without causing a cycle). Note that T' has n - 1 edges. Now we show that T' is connected, and is therefore a tree.

Claim: In T', there is a path from x to any vertex r.
Examine the path in T from x to r. If this path does not contain point p, then the same path exists in T', and the claim is true. If this path does contain point p, then there exists a path in T' as follows:
  • The path in T from x to c is also in T', since this path does not contain p (by the assumption of case 1).
  • Follow the edge {c,p} in T'.
  • The path in T from p to r is also in T', since this path does not contain x (by the assumption that the path x to r contains point p).

    Therefore, T' is connected, and is a tree. T' is like T, except it does not contain edge {x,p} but instead contains edge {c,p}. Since distance(x,p) > distance(c,p), the total length of the edges of T is greater than the total length of the edges in T'. Thus T is not a MST of S.

    Case 2. The path in T from c to p does not contain point x.
    Let T' be the graph constructed by removing the edge {x,p} from T and adding {x,c} (which cannot be in T without causing a cycle). Note that T' has n - 1 edges. We again show that T' is connected, in a similar method to case 1.

    Claim: In T', there is a path from x to any vertex r.
    Examine the path in T from p to r. If this path does not contain point x, then the same path exists in T', and the claim is true. If this path does contain point x, then there exists a path in T' as follows:
  • The path p to c in T is also in T', since this path does not contain x (by the assumption of case 2).
  • Follow the edge {c,x} in T'.
  • The path in T from x to r is also in T', since this path does not contain p (by the assumption that the path p to r contains point x).

    Therefore, T' is connected, and is a tree. Since distance(x,p) > distance(c,p), the total length of the edges of T is greater than the total length of the edges in T'. Thus T is not an MST of S.

    QED



    Lemma 2. Let S be a set of points in the plane. Then we can construct a MST T of S such that for any points x, c in S, and the set
    W = {p in S: distance(x,p) >= distance(x,c), and
              distance(p,x) >= distance(p,c)},
    the edge {x,p} appears in T for at most one element p in W. Furthermore, we can impose that this p be c.

    Proof. It is easier to prove the latter part of the lemma first, and this is what we will do here. We will prove the following:
    1. Suppose the edge {x,c} is not in T. Then for any p in W, if the edge {x,p} is in T, we can remove it and add either {x,c} to obtain T', or {c,p} to obtain T''. One of these is an MST.
    2. We can construct a MST such that x shares an edge with at most one element of W.
    Proof of statement 1. T' and T'' have the same number of edges as T, so we only have to prove that one of them has no cycles to show it is a tree.
    Case 1: The path in T from c to p contains x.
    Case 2: The path in T from c to p does not contain x.

    If case 1 is true, then we can replace {x,p} with {c,p} without creating a cycle. This is because there is no path in T from c to p that does not go through x, so {c,p} cannot create a cycle.

    Why is there no path in T from c to p that does not go through x? Suppose there were. Then a path from c to x followed by {x,p} would be a path from c to p. But by case 1 there is a path from c to p that does not contain x. Thus there are two paths from c to p, and we have a cycle.

    If case 2 is true, then we can replace {x,p} with {x,c} without creating a cycle. This is because there is no path in T from x to c that does not go through p.

    Why is there no path in T from x to c that does not go through p? Suppose there were. Then a path from x to c followed by {x,p} would be a path from c to p. But by case 2 there is a path from c to p that does not contain x. Thus there are two paths from c to p, and we have a cycle. This proves statement 1.

    Proof of statement 2. By the first statement, we can either move one edge between x and a vertex in W to {x,c}, or we can move it to {c,p}. Therefore, we can either remove all the edges between x and points in W, or we can delete one of these edges, say {x,q} (q != c and q is in W) and replace it with {x,c}. Then for all v in W - {c,q}, we can delete {x,v} (if it exists) and replace it with {c,v}.

  • {c,v} was not previously in the tree, since {x,c} and {x,v} were (and {c,v} would make a cycle.)
  • {c,v} cannot create a cycle, since otherwise {x,c} and {x,v} would have created a cycle.
  • The number of edges in the graph is maintained with no cycle, so we have a tree.

    We have replaced the edges {x,v} with {c,v} where v is an element of W. Since every element p of W has the condition that distance(p,x) >= distance(p,c), all the edges that we added are at least as short as the edges we removed. Therefore our new tree has total edge length no greater than the original tree. Since the original tree was an MST, this new tree with at most one edge between x and an element of W is also an MST. This proves statement 2.


    With these two statements, we can create an MST in which there is only an edge between x and exactly one vertex v of W, and we can force v to be c.

    This allows us to show that for one point p which is not a near point, we can create an MST which does not include the edge {x, p}. However, we need to show this for all points that are not near points. We observe that since Lemma 2 only requires the replacement of the edge {x,p} in T' with {x,c}, we can create T from T' without disturbing the rest of the graph. That is, T'\{x,p} = T\{x,c}. Therefore, we can eliminate edges from x to all points except its near points while maintaining an MST. This proves the lemma.

    QED



    Theorem 1. Let S be a set of points in the plane. Then for any vertex x in S, there exists a MST T of S such that the edge {x,p} appears in T only if p is a near point of x.

    Proof. Suppose there exists an edge {x,p} in some MST T', and that p is not a near point of x. Then there exists some near point c such that distance(p,x) >= distance(p,c) since the only manner in which p is not a near point is if it is ruled out in step 6 of the algorithm. Furthermore, c was the closest neighbor to x before p was ruled out, according to step 3 of the algorithm. Therefore, we also have the condition that distance(x,p) >= distance(x,c). Thus, p is a member of the set
    W = {p in S: distance(x,p) >= distance(x,c), and
              distance(p,x) >= distance(p,c)}.

    By Lemma 2, if the edge {x,p} appears in T', we can construct a MST T such that {x,p} appears in T only if p = c. Therefore, if p is not a near point of x, then we can construct a MST which does not include the edge {x,p}.

    QED




    Let's break away from the formalism for a moment to discuss what is intuitively going on. Algorithm AddToMST(T,x) utilizes Theorem 1 in its construction in the sense that it only considers the edges from x to its near points. In other words, AddToMST(T,x) is calculating some MST T whose existence is proven in Theorem 1 with respect to the point x.

    Okay, now what we've all been waiting for . . .


    Theorem 2. Let S be a set of points in the plane on which a MST T has been computed. Let x be a point not in S. Then AddToMST(T,x) returns a MST of the set S + {x}.

    In any MST, each point will be connected to its nearest neighbor. This can be verified by examining Kruskal's algorithm for computing a MST. Since the Nearest Neighbor Graph has no cycles, at some point in Kruskal's algorithm, the tree will be a superset of the Nearest Neighbor Graph. Thus we can connect x to its nearest neighbor z as in step 3 of AddToMST, because this edge will appear in all MST's of S + {x}. Let us refer to the graph obtained by this step as G.

    Note that G has no cycle and exactly |S + {x}| - 1 edges. Thus G is a tree. Now for every other edge we add, we have to delete an edge to maintain the tree structure.

    Adding another edge from x to a near point p would create a cycle. Thus, the algorithm considers placing it in only if it can delete a longer edge which would break this cycle. The longest edge along this cycle is either on the path from x to p, or is the edge {x,p} itself. Call this longest edge E (which may or may not be {x,p}). If we add {x,p}, and remove E, then new tree G' has total edge length at least as small as that of G.

    Remark: Note that adding {x,p} and removing E is equivalent to steps 6 to 9 in the algorithm. [If (x,p) is the longer than the longest edge on the path from x to p, then we do nothing because the edge {x,p} is the longest edge in the cycle. This is equivalent to adding {x,p} and removing {x,p}.]

    We do this for every near point of x, then we obtain a new graph whose total edge length is at least small as the total edge length of G'. Call this new graph H. Note that H is the graph that is returned by AddToMST.

    First, note that H is a tree, since it has the proper number of edges and no cycle. Suppose there is some tree H' obtained by deleting edges in H and adding a like number of edges without creating a cycle. Then for every edge {a,b} in H not in H', there is some edge {j,k} on the path from a to b in H' where {j,k} is not in H. [One of the edges on the path from a to b in H' is not in H; otherwise there would be a cycle in H. And of course, there must be a path from a to b in H', or else H' is not a tree.]

    Claim: The total edge length of H' is at least as large as that of H.
    Suppose that H' is shorter than H. Then for some {a,b} in H not in H', some edge {j,k} in H' not in H which lies on the path in H' from a to b is shorter than {a,b}. Let F be the graph H minus the edge {a,b} plus the edge {j,k}. We now show that F is a tree, by demonstrating that there exists a path in F from vertex a to any other vertex r. If the path in H from a to r does not contain verte x b, then this path is also in F. If the path does contain point r, then the path in F is as follows:
  • The path a to j in H is also in F, since this path does not contain b (by the assumption that the path from a to b contains points j and k).
  • Follow the edge {j,k} in F.
  • The path k to b in H is also in F, since this path does not contain a (by the assumption that the path from a to b contains points j and k).
  • The path in H from b to r is also in F, since this path does not contain a (by the assumption that the path from a to r contains b).

    Thus, F is a spanning tree of S. F is identical to H except it contains {j,k} instead of {a,b}. Since {j,k} is shorter than {a,b}, the total edge length of F is shorter than that of H. So H isn't a minimal spanning tree, and we have a contradiction. This proves the claim.

    Therefore, AddToMST computes some spanning tree T of S + {x}. Any other spanning tree of S + {x} will have total edge length at least as large as T. Therefore T is a minimal spanning tree of S + {x}.

    QED



    We demonstrated above that AddToMST(T,x) runs in time O(kn), where k is the number of near points of x. We now show that for the Euclidean distance metric, the number of near points of x is at most 5. Therefore AddToMST(T,x) runs in linear time.

    Theorem 3. Let S be a set of points in the plane, and x be a point in S. Then x has at most 5 near points by the Euclidean distance metric.

    Proof. For each near point c of x, no other point p can be a near point if p lies in the set
    W = {q: distance(x,p) >= distance(x,c), and
              distance(p,x) >= distance(p,c)}.

    Define Cone(x,c) as the closed cone centered at x, extending 60° above the line (x,c) and extending 60° below the line (x,c). This is shown by the solid lines in the diagram below.



    Note that the subspace R = {z: z is in Cone(x,c) and distance(z,x) >= distance(z,c)} is contained in the subspace W. Therefore there can be no near points inside R. [R is the space in the cone and outside the circle. The subspace W consists of points to the right of the dotted line and outside the circle.]

    Since c is a near point of x, there does not exist some other near point v such that
    distance(x,c) >= distance(x,v), and
    distance(c,x) >= distance(c,v).

    This leads us to the following claim.

    Claim: If c is a near point of x, then there are no near points inside the subspace U = {q: q is in Cone(c,x) and distance(q,x) < distance(c,x)}.
    Suppose there were some near point v inside U. Then
    distance(x,c) > distance(x,v), and
    distance(c,x) > distance(c,v).
    Then c is not a near point of x, since after v is marked in step 4 of the algorithm, c will be deleted from the list of possible candidates for the near points. This is a contradiction.

    Therefore, c is the only near point in Cone(x,c). This is equivalent to the fact that no closed cone of angle 60° centered at x contains more than one near point. Thus, for any two near points c and d, the angle cxd is 60° + e for some e > 0. Let µ = minimum{angle cxd - 60°} over all near points c and d (c != d). Since all e > 0, µ > 0. So for any two near points c and d, the angle cxd is at least 60° + µ. Since all near points have to fit in 360° around x, there can be at most 5 near points.

    QED



    Conclusion


    We have proven that AddToMST(T,x) returns a minimal spanning tree of the set S + {x} where S is the set of the nodes in T. We have also demonstrated that AddToMST runs in time O(kn), where k is the number of near points of x. Since the number of near points of any point is at most 5, AddToMST runs in time O(n).


    Questions? Comments? Typos? Counterexamples(gulp!)?  Please e-mail me.


    Mike Soss / soss@cs.mcgill.ca / 12 April 1997



    Keywords

    minimal spanning tree, near points, computational geometry, dynamic computational geometry, geometry