Binary heap, heapsort

A binary heap is an array-based data structure that is a representation of a complete (i.e. almost full (full i.e. each node has either zero or two children, and those with zero children are at same depth), that is as if from a full tree were possibly some depthmost nodes removed from rightmost but leaving at least one) binary tree — with children of each parent p being at 2p−b+1 and 2pb+2 where b is the first (leftmost) index of the array — that satisfies the heap property which states that children are, in case of max-heap, less or equal (in case of min-heap, greater or equal) to the parent.

For any given index x of a child node, the index p of its parent can be calculated with the formula ⌊(x+b−1)∕2⌋ where b is, again, the first (leftmost) index of the heap's underlying array. To avoid machine integer overflow, equivalent formula ⌊x∕2⌋+⌊((x mod 2)+b−1)∕2⌋ can be used. This formula is useful to calculate what would be the last parent node in a heap of a given length.

(b is 0 in standard C/Java/Go arrays and Python lists if whole array is used for the heap, and typically 1 in algorithm descriptions and also many programming languages)

In the pseudocode below, A'First is the first index of an array A, earlier mentioned in formulas as b.

procedure siftDown(A, e, f)
if ef then
t := f
furthest_parent := furthestParent(e)
while tfurthest_parent loop
i := leftChild(t)
j := t
if A[j] < A[i] then
j := i;
if i < e and then A[j] < A[i + 1] then
j := i+1
if j=t then
exit loop
swap(A[t], A[j])
t := j

To reiterate, what siftDown does is, starting with a given node, comparing the selected node to its children, and if the larger child is larger than the selected node, swapping them and repeating the procedure for the index of the node that we just moved down, otherwise being done.

It could be thought as taking the three nodes and rotating them (and then doing the same with the one previously on top), like what we know from balancing binary search trees, except here the order of the children doesn't matter and thus we make just one swap instead of two — but that can be considered as but an optimization.

Under the assumption that both (all, one if there is just one) subheaps — children of our given node — satisfied the heap property on their own, the (sub- — if f>b)heap resulting from siftDown satisfies the heap property entirely.

procedure heapify(A, e)
if eA'First then
furthest_parent := furthestParent(e)
for i := furthest_parent downto A'First loop
siftDown(A, e, i)

Since single-element (sub)heaps satisfy the heap property always, what heapify does is, by starting with the furthest node that is a parent to any children (one or two) and performing siftDown on it, thus always satisfying the precondition of siftDown by having the subheaps already performed upon. Thus from that, after it reaches the root of heap and performs siftDown on it, the whole heap satisfies the heap property.

procedure heapsort(A, e)
if eA'First then
heapify(A, e)
h := e - 1
while h > A'First then
swap(A, h, A'First)
h := h − 1
siftDown(A, h, A'First)

And heapsort first takes an array that is unassumed to be either sorted or a heap, then makes it satisfy the heap property entirely by performing heapify on it, then takes the largest element — from the heap property, the first one — and shortens the heap by putting it behind and placing the previously leaf-of-heap element on top, then performing siftDown on the lenght of the new heap which is one less than before — and then again, takes the largest element… so on. Finally it ends up with just one element, which is in same place as is it supposed to be by the order, the array being entirely sorted.

See also:

You might be also interested in: