Chapter 8 Incremental sorting
innovation("Incremental-Zacksort","E","Amortized Incremental-Zacksort")
dependency("Incremental-Zacksort","Zacksort")
innovation("Zackheaps","E","Symmetric (tie-handling) Quickheaps")
dependency("Zackheaps","Incremental-Zacksort")
innovation("Incremental-Frogsort","E","Amortized Incremental-Frogsort")
dependency("Incremental-Frogsort","Frogsort")
innovation("Frogsteps","E","Symmetric stable dictionary")
dependency("Frogsteps","Incremental-Frogsort")
So far we have looked at basic batch-sorting, where the set of sorted elements is known once the sorting starts. Incremental sorting deals with expanding sorted sets as data arrives, potentially in single elements (Online sorting).
8.1 Incremental insertion
The most known Online sorting algorithm is Insertionsort, which by default does this: integrating a single element in a sorted set. Unfortunately, Insertionsort has \(\mathcal{O}(N^2)\) cost, even if binary search is used to find the position in the sorted set at which to insert the new element (due to \(\mathcal{O}(N)\) move cost per inserted element). No algorithm is known that can expand a contiguous sorted set by a single element at only \(\mathcal{O}(\log{N})\) cost. All \(\mathcal{O}(N\log{N})\) incremental sorting algorithms somehow relax the requirement, that a sorted set is maintained in contiguous space at any time (after each single element expansion). Examples of relaxing the requirement of contiguous sorting is indexing such as binary trees. Examples of relaxing the requirement that the set is sorted at any time is amortized sorting which somehow collects the changes and does the real work in batches.
An example of a combined strategy is Librarysort (Bender, Farach-Colton, and Mosteiro (2006)), a variant of Insertionsort that keeps random gaps between the sorted elements (hence not contiguous). With enough random gaps inserting a elements requires only local moves and hence \(\mathcal{O}(1)\) insertion cost - like inserting a book in a book-shelf. Once the amount of gaps gets too low, physical rearrangement in a larger piece of memory costs \(\mathcal{O}(N)\) which amortizes to \(\mathcal{O}(N\log{N})\). In spite of everyone’s’ experience with book-shelves this algorithms was discovered as late as 2006, and it has the underappreciated ability to sort elements of varying size.
A classic algorithm for real-time application is the maintenance of a heap, which integrates a new element at cost of only \(\mathcal{O}(\log{N})\), but sacrifices contiguous sorting (limited usefulness) and it sacrifices total speed due to heavy random access.
8.2 Incremental divide&conquer
A generic strategy to obtain a amortized incremental Divide&Conquer algorithm is to convert it into a data-structure on which the algorithm can be paused and resumed. Once that is available, it is possible to run it with interruptions and inject the new data at that position where the algorithm naturally would pick it up first. With this strategy each batch-algorithm hat its incremental dual.
For example the incremental dual of Quicksort is the Incremental-Quicksort of Navarro and Paredes (2010) with \(\mathcal{O}(N\log{N})\) amortized cost. By applying the greeNsort® innovations one obtains a Incremental-Zacksort21 with \(\mathcal{O}(N\log{D})\) amortized cost, i.e. cheaper when duplicates reduce the number of distinct values such that \(D < N\). Navarro and Paredes (2010) turned Incremental-Quicksort into a fully-fledged priority queue named Quickheaps. By combining with Incremental-Zacksort one obtains Zackheaps with efficient tie-handling22.
Of course Quickheaps and Zackheaps inherit all weaknesses of Quicksort (see Split&Merge Algorithms). The incremental dual of Frogsort – Incremental-Frogsort – by contrast is stable, can handle elements of varying size and has deterministic \(\mathcal{O}(N\log{N})\) amortized cost23. Different from an incremental Mergesort that uses up to 100% buffer for maintaining a hierarchy of sorted runs, Incremental-Frogsort needs only 50% buffer. Different from Librarysort with its randomized gaps, Incremental-Frogsort has deterministic guarantees due to its (F)lip (R)ecursive (O)rganized (G)aps (FROG).
Like incremental Mergesort, Incremental-Frogsort can be turned into a fully-fledged sorted dictionary named Frogsteps24. Note that the maximum memory requirement of Frogsteps is lower with 150% (compared to 200%), but the average memory requirement is not lower: when growing the structure to the next duplication, the traditional method must allocate 2x the current data size, Frogsteps must allocate 3x the current data size. Furthermore Frogsteps needs an extra copy to migrate from the old to the grown memory, whereas the traditional method can merge into the grown memory, hence is always nocopy. But size duplication is a rare event in the lifetime of such a dictionary, during normal operation, Frogsteps benefits from reduced comparison and move-distance.
Such merge-based dictionaries do more work at insertion time than partition-based priority queues, and need less work for many queries, in other words: merge-based dictionaries have better read-amplification, partition-based priority queues have better write-amplification.
Such amortized data-structures and algorithms can further be developed into real-time indexes, such as Lazy Search Trees (Sandlund and Wild (2020)) or LSM-Trees (O’Neil et al. (1996)).
8.3 Incremental conclusion
The purpose of this section was to demonstrate that the greeNsort® innovations are orthogonal to a large body of existing literature and code which refers to binary Divide&Conquer sorting. Potentially all of these existing artefacts of the prior-art could be checked and adapted for the symmetric greeNsort® innovations. This could duplicate the number of these artefacts. Marketing on: greeNsort® is a contribution to exponential growth in scientific publications. Marketing off: realistically, greeNsort® is a contribution to correcting and optimizing, rather than duplicating human knowledge, most importantly when it comes to textbooks on data-structures and algorithms. But still: a fundamental an huge tidy-up project in human knowledge. Plus: I announce yet another plan for a new incremental sorting algorithm.