Here we comment on the energy efficiency status of sorting in some programming languages and recommend actions for improvments.

We welcome comments on other languages, corrections, updates and - most importantly - improvements

Compatibility means deliberately repeating other people’s mistakes — David Wheeler

All the following programming languages had their reasons to shape their sorting APIs and implementations as they did, however, that was before the climate crisis. Now programming languages as a whole are measured by their energy efficiency, see Energy Efficiency across Programming Languages, and - no surprise - it turns out that some languages seem more efficient than others, particularly compiled languages, but also Java. Sadly, some of the languages gaining popularity for expensive big-data applications are interpreted (*R* and *Python*). If - for some reason - people need to stick with those instead of switching to more efficient languages (e.g. Julia), then it becomes even more important to maximize the efficiency of the compiled sorting libraries used.

Three types of actions can improve sorting efficiency in standard libraries:

improving implementations (within unchanged interfaces)

enhancing interfaces (new parameters, new interfaces)

replacing interfaces (changing existing interfaces for the better)

gives quick wins, 2. should be doable, and perhaps even 3.

Before we comment on single languages, we shortly

- recall the most important
*greeNsort*^{®}*algorithms* - list the
*quick win actions*for each language - describe the
*ideal interfaces* - comment on
*string sorting* - comment on
*multithreading*

Insensitive choice of algorithms and data structures lead to significant energy wasting — Fethullah Goekkus

^{1}

The following *greeNsort ^{®}* algorithms are available to create better implementations:

- stable 100% buffer
*Omitsort*or*Octosort*are efficient adaptive (\(\mathcal{O}(N)\)) replacements for*Mergesorts*(and faster than*Timsort*in the \(\mathcal{O}(N\log{N})\) cases) - stable 100% buffer
*Kiwisort*or*Swansort*are replacements for*Mergesorts*when \(\mathcal{O}(N\log{D})\) adaptivity to ties is paramount - stable ≤50% buffer
*Frogsort**Geckosort*and*Squidsort*are efficient memory-parsimoneous replacements for*Mergesorts*(incl.*Timsort*) - stable √N buffer *Walksort** or
*Jumpsort*are efficient replacements for*in-place Mergesort*,*Grailsort*and similar algorithms - unstable 0% buffer
*ZacksortB*or*ZucksortB*are efficient replacements for Quicksort with cheap \(\mathcal{O}(N\log{D})\) adaptivity to ties

For more information on the algorithms see the *Portfolio*.

The following quick-wins can be achieved by improving implementations:

Opportunity | Action | |
---|---|---|

C |
faster sorting still adaptive to ties | replace Quicksort3 with ZacksortB |

C++ |
faster sorting with the same amount of memory or simply save memory | replace Quicksort2 with ZacksortB or Pdqsort,at 0%-buffer replace inplace-mergesort with Grailsort,at negligible sqrt(N) buffer add Walksort,at parametrizable 14% buffer add Frogsort2,replace 50%-buffer Mergesort with Frogsort1,replace 100%-buffer Mergesort with the more adaptive Omitsort |

Clojure |
see Java | see Java |

Groovy |
see Java | see Java |

Java |
faster sorting still adaptive to presorting, faster sorting still adpative to ties |
replace Timsort with Squidsort1,replace Dupisort with ZacksortB |

JRuby |
see Java | see Java |

Kotlin |
see Java | see Java |

Python |
faster sorting still adaptive to presorting | replace Timsort with Squidsort1 |

Rust |
faster sorting still adaptive to presorting | replace Timsort with Squidsort1 |

Scala |
faster sorting still adaptive to ties, half buffer and half copying, see Java |
replace Quicksort3 with ZacksortB,replace Copysort with Frogsort1,see Java |

After quick-wins, more can be saved by creating optimal interfaces.

- an ideal interface should rather take declarative specification arguments (e.g. need-stable Yes/NO) than imperative arguments (use-quicksort)
- an ideal interface should choose the most efficient algorithm to fulfill the specification
- an ideal interface should choose unstable inplace algorithms when sorting keys-only where stability doesn’t matter (Zacksort)
- an ideal interface should allow to specify a key-function or a comparison function (or none of them)
- an ideal interface should have energy-efficient defaults, such as using radix-sorts for atomic types and strings (Skasort)
- an ideal interface should allow specification of user-preferences regarding memory-speed-trade-offs (to choose between Walksort, Frogsort2 and Frogsort1)
- an ideal interface should implement algorithms for equally-sized elements (Frogsort) and size-varying elements (VFrogsort) or (XFrogsort)
- an ideal interface should explicit prior knowledge about structure in the input data (Squidsort1 and Squidsort2, Swansort)
- an ideal interface should provide efficient string sorting, with a radix-sorting default (Skasort) and a size-varying method that is adaptive to duplicates VSwansort), see below
- an ideal interface should allow parallel execution (with a user-specified limit on number of threads, processes, machines), see below

It strongly depends on the programming language, how such interfaces look in detail and whether need just enhancement or replacement.

Any frequently executed code that manipulates strings is fertile ground for optimization — Kurt Guntheroth

^{2}

**String sorting is unduly expensive. The reason is localization**: respecting the locale order of letters prevents us from using a simple efficient radix sort on strings. Instead we use a more expensive comparison sort, and for each comparison we call an expensive function handling dozens of special cases of the given locale. We found localized string sorting with C++ `std::sort`

up to 40 times more expensive than radix sorting with *Skasort*. The most effective way to reduce the cost of string sorting would be an international agreement on a character set (UTF-8) and no locale, i.e. simple byte order.

All problems in computer science can be solved by another level of indirection … except for the problem of too many layers of indirection — David Wheeler

**There is another reason that string sorting is unduly expensive: The prior art often sorts strings indirectly by moving pointers to strings**. This seems clever because - if the pointer is smaller than the string - it can theoretically save movements costs in true random-access-memory. Also it is convenient to use *Quicksort* with pointers, because we know *Quicksort* to be fast and it cannot handle size-varying strings, just equally-sized pointers. However, in todays deep cache-hierarchies this approach doesn’t scale to big \(N\): access costs depend on access distance, moving pointers instead of strings implies random-access to the memory where strings are actually stored, whether or not the strings are finally re-arranged. Suitably implemented *Mergesort* can directly move strings during the merge-passes, this can easily reduce sorting time and energy by 33%. And the *greeNsort ^{®}* algorithm

**In summary: direct radix-sorting of strings seems most promising. Where comparison sorting cannot be avoided greeNsort^{®} algorithms are an alternative, either direct Frogsort, Geckosort, Squidsort or indirect Zacksort, Zucksort.**

Multi-threading your software delivers better performance and as well as better energy-efficiency — Intel

^{3}

Not all algorithms parallelize equally well. For example *Mergesort* with 100% buffer can be multithreaded very well, not only with parallel branch execution but also with parallelizing the the work phase (parallel merge).

By contrast, the work-phase of *Quicksort* (partitioning) has a strong serial character: parallel inplace-partitioning is possible, but complicated and can cost extra move operations on certain input patterns, see for example Tsigas & Zhang. Parallel partitioning gets easier when 100% buffer memory is allowed, however, there are still challenges regarding work distribution due to random pivots, particularly when ties occur. From a practical point of view, it seems beneficial to not insist of parallelizing pure Quicksorts or partitioning sorts, it is much easier to follow the approach in Java 8 and use parallel binary *Mergesort* (better *Frogsort* or a parallel *Multiway-Mergesort* with help of a *LoserTree* at the top levels of the divide&conquer process and switch to e.g. *Quicksort* (better *ZacksortB*) once a branch becomes small enough to run it on a single thread.

**In summary: parallel branching is easy, parallel workphase is harder: parallel merging is a practical approach to parallelizing the work phase in both, merging sorts and in the top-levels of partitioning sorts, greeNsort^{®} algorithms help to lower the costs of parallel merging.**

C is quirky, flawed, and an enormous success — Dennis Ritchie

*C* is a compiled language and many other languages have interfaces to C. C has a standard library function `qsort`

which takes four parameters

- base − This is the pointer to the first element of the array to be sorted
- nitems − This is the number of elements in the array pointed by base
- size − This is the size in bytes of each element in the array
- compar − This is the function that compares two elements and returns a tri-boolean {-1,0-1}

**Limitations of the interface** are:

- does not sort stable
- does sort only equally sized elements (sorting strings indirectly via pointers results in expensive random-access)
- does not sort multi-threaded
- the tri-boolean comparison function requires too much work for always returning {-1,0-1} although a simple boolean would be enough
- implicit: implementation used to be inplace and deterministic (difficult to switch to probabilistic)

Underlying is a *Quicksort3* and **limitations of the implementation** are:

- average \(\mathcal{O}(N \log(N))\) but \(\mathcal{O}(N^2)\) worst case (unlike
*Introsort*) - best \(\mathcal{O}(N \log(D))\) case on ties but costs many extra operations, hence energy waste on non-tied data (unlike true binary
*Quicksort2*) - no block-tuning reducing branch-misprediction (unlike Edelkamp&Weiss
*Block-Quicksort*) - no acceleration for presorting (unlike
*Pdqsort*)

Opportunity for **improving the implementation** within the given interface:

- use
*ZacksortB*giving the same performance guarantees but identifying ties at cost of only one extra comparison per partitioning branch - possibly guarantee \(\mathcal{O}(N \log(D))\) worst case by safeguarding
^{5}with*Heapsort*

- possibly tune for presorting along the lines of
*Pdqsort*

Opportunity for **enhancing the interface**, all the following interfaces would benefit from cheaper boolean operators (e.g. `LESS`

) instead of more expensive tri-boolean comparators: - offer a new interface for *unstable* sorting of equally-sized elements (*ZacksortB*), also multi-threaded - offer a new interface for *stable 50%-buffer* sorting of equally-sized elements (*Frogsort1*), also multi-threaded - offer a new interface for *stable low-buffer* sorting of equally-sized elements (*Frogsort2*) with an parameter for the buffer fraction^{6} - offer a new interface for *stable sqrt-buffer* sorting of equally-sized elements (*Walksort*)^{7} - offer a new interface for *stable varying* sorting of size-varying elements (or at least for efficient string-sorting in arbitrary collation), also multi-threaded - offer a new interface for *specific* sorting of atomic types, also multi-threaded

There are only two kinds of languages: the ones people complain about and the ones nobody uses — Bjarne Stroustrup

*C++* is an compiled language with template support for many data-types, many other languages have interfaces to C++. C++ has two standard-library sorting functions, unstable `std::sort`

and `std::stable_sort`

, furthermore C’s `qsort`

is available. Since C++11 the implementations must guarantee a \(\mathcal{O}(N \log(N))\) worst case.

`std::stable_sort`

has a clever implementation of *Mergesort* that combines three types of merging (inplace-merge with \(\mathcal{O}(N \log(N))\) merge-cost, halfcopy-merge copying data to 50% buffer and merging back, nocopy-merge merging forward to 100% buffer). The algorithm tries to achieve maximum speed by allocating 100% buffer space, if less is obtained it starts with a more expensive merge and switches to a cheaper one as soon as the available buffer is big enough relative to the problem size in the recursion; for example the biggest top-level-merge is done with inplace-merge, the next level with halfcopy-merge and then nocopy-merge. We found the following **limitations of the implementation**:

- it optimizes for time, but not for footprint, this is good on an idle desktop but not good on a busy server
- it may allocate memory that is never used, if for example 75% memory are obtained, only 50% are used
^{9} - with little available buffer it is an expensive inplace-mergesort with \(\mathcal{O}(N \log(N)^2)\) although \(\mathcal{O}(N \log(N))\) would be possible with only \(\mathcal{O}(\sqrt{N})\) buffer.
- with 50% buffer there is too much copying, even in the prior art it is known how to avoid this, see Ininsort.
- despite of 100% buffer a best case of \(\mathcal{O}(N)\) for presorted data is not achieved
- the 50% buffer half-merge can not run concurrently on both branches

Opportunity for **improving the implementation** within the given interface:

- with 100% buffer
*Omitsort*^{10}provides adaptivity with a \(\mathcal{O}(N)\) best case unlike the*nocopy Mergesort* - with 50% buffer
*Frogsort1*avoids unnecessary copying compared to*halfcopy Mergesort* - with parametrizable 14% buffer
*Frogsort2*has peak sustainability and can be faster than any other algorithm^{11} - with negligible \(\mathcal{O}(\sqrt{N})\) buffer
*Walksort*is even more faster than*inplace Mergesort* - with 0% buffer
*Grailsort*is much faster than*inplace Mergesort*

`std::sort`

is using *Introsort*, based on a simple *Quicksort* with a pivot deterministically chosen as a middle element. In order to avoid vulnerability algorithmic complexity attacks (or simply unlucky input that leads to quadratic runtime or worse) if the recursion depth gets too deep it falls back on *Heapsort*. We found the following **limitations of this implementation**:

- no acceleration to expected \(\mathcal{O}(N \log(D))\) in case of tied input
- no graceful degradation
^{12}if falling back to*Heapsort* - no block-tuning reducing branch-misprediction (unlike Edelkamp&Weiss
*Block-Quicksort*) - no acceleration for presorting (unlike
*Pdqsort*) - implicit: implementation used to be inplace and deterministic (difficult to switch to probabilistic)

Opportunity for **improving the implementation** within the given interface:

std::sort in libc++ of the clang compiler uses s different implementation that covers more special cases and is tuned for tied and presorted data. Unfortunately it seems to contain a vulnerabilty for quadratic runtime reported 2014 by Orson Peters and has *not* been fixed since then^{13}. **Hence libc++ std::sort violates the C++ standard and cannot be recommended for production use.**

Since C++17 the standard specifies parallel sorting with a serial and multi-threaded policy, since C++20 it adds a SIMD-policy. The GCC 9 implements the parallel Standard Template Library (STL) via Intel’s *Thread Building Blocks (TBB)* library. However, the TBB has no stable sort and its unstable sort was incompletely parallelized (parallel branching but not parallel partitioning). In 2018 Intel open-sourced its *Parallel Standard Template Library (PSTL)* which implements a stable merge sort which parallelizes the branches and the merging. Now the *llvm PSTL* has the following **limitations of the implementation**:

- the parallel merge of the PSTL does a primitive binary split of the merge task which does only guarantee problem size reduction of 1/4
- the unstable parallel sort seems to call the stable parallel sort (and hence does not exploit the dropped stability-requirement)
- the stable parallel sort requires 100% buffer (although 50% would be enough)

Opportunities for **improving the implementation** within the given interface:

*Frogsort1*allows a similar implementation with similar complexity but with only 50% buffer optionally use*Squidsort1*trading higher code-complexity for adaptivity- k-threaded unstable tie-adaptive sorting could be implemented by parallel sorting k parts of the data with
*ZacksortB*, then allocating 50% buffer and parallel merging those using*Frogsort1*techniques.^{14}

Opportunities for **enhancing the interface**:

- the current interface could be overloaded with a new parameter specifying the desired fraction of buffer, this would allow to benefit from the optimal trade-off that
*Frogsort2*provides between 50% buffer and little or no buffer^{15}. - the current interfaces could be overloaded with a new parameter alternative to the comparison-function: a function for key-extraction that could be used in a radix-sort along the lines of
*Skasort*.

Opportunities for **improving the interface**:

*specific*sorting could be made the default for atomic types when users do neither provide a*comparison-function*nor a*key-extraction-function*.- offer a new interface for stable sustainable sorting of variable-sized elements, and optionally multi-threaded

I wanted a language as acceptable as Java or C#, but supporting a much simpler programming model — Rich Hickey

*Closure* is a modern, dynamic, and functional dialect of the Lisp programming language on the Java platform, it uses the sorting library of *Java*.

Groovy’s sort ultimately calls Java’s Collections’ sort— Guillaume Laforge

*Groovy* is a Java-syntax-compatible object-oriented programming language for the Java platform, it uses the sorting library of *Java*.

Java is C++ without the guns, clubs and knives — James Gosling

*Java* is a class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. Java 7 introduced *Dual-Pivot-Quicksort* for primitive types and *Timsort* for object-types behind its `sort`

API, and Java 8 introduced the `Arrays.parallelSort`

API, which uses *Mergesort* for the parallel top-levels of recursion. From the *greeNsort ^{®}* perspective these are

*Dual-Pivot-Quicksort*is nicely adaptive to ties but it does not reach the speed of*Block-Quicksort*for sorting untied data (and implementation complexity is too high)*Timsort*is an algorithm extremely spezialized on presorted data, for sorting unsorted data it is too expensive (and implementation complexity is too high), see the*deep-dive*in the results section. Furthermore the Java documentation encourages inappropriate use of*Timsort*which dramatically waists energy^{16}.*parallelSort*requires 100% buffer (and its parallel binary merging allows imbalance up to 25:75%)

Opportunities for **improving the implementation** within the given interface:

*Dual-Pivot-Quicksort*can be replaced with*ZacksortB*which is simpler, faster and as adaptive.*Timsort*can be replaced with*Squidsort1*which is simpler, faster and also very adaptive (sometimes more, sometimes less)*parallelSort*can be replaced with a parallel-version of*Frogsort1*which requires only 50% buffer space (and using a global median would guarantee 50:50% balance)

Opportunities for **enhancing the interface**:

- the current interface could be overloaded with a new parameter specifying the desired fraction of buffer, this would allow to benefit from the optimal trade-off that
*Squidsort2*provides^{17}. - the current interfaces could be overloaded with a new parameter alternative to the comparison-function: a function for key-extraction that could be used in a radix-sort along the lines of
*Skasort*.

Opportunity for **enhancing the interface**, all the following interfaces would benefit from cheaper boolean operators (e.g. `LESS`

) instead of more expensive tri-boolean comparators:

- offer a new interface for
*unstable*sorting of equally-sized elements (*Zacksort*), also multi-threaded - offer a new interface for
*stable 50%-buffer*sorting of equally-sized elements (*Frogsort1*), also multi-threaded - offer a new interface for
*stable low-buffer*sorting of equally-sized elements (*Frogsort2*) with an parameter for the buffer fraction^{18} - offer a new interface for
*stable sqrt-buffer*sorting of equally-sized elements (*Walksort*)^{19} - offer a new interface for
*stable varying*sorting of size-varying elements (or at least for efficient string-sorting in arbitrary collation), also multi-threaded - offer a new interface for
*specific*sorting of atomic types, also multi-threaded

The following languages benefit from improvements to the Java sorting library:

JRuby always has and always will try to match CRuby behavior — Charles Nutter

*JRuby* is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java. Actually, *JRuby* switched to *Java*’s sort library in 2016.

Develop with pleasure — Dmitry Jemerov

*Kotlin* is a cross-platform, statically typed, general-purpose programming language with type inference. Kotlin is designed to interoperate fully with Java, and the JVM version of Kotlin’s standard library depends on the *Java* Class Library.

I stepped down as ‘Benevolent Dictator’, as I was called — Guido van Rossum

*Python* is an interpreted, high-level and general-purpose programming language written in C. Python2 supported a *cmp* parameter to handle user specified comparison functions, in python3 this was completely abandoned for a *key* parameter to use specified key-extraction functions. The python3 API has two methods `list.sort()`

which sorts an object *insitu* and a `sorted()`

method that sorts *exsitu* and does not modify its input.

These are **limitations of the interface**:

- although a stable
*Natural-Mergesort*is used, the sort is not generic because it no longer takes a comparison function - the python documentation promotes sorting by multiple keys, but the interface does force users to sort multiple times which is much more expensive than using multiple keys in one comparison function
^{20}

These are **limitations of the implementation**:

*Timsort*is slower on non-sorted data than nocopy-*Mergesort**Timsort*, although an algorithm extremely specialized on presorted data, is only faster than*Mergesort*, when very little sorting work is left to be done, see*deep-dive*in the results section*Timsort*is*not*adaptive to ties*Timsort*is inherently serial

Opportunities for **improving the implementation** within the given interface:

*Timsort*can be replaced with*Squidsort1*which is simpler, faster and also very adaptive (sometimes more, sometimes less)- for atomic key-extraction functions
*specific*sorting algorithmns could be used such as radix-sort along the lines of*Skasort*

Opportunities for **enhancing the interface**:

- the current interface could be overloaded with a new parameter specifying the desired fraction of buffer, this would allow to benefit from the optimal trade-off that
*Squidsort2*provides^{21}. - the current interfaces could be overloaded with a new parameter alternative to the key-extraction-function: a comparison-function that would allow fully generic sorting
- the current interfaces could be overloaded with a new parameter for the number of threads used for parallel sorting

Opportunity for **enhancing the interface**:

- offer a new interface for
*unstable*sorting of equally-sized elements (*Zacksort*), also multi-threaded - offer a new interface for
*stable sqrt-buffer*sorting of equally-sized elements (*Walksort*)^{22} - offer a new interface for
*stable varying*sorting of size-varying elements (or at least for efficient string-sorting in arbitrary collation), also multi-threaded

Less-well-known things I’m very proud that rust shipped 1.0 without — Graydon Hoare

*Rust* is a systems programming language similar to C++ or Java but *memory-save* without a *garbage-collector*. It has sorting APIs for stable and unstable sorting using a comparison-function or a key-extraction-function. Stable `sort`

uses a lean version of *Timsort* without *galloping-mode* and `sort_unstable`

uses *Pdqsort*. We found the sorting libraries slower than C/C++ implementations.

**Limitations of the implementation** are:

*Pdqsort*uses more operations than needed which can slow it down with an expensive comparison function*Timsort*still suffers from a merging that costs to many moves and and potentially too imbalanced merging, see*Python*and the*results*section.

Opportunity for **improving the implementation** within the given interface:

- use
*ZacksortB*instead of*Pdqsort*to reduce ‘low-overhead’ to ‘minimum-overhead’, possibly guarantee \(\mathcal{O}(N \log(D))\) worst case by safeguarding^{23}with*Heapsort*, possibly tune for presorting along the lines of*Pdqsort* - use the nocopy-
*Squidsort1*instead of*Timsort*for a faster adaptive algorithm

Opportunity for **enhancing the interface**:

- enhance the
`sort`

API with an optional parameter for the buffer fraction (*Squidsort2*) and optionally multi-threaded - offer a new interface for stable sustainable sorting of variable-sized elements (or at least for efficient string-sorting in arbitrary collation), and optionally multi-threaded

Opportunities for **improving the interface**:

*specific*sorting could be made the default for atomic types when users do neither provide a*comparison-function*nor a*key-extraction-function*.

It stands for scalable language — Martin Odersky

*Scala* is a general-purpose programming language providing support for both object-oriented programming and functional programming. Scala provides language interoperability with Java, so that libraries written in either language may be referenced directly in Scala or Java code. Scala’s * Sorting object provides convenience wrappers for java.util.Arrays.sort*, see

Obvious **limitations of the Quicksort implementation** are:

- average \(\mathcal{O}(N \log(N))\) but \(\mathcal{O}(N^2)\) worst case of
*Quicksort*(unlike*Introsort*) - best \(\mathcal{O}(N \log(D))\) case on ties but costs many extra operations, hence energy waste on non-tied data (unlike true binary
*Quicksort2*) - no block-tuning reducing branch-misprediction (unlike Edelkamp&Weiss
*Block-Quicksort*) - no acceleration for presorting (unlike
*Pdqsort*)

Obvious **limitations of the Mergesort implementation** are:

- 200% memory instead of 150% memory
- 200% moves instead of 100% moves

Opportunity for **improving the implementation** within the given interface:

- use
*ZacksortB*to get adaptivity to ties without the operation overhead of*Quicksort3* - possibly guarantee \(\mathcal{O}(N \log(D))\) worst case by safeguarding
^{24}with*Heapsort*

- possibly tune for presorting along the lines of
*Pdqsort* - use 50% buffer nocopy-
*Frogsort1*instead of 100% buffer copy-*Mergesort*

Opportunity for **enhancing the interface**:

- offer a new interface for unstable sorting that is happy with a cheaper boolean
`LESS`

comparison function and optionally multi-threaded - offer a new interface for stable sustainable sorting of equally sized elements with an optional parameter for the buffer fraction (
*Frogsort2*) and optionally multi-threaded - offer a new interface for stable sustainable sorting of variable-sized elements (or at least for efficient string-sorting in arbitrary collation), and optionally multi-threaded
- offer a new interface for msb-radix sorting of integer and float types (and ideally for strings in standard collation also), and optionally multi-threaded
- interfaces could be defined that use boolean operators instead of tri-boolean comparators

The limits of my language mean the limits of my world — Ludwig Wittgenstein

University of Zurich, Informatics and Sustainability Research↩︎

October 2020↩︎

not needed when random-pivots are used, since

*Zucksort*have probabilistic convergence guarantees↩︎Multithreading is possible but more complicated↩︎

Multithreading is possible but more complicated↩︎

October 2020↩︎

Related to this, I found a logical glitch:

`__inplace_merge`

tries to get 100% buffer although it never uses more than 50% when calling`std::__merge_adaptive`

↩︎alternatives are the slightly faster but less adaptive

*Knuthsort4*or the slightly slower but more adaptive*Octosort*↩︎measured with sorting doubles↩︎

for

*graceful degradation*in the context of sorting see Implementing Sorting in Database Systems↩︎The maintainers neither fixed thsi the usual way with

*Heapsort*nor did they switched to*Pdqsort*offered by Orson Peters which is now in boost and Rust↩︎A more complicated alternative would be using the completely parallel Quicksort of Tsigas & Zhang based on the implementation of Singler&Sanders in the GCC Parallel Mode, however that is not adaptive to ties. Another complicated alternative would be IPS4o by Axtmann, Witt, Ferizovic & Sanders, but note that the serial version IPS4o is slower than optimal

*Quicksort*algorithms.↩︎Note that parallelizing the low-memory versions will more challenging↩︎

The documentation unteaches the well-founded distinction between \(\mathcal{O}(N)\) merging and \(\mathcal{O}(N\log{N})\) sorting:

*“It is well-suited to merging two or more sorted arrays: simply concatenate the arrays and sort the resulting array”*↩︎Note that parallelizing the low-memory versions will more challenging↩︎

Multithreading is possible but more complicated↩︎

Multithreading is possible but more complicated↩︎

The documentation claims that

*“The Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset”*, but in fact*stability*of*Timsort*enables sorting by multiple keys, taking advantage of a previous sort requires the two keys to be*correlated*, not*orthogonal*, and rather*perfectly correlated*↩︎Note that parallelizing the low-memory versions will more challenging↩︎

Multithreading is possible but more complicated↩︎

not needed when random-pivots are used, since

*Zacksort*have probabilistic convergence guarantees↩︎not needed when random-pivots are used, since

*Zacksort*have probabilistic convergence guarantees↩︎

Copyright © 2020 Dr. Jens Oehlschlägel - All rights reserved - Terms - Privacy - Impressum