HN Gopher Feed (2017-06-29) - page 1 of 10 ___________________________________________________________________
Pattern-defeating quicksort
115 points by pettou
https://github.com/orlp/pdqsort___________________________________________________________________
graycat - 48 minutes ago
I always wondered if there would be a way to have quicksort run
slower than O(n ln(n)).Due to that possibility, when I code up a
sort routine, I use heap sort. It is guaranteed O(n ln(n)) worst
case and achieves the Gleason bound for sorting by comparing keys.
For a stable sort, sure, just extend the sort keys with a sequence
number, do the sort, and remove the key extensions.Quicksort has
good main memory locality of reference and a possibility of some
use of multiple threads, and heap sort seems to have neither. But
there is a version of heap sort modified for doing better on
locality of reference when the array being sorted is really
large.But, if are not too concerned about memory space, then don't
have to care about the sort routine being in place. In that case,
get O(n ln(n)), a stable sort, no problems with locality of
reference, and ability to sort huge arrays with just the old merge
sort.I long suspected that much of the interest in in-place, O(n
ln(n)), stable sorting was due to some unspoken but strong goal of
finding some fundamental conservation law of a trade off of
processor time and memory space. Well, that didn't really happen.
But heap sort is darned clever; I like it.
[deleted]
atilimcetin - 3 hours ago
Rust's stable sort is based on timsort (https://doc.rust-
lang.org/std/vec/struct.Vec.html#method.sor...) and unstable sort
is based on pattern-defeating quicksort (https://doc.rust-
lang.org/std/vec/struct.Vec.html#method.sor...). The documentation
says that 'It [unstable sorting] is generally faster than stable
sorting, except in a few special cases, e.g. when the slice
consists of several concatenated sorted sequences.'
k__ - 3 hours ago
how much faster?
sgift - 3 hours ago
Here are a few benchmarks results from a recent rayon
pull:https://github.com/nikomatsakis/rayon/pull/379So, for a
dual-core with HT 8.26s vs 4.55s
jkabrg - 3 hours ago
[Post-edit] I made several edits to the post below. First, to make
an argument. Second, to add paragraphs. [/Post-edit]Tl;dr version:
It seems to me you should either use heapsort or plain quicksort;
the latter with the sort of optimisations described in the linked
article, but not including the fallback to heapsort.Long
version:Here's my reasoning for the above:You're either working
with lists that are reasonably likely to trigger the worst case of
randomised quicksort, or you're not working with such lists. By
likely, I mean the probability is not extremely small.Consider the
case when the worst case is very unlikely: you're so unlikely to
have a worst case that you're gaining almost nothing for accounting
for it except extra complexity. So you might as well only use
quicksort with optimisations that are likely to actually help.Next
is the case that a worst case might actually happen. Again, this is
not by chance; it has to be because someone can predict your
"random" pivot and screw with your algorithm; in that case, I
propose just using heapsort. Why? This might be long, so I
apologise. It's because usually when you design something, you
design it to a high tolerance; a high tolerance in this case ought
to be the worst case of your sorting algorithm. In which case, when
designing and testing your system, you'll have to do extra work to
tease out the worst case. To avoid doing that, you might as well
use an algorithm that takes the same amount of time every time,
which I think means heapsort.
nightcracker - 2 hours ago
The overhead of including the fallback to heapsort takes a
negligible, non-measurable amount of processing time that
guarantees a worst case runtime of O(n log n), and to be more
precise, a worst case that is 2 - 4 times as slow as the best
case.Your logic also would mean that any sorting function that is
publicly facing (which is basically any interface on the
internet, like a sorted list of Facebook friends) would need to
use heapsort (which is 2-4 times as slow), as otherwise DoS
attacks are simply done by constructing worst case inputs.There
are no real disadvantages to the hybrid approach.
jkabrg - 2 hours ago
Thanks for your reply.> Your logic also would mean that any
sorting function that is publicly facing (which is basically
any interface on the internet, like a sorted list of Facebook
friends) would need to use heapsort (which is 2-4 times as
slow), as otherwise DoS attacks are simply done by constructing
worst case inputs.Why is that a wrong conclusion? It might be,
I'm not a dev. But if I found myself caring about that sort of
minutiae, I would reach exactly that conclusion.Reasons:* the
paranoid possibility that enough users can trigger enough DoS
attacks that your system can fall over. If this is likely
enough, maybe you should design for the 2-4x worst case, and
make your testing and provisioning of resources easier.* a
desire for simplicity when predicting performance, which you're
losing by going your route because you're adding the
possibility of a 2-4x performance drop depending on the content
of the list. Ideally, you want the performance to solely be a
function of n, where n is the size of your list; not n and the
time-varying distribution of evilness over your users.Finally,
adding a fallback doesn't seem free to me, because it might
fool you into not addressing the points I just made. That
O(n^2) for Quicksort might be a good way to get people to
think; your O(n log n) is hiding factors which don't just
depend on n.
sqeaky - 1 hours ago
Presuming your internet based users will enter malicious
input is not paranoia, it is the only sane starting place.
beagle3 - 4 hours ago
Anyone knows how this compares to Timsort in practice?A quick
google turns out nothing
mastax - 3 hours ago
https://github.com/rust-lang/rust/pull/40601"stable" is a
simplified Timsort: https://github.com/rust-
lang/rust/pull/38192"unstable" is a pdqsort
stjepang - 2 hours ago
To summarize:If comparison is cheap (e.g. when sorting
integers), pdqsort wins because it copies less data around and
the instructions are less data-dependency-heavy.If comparison
is expensive (e.g. when sorting strings), timsort is usually a
tiny bit faster (around 5% or less) because it performs a
slightly smaller total number of comparisons.
nneonneo - 4 hours ago
I would love to see the benchmark results against Timsort, the
Python sorting algorithm that also implements a bunch of pragmatic
heuristics for pattern sorting. Timsort has a slight advantage over
pdqsort in that Timsort is stable, whereas pdqsort is not.I see
that timsort.h is in the benchmark directory, so it seems odd to me
that the README doesn't mention the benchmark results.
nightcracker - 3 hours ago
There are multiple reasons I don't include Timsort in my README
benchmark graph:1. There is no authoritative implementation of
Timsort in C++. In the bench directory I included
https://github.com/gfx/cpp-TimSort, but I don't know the quality
of that implementation.2. pdqsort intends to be the algorithm of
choice of a system unstable sort. In other words, a direct
replacement for introsort for std::sort. So std::sort is my main
comparison vehicle, and anything else is more or less a
distraction. The only reason I included std::stable_sort in the
benchmark is to show that unstable sorting is an advantage for
speed for those unaware.But, since you're curious, here's the
benchmark result with Timsort included on my machine:
http://i.imgur.com/tSdS3Y0.pngThis is for sorting integers
however, I expect Timsort to become substantially better as the
cost of a comparison increases.
jorgemf - 4 hours ago
Where is a high level description of the algorithm? How is it
different from quick sort, it seems quite similar based on a quick
observation of the code.
klodolph - 4 hours ago
The readme file actually contains a fairly thorough description
of how it differs from quicksort. Start with the section titled
"the best case".
jorgemf - 2 hours ago
> On average case data where no patterns are detected pdqsort
is effectively a quicksort that uses median-of-3 pivot
selectionSo basically is quicksort with a bit more clever pivot
selection, but only for some cases.
stjepang - 1 hours ago
You're forgetting probably the most important optimization:
block partitioning. This one alone makes it almost 2x faster
(on random arrays) than typical introsort when sorting items
by an integer key.
wiz21c - 4 hours ago
Is there a analysis of its complexity ? The algorithm looks very
nice !
nightcracker - 4 hours ago
Hey, author of pdqsort here, the draft paper contains complexity
proofs of the O(n log n) worst case and O(nk) best case with k
distinct keys: https://drive.google.com/open?id=0B1-vl-
dPgKm_T0Fxeno1a0lGT0...
ouid - 2 hours ago
Best case? Give worst and average case when describing
complexities.
teraflop - 35 minutes ago
There's not much value in discussing the asymptotic worst-
case or average-case performance of sorting algorithms,
because just about every algorithm worth talking about is
already optimal in that regard.The interesting performance
differences are all about constant factors.
nightcracker - 2 hours ago
I already did, you may want to re-read my comment.
ouid - 39 minutes ago
Am I misinterpreting your usage of "best case"?
nightcracker - 31 minutes ago
Without enlightening me on what your interpretation is, I
have no way of telling.pdqsort has a worst case of O(n
log n). That means, no matter what, the algorithm never
takes more than a constant factor times n log n time to
complete.Since pdqsort is strictly a comparison sort, and
comparison sorts can do no better than O(n log n) in the
average case, pdqsort is asymptotically optimal in the
average case (because the average case can never be worse
than the worst case).On top of the above guarantees, if
your input contains only k distinct keys, then pdqsort
has a worst case complexity of O(nk). So when k gets
small (say, 1-5 distinct elements), pdqsort approaches
linear time. That is pdqsort's best case.
j_s - 1 hours ago
Has HN ever discussed the possibilities when purposely crafting
worst-case input to amplify a denial-of-service attack?
nightcracker - 35 minutes ago
If whoever you're targeting uses libc++, I already did the
analysis: https://bugs.llvm.org/show_bug.cgi?id=20837To my
knowledge it's still not fixed.
dpcx - 2 hours ago
Question as a non low-level developer, and please forgive my
ignorance:How is it that we're essentially 50 years in to writing
sorting algorithms, and we still find improvements? Shouldn't
sorting items be a "solved" problem by now?
jorgemf - 2 hours ago
As far as I understood, the base algorithm is the same which has
an average case of n log(n). The new algorithms only try to
improve the pivot selection to avoid the worst cases and try to
be better than the average case for most practical cases. But at
the end there are not new algorithms improving the limit of n
log(n).
paulddraper - 1 hours ago
This is an improvment in practical cases, not theoretical ones.
stjepang - 1 hours ago
Basically all comparison-based sort algorithms we use today stem
from two basic algorithms: mergesort (stable sort, from 1945) and
quicksort (unstable sort, from 1959).Mergesort was improved by
Tim Peters in 2002 and that became timsort. He invented a way to
take advantage of pre-sorted intervals in arrays to speed up
sorting. It's basically an additional layer over mergesort with a
few other low-level tricks to minimize the amount
memcpying.Quicksort was improved by David Musser in 1997 when he
developed introsort. He set a strict worst-case bound of O(n log
n) on the algorithm, as well as improved the pivot selection
strategy. And people are inventing new ways of pivot selection
all the time. E.g. Andrei Alexandrescu has published a new method
in 2017[1].In 2016 Edelkamp and Wei? found a way to eliminate
branch mispredictions during the partitioning phase in
quicksort/introsort. This is a vast improvement. The same year
Orson Peters adopted this technique and developed pattern-
defeating quicksort. He also figured out multiple ways to take
advantage of partially sorted arrays.Sorting is a mostly "solved"
problem in theory, but as new hardware emerges different aspects
of implementations become more or less important (cache, memory,
branch prediction) and then we figure out new tricks to take
advantage of modern hardware. And finally, multicore became a
thing fairly recently so there's a push to explore sorting in yet
another direction...[1] http://erdani.com/research/sea2017.pdf
agumonkey - 40 minutes ago
Thanks for the alexandrescu paper
DiThi - 1 hours ago
One of the problems is that hardware changes. Long time ago
memory was very limited and there was virtually no cost with
branching. Now we have very complex pipelined architecture with
branch prediction, many levels of cache, microcode, etc. And
memory is plenty.
contravariant - 14 minutes ago
What makes things tricky is that there are a couple of common
cases that can be sorted in O(n), and that more complicated
algorithms might have better asmyptotic behaviour, while being
worse for small or even moderately large lists.To make matters
worse there are also more specific sorting algorithms like radix
sort, which can be even faster in cases where they can be used.
stjepang - 2 hours ago
I think it's fair to say that pdqsort (pattern-defeating quicksort)
is overall the best unstable sort and timsort is overall the best
stable sort in 2017, at least if you're implementing one for a
standard library.The standard sort algorithm in Rust is timsort[1]
(slice::sort), but soon we'll have pdqsort as well[2]
(slice::sort_unstable), which shows great benchmark numbers.[3]
Actually, I should mention that both implementations are not 100%
equivalent to what is typically considered as timsort and pdqsort,
but they're pretty close.It is notable that Rust is the first
programming language to adopt pdqsort, and I believe its adoption
will only grow in the future.Here's a fun fact: Typical quicksorts
(and introsorts) in standard libraries spend most of the time doing
literally nothing - just waiting for the next instruction because
of failed branch prediction! If you manage to eliminate branch
misprediction, you can easily make sorting twice as fast! At least
that is the case if you're sorting items by an integer key, or a
tuple of integers, or something primitive like that (i.e. when
comparison is rather cheap).Pdqsort efficiently eliminates branch
mispredictions and brings some other improvements over introsort as
well - for example, the complexity becomes O(nk) if the input array
is of length n and consists of only k different values. Of course,
worst-case complexity is always O(n log n).Finally, last week I
implemented parallel sorts for Rayon (Rust's data parallelism
library) based on timsort and pdqsort[4].Check out the links for
more information and benchmarks. And before you start criticizing
the benchmarks, please keep in mind that they're rather simplistic,
so please take them with a grain of salt.I'd be happy to elaborate
further and answer any questions. :)[1] https://github.com/rust-
lang/rust/pull/38192[2] https://github.com/rust-
lang/rust/issues/40585[3] https://github.com/rust-
lang/rust/pull/40601[4]
https://github.com/nikomatsakis/rayon/pull/379
Retric - 2 hours ago
Comparing sorting algo's often says more about your benchmark
than the algo's themselves. Random and pathological are obvious,
but often your dealing with something in between. Radix vs n log
n is another issue.So, what where your benchmarks like?
stjepang - 1 hours ago
That is true - the benchmarks mostly focus on random cases,
although there are a few benchmarks with "mostly sorted" arrays
(sorted arrays with sqrt(n) random swaps).If the input array
consists of several concatenated ascending or descending
sequences, then timsort is the best. After all, timsort was
specifically designed to take advantage of that particular
case. Pdqsort performs respectably, too, and if you have more
than a dozen of these sequences or if the sequences are
interspersed, then it starts winning over timsort.Anyways, both
pdqsort and timsort perform well when the input is not quite
random. In particular, pdqsort blows introsort (e.g. typical
C++ std::sort implementations) out of the water when the input
is not random[1]. It's pretty much a strict improvement over
introsort. Likewise, timsort (at least the variant implemented
in Rust's standard library) is pretty much a strict improvement
over merge sort (e.g. typical C++ std::stable_sort
implementations).Regarding radix sort, pdqsort can't quite
match its performance (it's O(n log n) after all), but can
perform fairly respectably. E.g. ska_sort[2] (a famous radix
sort implementation) and Rust's pdqsort perform equally well on
my machine when sorting 10 million random 64-bit integers.
However, on larger arrays radix sort starts winning easily,
which shouldn't be surprising.I'm aware that benchmarks are
tricky to get right, can be biased, and are always
controversial. If you have any further questions, feel free to
ask.[1]: https://github.com/orlp/pdqsort[2]:
https://github.com/skarupke/ska_sort
etep - 36 minutes ago
Hi stjepang,If the following criteria is met, then perhaps the
branch mis-predict penalty is less of a problem: 1. you are
sorting a large amount of data, much bigger than the CPU LLC 2.
you can effectively utilize all cores, i.e. your sort algorithm
can parallelize Perhaps in this case you are memory bandwidth
limited. If so, you are probably spending more time waiting on
data than waiting on pipe flushes (i.e. consequence of mis-
predicts).
stjepang - 19 minutes ago
Absolutely - there are cases when branch misprediction is not
the bottleneck. It depends on a lot of factors.Another such
case is when sorting strings because every comparison causes a
potential cache miss and introduces even more branching, and
all that would dwarf that one misprediction.