Saturday, January 31st, 2009...09:31
Quicksort in the Erlang Shell
It seems that one of the canonical examples of the power/simplicity/elegance etc. of functional languages/recursion tends to be an implementation of Quicksort. In Pragmatic Programming Erlang it appears on page 62.
qsort([]) -> []; qsort([Pivot|T]) -> qsort([X || X <- T, X < Pivot]) ++ [Pivot] ++ qsort([X || X <- T, X >= Pivot]).
The author notes this use of ++ is not efficient, nor is it considered good programming practice; and there are other reasons why this is not an optimal implementation of Quicksort.
I decided to try and fix some of the implementation faults to see how a “real” implementation of Quicksort looked in Erlang. At this point I was working purely in the Erlang shell, and was surprised when my initial conversion failed to compile:
QSort = fun([]) -> []; ([Pivot|T]) -> QSort([X || X <- T, X < Pivot]) ++ [Pivot] ++ QSort([X || X <- T, X >= Pivot]) end.
This results in ** 2: variable ‘QSort’ is unbound **
It turns out recursion in the shell is funky because the function being defined cannot invoke itself before it’s been bound to a variable. Compilation of modules don’t seem limited by this. In order to perform recursion, a Y Combinator can be used, e.g.
QSort1 = fun(L) -> Q = fun([],F) -> []; ([Pivot|T],F) -> F([X || X <- T, X < Pivot],F) ++ [Pivot] ++ F([X || X <- T, X >= Pivot],F) end, Q(L,Q) end.
Here QSort1 is bound to a function which takes a list and returns the output of Q. Q is bound to a function which takes a list and a function F. Once Q is defined (note the comma), it can be invoked with itself as the value of F.
At this point, nothing has been done to remove the inefficiencies of the original implementation, one of which is the use of list comprehensions. This causes 2 iterations though the tail of the list, T, when dividing the list into 2 sub-lists. Only one is necessary if an accumulator pattern is used (unfortunately necessitating another Y Combinator):
QSort2 = fun(L) -> Q = fun([],F) -> []; ([Pivot|T],F) -> S = fun([], Pivot, Less, More, FF) -> {Less, More}; ([H|T], Pivot, Less, More, FF) -> case(H < Pivot) of true -> FF(T, Pivot, [H|Less], More, FF); false -> FF(T, Pivot, Less, [H|More], FF) end end, {Lf,Rt} = S(T, Pivot, [], [], S), F(Lf,F) ++ [Pivot] ++ F(Rt,F) end, Q(L,Q) end.
S, is used to divide the tail of the list, T, into a tuple containing a list of elements less than the pivot, Lf, and a list with elements greater than the pivot, Rt.
In order to get rid of the ++ operator we’ll…you guessed it, use a Y Combinator along with the accumulator pattern:
QSort3 = fun(L) -> Q = fun([], QF) -> []; ([Pivot|T], QF) -> QA = fun([], Acc, QAF) -> Acc; ([Pivot|T], Acc, QAF) -> S = fun([], Pivot, Less, More, Acc, SF, QAF) -> QAF(Less, [Pivot|QAF(More, Acc, QAF)], QAF); ([H|T], Pivot, Less, More, Acc, SF, QAF) -> case (H < Pivot) of true -> SF(T, Pivot, [H|Less], More, Acc, SF, QAF); false -> SF(T, Pivot, Less, [H|More], Acc, SF, QAF) end end, S(T, Pivot, [], [], Acc, S, QAF) end, QA([Pivot|T], [], QA) end, Q(L,Q) end.
If you’re having trouble following this code (Apologies for the cryptic variable names, but I wanted a certain style of indentation and no lines wrapping. Note, I always use all caps for the last parameters which are functions and only there due the use of Y Combinators), try looking at Quicksort Erlang which has a version suitable for a module i.e. no Y Combinators. I’m certainly not an Erlang expert (currently on page 148 of Pragmatic Programming Erlang), and this code is a bit simpler/readable when placed in a module vs. entered directly into the shell; however, I do wonder why an implementation of Quicksort is used so often to show the elegance of functional languages (I haven’t attempted to continue improving this algorithm via better pivot selection, guarding against the worst case reversed list, etc.). I’m sure one reason is the presumed universal familiarity with Quicksort, but surely if the purpose it so show off things like list comprehensions and recursion, it can be done via some other algorithm which, in the real world, isn’t primarily used for its performance.
Comments are closed.