# natural mergesort in Data.List

7 messages
Open this post in threaded view
|

## natural mergesort in Data.List

 Dear Cafe, some years ago I formalized the mergesort implementation [1] from http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort(as far as I can tell it didn't change since 2012) in the proof assistant Isabelle/HOL [2]. More specifically, I proved its functional correctness (the result is sorted and contains all elements of the input with exactly the same multiplicities) and that it is a stable sorting algorithm. Very recently I also formalized a complexity result in Isabelle/HOL, namely that the number of comparisons is bounded from above by   n + n * ⌈log 2 n⌉ for lists of length n. For this proof I had to change the definition of "sequences", "ascending", and "descending" slightly. Now here is my question: Does anyone now of reasons why the current implementation of "sequences" is allowed to produce spurious empty lists in its result? The version I used in my formalization only differs in the following three spots:   sequences [x] = [[x]] -- this is the only important change, since   sequences [] = []     -- then the result does not contain empty lists instead of   sequences xs = [xs] and   ascending a as [] = let !x = as [a] in [x] instead of   ascending a as bs = let !x = as [a] in x : sequences bs and   descending a as []  = [a:as] instead of   descending a as bs = (a:as) : sequences bs [1] https://www.isa-afp.org/entries/Efficient-Mergesort.html[2] http://isabelle.in.tum.de/_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post.
Open this post in threaded view
|

## Re: natural mergesort in Data.List

 Hi, Am Dienstag, den 18.09.2018, 10:21 +0200 schrieb Christian Sternagel: > some years ago I formalized the mergesort implementation [1] from > > > http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort> > (as far as I can tell it didn't change since 2012) in the proof > assistant Isabelle/HOL [2]. > > More specifically, I proved its functional correctness (the result is > sorted and contains all elements of the input with exactly the same > multiplicities) and that it is a stable sorting algorithm. just a shameless plug: As part of our hs-to-coq project, we mechanically translated Data.List.sort to Coq, proved its termination (for finite inputs, of course) and functional correctness; https://github.com/antalsz/hs-to-coq/blob/master/examples/containers/theories/SortSorted.v> Very recently I also formalized a complexity result in Isabelle/HOL, > namely that the number of comparisons is bounded from above by > >   n + n * ⌈log 2 n⌉ > > for lists of length n. Cool! That’s of course not easily possible with out shallow embedding into Coq. > For this proof I had to change the definition of "sequences", > "ascending", and "descending" slightly. > > Now here is my question: Does anyone now of reasons why the current > implementation of "sequences" is allowed to produce spurious empty lists > in its result? The version I used in my formalization only differs in > the following three spots: > … It _could_ have been benchmarked and measured and determined that it is actually faster to do less case analysis here. But I find that unlikely (both the investigation and this output), and it is probably simply an oversight, and I would expect that a patch to that effect would be appreciated ! Cheers, Joachim -- Joachim Breitner   [hidden email]   http://www.joachim-breitner.de/_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post. signature.asc (849 bytes) Download Attachment
Open this post in threaded view
|

## Re: natural mergesort in Data.List

 In reply to this post by Christian Sternagel If you're guaranteeing that the result won't contain empty lists, it would be worth benchmarking the effect of using NonEmpty x instead of [x] in that spot.On Tue, Sep 18, 2018, 4:21 AM Christian Sternagel <[hidden email]> wrote:Dear Cafe, some years ago I formalized the mergesort implementation [1] from http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort (as far as I can tell it didn't change since 2012) in the proof assistant Isabelle/HOL [2]. More specifically, I proved its functional correctness (the result is sorted and contains all elements of the input with exactly the same multiplicities) and that it is a stable sorting algorithm. Very recently I also formalized a complexity result in Isabelle/HOL, namely that the number of comparisons is bounded from above by   n + n * ⌈log 2 n⌉ for lists of length n. For this proof I had to change the definition of "sequences", "ascending", and "descending" slightly. Now here is my question: Does anyone now of reasons why the current implementation of "sequences" is allowed to produce spurious empty lists in its result? The version I used in my formalization only differs in the following three spots:   sequences [x] = [[x]] -- this is the only important change, since   sequences [] = []     -- then the result does not contain empty lists instead of   sequences xs = [xs] and   ascending a as [] = let !x = as [a] in [x] instead of   ascending a as bs = let !x = as [a] in x : sequences bs and   descending a as []  = [a:as] instead of   descending a as bs = (a:as) : sequences bs [1] https://www.isa-afp.org/entries/Efficient-Mergesort.html [2] http://isabelle.in.tum.de/ _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post.
Open this post in threaded view
|

## Re: natural mergesort in Data.List

 Dear David, I am guaranteeing (since I proved it in Isabelle/HOL) that the following version of "sequences" does not contain empty lists in its result (I am copying from my Isabelle formalization, in order to avoid typos) ;) fun sequences :: "'a list ⇒ 'a list list"   and asc :: "'a ⇒ ('a list ⇒ 'a list) ⇒ 'a list ⇒ 'a list list"   and desc :: "'a ⇒ 'a list ⇒ 'a list ⇒ 'a list list"   where     "sequences (a # b # xs) =       (if key a > key b then desc b [a] xs else asc b ((#) a) xs)"   | "sequences [x] = [[x]]"   | "sequences [] = []"   | "asc a as (b # bs) =       (if ¬ key a > key b then asc b (λys. as (a # ys)) bs       else as [a] # sequences (b # bs))"   | "asc a as [] = [as [a]]"   | "desc a as (b # bs) =       (if key a > key b then desc b (a # as) bs       else (a # as) # sequences (b # bs))"   | "desc a as [] = [a # as]" The "key" function is an implicit first parameter for each of "sequences", "asc", and "desc" above. The fact that I am using a "key" function instead of a comparator is due to Isabelle/HOL's standard library. Also, there are no pattern guards in Isabelle/HOL. Anyway, it should be relatively straight-forward to translate these functions into Haskell. Another thing: I just realized that "merge_pairs" in my formalization also differs from "mergePairs", since with the changed "sequences" it might actually get an empty list as input, in which case the current "mergePairs" wouldn't terminate at all. So for those who are interested, the full definition of mergesort can be found here https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.htmlwhere mergesort is called "msort_key". cheers chris Btw: What is "NonEmpty x"? On 09/18/2018 11:55 AM, David Feuer wrote: > If you're guaranteeing that the result won't contain empty lists, it > would be worth benchmarking the effect of using NonEmpty x instead of > [x] in that spot. > > On Tue, Sep 18, 2018, 4:21 AM Christian Sternagel <[hidden email] > > wrote: > >     Dear Cafe, > >     some years ago I formalized the mergesort implementation [1] from > > >     http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort> >     (as far as I can tell it didn't change since 2012) in the proof >     assistant Isabelle/HOL [2]. > >     More specifically, I proved its functional correctness (the result is >     sorted and contains all elements of the input with exactly the same >     multiplicities) and that it is a stable sorting algorithm. > >     Very recently I also formalized a complexity result in Isabelle/HOL, >     namely that the number of comparisons is bounded from above by > >       n + n * ⌈log 2 n⌉ > >     for lists of length n. > >     For this proof I had to change the definition of "sequences", >     "ascending", and "descending" slightly. > >     Now here is my question: Does anyone now of reasons why the current >     implementation of "sequences" is allowed to produce spurious empty lists >     in its result? The version I used in my formalization only differs in >     the following three spots: > >       sequences [x] = [[x]] -- this is the only important change, since >       sequences [] = []     -- then the result does not contain empty lists > >     instead of > >       sequences xs = [xs] > >     and > >       ascending a as [] = let !x = as [a] in [x] > >     instead of > >       ascending a as bs = let !x = as [a] in x : sequences bs > >     and > >       descending a as []  = [a:as] > >     instead of > >       descending a as bs = (a:as) : sequences bs > >     [1] https://www.isa-afp.org/entries/Efficient-Mergesort.html>     [2] http://isabelle.in.tum.de/>     _______________________________________________ >     Haskell-Cafe mailing list >     To (un)subscribe, modify options or view archives go to: >     http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe>     Only members subscribed via the mailman list are allowed to post. > _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post.
Open this post in threaded view
|

## Re: natural mergesort in Data.List

 data NonEmpty a = a :| [a]It's a nonempty list, defined in Data.List.NonEmpty.On Tue, Sep 18, 2018, 9:14 AM Christian Sternagel <[hidden email]> wrote:Dear David, I am guaranteeing (since I proved it in Isabelle/HOL) that the following version of "sequences" does not contain empty lists in its result (I am copying from my Isabelle formalization, in order to avoid typos) ;) fun sequences :: "'a list ⇒ 'a list list"   and asc :: "'a ⇒ ('a list ⇒ 'a list) ⇒ 'a list ⇒ 'a list list"   and desc :: "'a ⇒ 'a list ⇒ 'a list ⇒ 'a list list"   where     "sequences (a # b # xs) =       (if key a > key b then desc b [a] xs else asc b ((#) a) xs)"   | "sequences [x] = [[x]]"   | "sequences [] = []"   | "asc a as (b # bs) =       (if ¬ key a > key b then asc b (λys. as (a # ys)) bs       else as [a] # sequences (b # bs))"   | "asc a as [] = [as [a]]"   | "desc a as (b # bs) =       (if key a > key b then desc b (a # as) bs       else (a # as) # sequences (b # bs))"   | "desc a as [] = [a # as]" The "key" function is an implicit first parameter for each of "sequences", "asc", and "desc" above. The fact that I am using a "key" function instead of a comparator is due to Isabelle/HOL's standard library. Also, there are no pattern guards in Isabelle/HOL. Anyway, it should be relatively straight-forward to translate these functions into Haskell. Another thing: I just realized that "merge_pairs" in my formalization also differs from "mergePairs", since with the changed "sequences" it might actually get an empty list as input, in which case the current "mergePairs" wouldn't terminate at all. So for those who are interested, the full definition of mergesort can be found here https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.html where mergesort is called "msort_key". cheers chris Btw: What is "NonEmpty x"? On 09/18/2018 11:55 AM, David Feuer wrote: > If you're guaranteeing that the result won't contain empty lists, it > would be worth benchmarking the effect of using NonEmpty x instead of > [x] in that spot. > > On Tue, Sep 18, 2018, 4:21 AM Christian Sternagel <[hidden email] > > wrote: > >     Dear Cafe, > >     some years ago I formalized the mergesort implementation [1] from > > >     http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort > >     (as far as I can tell it didn't change since 2012) in the proof >     assistant Isabelle/HOL [2]. > >     More specifically, I proved its functional correctness (the result is >     sorted and contains all elements of the input with exactly the same >     multiplicities) and that it is a stable sorting algorithm. > >     Very recently I also formalized a complexity result in Isabelle/HOL, >     namely that the number of comparisons is bounded from above by > >       n + n * ⌈log 2 n⌉ > >     for lists of length n. > >     For this proof I had to change the definition of "sequences", >     "ascending", and "descending" slightly. > >     Now here is my question: Does anyone now of reasons why the current >     implementation of "sequences" is allowed to produce spurious empty lists >     in its result? The version I used in my formalization only differs in >     the following three spots: > >       sequences [x] = [[x]] -- this is the only important change, since >       sequences [] = []     -- then the result does not contain empty lists > >     instead of > >       sequences xs = [xs] > >     and > >       ascending a as [] = let !x = as [a] in [x] > >     instead of > >       ascending a as bs = let !x = as [a] in x : sequences bs > >     and > >       descending a as []  = [a:as] > >     instead of > >       descending a as bs = (a:as) : sequences bs > >     [1] https://www.isa-afp.org/entries/Efficient-Mergesort.html >     [2] http://isabelle.in.tum.de/ >     _______________________________________________ >     Haskell-Cafe mailing list >     To (un)subscribe, modify options or view archives go to: >     http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >     Only members subscribed via the mailman list are allowed to post. > _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafeOnly members subscribed via the mailman list are allowed to post.