Time and space complexity of "take k . sort"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Time and space complexity of "take k . sort"

Paul Johnson-2
This question on StackOverflow asked about how to find the largest 100
items in a very long list:

http://stackoverflow.com/questions/1602998/fastest-way-to-obtain-the-largest-x-numbers-from-a-very-large-unsorted-list/1603198#1603198

I replied that you could do it with something like this (but here taking
the k smallest to strip out some irrelevant complications):

 >  takeLargest k = take k . sort

Because "sort" is lazily evaluated this only does enough sorting to find
the first k elements.  I guess the complexity is something like
O(n*k*log(k)).

But of equal practical interest is the space complexity.  The optimum
algorithm is to take the first k items, sort them, and then iterate
through the remaining items by adding each item to the sorted list and
then throwing out the highest one.  That has space complexity O(k).  
What does the function above do?

Paul.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Time and space complexity of "take k . sort"

Paul Johnson-2
On 22/10/09 15:31, Paul Johnson wrote:
>
> >  takeLargest k = take k . sort
>
> Because "sort" is lazily evaluated this only does enough sorting to
> find the first k elements.  I guess the complexity is something like
> O(n*k*log(k)).
>
Correction: O(n*log(k))
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Time and space complexity of "take k . sort"

Ketil Malde-5
In reply to this post by Paul Johnson-2
Paul Johnson <[hidden email]> writes:

>>  takeLargest k = take k . sort

> But of equal practical interest is the space complexity.  The optimum
> algorithm is to take the first k items, sort them, and then iterate
> through the remaining items by adding each item to the sorted list and
> then throwing out the highest one.  That has space complexity O(k).
> What does the function above do?

Well - 'sort' doesn't know the value of 'k', so it needs to retain all
elements, just in case 'k' might be 'n'.  So I don't see how you can use
space less than 'n' for a construct like the above.

-k
--
If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Time and space complexity of "take k . sort"

Dan Weston
Unless of course you use a GHC RULE to rewrite the RHS into the LHS,
which should always be a valid transformation.

Ketil Malde wrote:

> Paul Johnson <[hidden email]> writes:
>
>>>  takeLargest k = take k . sort
>
>> But of equal practical interest is the space complexity.  The optimum
>> algorithm is to take the first k items, sort them, and then iterate
>> through the remaining items by adding each item to the sorted list and
>> then throwing out the highest one.  That has space complexity O(k).
>> What does the function above do?
>
> Well - 'sort' doesn't know the value of 'k', so it needs to retain all
> elements, just in case 'k' might be 'n'.  So I don't see how you can use
> space less than 'n' for a construct like the above.
>
> -k

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Time and space complexity of "take k . sort"

Heinrich Apfelmus
In reply to this post by Paul Johnson-2
Paul Johnson wrote:
> Paul Johnson wrote:
>>
>> >  takeLargest k = take k . sort
>>
>> Because "sort" is lazily evaluated this only does enough sorting to
>> find the first k elements.  I guess the complexity is something like
>> O(n*k*log(k)).
>>
> Correction: O(n*log(k))

It's  O(n + k log k)  (which is the same as  O(n + k log n) ):

   http://apfelmus.nfshost.com/quicksearch.html


The remark about O(k) space complexity of the other algorithm is
interesting, since this means that it's not even allowed to copy its
argument of size O(n) .


Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe