What's the best way to implement the following function in haskell:
Given a list and an integer k as input return the indices of the least k
elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. R M Check out what you're missing if you're not on Yahoo! Messenger _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
mrvr84:
> > What's the best way to implement the following function in > haskell: Given a list and an integer k as input return the > indices of the least k elements in the list. The code should > be elegant and also, more importantly, must not make more > than the minimum O(k*length(list)) number of operations. > R M Is this a homework question?  Don _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote:
> What's the best way to implement the following function in haskell: > Given a list and an integer k as input return the indices of the least > k elements in the list. The code should be elegant and also, more > importantly, must not make more than the minimum O(k*length(list)) > number of operations. Go read and thoroughly understand "Why Functional Programming Matters." Also, your asyptotic complexity bound is just plain wrong. I'd give faster code, but Don is suspicious (and I can never tell these things myself). Stefan _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
On Wed, Apr 11, 2007 at 08:38:48PM 0700, Stefan O'Rear wrote:
> On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote: > > What's the best way to implement the following function in haskell: > > Given a list and an integer k as input return the indices of the least > > k elements in the list. The code should be elegant and also, more > > importantly, must not make more than the minimum O(k*length(list)) > > number of operations. > > Go read and thoroughly understand "Why Functional Programming > Matters." > > Also, your asyptotic complexity bound is just plain wrong. I'd give > faster code, but Don is suspicious (and I can never tell these things > myself). > > Stefan Don tells me (in #haskell) that you are legitimate, so here is the example: kminima k lst = take k $ sort lst If you want to be really explicit about it, here is a sort that will work: sort [] = [] sort l@(x:_) = filter (<x) l ++ filter (==x) l ++ filter (>x) l (A stable quicksort, btw) Note that this is FASTER than your bound  somewhere between O(n) and O(n*log n). Ain't lazy evaluation great? :) Stefan _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
On 4/11/07, Stefan O'Rear <[hidden email]> wrote:
> > If you want to be really explicit about it, here is a sort that will > work: > > sort [] = [] > sort l@(x:_) = filter (<x) l ++ filter (==x) l ++ filter (>x) l > > (A stable quicksort, btw) You may be missing a few recursive calls there :) Cheers, Tim  Tim Chevalier * [hidden email] * Often in error, never in doubt Confused? See http://catamorphism.org/transition.html _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
On Wed, Apr 11, 2007 at 09:20:12PM 0700, Tim Chevalier wrote:
> On 4/11/07, Stefan O'Rear <[hidden email]> wrote: > > > >If you want to be really explicit about it, here is a sort that will > >work: > > > >sort [] = [] > >sort l@(x:_) = filter (<x) l ++ filter (==x) l ++ filter (>x) l > > > >(A stable quicksort, btw) > > You may be missing a few recursive calls there :) Indeed. Stefan _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
And just to remind people, the question is to find the indices and not
the numbers themselves. For example on input '3, [10,9,8,..., 3,2,1]'
the output must be '[9,8,7]'.  Original Message  From: Stefan O'Rear <[hidden email]> To: raghu vardhan <[hidden email]> Sent: Wednesday, 11 April, 2007 11:17:15 PM Subject: Re: [Haskellcafe] kminima in Haskell On Thu, Apr 12, 2007 at 09:30:22AM +0530, raghu vardhan wrote: > Hmmm. That's not something I was looking for. I'm not sure the > running time is good enough (think of k as being 2  then you should > not make more than 2n comparisons)  even with lazy evaluation, > quick sort won't have a bound of O(k*n). Muahahahaha! Muahahahahahahahahahaha! Muahaha! Actually it DOES. (this list courtesy of a ghci oneliner) find the 3 minima of [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36] ============ take 3 (sort [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36]) vvvvvvvvvvvv take 3 (sort (filter (<71) [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36]) ++ a bunch of stuff I won't track because it won't be evaluated) vvvvvvvvvvvv (comparisons so far: 20) take 3 (sort [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36]) take 3 (sort (filter (<17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36]) ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) vvvvvvvvvvvv (31) take 3 (sort [14, 16, 18, 4] ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) take 3 (sort (filter (<14) [14, 16, 18, 4]) ++ sort (filter (==14) [14, 16, 18, 4]) ++ sort (filter (>14) [14, 16, 18, 4]) ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) vvvvvvvvvvvv (39) take 3 (sort [4] ++ sort [14] ++ sort (filter (>14) [14, 16, 18, 4]) ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) take 3 (4 : 14 : sort (filter (>14) [14, 16, 18, 4]) ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) 4 : 14 : take 1 (sort (filter (>14) [14, 16, 18, 4]) ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) vvvvvvvvvvvv (43) 4 : 14 : take 1 (sort [16, 18] ++ filter (==17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36] ++ sort (filter (>17) [17, 14, 16, 18, 58, 65, 18, 4, 45, 51, 36])) vvvvvvvvvvvv (47) [4, 14, 16] 47 close enough to O(n*k) for you? (remember this is quicksort we are dealing with, O(n^2) worst case) Stefan Check out what you're missing if you're not on Yahoo! Messenger _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
G'day all.
Quoting raghu vardhan <[hidden email]>: > What's the best way to implement the following function in haskell: > Given a list and an integer k as input return the indices of the least k > elements in the list. The code should be elegant and also, more importantly, > must not make more than the minimum O(k*length(list)) number of operations. Pretty much like everyone has says, although it's best to use a real lazy O(n log n) sort, not quicksortwithdumbestpivot. To get the indices, use the Schwartzian transform: sortWith :: (Ord b) => (a > b) > [a] > [a] sortWith f = mergeRuns . runs where runs = map (:[]) mergeRuns [] = [] mergeRuns [xs] = xs mergeRuns xss = mergeRuns (mergeRun xss) mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss mergeRun xss = xss mergeOne [] ys = ys mergeOne xs [] = xs mergeOne xs'@(x:xs) ys':(y:ys) = case compare (f x) (f y) of LT > x : mergeOne xs ys' GT > y : mergeOne xs' ys EQ > x : y : mergeOne xs ys getKMinima :: (Ord a) => [a] > [Int] getKMinima k = map fst . take k . sortWith snd . zip [0..] Cheers, Andrew Bromage _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
There seems to be some confusion about the question. There are two key things to keep in mind here: 1) You must make at most O(k*n) comparisons (in the worst case) if the list has length n. 2) The output must be the indices and not the numbers themselves. So, any algorithm that sorts is no good (think of n as huge, and k small). Another interesting caveat to this is that if k=2, you can actually solve the problem with just (n+log n) comparisons in worst case(instead of 2*n, that you get by a naive approach), and it's a nice exercise to do this. As a further clarification, this is not a homework question. I genereally do whatever programming I do in Matlab, as I work with matrices (huge ones) and use this function a lot. I just wanted to see how different an implementation you can get in Haskell (I am new to Haskell so I might not be able to come up with the best way to do this).  Original Message  From: Stefan O'Rear <[hidden email]> To: raghu vardhan <[hidden email]> Cc: [hidden email] Sent: Wednesday, 11 April, 2007 10:47:08 PM Subject: Re: [Haskellcafe] kminima in Haskell On Wed, Apr 11, 2007 at 08:38:48PM 0700, Stefan O'Rear wrote: > On Thu, Apr 12, 2007 at 08:58:33AM +0530, raghu vardhan wrote: > > What's the best way to implement the following function in haskell: > > Given a list and an integer k as input return the indices of the least > > k elements in the list. The code should be elegant and also, more > > importantly, must not make more than the minimum O(k*length(list)) > > number of operations. > > Go read and thoroughly understand "Why Functional Programming > Matters." > > Also, your asyptotic complexity bound is just plain wrong. I'd give > faster code, but Don is suspicious (and I can never tell these things > myself). > > Stefan Don tells me (in #haskell) that you are legitimate, so here is the example: kminima k lst = take k $ sort lst If you want to be really explicit about it, here is a sort that will work: sort [] = [] sort l@(x:_) = filter (<x) l ++ filter (==x) l ++ filter (>x) l (A stable quicksort, btw) Note that this is FASTER than your bound  somewhere between O(n) and O(n*log n). Ain't lazy evaluation great? :) Stefan Check out what you're missing if you're not on Yahoo! Messenger _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
G'day all.
Quoting raghu vardhan <[hidden email]>: > So, any algorithm that sorts is no good (think of n as huge, and k small). The essence of all the answers that you've received is that it doesn't matter if k is small, sorting is still the most elegant answer in Haskell. The reason is that in Haskell, a function which sort function will produce the sorted list lazily. If you don't demand the while list, you don't perform the whole sort. Some algorithms are better than others for minimising the amount of work if not all of the list is demanded, but ideally, producing the first k elements will take O(n log k + k) time. Cheers, Andrew Bromage _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
[sorry for the double, ajb]
Since there seemed to be a disconnect between the expectation and the previous answers, I thought an alternative suggestion might help out. This sort of thing (haha) usually isn't my cup o' tea, so please point out any blunders. RM, is this more like what you had in mind? It leans more towards an iterative approach. If so, I encourage you to compare this to the other solutions and try understand how those solutions rely upon laziness. Stefan and Andrew, feel free to point out the drawbacks to this approach that I'm probably overlooking. kminima n l = map fst (foldr insert' (List.sort pre) suf) where (pre, suf) = (splitAt n . zip [0..]) l  I think this insertion sort could be  O(log k) with a better data structure. insert' x [] = [] insert' x (y : ys)  snd x < snd y = x : y : dropLast ys'  otherwise = y : insert' x ys where dropLast ys = take (length ys  1) ys We grab the first k elements and sort them, this list is our first approximation to the kminima. We then process the rest of the list, checking if the current element is smaller than any of the minima of the current approximation. If the current element is smaller, we improve the current approximation by inserting the new element and dropping the biggest (i.e. last) minimum from the minima accumulator. The worst behavior is: sort(k) + (nk) * k comparisons. This could be improved (to: sort(k) + (nk) * log k comparisons, I think) with a better data structure for the minima approximation. On 4/12/07, [hidden email] <[hidden email]> wrote: > G'day all. > > Quoting raghu vardhan <[hidden email]>: > > > So, any algorithm that sorts is no good (think of n as huge, and k small). > > The essence of all the answers that you've received is that it doesn't > matter if k is small, sorting is still the most elegant answer in Haskell. > > The reason is that in Haskell, a function which sort function will produce the > sorted > list lazily. If you don't demand the while list, you don't perform > the whole sort. > > Some algorithms are better than others for minimising the amount of > work if not all of the list is demanded, but ideally, producing the > first k elements will take O(n log k + k) time. > > Cheers, > Andrew Bromage > _______________________________________________ > HaskellCafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskellcafe > HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by ajb@spamcop.net
raghu vardhan <[hidden email]>:
> So, any algorithm that sorts is no good (think of n as huge, and k small). With lazy evaluation, it is! http://article.gmane.org/gmane.comp.lang.haskell.general/15010 [hidden email] wrote: > The essence of all the answers that you've received is that it doesn't > matter if k is small, sorting is still the most elegant answer in Haskell. (It's not only most elegant, it can even be put to work.) > The reason is that in Haskell, a function which sort function will produce the > sorted list lazily. If you don't demand the while list, you don't perform > the whole sort. > > Some algorithms are better than others for minimising the amount of > work if not all of the list is demanded, but ideally, producing the > first k elements will take O(n log k + k) time. You mean O(k * log n + n) of course. Regards, apfelmus _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
On Thu, 12 Apr 2007, raghu vardhan wrote:
> > What's the best way to implement the following function in haskell: > Given a list and an integer k as input return the indices of the least k > elements in the list. The code should be elegant and also, more importantly, must not make more than the minimum O(k*length(list)) number of operations. > > R M > I don't know about performance, but trying this problem I was struck again by the gorgeous, terse code that can be created: import Data.List import Data.Ord kminima :: (Enum a, Num a, Ord b) => Int > [b] > [a] kminima k lst = take k $ map fst $ sortBy (comparing snd) $ zip [0 ..] lst commented: kminima k lst = take k  We want k items off the front $ map fst  Just the list of indices $ sortBy (comparing snd)  Sort the pairs by their snd $ zip [0 ..] lst  Preserve indices in tuples Prelude> :l kminima.hs [1 of 1] Compiling Main ( kminima.lhs, interpreted ) Ok, modules loaded: Main. *Main> kminima 3 [71,71,17,14,16,91,18,71,58,75,65,79,76,18,4,45,87,51,93,36] [14,3,4] *Main> kminima 4 [10,9,8,7,6,5,4,3,2,1] [9,8,7,6]  .~. Dino Morelli /V\ email: [hidden email] /( )\ irc: dino ^^^^ preferred distro: Debian GNU/Linux http://www.debian.org _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by ajb@spamcop.net
On 4/12/07, [hidden email] <[hidden email]> wrote:
> To get the indices, use the Schwartzian transform: > > sortWith :: (Ord b) => (a > b) > [a] > [a] > sortWith f = mergeRuns . runs > where > runs = map (:[]) > > mergeRuns [] = [] > mergeRuns [xs] = xs > mergeRuns xss = mergeRuns (mergeRun xss) > > mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss > mergeRun xss = xss > > mergeOne [] ys = ys > mergeOne xs [] = xs > mergeOne xs'@(x:xs) ys':(y:ys) > = case compare (f x) (f y) of > LT > x : mergeOne xs ys' > GT > y : mergeOne xs' ys > EQ > x : y : mergeOne xs ys > > getKMinima :: (Ord a) => [a] > [Int] > getKMinima k = map fst . take k . sortWith snd . zip [0..] For my own edification, what is the benefit of this sortWith over sortBy? f `on` g = \ x y > f ( g x ) ( g y ) kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..] _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
Kurt Hutchinson wrote:
> On 4/12/07, [hidden email] <[hidden email]> wrote: >> To get the indices, use the Schwartzian transform: >> >> sortWith :: (Ord b) => (a > b) > [a] > [a] >> sortWith f = mergeRuns . runs >> where >> runs = map (:[]) >> >> mergeRuns [] = [] >> mergeRuns [xs] = xs >> mergeRuns xss = mergeRuns (mergeRun xss) >> >> mergeRun (xs1:xs2:xss) = mergeOne xs1 xs2 : mergeRun xss >> mergeRun xss = xss >> >> mergeOne [] ys = ys >> mergeOne xs [] = xs >> mergeOne xs'@(x:xs) ys':(y:ys) >> = case compare (f x) (f y) of >> LT > x : mergeOne xs ys' >> GT > y : mergeOne xs' ys >> EQ > x : y : mergeOne xs ys >> >> getKMinima :: (Ord a) => [a] > [Int] >> getKMinima k = map fst . take k . sortWith snd . zip [0..] > > For my own edification, what is the benefit of this sortWith over sortBy? > > f `on` g = \ x y > f ( g x ) ( g y ) > kminima k = map fst . take k . sortBy ( compare `on` snd ) . zip [0..] possibly related (newbie question): pairs are instances of Ord, why not directly sort those (implying the item to be sorted is fst): > kminima k = \list > map snd $ take k $ sort $ zip list [0..] _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe signature.asc (257 bytes) Download Attachment 
In reply to this post by Heinrich Apfelmus
The link pretty much answers my question, though you probably require a little bit more book keeping to get the indices out. Compared to the iterative version or the matlab version, this definitely is more elegant, but also is trickier.

\begin{code}
kmin :: Ord a => Int > [a] > [Int] kmin k x = map snd $ Set.toList $ foldl' insertIfSmall (Set.fromList h) t where (h, t) = splitAt k $ zip x [0..] insertIfSmall :: Ord a => Set.Set a > a > Set.Set a insertIfSmall s e  e < mx = Set.insert e s'  otherwise = s where (mx, s') = Set.deleteFindMax s \end{code} This gives O(log k * (n + k)) execution in constant memory. If you need the result indices to be in order, you can put in a sort at the end without changing the complexity. This could be improved by a significant constant factor by using a priority queue instead of Set. Any Edison people out there? Regards, Yitz _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by R M
\begin{code}
kmin :: Ord a => Int > [a] > [Int] kmin k x = map snd $ Set.toList $ foldl' insertIfSmall (Set.fromList h) t where (h, t) = splitAt k $ zip x [0..] insertIfSmall :: Ord a => Set.Set a > a > Set.Set a insertIfSmall s e  e < mx = Set.insert e s'  otherwise = s where (mx, s') = Set.deleteFindMax s \end{code} This gives O(log k * (n + k)) execution in constant memory. If you need the result indices to be in order, you can put in a sort at the end without changing the complexity. This could be improved by a significant constant factor by using a priority queue instead of Set. Any Edison people out there? Regards, Yitz _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by Heinrich Apfelmus
Does the answer change if the data source isn't a list, already in
memory, but a stream? That is, will the sort end up pulling the entire
stream into memory, when we only need k elements from the entire stream.
Interestingly, this is almost exactly the same as one of my standard interview questions, with the main difference being looking for the k elements closest to a target value, rather than the smallest. Not that it really changes the problem, but recognizing that is one of the things I'm looking for. On 4/12/07, apfelmus <[hidden email]> wrote: raghu vardhan <[hidden email]>: _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
In reply to this post by Kurt Hutchinson
G'day all.
Quoting Kurt Hutchinson <[hidden email]>: > For my own edification, what is the benefit of this sortWith over sortBy? None. I wanted to write my own sort, illustrating how the lazy evaluation thing works, and I didn't want a name clash with an existing library function. Cheers, Andrew Bromage _______________________________________________ HaskellCafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskellcafe 
Free forum by Nabble  Edit this page 