Optimization problem Classic List Threaded 44 messages 123
Open this post in threaded view
|

Optimization problem

 Dear Haskell Cafe, When programming the other day I ran into this problem. What I want to do is a function that would work like this: splitStreams::Ord a=>[(a,b)]->[(a,[b])] > splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])] I don't care about the order that the pairs are output, so the answer could just as well be [(2,[w],(3,[x,z]),(1,[y])]. However I do care about the order of the xyzw:s, so (3,[z,x]) could not be part of the solution in this example. Furthermore it should work on infinite lists. It can't eat the whole list before producing any output. Now, it's not too hard to come up with an inefficient solution that traverses the input list multiple times. For example a sieving solution: import Data.List splitStreams [] = [] splitStreams ((channel,msg):rest) =      let (myMsgs,otherMsgs) =            partition (\(c,_)->c==channel) rest      in (channel, msg : map snd myMsgs) : splitStreams otherMsgs I'm afraid this algorithm is O(n*m) time in the worst case, where n is the length of the input list, and m is the number of unique channels. But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm? Any takers? / Magnus Jonsson _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 Magnus Jonsson wrote: > Dear Haskell Cafe, > > When programming the other day I ran into this problem. What I want to > do is a function that would work like this: > > splitStreams::Ord a=>[(a,b)]->[(a,[b])] > >> splitStreams [(3,x),(1,y),(3,z),(2,w)] > > [(3,[x,z]),(1,[y]),(2,[w])] A O(n log(n)) algorithm is easy if you use Data.Map:  > import qualified Data.Map as Map  >  > splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b]  > splitStreamsMap = foldl add Map.empty  >  where add (a,b) m = Map.insertWith (++) a [b]  >  > splitStreams = Map.fromList . splitStreamsMap Twan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Magnus Jonsson Magnus Jonsson <[hidden email]> writes: > splitStreams::Ord a=>[(a,b)]->[(a,[b])] >> splitStreams [(3,x),(1,y),(3,z),(2,w)] > [(3,[x,z]),(1,[y]),(2,[w])]   [...] > But is there any way to write it such that each element is touched > only once? Or at least an O(n*log(m)) algorithm? I guess it should be possible to use a quicksort-like algorithm, splitting the stream into three streams with keys less than, equal, and greater than the first element, and recurse on the less/equal streams? -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 On Thu, 14 Sep 2006, Ketil Malde wrote: > Magnus Jonsson <[hidden email]> writes: > >> splitStreams::Ord a=>[(a,b)]->[(a,[b])] > >>> splitStreams [(3,x),(1,y),(3,z),(2,w)] >> [(3,[x,z]),(1,[y]),(2,[w])] > >  [...] > >> But is there any way to write it such that each element is touched >> only once? Or at least an O(n*log(m)) algorithm? > > I guess it should be possible to use a quicksort-like algorithm, > splitting the stream into three streams with keys less than, equal, > and greater than the first element, and recurse on the less/equal > streams? > > -k > That is probably the best we can do. I think in a hypothetical concurrent Haskell with futures/promises, the split stream problem could be solved. But if it can be solved in regular Haskell, I would be pleasantly surprised. / Magnus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Magnus Jonsson On Wed, 13 Sep 2006, Magnus Jonsson wrote: > When programming the other day I ran into this problem. What I want to do is a > function that would work like this: > > splitStreams::Ord a=>[(a,b)]->[(a,[b])] > > > splitStreams [(3,x),(1,y),(3,z),(2,w)] > [(3,[x,z]),(1,[y]),(2,[w])] Interestingly we use such a routine in Haskore for splitting a sequence of notes into sequences of notes of equal instruments. It's implemented rather the same way like your version. It's 'slice' in    http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs  but it is a bit more complicated because it has to manage time information. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Magnus Jonsson > >Magnus Jonsson <[hidden email]> writes: > > > >>splitStreams::Ord a=>[(a,b)]->[(a,[b])] > > > >>>splitStreams [(3,x),(1,y),(3,z),(2,w)] > >>[(3,[x,z]),(1,[y]),(2,[w])] > > > > [...] > > > >>But is there any way to write it such that each element is touched > >>only once? Or at least an O(n*log(m)) algorithm? > > > >I guess it should be possible to use a quicksort-like algorithm, > >splitting the stream into three streams with keys less than, equal, > >and greater than the first element, and recurse on the less/equal > >streams? something like this, then? splitStreams :: Ord a => [(a, b)] -> [(a, [b])] splitStreams [] = [] splitStreams ((a, b) : t) = (a, b : bs) : splitStreams t'     where     (bs, t') = foldr f ([], []) t     f z@(a', b') (bs, t') = if a' == a                             then (b' : bs, t')                             else (bs, z : t') (i guess i should use a strictness '!' for (xs, ys), but i am still running ghc-6.5, and i don't like the case constructions that much. does this bite me here?  i didn't check, really.) who can tune this any further? cheers, m. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc (196 bytes) Download Attachment
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Magnus Jonsson Magnus Jonsson wrote: > splitStreams::Ord a=>[(a,b)]->[(a,[b])] > > >splitStreams [(3,x),(1,y),(3,z),(2,w)] > [(3,[x,z]),(1,[y]),(2,[w])] > I'm afraid this algorithm is O(n*m) time in the worst case, where n is the > length of the input list, and m is the number of unique channels. > > But is there any way to write it such that each element is touched only > once? Or at least an O(n*log(m)) algorithm? This can be done. It's an interesting puzzle to make it work for infinite lists. For finite lists, sortBy + groupBy + map easily do the job. The problem is dealing with holes in data structures in Haskell, which are to be filled in later. This can be done by providing blueprints for them. Let's define a basic map: > data Map k a  = Node k a (Map k a) (Map k a) | Leaf We will use Map k () as blueprints - the () indicate where the holes are. Next we define a lookup function, which takes a blueprint and evaluates the correct hole in a second map: > lookup :: Ord k => k -> Map k () -> Map k a -> Maybe a > lookup _ Leaf            _                  = Nothing > lookup k (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of >     LT -> lookup k l l' >     EQ -> Just a' >     GT -> lookup k r r' As you can see, the structure of the second argument is forced by the first argument. The lazy pattern assures that we don't look at the second argument too early. In a similar fashion, we can define an update function: > update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a > update k f (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of >     LT -> Node k' a' (update k f l l') r' >     EQ -> Node k' (f a') l' r' >     GT -> Node k' a' l' (update k f r r') Next comes insert. Insert takes a blueprint and inserts a key into it. It also takes a map and removes the corresponding hole from it. To simplify the code it does no balancing.  > insert :: Ord k => k -> Map k () -> Map k a -> (Map k (), Map k a) > insert k Leaf            _ >     = (Node k () Leaf Leaf, Leaf) > insert k (Node k' _ l r) ~(Node _ a' l' r') >     = case compare k k' of >         LT -> let (m, m') = insert k l l' in >               (Node k' () m r, Node k' a' m' r') >         EQ -> error "inserting existing element" >         GT -> let (m, m') = insert k r r' in >               (Node k' () l m, Node k' a' l' m') We also need a map, defined in the usual fashion, without the blueprint. A version with blueprint can also be defined, but it isn't necessary for our problem: > map :: (a -> b) -> Map k a -> Map k b > map _ Leaf           = Leaf > map f (Node k a l r) = Node k (f a) (mapMap f l) (mapMap f r) We can now build the splitStream function, using the following helper function: > splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) > splitSeq' bp []         = ([], map (const []) bp) > splitSeq' bp ((a,b):xs) = case lookup a bp bp of >     Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) >     _      -> let (bp', m) = insert a bp m' >                   (l, m')  = splitSeq' bp' xs >                 in >                   ((a, b : (fromJust \$ lookup a bp' m')) : l, m) splitSeq' takes a blueprint for a map with all keys seen so far, and a list tail. It returns the result list for all new keys, and a map (corresponding to the given blueprint) with the tails of the seen elements. The in the base case it just fills the blueprint with empty lists and returns to the caller. If a new element is seen, insert is used to create a new blueprint, including the new key a, which is passed to the recursive call of splitSeq'. The resulting map from that call is threaded back into insert, which gives us a new map without the a key which matches the structure of the original blueprint. Finally we can define splitSeq as follows: > splitSeq :: Ord a => [(a,b)] -> [(a,[b])] > splitSeq = fst . splitSeq' Leaf A quick test: *Main> let s = splitSeq ([(3,'x'),(1,'y'),(3,'z'),(2,'w')] ++ repeat (4,' ')) *Main> s !! 0 (3,"xzInterrupted.  (I pressed ^C) *Main> s !! 2 (2,"wInterrupted.   (ditto) *Main> s !! 3 (4,"       ... Is there a simpler way to do this? I don't know. I'm also unsure whether it is a good idea - unless you use several threads to process the list the code will produce a lot of thunks, and eat a lot of memory. The code above provides a maximum amount of lazyness while using O(n log m) time. Depending on the circumstances processing the list in chunks and using techniques like the above to combine the result will be better. enjoy, Bertram  Balancing can be done with the information in the blueprint, and mapping back is easily done by doing the transformation on the tree in reverse. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Henning Thielemann On Thu, 14 Sep 2006, Henning Thielemann wrote: > Interestingly we use such a routine in Haskore for splitting a sequence of > notes into sequences of notes of equal instruments. It's implemented > rather the same way like your version. > > It's 'slice' in >   http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs>  but it is a bit more complicated because it has to manage time > information. > Now even more interestingly, my program also deals with musiec! :) I'm generating microtonal midi files. I use it for very much the same purpose as you do (although my program is not yet finished). - Magnus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Bertram Felgenhauer-2 Here's my attempt, also using the quicksort idea, but using two passes instead of tying the knot: > import Data.Set hiding (map) First a binary search tree, with a lookup function: > data Tree k v = Node (Tree k v) k v (Tree k v) > get :: Ord a => a -> Tree a b -> b > get a (Node l k v r) = case compare a k of >         LT -> get a l >         EQ -> v >         GT -> get a r There's no empty case: we'll ensure that we only search for keys that are in the tree. Now to make a tree of lists from the list of pairs: > mkTree :: Ord a => [(a,b)] -> Tree a [b] > mkTree [] = error "unused" > mkTree ((a,b):abs) = Node l a (b:bs) r >   where l = mkTree [(a',b') | (a',b') <- abs, a' < a] >         r = mkTree [(a',b') | (a',b') <- abs, a' > a] >         bs = [b' | (a',b') <- abs, a' == a] It remains to extract from this tree the list of second elements corresponding to the each distinct first element in the input list: > splitStreams :: Ord a => [(a,b)] -> [(a,[b])] > splitStreams abs = [(a, get a t) | a <- uniq (map fst abs)] >   where t = mkTree abs where uniq computes the list of unique elements of a list: > uniq :: Ord a => [a] -> [a] > uniq = u empty >   where u s [] = [] >         u s (x:xs) >           | member x s = u s xs >           | otherwise  = x : u (insert x s) xs There's no rebalancing, so it suffers from the same problems as quicksort. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Bertram Felgenhauer-2 I wrote: >  Balancing can be done with the information in the blueprint, and > mapping back is easily done by doing the transformation on the tree > in reverse. I should add that this possibility was the main reason for dealing with blueprints at all. As Ross Paterson's solution shows, it's possible to get simpler code without balancing the tree. regards, Bertram _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Matthias Fischmann Hello, I think it is possible to do it in O(n) (if you change the representation of the output stream). Note that the solution is quite similar toTwan van Laarhoven, and it should be trivial use "Map" instead of "Rel". Here is my take on it: The type of the output stream: > type Rel a b = a -> [b] Here are cons and nil: > cons :: Eq a => (a,b) -> Rel a b -> Rel a b > cons (x,y) l z >    | x == z        = y : l x >    | otherwise  = l z > nil :: Rel a b > nil _ = [] and a lookUp function: > lookUp :: Rel a b -> a -> [b] > lookUp = id Finally the splitStreams algorithm: > splitStreams :: Eq a => [(a,b)] -> Rel a b > splitStreams = foldr cons nil Here is a test with finite lists: > fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')] and here are the console tests: *General7> lookUp fin 1 "y" *General7> lookUp fin 2 "w" *General7> lookUp fin 3 "xz" Now with infinite lists: > inf = splitStreams (map (\x -> (0, x)) [1..]) and here a case where it terminates with infinite lists: *General7> take 10 (lookUp inf 0) [1,2,3,4,5,6,7,8,9,10] Cheers, Bruno Oliveira _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Re: Optimization problem

 In reply to this post by Bertram Felgenhauer-2 Thanks Bertran for your solution. I have difficulty understanding it so I can't comment on it right now but I'll try to have a look at it. Do you know of any article that explains this technique, starting from very simple examples? /Magnus On Thu, 14 Sep 2006, Bertram Felgenhauer wrote: > I wrote: >>  Balancing can be done with the information in the blueprint, and >> mapping back is easily done by doing the transformation on the tree >> in reverse. > > I should add that this possibility was the main reason for dealing with > blueprints at all. As Ross Paterson's solution shows, it's possible to > get simpler code without balancing the tree. > > regards, > > Bertram > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe> _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

Open this post in threaded view
|

 In reply to this post by Henning Thielemann On Thu, 14 Sep 2006, Henning Thielemann wrote: > > On Thu, 14 Sep 2006, Magnus Jonsson wrote: > >> >> Now even more interestingly, my program also deals with music! :) I'm >> generating microtonal midi files. I use it for very much the same purpose as >> you do (although my program is not yet finished). > > Is it something we could and should add to Haskore? If Haskore could support microtones that would make this world a slightly better world for me. Here are the basic things you need to support microtonal music: - Pitch representations would have to be able to express any pitch.    - One appealing approach is to represent a pitch directly as it's frequency.    - Probably the most useful representation though is a base pitch,      say one of C,D,E,F,G,A,B, followed by a list of accidentals that      modify the pitch. The user should be able to define his own base      pitches and accidentals, in terms of cents or frequency ratios or      something similar. - Generating microtonal midi files requires that you add pitch-bend messages before all notes. That restricts each midi channel to only being able to play one note at a time. This is a big deficiency in the midi protocol imo. / Magnus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Re: Optimization problem

 In reply to this post by Magnus Jonsson Magnus Jonsson wrote: > Thanks Bertran for your solution. I have difficulty understanding it so I > can't comment on it right now but I'll try to have a look at it. Do you > know of any article that explains this technique, starting from very > simple examples? Not really. It's a result of thinking about lazy evaluation, and especially lazy patterns (and let bindings) for some time. A wiki article that helped me a lot to understand these is   http://www.haskell.org/hawiki/TyingTheKnotI'd like to point out the trustList function there which uses the idea of encoding the structure of a term and its actual values in different arguments, i.e. a blueprint. HTH, Bertram _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

Re: Optimization problem

Open this post in threaded view
|

Re: Optimization problem

 In reply to this post by Bertram Felgenhauer-2 Bertram Felgenhauer wrote: >> splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) >> splitSeq' bp []         = ([], map (const []) bp) >> splitSeq' bp ((a,b):xs) = case lookup a bp bp of >>     Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) >>     _      -> let (bp', m) = insert a bp m' >>                   (l, m')  = splitSeq' bp' xs >>                 in >>                   ((a, b : (fromJust \$ lookup a bp' m')) : l, m) > > splitSeq' takes a blueprint for a map with all keys seen so far, and > a list tail. It returns the result list for all new keys, and a map > (corresponding to the given blueprint) with the tails of the seen > elements. > > The in the base case it just fills the blueprint with empty lists and > returns to the caller. > > If a new element is seen, insert is used to create a new blueprint, > including the new key a, which is passed to the recursive call of > splitSeq'. The resulting map from that call is threaded back into > insert, which gives us a new map without the a key which matches > the structure of the original blueprint. Very interesting! So the map with the seen tails matches the blueprint and therefore throws away information (the future keys) from the future. This is what effectively allows the key-tree structure to be rebalanced.    If one would return the tails-map with all future keys, it would be _|_ because the key-order in the tree depends on the future keys and changes everytime when a new key is found. I thought that there can only be a static solution, i.e. like the one Ross Paterson presented where the structure (I mean the outermost constructors) of the returned tree are determined before the future. This obviously excludes rebalancing. I found a static solution by trying to fit the key-tails pairs into an infinite tails-map which is filled "first comes first":   map ! 1 := (first key seen, tails)   map ! 2 := (second key seen, tails) By keeping another key-position-map around which records where each key has been inserted, so that the future knows where to put the tails. The point is that the structure of tails-map is known before the future comes as its keys are just 1,2,3,... and so on. It remains to construct such an infinite random access list, but this is turns out to be even easier than finite random access lists (when using non-uniform recursion from Okasaki's chapter 10) :) > data Imp a = Imp a (Imp (a,a)) deriving (Show) > > instance Functor Imp where >    fmap h ~(Imp x xs) = Imp (h x) (fmap (\(x,y) -> (h x, h y)) xs) > > update :: (a -> a) -> Position -> Imp a -> Imp a > update f 1 ~(Imp x xs) = Imp (f x) xs > update f n ~(Imp x xs) = Imp x \$ update f2 (n `div` 2) xs >    where >    f2 ~(y,z) = if even n then (f y, z) else (y, f z) Note that we can use irrefutable patterns without hesitation as there is only one constructor. Folding over an infinite thing is strange but here we are > fold :: (a -> b -> b) -> Imp a -> b > fold f ~(Imp x xs) = f x (fold f2 xs) >    where >    f2 (x,y) z = f x (f y z) It's not so strange anymore when we realize that this can be used to convert it into an infinite list > toList = fold (:) The implementation of fromList is fun as well, so I won't tell it. As fold and unfold can be defined generically for Mu f where f is a functor, i wonder how to deal with it in the case of non-uniform recursion. For splitStreams, the key-position-map is needed in both directions, so we quickly define a bijective map > type BiMap a b = (Map.Map a b, Map.Map b a) > > switch :: BiMap a b -> BiMap b a > switch (x,y) = (y,x) with the usual operations (empty, member, size etc.) omitted here. Now comes splitStreams itself: > splitStreams :: Ord a => [(a,b)] -> [(a,[b])] > splitStreams xs = >    takeWhile (not . null . snd) \$ toList \$ splitStreams' empty xs > > splitStreams' :: Ord a => BiMap a Position -> [(a,b)] -> Imp (a,[b]) > splitStreams' bimap [] = >    fmap (\i -> (findWithDefault undefined i \$ switch bimap,[])) \$ >        fromList [1..] > splitStreams' bimap ((a,b):xs) = >    update fun pos \$ splitStreams' bimap' xs >    where >    fun ~(_,bs) = (a,b:bs) >    sz          = size bimap >    pos         = findWithDefault (sz+1) a bimap >    bimap'      = >       (if member a bimap then id else insert a (sz+1)) bimap Note that update actually generates fresh constructors, so the structure of our tails-Imp is not really static. But this is no problem as the form of the constructors is completely known: there is only one. Regards, apfelmus _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe