Optimization problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Optimization problem

Magnus Jonsson
Dear Haskell Cafe,

When programming the other day I ran into this problem. What I want to do
is a function that would work like this:

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

> splitStreams [(3,x),(1,y),(3,z),(2,w)]
[(3,[x,z]),(1,[y]),(2,[w])]

I don't care about the order that the pairs are output, so the answer
could just as well be [(2,[w],(3,[x,z]),(1,[y])]. However I do care about
the order of the xyzw:s, so (3,[z,x]) could not be part of the solution in
this example.

Furthermore it should work on infinite lists. It can't eat the whole
list before producing any output.

Now, it's not too hard to come up with an inefficient solution that
traverses the input list multiple times. For example a sieving solution:

import Data.List

splitStreams [] = []
splitStreams ((channel,msg):rest) =
     let (myMsgs,otherMsgs) =
           partition (\(c,_)->c==channel) rest
     in (channel, msg : map snd myMsgs) : splitStreams otherMsgs

I'm afraid this algorithm is O(n*m) time in the worst case, where n is the
length of the input list, and m is the number of unique channels.

But is there any way to write it such that each element is touched only
once? Or at least an O(n*log(m)) algorithm?

Any takers?

/ Magnus Jonsson
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Twan van Laarhoven
Magnus Jonsson wrote:

> Dear Haskell Cafe,
>
> When programming the other day I ran into this problem. What I want to
> do is a function that would work like this:
>
> splitStreams::Ord a=>[(a,b)]->[(a,[b])]
>
>> splitStreams [(3,x),(1,y),(3,z),(2,w)]
>
> [(3,[x,z]),(1,[y]),(2,[w])]

A O(n log(n)) algorithm is easy if you use Data.Map:

 > import qualified Data.Map as Map
 >
 > splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b]
 > splitStreamsMap = foldl add Map.empty
 >  where add (a,b) m = Map.insertWith (++) a [b]
 >
 > splitStreams = Map.fromList . splitStreamsMap

Twan
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Magnus Jonsson
Nice try Twan but your example fails on infinite lists. I cleaned up
your example so that it compiles:

import qualified Data.Map as Map

splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b]
splitStreamsMap = foldl add Map.empty
   where add m (a,b) = Map.insertWith (++) a [b] m

splitStreams :: Ord a => [(a,b)] -> [(a,[b])]
splitStreams = Map.toList . splitStreamsMap

It fails to return a value on this test:

take 2 $ snd $ head $ splitStreams (map (\x -> (0 ,x)) [1..])

/ Magnus

On Thu, 14 Sep 2006, Twan van Laarhoven wrote:

> Magnus Jonsson wrote:
>> Dear Haskell Cafe,
>>
>> When programming the other day I ran into this problem. What I want to do
>> is a function that would work like this:
>>
>> splitStreams::Ord a=>[(a,b)]->[(a,[b])]
>>
>>> splitStreams [(3,x),(1,y),(3,z),(2,w)]
>>
>> [(3,[x,z]),(1,[y]),(2,[w])]
>
> A O(n log(n)) algorithm is easy if you use Data.Map:
>
>> import qualified Data.Map as Map
>>
>> splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b]
>> splitStreamsMap = foldl add Map.empty
>>  where add (a,b) m = Map.insertWith (++) a [b]
>>
>> splitStreams = Map.fromList . splitStreamsMap
>
> Twan
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Ketil Malde-3
In reply to this post by Magnus Jonsson
Magnus Jonsson <[hidden email]> writes:

> splitStreams::Ord a=>[(a,b)]->[(a,[b])]

>> splitStreams [(3,x),(1,y),(3,z),(2,w)]
> [(3,[x,z]),(1,[y]),(2,[w])]

  [...]

> But is there any way to write it such that each element is touched
> only once? Or at least an O(n*log(m)) algorithm?

I guess it should be possible to use a quicksort-like algorithm,
splitting the stream into three streams with keys less than, equal,
and greater than the first element, and recurse on the less/equal
streams?

-k
--
If I haven't seen further, it is by standing in the footprints of giants

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Magnus Jonsson

On Thu, 14 Sep 2006, Ketil Malde wrote:

> Magnus Jonsson <[hidden email]> writes:
>
>> splitStreams::Ord a=>[(a,b)]->[(a,[b])]
>
>>> splitStreams [(3,x),(1,y),(3,z),(2,w)]
>> [(3,[x,z]),(1,[y]),(2,[w])]
>
>  [...]
>
>> But is there any way to write it such that each element is touched
>> only once? Or at least an O(n*log(m)) algorithm?
>
> I guess it should be possible to use a quicksort-like algorithm,
> splitting the stream into three streams with keys less than, equal,
> and greater than the first element, and recurse on the less/equal
> streams?
>
> -k
>

That is probably the best we can do. I think in a hypothetical concurrent
Haskell with futures/promises, the split stream problem could be solved.
But if it can be solved in regular Haskell, I would be pleasantly
surprised.

/ Magnus

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Henning Thielemann
In reply to this post by Magnus Jonsson

On Wed, 13 Sep 2006, Magnus Jonsson wrote:

> When programming the other day I ran into this problem. What I want to do is a
> function that would work like this:
>
> splitStreams::Ord a=>[(a,b)]->[(a,[b])]
>
> > splitStreams [(3,x),(1,y),(3,z),(2,w)]
> [(3,[x,z]),(1,[y]),(2,[w])]

Interestingly we use such a routine in Haskore for splitting a sequence of
notes into sequences of notes of equal instruments. It's implemented
rather the same way like your version.

It's 'slice' in
   http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs
  but it is a bit more complicated because it has to manage time
information.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Matthias Fischmann
In reply to this post by Magnus Jonsson


> >Magnus Jonsson <[hidden email]> writes:
> >
> >>splitStreams::Ord a=>[(a,b)]->[(a,[b])]
> >
> >>>splitStreams [(3,x),(1,y),(3,z),(2,w)]
> >>[(3,[x,z]),(1,[y]),(2,[w])]
> >
> > [...]
> >
> >>But is there any way to write it such that each element is touched
> >>only once? Or at least an O(n*log(m)) algorithm?
> >
> >I guess it should be possible to use a quicksort-like algorithm,
> >splitting the stream into three streams with keys less than, equal,
> >and greater than the first element, and recurse on the less/equal
> >streams?
something like this, then?

splitStreams :: Ord a => [(a, b)] -> [(a, [b])]
splitStreams [] = []
splitStreams ((a, b) : t) = (a, b : bs) : splitStreams t'
    where
    (bs, t') = foldr f ([], []) t
    f z@(a', b') (bs, t') = if a' == a
                            then (b' : bs, t')
                            else (bs, z : t')

(i guess i should use a strictness '!' for (xs, ys), but i am still
running ghc-6.5, and i don't like the case constructions that much.
does this bite me here?  i didn't check, really.)

who can tune this any further?



cheers,
m.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Bertram Felgenhauer-2
In reply to this post by Magnus Jonsson
Magnus Jonsson wrote:
> splitStreams::Ord a=>[(a,b)]->[(a,[b])]
>
> >splitStreams [(3,x),(1,y),(3,z),(2,w)]
> [(3,[x,z]),(1,[y]),(2,[w])]

> I'm afraid this algorithm is O(n*m) time in the worst case, where n is the
> length of the input list, and m is the number of unique channels.
>
> But is there any way to write it such that each element is touched only
> once? Or at least an O(n*log(m)) algorithm?

This can be done. It's an interesting puzzle to make it work for infinite
lists. For finite lists, sortBy + groupBy + map easily do the job.

The problem is dealing with holes in data structures in Haskell, which
are to be filled in later. This can be done by providing blueprints for
them. Let's define a basic map:

> data Map k a  = Node k a (Map k a) (Map k a) | Leaf

We will use Map k () as blueprints - the () indicate where the holes
are. Next we define a lookup function, which takes a blueprint and
evaluates the correct hole in a second map:

> lookup :: Ord k => k -> Map k () -> Map k a -> Maybe a
> lookup _ Leaf            _                  = Nothing
> lookup k (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of
>     LT -> lookup k l l'
>     EQ -> Just a'
>     GT -> lookup k r r'

As you can see, the structure of the second argument is forced
by the first argument. The lazy pattern assures that we don't look
at the second argument too early.

In a similar fashion, we can define an update function:

> update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a
> update k f (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of
>     LT -> Node k' a' (update k f l l') r'
>     EQ -> Node k' (f a') l' r'
>     GT -> Node k' a' l' (update k f r r')

Next comes insert. Insert takes a blueprint and inserts a key into it.
It also takes a map and removes the corresponding hole from it. To
simplify the code it does no balancing. [1]

> insert :: Ord k => k -> Map k () -> Map k a -> (Map k (), Map k a)
> insert k Leaf            _
>     = (Node k () Leaf Leaf, Leaf)
> insert k (Node k' _ l r) ~(Node _ a' l' r')
>     = case compare k k' of
>         LT -> let (m, m') = insert k l l' in
>               (Node k' () m r, Node k' a' m' r')
>         EQ -> error "inserting existing element"
>         GT -> let (m, m') = insert k r r' in
>               (Node k' () l m, Node k' a' l' m')

We also need a map, defined in the usual fashion, without the blueprint.
A version with blueprint can also be defined, but it isn't necessary for
our problem:

> map :: (a -> b) -> Map k a -> Map k b
> map _ Leaf           = Leaf
> map f (Node k a l r) = Node k (f a) (mapMap f l) (mapMap f r)


We can now build the splitStream function, using the following helper
function:

> splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])
> splitSeq' bp []         = ([], map (const []) bp)
> splitSeq' bp ((a,b):xs) = case lookup a bp bp of
>     Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m)
>     _      -> let (bp', m) = insert a bp m'
>                   (l, m')  = splitSeq' bp' xs
>                 in
>                   ((a, b : (fromJust $ lookup a bp' m')) : l, m)

splitSeq' takes a blueprint for a map with all keys seen so far, and
a list tail. It returns the result list for all new keys, and a map
(corresponding to the given blueprint) with the tails of the seen
elements.

The in the base case it just fills the blueprint with empty lists and
returns to the caller.

If a new element is seen, insert is used to create a new blueprint,
including the new key a, which is passed to the recursive call of
splitSeq'. The resulting map from that call is threaded back into
insert, which gives us a new map without the a key which matches
the structure of the original blueprint.

Finally we can define splitSeq as follows:

> splitSeq :: Ord a => [(a,b)] -> [(a,[b])]
> splitSeq = fst . splitSeq' Leaf

A quick test:
*Main> let s = splitSeq ([(3,'x'),(1,'y'),(3,'z'),(2,'w')] ++ repeat (4,' '))
*Main> s !! 0
(3,"xzInterrupted.  (I pressed ^C)
*Main> s !! 2
(2,"wInterrupted.   (ditto)
*Main> s !! 3
(4,"       ...

Is there a simpler way to do this? I don't know.

I'm also unsure whether it is a good idea - unless you use several
threads to process the list the code will produce a lot of thunks,
and eat a lot of memory.

The code above provides a maximum amount of lazyness while using
O(n log m) time. Depending on the circumstances processing the list in
chunks and using techniques like the above to combine the result will
be better.

enjoy,

Bertram

[1] Balancing can be done with the information in the blueprint, and
mapping back is easily done by doing the transformation on the tree
in reverse.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Magnus Jonsson
In reply to this post by Henning Thielemann
On Thu, 14 Sep 2006, Henning Thielemann wrote:

> Interestingly we use such a routine in Haskore for splitting a sequence of
> notes into sequences of notes of equal instruments. It's implemented
> rather the same way like your version.
>
> It's 'slice' in
>   http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs
>  but it is a bit more complicated because it has to manage time
> information.
>

Now even more interestingly, my program also deals with musiec! :) I'm
generating microtonal midi files. I use it for very much the same
purpose as you do (although my program is not yet finished).

- Magnus
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Henning Thielemann

On Thu, 14 Sep 2006, Magnus Jonsson wrote:

> On Thu, 14 Sep 2006, Henning Thielemann wrote:
>
> > Interestingly we use such a routine in Haskore for splitting a sequence of
> > notes into sequences of notes of equal instruments. It's implemented
> > rather the same way like your version.
> >
> > It's 'slice' in
> >   http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs
> >  but it is a bit more complicated because it has to manage time
> > information.
>
> Now even more interestingly, my program also deals with musiec! :) I'm
> generating microtonal midi files. I use it for very much the same purpose as
> you do (although my program is not yet finished).

Is it something we could and should add to Haskore?


"microtonal" -- I recently read this term here:
  http://www.math.tu-dresden.de/~mutabor/
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Ross Paterson
In reply to this post by Bertram Felgenhauer-2
Here's my attempt, also using the quicksort idea, but using two passes
instead of tying the knot:

> import Data.Set hiding (map)

First a binary search tree, with a lookup function:

> data Tree k v = Node (Tree k v) k v (Tree k v)

> get :: Ord a => a -> Tree a b -> b
> get a (Node l k v r) = case compare a k of
>         LT -> get a l
>         EQ -> v
>         GT -> get a r

There's no empty case: we'll ensure that we only search for keys that
are in the tree.

Now to make a tree of lists from the list of pairs:

> mkTree :: Ord a => [(a,b)] -> Tree a [b]
> mkTree [] = error "unused"
> mkTree ((a,b):abs) = Node l a (b:bs) r
>   where l = mkTree [(a',b') | (a',b') <- abs, a' < a]
>         r = mkTree [(a',b') | (a',b') <- abs, a' > a]
>         bs = [b' | (a',b') <- abs, a' == a]

It remains to extract from this tree the list of second elements
corresponding to the each distinct first element in the input list:

> splitStreams :: Ord a => [(a,b)] -> [(a,[b])]
> splitStreams abs = [(a, get a t) | a <- uniq (map fst abs)]
>   where t = mkTree abs

where uniq computes the list of unique elements of a list:

> uniq :: Ord a => [a] -> [a]
> uniq = u empty
>   where u s [] = []
>         u s (x:xs)
>           | member x s = u s xs
>           | otherwise  = x : u (insert x s) xs

There's no rebalancing, so it suffers from the same problems as
quicksort.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Bertram Felgenhauer-2
In reply to this post by Bertram Felgenhauer-2
I wrote:
> [1] Balancing can be done with the information in the blueprint, and
> mapping back is easily done by doing the transformation on the tree
> in reverse.

I should add that this possibility was the main reason for dealing with
blueprints at all. As Ross Paterson's solution shows, it's possible to
get simpler code without balancing the tree.

regards,

Bertram
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Bruno Oliveira-5
In reply to this post by Matthias Fischmann
Hello,

I think it is possible to do it in O(n) (if you change the representation
of the output stream).

Note that the solution is quite similar toTwan van Laarhoven,
and it should be trivial use "Map" instead of "Rel".

Here is my take on it:

The type of the output stream:

> type Rel a b = a -> [b]

Here are cons and nil:

> cons :: Eq a => (a,b) -> Rel a b -> Rel a b
> cons (x,y) l z
>    | x == z        = y : l x
>    | otherwise  = l z

> nil :: Rel a b
> nil _ = []

and a lookUp function:

> lookUp :: Rel a b -> a -> [b]
> lookUp = id

Finally the splitStreams algorithm:

> splitStreams :: Eq a => [(a,b)] -> Rel a b
> splitStreams = foldr cons nil

Here is a test with finite lists:

> fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]

and here are the console tests:

*General7> lookUp fin 1
"y"
*General7> lookUp fin 2
"w"
*General7> lookUp fin 3
"xz"

Now with infinite lists:

> inf = splitStreams (map (\x -> (0, x)) [1..])

and here a case where it terminates with infinite lists:

*General7> take 10 (lookUp inf 0)
[1,2,3,4,5,6,7,8,9,10]

Cheers,

Bruno Oliveira


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Optimization problem

Magnus Jonsson
In reply to this post by Bertram Felgenhauer-2
Thanks Bertran for your solution. I have difficulty understanding it so I
can't comment on it right now but I'll try to have a look at it. Do you
know of any article that explains this technique, starting from very
simple examples?

/Magnus

On Thu, 14 Sep 2006, Bertram Felgenhauer wrote:

> I wrote:
>> [1] Balancing can be done with the information in the blueprint, and
>> mapping back is easily done by doing the transformation on the tree
>> in reverse.
>
> I should add that this possibility was the main reason for dealing with
> blueprints at all. As Ross Paterson's solution shows, it's possible to
> get simpler code without balancing the tree.
>
> regards,
>
> Bertram
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Magnus Jonsson
In reply to this post by Bruno Oliveira-5
Thanks Bruno.

However I think this is still O(n*m). As far as I understand your code it
is a longwinded way to say "[b | (a,b) <- input, b ==
myChannelOfInterest]". This is fine if you are only interested in one
channel, and for that case I agree it's O(n), but if you are interested in
many different channels, it will take O(n*m). Here's why I think your
code is equivalent to using a filter/list-comprehension:

>> cons :: Eq a => (a,b) -> Rel a b -> Rel a b
>> cons (x,y) l z
>>    | x == z        = y : l x
>>    | otherwise  = l z

z is the channel that the user is interested in.
x is the channel of the current message in the list.
y is the message content.
l is a function for searching the rest of the list.

For each element of the list you create a function that compares the
requested channel to a (in (a,b)). If it's the same, you return that
element followed by a continued search down the list. If you don't find
it, you forward the request to the next function down the list. That's
exactly the same as what the filter function does.

It is possible I made a mistake somewhere and if I did, let me know.

/ Magnus

On Thu, 14 Sep 2006, Bruno Oliveira wrote:

> Hello,
>
> I think it is possible to do it in O(n) (if you change the representation
> of the output stream).
>
> Note that the solution is quite similar toTwan van Laarhoven,
> and it should be trivial use "Map" instead of "Rel".
>
> Here is my take on it:
>
> The type of the output stream:
>
>> type Rel a b = a -> [b]
>
> Here are cons and nil:
>
>> nil :: Rel a b
>> nil _ = []
>
> and a lookUp function:
>
>> lookUp :: Rel a b -> a -> [b]
>> lookUp = id
>
> Finally the splitStreams algorithm:
>
>> splitStreams :: Eq a => [(a,b)] -> Rel a b
>> splitStreams = foldr cons nil
>
> Here is a test with finite lists:
>
>> fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]
>
> and here are the console tests:
>
> *General7> lookUp fin 1
> "y"
> *General7> lookUp fin 2
> "w"
> *General7> lookUp fin 3
> "xz"
>
> Now with infinite lists:
>
>> inf = splitStreams (map (\x -> (0, x)) [1..])
>
> and here a case where it terminates with infinite lists:
>
> *General7> take 10 (lookUp inf 0)
> [1,2,3,4,5,6,7,8,9,10]
>
> Cheers,
>
> Bruno Oliveira
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Haskore microtonal support

Magnus Jonsson
In reply to this post by Henning Thielemann
On Thu, 14 Sep 2006, Henning Thielemann wrote:

>
> On Thu, 14 Sep 2006, Magnus Jonsson wrote:
>
>>
>> Now even more interestingly, my program also deals with music! :) I'm
>> generating microtonal midi files. I use it for very much the same purpose as
>> you do (although my program is not yet finished).
>
> Is it something we could and should add to Haskore?

If Haskore could support microtones that would make this world a slightly
better world for me. Here are the basic things you need to support
microtonal music:

- Pitch representations would have to be able to express any pitch.
   - One appealing approach is to represent a pitch directly as it's
frequency.
   - Probably the most useful representation though is a base pitch,
     say one of C,D,E,F,G,A,B, followed by a list of accidentals that
     modify the pitch. The user should be able to define his own base
     pitches and accidentals, in terms of cents or frequency ratios or
     something similar.
- Generating microtonal midi files requires that you add pitch-bend
messages before all notes. That restricts each midi channel to only being
able to play one note at a time. This is a big deficiency in the midi
protocol imo.

/ Magnus
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Optimization problem

Bertram Felgenhauer-2
In reply to this post by Magnus Jonsson
Magnus Jonsson wrote:
> Thanks Bertran for your solution. I have difficulty understanding it so I
> can't comment on it right now but I'll try to have a look at it. Do you
> know of any article that explains this technique, starting from very
> simple examples?

Not really. It's a result of thinking about lazy evaluation, and
especially lazy patterns (and let bindings) for some time. A wiki article
that helped me a lot to understand these is

  http://www.haskell.org/hawiki/TyingTheKnot

I'd like to point out the trustList function there which uses the idea
of encoding the structure of a term and its actual values in different
arguments, i.e. a blueprint.

HTH,

Bertram
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Bruno Oliveira-5
In reply to this post by Magnus Jonsson
Hello Magnus,

You are right. Silly me. I was thinking of Rel as a kind of Hashtable. In this case,
I think it should be possible to have O(n), since "cons" would only need constant
time.

Cheers,

Bruno

On Thu, 14 Sep 2006 13:11:05 -0400 (EDT), Magnus Jonsson wrote:

>Thanks Bruno.
>
>However I think this is still O(n*m). As far as I understand your code it
>is a longwinded way to say "[b | (a,b) <- input, b ==
>myChannelOfInterest]". This is fine if you are only interested in one
>channel, and for that case I agree it's O(n), but if you are interested in
>many different channels, it will take O(n*m). Here's why I think your
>code is equivalent to using a filter/list-comprehension:
>
>>> cons :: Eq a => (a,b) -> Rel a b -> Rel a b
>>> cons (x,y) l z
>>> | x == z = y : l x
>>> | otherwise = l z
>
>z is the channel that the user is interested in.
>x is the channel of the current message in the list.
>y is the message content.
>l is a function for searching the rest of the list.
>
>For each element of the list you create a function that compares the
>requested channel to a (in (a,b)). If it's the same, you return that
>element followed by a continued search down the list. If you don't find
>it, you forward the request to the next function down the list. That's
>exactly the same as what the filter function does.
>
>It is possible I made a mistake somewhere and if I did, let me know.
>
>/ Magnus
>
>On Thu, 14 Sep 2006, Bruno Oliveira wrote:
>
>> Hello,
>>
>> I think it is possible to do it in O(n) (if you change the representation
>> of the output stream).
>>
>> Note that the solution is quite similar toTwan van Laarhoven,
>> and it should be trivial use "Map" instead of "Rel".
>>
>> Here is my take on it:
>>
>> The type of the output stream:
>>
>>> type Rel a b = a -> [b]
>>
>> Here are cons and nil:
>>
>>> nil :: Rel a b
>>> nil _ = []
>>
>> and a lookUp function:
>>
>>> lookUp :: Rel a b -> a -> [b]
>>> lookUp = id
>>
>> Finally the splitStreams algorithm:
>>
>>> splitStreams :: Eq a => [(a,b)] -> Rel a b
>>> splitStreams = foldr cons nil
>>
>> Here is a test with finite lists:
>>
>>> fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]
>>
>> and here are the console tests:
>>
>> *General7> lookUp fin 1
>> "y"
>> *General7> lookUp fin 2
>> "w"
>> *General7> lookUp fin 3
>> "xz"
>>
>> Now with infinite lists:
>>
>>> inf = splitStreams (map (\x -> (0, x)) [1..])
>>
>> and here a case where it terminates with infinite lists:
>>
>> *General7> take 10 (lookUp inf 0)
>> [1,2,3,4,5,6,7,8,9,10]
>>
>> Cheers,
>>
>> Bruno Oliveira
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Heinrich Apfelmus
In reply to this post by Bertram Felgenhauer-2
Bertram Felgenhauer wrote:

>> splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])
>> splitSeq' bp []         = ([], map (const []) bp)
>> splitSeq' bp ((a,b):xs) = case lookup a bp bp of
>>     Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m)
>>     _      -> let (bp', m) = insert a bp m'
>>                   (l, m')  = splitSeq' bp' xs
>>                 in
>>                   ((a, b : (fromJust $ lookup a bp' m')) : l, m)
>
> splitSeq' takes a blueprint for a map with all keys seen so far, and
> a list tail. It returns the result list for all new keys, and a map
> (corresponding to the given blueprint) with the tails of the seen
> elements.
>
> The in the base case it just fills the blueprint with empty lists and
> returns to the caller.
>
> If a new element is seen, insert is used to create a new blueprint,
> including the new key a, which is passed to the recursive call of
> splitSeq'. The resulting map from that call is threaded back into
> insert, which gives us a new map without the a key which matches
> the structure of the original blueprint.

Very interesting! So the map with the seen tails matches the blueprint
and therefore throws away information (the future keys) from the future.
This is what effectively allows the key-tree structure to be rebalanced.
   If one would return the tails-map with all future keys, it would be
_|_ because the key-order in the tree depends on the future keys and
changes everytime when a new key is found.

I thought that there can only be a static solution, i.e. like the one
Ross Paterson presented where the structure (I mean the outermost
constructors) of the returned tree are determined before the future.
This obviously excludes rebalancing.

I found a static solution by trying to fit the key-tails pairs into an
infinite tails-map which is filled "first comes first":
  map ! 1 := (first key seen, tails)
  map ! 2 := (second key seen, tails)
By keeping another key-position-map around which records where each key
has been inserted, so that the future knows where to put the tails. The
point is that the structure of tails-map is known before the future
comes as its keys are just 1,2,3,... and so on.

It remains to construct such an infinite random access list, but this is
turns out to be even easier than finite random access lists (when using
non-uniform recursion from Okasaki's chapter 10) :)

> data Imp a = Imp a (Imp (a,a)) deriving (Show)
>
> instance Functor Imp where
>    fmap h ~(Imp x xs) = Imp (h x) (fmap (\(x,y) -> (h x, h y)) xs)
>
> update :: (a -> a) -> Position -> Imp a -> Imp a
> update f 1 ~(Imp x xs) = Imp (f x) xs
> update f n ~(Imp x xs) = Imp x $ update f2 (n `div` 2) xs
>    where
>    f2 ~(y,z) = if even n then (f y, z) else (y, f z)

Note that we can use irrefutable patterns without hesitation as there is
only one constructor.

Folding over an infinite thing is strange but here we are

> fold :: (a -> b -> b) -> Imp a -> b
> fold f ~(Imp x xs) = f x (fold f2 xs)
>    where
>    f2 (x,y) z = f x (f y z)

It's not so strange anymore when we realize that this can be used to
convert it into an infinite list

> toList = fold (:)

The implementation of fromList is fun as well, so I won't tell it. As
fold and unfold can be defined generically for Mu f where f is a
functor, i wonder how to deal with it in the case of non-uniform recursion.


For splitStreams, the key-position-map is needed in both directions, so
we quickly define a bijective map

> type BiMap a b = (Map.Map a b, Map.Map b a)
>
> switch :: BiMap a b -> BiMap b a
> switch (x,y) = (y,x)

with the usual operations (empty, member, size etc.) omitted here.

Now comes splitStreams itself:

> splitStreams :: Ord a => [(a,b)] -> [(a,[b])]
> splitStreams xs =
>    takeWhile (not . null . snd) $ toList $ splitStreams' empty xs
>
> splitStreams' :: Ord a => BiMap a Position -> [(a,b)] -> Imp (a,[b])
> splitStreams' bimap [] =
>    fmap (\i -> (findWithDefault undefined i $ switch bimap,[])) $
>        fromList [1..]
> splitStreams' bimap ((a,b):xs) =
>    update fun pos $ splitStreams' bimap' xs
>    where
>    fun ~(_,bs) = (a,b:bs)
>    sz          = size bimap
>    pos         = findWithDefault (sz+1) a bimap
>    bimap'      =
>       (if member a bimap then id else insert a (sz+1)) bimap

Note that update actually generates fresh constructors, so the structure
of our tails-Imp is not really static. But this is no problem as the
form of the constructors is completely known: there is only one.

Regards,
apfelmus

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Ross Paterson
In reply to this post by Bertram Felgenhauer-2
On Thu, Sep 14, 2006 at 05:22:05PM +0200, Bertram Felgenhauer wrote:
> [much subtle code]
> We can now build the splitStream function, using the following helper
> function:
>
> > splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])

This works for infinite lists if there is no balancing, but if insert does
balancing, the top of the map will not be available until the last key
is seen, so splitSeq' could only be used for finite chunks.  Then you'll
need a way to put the partial answers together.  It might be possible
to amortize the cost of that for an appropriate choice of chunk length.
It would also cost some laziness: the chunked version would work for
infinite lists, but wouldn't produce all the output for a partial list.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
123