I have created an ugly Haskell program..

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Philip Scott-2
.. and I am positive there must be a way of beautifying it, but I am
struggling. I bet there is just some lovely way of making this all shrink to
three lines..

So here's the problem. I have two lists of tuples: (timestamp, value)

What I would like to do in do a kind of 'zip' on two of these lists to make a
list of (timestamp, (value1, value2)) with the following rules:

- If the timestamps are equal it's easy - make your new element an move on
- If one of the lists has a timestamp that the other doesn't, repeat an old
value from the other list
- If we don't have an old value yet, then don't create an element in the new
list.

e.g. if I ran my algorithm on these two lists

d1 = [ (1,"a"), (2,"b"),  (3,"c")           ]
d2 = [          (2,"b'"),          (4,"d'") ]  

I would like to get

result = [ (2, (b,b')), (3, (c,b')), (4, (c,d')) ]  

e.g. there was no data in d2 for our first element so we skipped it.

Okay, so here is my code.. It works, but makes me feel a bit dirty. To explain
my nomenclature 't' is 'timestamp of', 'v' is 'value of'. vx' and vy' are the
'old' values from the previous iteration in case a repeat is needed. They are
Maybes because at the beginning there may be no old value.

d1 = [ (1,"a"), (2,"b"),  (3,"c")           ]
d2 = [          (2,"b'"),          (4,"d'") ]  

t (x,y) = x
v (x,y) = y

js vx' vy' (x:xs) (y:ys)
 | t x == t y  = ( (t x), (v x, v y) )  : js (Just (v x)) (Just (v y)) xs ys
 | t x < t y   =
     maybe (js (Just (v x)) Nothing xs (y:ys))
           (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs (y:ys)))
           vy'
 | t x > t y   =
     maybe (js Nothing  (Just (v y)) (x:xs) ys)
           (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) (x:xs) ys))
           vx'
js vx' vy' (x:xs) []   =
    maybe []
          (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs []))
          vy'
js vx' vy' [] (y:ys)   =
    maybe []
          (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y))  [] ys ))
          vx'
js _ _ [] []   = []

You call it with the first two arguments as Nothing to kick it off (I have a
trivial wrapper function to do this)

It works fine:

> :t js
js
  :: (Ord t) =>
     Maybe a1 -> Maybe a -> [(t, a1)] -> [(t, a)] -> [(t, (a1, a))]

> js Nothing Nothing d1 d2
[(2,("b","b'")),(3,("c","b'")),(4,("c","d'"))]

But it just feels gross. Any advice on how to tame this beast would be greatly
appreciated :)

All the best,

Philip
Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Michael Mossey

Function you seek is 'specialZip' below. 'fluff' and 'decapitate' are
helpers. Not extensively tested.

-- Given a list of ints that "should" all have values, fill in missing
-- values using the "last" value as default.
fluff :: String -> [Int] -> [(Int,String)] -> [(Int,String)]
fluff last (i:is) pss@((t,s):ps)
       | i == t = (i,s) : fluff s is ps
       | i < t  = (i,last) : fluff last is pss
fluff last is [] = zip is (repeat last)


-- Given two lists, remove enough from the front to get to two equal keys.
decapitate [] _ = ([],[])
decapitate _ [] = ([],[])
decapitate xss@((tx,_):xs) yss@((ty,_):ys)
     | tx < ty  = decapitate xs yss
     | ty < tx  = decapitate xss ys
     | ty == tx = (xss,yss)


specialZip d1 d2 =
     let (dd1,dd2) = decapitate d1 d2
         -- build set of every key that should be in final list
         s = S.toAscList . S.fromList $ (map fst dd1) ++ (map fst dd2)
     in case (dd1,dd2) of
          ([],[]) -> []
          (xs1,xs2) ->
              let f1 = fluff "" s xs1 -- use this set to fluff
                  f2 = fluff "" s xs2 -- each list
              -- so final answer can be a simple zipWith
              in zipWith (\(t1,s1) (t2,s2) -> (t1,(s1,s2))) f1 f2

Philip Scott wrote:

> .. and I am positive there must be a way of beautifying it, but I am
> struggling. I bet there is just some lovely way of making this all shrink to
> three lines..
>
> So here's the problem. I have two lists of tuples: (timestamp, value)
>
> What I would like to do in do a kind of 'zip' on two of these lists to make a
> list of (timestamp, (value1, value2)) with the following rules:
>
> - If the timestamps are equal it's easy - make your new element an move on
> - If one of the lists has a timestamp that the other doesn't, repeat an old
> value from the other list
> - If we don't have an old value yet, then don't create an element in the new
> list.
>
> e.g. if I ran my algorithm on these two lists
>
> d1 = [ (1,"a"), (2,"b"),  (3,"c")           ]
> d2 = [          (2,"b'"),          (4,"d'") ]  
>
> I would like to get
>
> result = [ (2, (b,b')), (3, (c,b')), (4, (c,d')) ]  
>
> e.g. there was no data in d2 for our first element so we skipped it.
>
> Okay, so here is my code.. It works, but makes me feel a bit dirty. To explain
> my nomenclature 't' is 'timestamp of', 'v' is 'value of'. vx' and vy' are the
> 'old' values from the previous iteration in case a repeat is needed. They are
> Maybes because at the beginning there may be no old value.
>
> d1 = [ (1,"a"), (2,"b"),  (3,"c")           ]
> d2 = [          (2,"b'"),          (4,"d'") ]  
>
> t (x,y) = x
> v (x,y) = y
>
> js vx' vy' (x:xs) (y:ys)
>  | t x == t y  = ( (t x), (v x, v y) )  : js (Just (v x)) (Just (v y)) xs ys
>  | t x < t y   =
>      maybe (js (Just (v x)) Nothing xs (y:ys))
>            (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs (y:ys)))
>            vy'
>  | t x > t y   =
>      maybe (js Nothing  (Just (v y)) (x:xs) ys)
>            (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) (x:xs) ys))
>            vx'
> js vx' vy' (x:xs) []   =
>     maybe []
>           (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs []))
>           vy'
> js vx' vy' [] (y:ys)   =
>     maybe []
>           (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y))  [] ys ))
>           vx'
> js _ _ [] []   = []
>
> You call it with the first two arguments as Nothing to kick it off (I have a
> trivial wrapper function to do this)
>
> It works fine:
>
>> :t js
> js
>   :: (Ord t) =>
>      Maybe a1 -> Maybe a -> [(t, a1)] -> [(t, a)] -> [(t, (a1, a))]
>
>> js Nothing Nothing d1 d2
> [(2,("b","b'")),(3,("c","b'")),(4,("c","d'"))]
>
> But it just feels gross. Any advice on how to tame this beast would be greatly
> appreciated :)
>
> All the best,
>
> Philip
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Philip Scott-2
Michael Mossey wrote:
>
> Function you seek is 'specialZip' below. 'fluff' and 'decapitate' are
> helpers. Not extensively tested.

Thanks Michael, that looks much better than mine :)


Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Michael Mossey
Another solution here. The inspiration is to try to use Data.Map's
fromListWith to do the main work. Notice that you can "decapitate" the
useless head of each list with the single line

        dropWhile (not . both) . M.toAscList $ neated


import Control.Arrow
import qualified Data.Map as M

data Combine v = LeftOnly v
                | RightOnly v
                | BothOfThem v v
                  deriving (Show)

cmb :: Combine a -> Combine a -> Combine a
cmb (LeftOnly x) (RightOnly y) = BothOfThem x y
cmb (RightOnly y) (LeftOnly x) = BothOfThem x y

both (_,(BothOfThem _ _ )) = True
both _ = False

chain _ last2 ((t,LeftOnly v):xs) = (t,(v,last2)) : chain v last2 xs
chain last1 _ ((t,RightOnly v):xs) = (t,(last1,v)) : chain last1 v xs
chain _ _     ((t,BothOfThem v w):xs) = (t,(v,w)) : chain v w xs
chain _ _     [] = []

specialZip d1 d2 =
     let neated = M.fromListWith cmb $ map (second LeftOnly) d1
                  ++ map (second RightOnly) d2
         dr = dropWhile (not . both) . M.toAscList $ neated
     in chain "" "" dr


Philip Scott wrote:
> Michael Mossey wrote:
>>
>> Function you seek is 'specialZip' below. 'fluff' and 'decapitate' are
>> helpers. Not extensively tested.
>
> Thanks Michael, that looks much better than mine :)
>
>
Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Brent Yorgey-2
In reply to this post by Philip Scott-2
On Sun, Nov 01, 2009 at 11:27:42PM +0000, Philip Scott wrote:

> .. and I am positive there must be a way of beautifying it, but I am
> struggling. I bet there is just some lovely way of making this all shrink to
> three lines..
>
> So here's the problem. I have two lists of tuples: (timestamp, value)
>
> What I would like to do in do a kind of 'zip' on two of these lists to make a
> list of (timestamp, (value1, value2)) with the following rules:
>
> - If the timestamps are equal it's easy - make your new element an move on
> - If one of the lists has a timestamp that the other doesn't, repeat an old
> value from the other list
> - If we don't have an old value yet, then don't create an element in the new
> list.

Ask yourself: What Would Conal Do (WWCD)?  Conal Elliott is always
trying to get people to think about the semantic essence of their
problems, so let's try it.

What are we REALLY trying to do here?  What are those lists of tuples,
REALLY?  Well, it seems to me that the lists of tuples are really just
representing *functions* on some totally ordered domain.  The
list-of-pairs representation takes advantage of the fact that these
functions tend to be constant on whole intervals of the domain; we
only need a tuple to mark the *beginning* of a constant interval.  The
fact that we want to take a value from the last old timestamp when we
don't have a certain timestamp in the list reflects the fact that the
function takes on that value over the whole *interval* from the
timestamp when it occurred to whenever the next timestamp is.

So, let's try converting these lists of pairs to actual functions:


  asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b)
  asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is


Simple -- we just scan through the list looking for the right
interval.

Now the combining function is just a matter of converting the lists to
functions, and applying those functions to each index we want in the
output list (discarding any Nothings).


  combine :: (Ord a) => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
  combine is js = catMaybes . flip map ixs $ \a ->
                    fmap ((,) a) (liftA2 (,) (asFunc is a) (asFunc js a))
    where ixs = sort . nub $ map fst is ++ map fst js


Done!  

Now, you might object that this is much more inefficient than the
other solutions put forth.  That is very true.  Converting to a
function with 'asFunc' means that we do a linear-time lookup in the
list every time we call the function, so this is O(n^2) overall
instead of O(n).  Building the list of indices ixs in the code above
is also O(n^2) instead of O(n).  However, I still find it very helpful
to think about the essence of the problem like this: elegant yet
inefficient code is a much better starting place than the other way
around!  From here there are several possibilities: maybe this version
is efficient enough, if you'll only be working with small lists.  Or
you can also try to optimize, taking advantage of the fact that we
always call the functions built by asFunc with a sequence of strictly
increasing inputs.  I might make a sort of "iterator" object which
acts like a function (a -> Maybe b), but keeps some extra state so
that as long as you call it with strictly increasing values of a, you
get back a Maybe b (and a new iterator) in constant time.  Of course,
this is really what the other solutions are doing: but thinking about
it this way has helped to structure the solution in a (hopefully) more
clear and elegant way.

-Brent

Reply | Threaded
Open this post in threaded view
|

I have created an ugly Haskell program..

Michael Mossey
Thanks, Brent, for this way of looking at it. If you want n log n behavior
you could write asFunc to use a Map for lookup.

-Mike

Brent Yorgey wrote:

>
> Ask yourself: What Would Conal Do (WWCD)?  Conal Elliott is always
> trying to get people to think about the semantic essence of their
> problems, so let's try it.
>
> What are we REALLY trying to do here?  What are those lists of tuples,
> REALLY?  Well, it seems to me that the lists of tuples are really just
> representing *functions* on some totally ordered domain.  The
> list-of-pairs representation takes advantage of the fact that these
> functions tend to be constant on whole intervals of the domain; we
> only need a tuple to mark the *beginning* of a constant interval.  The
> fact that we want to take a value from the last old timestamp when we
> don't have a certain timestamp in the list reflects the fact that the
> function takes on that value over the whole *interval* from the
> timestamp when it occurred to whenever the next timestamp is.
>
> So, let's try converting these lists of pairs to actual functions:
>
>
>   asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b)
>   asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is
>
>
> Simple -- we just scan through the list looking for the right
> interval.
>

Reply | Threaded
Open this post in threaded view
|

Re: I have created an ugly Haskell program..

Heinrich Apfelmus
In reply to this post by Brent Yorgey-2
Brent Yorgey wrote:

> Ask yourself: What Would Conal Do (WWCD)?  Conal Elliott is always
> trying to get people to think about the semantic essence of their
> problems, so let's try it.
>
> What are we REALLY trying to do here?  What are those lists of tuples,
> REALLY?  Well, it seems to me that the lists of tuples are really just
> representing *functions* on some totally ordered domain.
> [...]
>
> So, let's try converting these lists of pairs to actual functions:
>
>
>   asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b)
>   asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is
>
> [...]
>
> Now, you might object that this is much more inefficient than the
> other solutions put forth.  That is very true. [...]
>
> However, I still find it very helpful to think about the essence
> of the problem like this: elegant yet inefficient code is a much
> better starting place than the other way around! [...]
>
> You can also try to optimize, taking advantage of the fact that we
> always call the functions built by asFunc with a sequence of strictly
> increasing inputs.

I am with Brent and Conal here. Now, to continue, ask yourself: What
Would Conal Do Next (WWCDN)?

What are we really trying to do here? What is this function, really,
considering that we are only evaluating it at a strictly increasing
sequence of inputs? Well, it seems to me that it is some special kind of
function, best captured as an *abstract data type*.


In particular, the function is something which I will call a "time
series". In other words, the input is to be thought of as time.

    data Time t = Moment t | Infinity
                deriving (Eq,Ord,Show)

The inclusion of infinity will turn out to be very convenient.

Now, the time series is a function that has a value  x1  in the distant
past, until a time  t1  where it begins to have the value  x2 , again
until a time  t2  where it switches to  x3  and so on, until a value  xn
 that is kept until infinity. In Haskell, this looks like this

  function t
     | -Infinity <= t && t < t1       = x1
     |        t1 <= t && t < t2       = x2
     |        t2 <= t && t < t3       = x3
     | ...
     |        t1 <= t && t < Infinity = xn

and pictorially, something like this:

                                                 ____ xn _____
                ____ x2 ____                    |
               |            |____ x3 ____ ...   |
  _____ x1 ____|

 -Inf          t1            t2           ...   tn          Inf


Of course, we can implement this abstract data type with a list of pairs
 (tk,xk)

    newtype TimeSeries t a = TS { unTS :: [(a,Time t)] }
                           deriving (Show)

and our goal is to equip this data type with a few natural operations
that can be used to implement Philip's zip-like function.


The first two operations are

    progenitor :: TimeSeries t a -> a
    progenitor = fst . head . unTS

which returns the value from the distant past and

    beginning :: TimeSeries t a -> Time t
    beginning = snd . head . unTS

which returns the first point in time when the function changes its
value. These correspond to the operation  head  on lists.


The next operation is called  `forgetTo` t  and will throw away all
values and changes before and including a given time  t .

    forgetTo :: Ord t => TimeSeries t a -> Time t -> TimeSeries t a
    forgetTo (TS xs) Infinity = TS [last xs]
    forgetTo (TS xs) t        = TS $ dropWhile ((<= t) . snd) xs

This roughly corresponds to  tail , but takes advantage of the time
being continuous.


Last but not least, we need a way to create a time series

    forever :: a -> TimeSeries t a
    forever x = TS [(x,Infinity)]

and we need to add values to a time series, which can be done with an
operation called  prepend  that adds a new  beginning  and  replaces the
 progenitor .

        -- We assume that  t < beginning xs
    prepend :: a -> Time t -> TimeSeries t a -> TimeSeries t a
    prepend x Infinity _       = TS [(x,Infinity)]
    prepend x t        (TS xs) = TS $ (x,t) : xs

These operations correspond to [] and (:) for lists.


The key about these operations is that they have a description /
intuition that is *independent* of the implementation of times series.
At no point do we need to know how exactly  TimeSeries  is implemented
to understand what these five operations do.

Now, Philip's desired zip-like function is straightforward to implement:

    zipSeries :: Ord t => TimeSeries t a -> TimeSeries t b
                          -> TimeSeries t (a,b)
    zipSeries xs ys = prepend (progenitor xs, progenitor ys) t $
        zipSeries (xs `forgetTo` t) (ys `forgetTo` t)
        where t = min (beginning xs) (beginning ys)

and you may want to convince yourself of its correctness by appealing to
the intuition behind time series.


Regards,
apfelmus

--
http://apfelmus.nfshost.com