Appending to a list

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Appending to a list

Francesco Bochicchio
Hi all,

in my haskell exercises I often build list by appending values at the end.
But I read somewhere that in haskell this is inefficient since there is no
'pointer to the tail of the list'. I've also seen some example of recursive
functions which build the list tail-first and then in the base case of the
recursion returns the reverse of the accumulated results, counting on the
fact
that in haskell 'reverse list'  just means 'consume the list from the tail'.

Just out of curiosity (I have no need to speed up my little programs with
which I try to teach myself
some haskell): is this a consolidated pattern?   And are append really
expensive?

Ciao
-------
FB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090207/e53cdee4/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Appending to a list

Cory Knapp
In general, you want to be building a list "from the front". Consider
the list constructor-- x:xs is a list, when x is an element, and xs is a
list; on the other hand, xs:x will give you a type error. I think the
lisp syntax is a bit more elucidating; in lisp, you use "cons" to create
a pair e.g. (cons 1 2), and to create a list, you make a pair whose last
element is empty, and cons elements to the front. E.g. (cons 2 '())
gives you the list '(2), which is lisp for [2], and (cons 1 (cons 2
(cons 3 (cons 4 '() ) ) ) ) gives you '(1 2 3 4)-- lisp for [1,2,3,4]).
In order to add the end of the list, you need to unravel the whole list,
(replacing the '() at the end with "(cons 5 '())" ). On the other hand,
adding to the front requires you to just cons another value to it.

Haskell works the same way, but it uses an infix operator [1,2,3,4] =
1:2:3:4:[] = 1 : (2 : (3 : (4 : [] ) ) ). So appending to the list is
expensive, while adding to the front just requires x : list.

Obviously, if you're actually concatenating two lists, you need to
filter through one of them anyway, but if you're just putting an element
on, it's better to put it on the front.

Normally, this is practically accomplished by making recursive
procedures which work towards the end of the list. For example, map will
apply a function to the first element and then map to the rest:
map f (x:xs) = (f x) : (map f xs)
map _ [] = []

So, you use your list as a stack-- pulling everything off, and then
placing it all back on with the function applied.

Does that help, or did I miss the point?

Cheers,
Cory Knapp

Francesco Bochicchio wrote:

> Hi all,
>
> in my haskell exercises I often build list by appending values at the end.
> But I read somewhere that in haskell this is inefficient since there
> is no
> 'pointer to the tail of the list'. I've also seen some example of
> recursive
> functions which build the list tail-first and then in the base case of the
> recursion returns the reverse of the accumulated results, counting on
> the fact
> that in haskell 'reverse list'  just means 'consume the list from the
> tail'.
>
> Just out of curiosity (I have no need to speed up my little programs
> with which I try to teach myself
> some haskell): is this a consolidated pattern?   And are append really
> expensive?
>
> Ciao
> -------
> FB
> ------------------------------------------------------------------------
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners
>  

Reply | Threaded
Open this post in threaded view
|

Appending to a list

Francesco Bochicchio
2009/2/7 Cory Knapp <[hidden email]>

<nice explanation on : operator - always useful>

>
> Does that help, or did I miss the point?
>
> Ok, I'll try to be more clear with an example: the following function -
which surely can be written in better way - takes a list and a predicate and
builds two lists. a list of lists of consecutive elements that satisfy the
predicate and a list of separators, i.e. of elements that does not satisfy
the predicate.
That is: groups  odd  [1,3,2,5,9,6] ->   [[1,3],[5,9]],  [2,6]
The function uses another function, groupWhile, which works like takeWhile
but returns also the rest of the list.

Now, from the way the function works, it shoud build the two result lists by
appending new elements to them.
But, as you said, appending is expensive in haskell, so instead I build the
lists using (:), which gives me the list
in reverse order, and then I use the 'final step' of the function - the base
case of the recursion - to reverse the
result. If I understand lazyness, this should be a low-cost operation
because it does not actual build a reversed
list, but  only make the list to be 'consumed' from the tail.

Now, this trick is not mine: I read it somewhere on internet (I
can'tremember where), so I was asking if it is classified among the 'neat
tricks' or among the 'ugly hacks' :-)

Here is the code I was referring to:


groups :: (a->Bool)-> [a] -> ([[a]],[a])
groups f lst = groups_ f ([],[]) lst where
    groups_ f (gs,seps) [] = (reverse gs, reverse seps)
    groups_ f (gs,seps) [a] = if (f a)
                              then ( reverse ( [a]:gs), reverse seps )
                              else ( reverse gs , reverse (a:seps) )
    groups_ f (gs, seps) lst = groups_ f (g:gs, r:seps) est
        where
          (g, (r:est)) = groupWhile f lst

Ciao
-------
FB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20090208/6396d453/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Appending to a list

Adrian Neumann
In reply to this post by Francesco Bochicchio
Yes, appending to the tail of a list is an expensive operation.
You may ask yourself why there is no pointer to the end of a list.  
That's because the end is in general not known (e.g. if the list is  
produced by a computation, or if it's infinite).
However if you know up front that your list will be finite and you  
don't rely on lazyness you may want to use a Data.Sequence instead

 > http://www.haskell.org/ghc/docs/latest/html/libraries/containers/ 
Data-Sequence.html

Cheers,
Adrian

Am 07.02.2009 um 21:29 schrieb Francesco Bochicchio:

> Hi all,
>
> in my haskell exercises I often build list by appending values at  
> the end.
> But I read somewhere that in haskell this is inefficient since  
> there is no
> 'pointer to the tail of the list'. I've also seen some example of  
> recursive
> functions which build the list tail-first and then in the base case  
> of the
> recursion returns the reverse of the accumulated results, counting  
> on the fact
> that in haskell 'reverse list'  just means 'consume the list from  
> the tail'.
>
> Just out of curiosity (I have no need to speed up my little  
> programs with which I try to teach myself
> some haskell): is this a consolidated pattern?   And are append  
> really expensive?
>
> Ciao
> -------
> FB
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: Signierter Teil der Nachricht
Url : http://www.haskell.org/pipermail/beginners/attachments/20090208/0e9cb17f/PGP.bin
Reply | Threaded
Open this post in threaded view
|

Re: Appending to a list

Heinrich Apfelmus
In reply to this post by Francesco Bochicchio
Francesco Bochicchio wrote:
> But, as you said, appending is expensive in haskell

We can say how expensive. Namely, fully evaluating

   xs ++ ys

takes O(length xs) time. Always appending an element at the end will
lead to something like

   (..((([1] ++ [2]) ++ [3]) ++ [4]) ++ ..) ++ [n]

which will take O(n^2) time instead of linear time. This is probably
best demonstrated with the following two implementations of  reverse

   reverse1 []     = []
   reverse1 (x:xs) = reverse1 xs ++ [x]

   reverse2 xs     = go [] xs
       where
       go ys []     = ys
       go ys (x:xs) = go (x:ys) xs

The first runs in quadratic time while second runs in linear time.

> so instead I build the lists using (:), which gives me the list in
> reverse order, and then I use the 'final step' of the function - the
> base case of the recursion - to reverse the result. If I understand
> laziness, this should be a low-cost operation because it does not
> actual build a reversed list, but only make the list to be
> 'consumed' from the tail.

No,  reverse  has nothing to do with laziness and still costs O(n) time.
It's just that building the list in linear time and then reversing it in
linear time is cheaper than building the list in quadratic time.

> Here is the code I was referring to:
>
> groups :: (a->Bool)-> [a] -> ([[a]],[a])
> groups f lst = groups_ f ([],[]) lst where
>     groups_ f (gs,seps) [] = (reverse gs, reverse seps)
>     groups_ f (gs,seps) [a] = if (f a)
>                               then ( reverse ( [a]:gs), reverse seps )
>                               else ( reverse gs , reverse (a:seps) )
>     groups_ f (gs, seps) lst = groups_ f (g:gs, r:seps) est
>         where
>           (g, (r:est)) = groupWhile f lst

While using  reverse  instead of appending at the end of the list is a
good idea, not building the list in reverse at all is much better.


First, let me restate your code as

    groups p xs = go ([],[]) xs
        where
        go (gs,seps) xs = let (g,rest) = span p xs in
            case rest of
                r:est -> go (g:gs, r:seps) est
                []    -> (reverse (g:gs), reverse seps)

The  groupWhile  function can be found in the Prelude, it's called  span
. This function  groups  differs from yours, for instance we have

  groups odd [2,2,2] == ([[],[],[],[]],[2,2,2])
  groups odd [1,2]   == ([[1],[]],[2])
  groups odd [1,1]   == ([[1,1]],[])

By the way, your code crashes in the last case.


Now, let's reformulate it so that the result won't be built in reverse.
The key is to eliminate the unnecessary accumulating parameter and work
with the result of the recursive call to  groups  directly:

    groups p xs = let (g,rest) = span p xs in
        case rest of
            r:est -> let (gs,seps) = groups p est in (g:gs,r:seps)
            []    -> ([g], [])

I'd say that this is the canonical implementation.


In case you wrote this function to split strings, have a look at

  http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split


Regards,
apfelmus

--
http://apfelmus.nfshost.com

Reply | Threaded
Open this post in threaded view
|

Re: Appending to a list

Heinrich Apfelmus
Francesco Bochicchio wrote:

> Heinrich Apfelmus wrote:
>>
>> No,  reverse  has nothing to do with laziness and still costs O(n)
>> time. It's just that building the list in linear time and then
>> reversing it in linear time is cheaper than building the list in
>> quadratic time.
>
> I see. Thanks for the information. I keep thinking to haskell lists
> as they are 'generators', but this is not always the case ... and I
> realize now that there is no way to reverse the list without
> expanding it first.

Well, they are 'generators' from an operational point of view, in that
elements are generated on demand. But other than that, they are just a tree

     :
    / \
   1   :
      / \
     2   :
        / \
       3  ...

>> Now, let's reformulate it so that the result won't be built in
>> reverse. The key is to eliminate the unnecessary accumulating
>> parameter and work with the result of the recursive call to  groups
>> directly:
>>
>>    groups p xs = let (g,rest) = span p xs in
>>        case rest of
>>            r:est -> let (gs,seps) = groups p est in (g:gs,r:seps)
>>            []    -> ([g], [])
>>
>> I'd say that this is the canonical implementation.
>
> The reason I used accumulators is that I try to teach myself to
> always write tail-recursive functions. I have been bitten by stack
> exhaustion in one of my first exercises (working on large amount of
> data), so I thought that non tail-recursive functions should be
> always avoided. But using accumulators often leads to appending to
> lists, so one has to find the balance between the two things...

Due to lazy evaluation, something that looks tail recursive is not
necessarily tail recursive at all; I strongly advise to unlearn your
habit. For example, the version of  groups  that uses an accumulator is
less efficient that the one without.

In general, a lazy approach to performance is best: just use the
simplest possible implementation and postpone performance considerations
to when the program actually takes ages to compute the result.


Concerning stack exhaustion and large amounts of data, there is one very
common pattern worth knowing, namely foldr vs. foldl' , see also

  http://en.wikibooks.org/wiki/Haskell/Performance_Introduction#Space
  http://book.realworldhaskell.org/read/functional-programming.html

Other than that, there is no way around understanding Haskell's
evaluation model properly.


Regards,
apfelmus

--
http://apfelmus.nfshost.com