storing highly shared data structures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

storing highly shared data structures

Christian Maeder
Dear Haskell Experts,

for storing highly shared data structures we use so called Annotated
Terms (shortly ATerms, details below).

   http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm

In contrast to the Binary (or GhcBinary) class we compute the sharing,
which saves a lot of space for data types that keep redundant
information.

With this we can store some of our data structures (of course only
non-cyclic and finite ones) in a few KBs that need MBs if stored
without sharing (as when using the Binary or the Show/Read
classes).

So far so good. The problem remaining is that an object is _traversed_
as if being unshared and thus the _time_ for the ATermTable construction
becomes too long for us.

GHC's internal data structures (on the heap) are in many cases shared,
by pointer references. I.e. if I add a (single) symbol table to every
symbol that I use, then the symbol table will not be copied, but only a
reference added to my symbol.

How can I detect this sharing in order to avoid traversing the very
same symbol table for every symbol?

I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an
object I look up its address and check if is was traversed before (if
not I traverse it and store it in my map for future lookups).

1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage
collection, heap compaction and what not could happen).

2.) A single "Map (Ptr ()) ShATerm" does not work! It seems that
sometimes objects are shared that have different types (as tests
revealed). This seems obvious for newtypes but also happens (I think)
without newtype declarations.

However, shared ATerms are always different for different types,
because the corresponding data constructors are different.

So, I'm now thinking if I should try using the type of my data as key
as well, i.e. using "Map (TypeRep, Ptr ()) ShATerm" to avoid duplicate
traversals.

Would this be worth the effort? What other possibilities could I try?

An obvious solution is to avoid the redundancy in the first place
(i.e. simply don't store the symbol table with every symbol), but that
would require a major change of our sources, and would the code become
clearer? (Many functions would need more arguments.)

Finally, _reading in_ shared ATerms is fast, since ghc seems to exploit
the sharing given by the injective ATermTable.

Thanks for any help
Christian

More details:

   Brand, M.G.J. van den, H.A. de Jong, P. Klint, and P.A. Olivier (2000).
   "Efficient Annotated Terms." Software -- Practice & Experience
30:259--291.

   (The link "ATerm Library" http://www.cwi.nl/projects/MetaEnv/haterm/
   under http://www.haskell.org/libraries/#compilation looks outdated
   to me. We have extra source code that - at the moment - cannot be
   distributed separately)

Via an extension to DrIFT,
http://repetae.net/john/computer/haskell/DrIFT, we generate class
instances (similar to the Binary class) for our types that we want to
store.

Our ATerms are (basically) constructor terms of the form:

data ShATerm = ShAAppl String [Int] deriving (Eq, Ord)

(The annotation part of ATerms is omitted here since we don't use it)

The Int-list contains the arguments as indices into an ATermTable
being an injective "IntMap ShATerm" with update and lookup
functions:

addATerm :: ShATerm -> ATermTable -> (ATermTable, Int)
getATerm :: Int -> ATermTable -> ShATerm

The class (that DriFT derives instances for) looks as follows:

class ShATermConvertible t where
     toShATerm       :: ATermTable -> t -> (ATermTable, Int)
     fromShATerm     :: (ATermTable, Int) -> t

The automatically derived instance of i.e. a user-defined List data
type "data List a = Nil | Cons a (List a)" is:

instance ShATermConvertible a => ShATermConvertible (List a) where
     toShATerm att0 Nil =
         addATerm (ShAAppl "Nil" []) att0
     toShATerm att0 (Cons a b) =
         case toShATerm att0 a of { (att1, a') ->
         case toShATerm att1 b of { (att2, b') ->
         addATerm (ShAAppl "Cons" [a', b']) att2 }}
     fromShATerm (att, i) =
         case getATerm i att of
             ShAAppl "Nil" [] -> Nil
             ShAAppl "Cons" [a, b] ->
                     case fromShATerm (att, a) of { a' ->
                     case fromShATerm (att, b) of { b' ->
                     Cons a' b' }}
             _ -> error "ShATermConvertible List"

(Internally the ATermTable also keeps an inverse "Map ShATerm Int" to
support the efficient implementation of addATerm.)

We can use the "baffle" tool to convert our files from shared
(TAF) to unshared ATerm format and vice versa.
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Einar Karttunen
On 22.12 14:43, Christian Maeder wrote:
> How can I detect this sharing in order to avoid traversing the very
> same symbol table for every symbol?

By using System.Mem.StableName
SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
so you can look at the source for pointers.

> I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an
> object I look up its address and check if is was traversed before (if
> not I traverse it and store it in my map for future lookups).

GHC can move the objects in memory.

> 2.) A single "Map (Ptr ()) ShATerm" does not work! It seems that
> sometimes objects are shared that have different types (as tests
> revealed). This seems obvious for newtypes but also happens (I think)
> without newtype declarations.

Yes, this is quite evil. For example the empty list can be shared
across type boundaries. Also phantom types can share the same
address. In addition you have to worry about libraries using
unsafeCoerce#

> So, I'm now thinking if I should try using the type of my data as key
> as well, i.e. using "Map (TypeRep, Ptr ()) ShATerm" to avoid duplicate
> traversals.

TypeRep is not Ord iirc, which makes this harder. Of course you can
show it and then use that.

- Einar

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

RE: storing highly shared data structures

Simon Marlow
In reply to this post by Christian Maeder
On 22 December 2005 13:43, Christian Maeder wrote:

> for storing highly shared data structures we use so called Annotated
> Terms (shortly ATerms, details below).
>
>    http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
>
> In contrast to the Binary (or GhcBinary) class we compute the sharing,
> which saves a lot of space for data types that keep redundant
> information.
>
> With this we can store some of our data structures (of course only
> non-cyclic and finite ones) in a few KBs that need MBs if stored
> without sharing (as when using the Binary or the Show/Read
> classes).
>
> So far so good. The problem remaining is that an object is _traversed_
> as if being unshared and thus the _time_ for the ATermTable
> construction becomes too long for us.
>
> GHC's internal data structures (on the heap) are in many cases shared,
> by pointer references. I.e. if I add a (single) symbol table to every
> symbol that I use, then the symbol table will not be copied, but only
> a reference added to my symbol.
>
> How can I detect this sharing in order to avoid traversing the very
> same symbol table for every symbol?
>
> I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an
> object I look up its address and check if is was traversed before (if
> not I traverse it and store it in my map for future lookups).
>
> 1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage
> collection, heap compaction and what not could happen).

Right - Ptr isn't the right thing here, because GC will move objects
around.  That's why we have StablePtr and StableName.  In fact, what you
really want here is the pointer-equality memo table implementation in
the Memo module (package util).  This is scheduled to be removed in 6.6,
but it will be available as a Cabal package.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Christian Maeder
In reply to this post by Einar Karttunen
Einar Karttunen wrote:
> On 22.12 14:43, Christian Maeder wrote:
>> How can I detect this sharing in order to avoid traversing the very
>> same symbol table for every symbol?
>
> By using System.Mem.StableName
> SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
> so you can look at the source for pointers.

Thanks for your hints. What a pity that "makeStableName" goes to IO (why
would this break referential transparency?) and that "StableName a" and
"TypeRep" have no Ord instances.

Facing RealWorld (and HashTable),
Christian
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Ben Franksen-2
On Thursday 29 December 2005 18:22, Christian Maeder wrote:

> Einar Karttunen wrote:
> > On 22.12 14:43, Christian Maeder wrote:
> >> How can I detect this sharing in order to avoid traversing the
> >> very same symbol table for every symbol?
> >
> > By using System.Mem.StableName
> > SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
> > so you can look at the source for pointers.
>
> Thanks for your hints. What a pity that "makeStableName" goes to IO
> (why would this break referential transparency?) and that "StableName
> a" and "TypeRep" have no Ord instances.

This has been asked before (by me too). The reason is that (due to
efficient implementation issues) the order relation between TypeReps is
not guaranteed to be the same for every program run. I guess the
situation is similar with makeStableName, which is in the IO monad
because the result may be different between program runs even given the
same value as input.

Ben
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Christian Maeder
In reply to this post by Simon Marlow
Simon Marlow wrote:
> Right - Ptr isn't the right thing here, because GC will move objects
> around.  That's why we have StablePtr and StableName.

may it be that makeStableName is expensive? (or it is my additional Map?)

My old version is faster, because the version with makeStableName does
very much GC.

Christian

1. with makeStableName (and a Map):

2,447,401,824 bytes allocated in the heap
703,294,688 bytes copied during GC
  50,780,688 bytes maximum residency (24 sample(s))

        9328 collections in generation 0 (129.88s)
          24 collections in generation 1 (  4.10s)

          99 Mb total memory in use

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time   27.28s  ( 28.91s elapsed)
   GC    time  133.98s  (140.08s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time  161.26s  (168.99s elapsed)

   %GC time      83.1%  (82.9% elapsed)

   Alloc rate    89,714,143 bytes per MUT second

   Productivity  16.9% of total user, 16.1% of total elapsed


2. without makeStableName:

7,560,158,340 bytes allocated in the heap
578,736,496 bytes copied during GC
  44,307,024 bytes maximum residency (25 sample(s))

       28832 collections in generation 0 (  6.83s)
          25 collections in generation 1 (  3.20s)

         102 Mb total memory in use

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time  139.71s  (146.74s elapsed)
   GC    time   10.03s  ( 10.94s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time  149.74s  (157.68s elapsed)

   %GC time       6.7%  (6.9% elapsed)

   Alloc rate    54,113,222 bytes per MUT second

   Productivity  93.3% of total user, 88.6% of total elapsed
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Simon Marlow
Christian Maeder wrote:

> Simon Marlow wrote:
>
>> Right - Ptr isn't the right thing here, because GC will move objects
>> around.  That's why we have StablePtr and StableName.
>
>
> may it be that makeStableName is expensive? (or it is my additional Map?)
>
> My old version is faster, because the version with makeStableName does
> very much GC.
>
> Christian
>
> 1. with makeStableName (and a Map):
>
> 2,447,401,824 bytes allocated in the heap
> 703,294,688 bytes copied during GC
>  50,780,688 bytes maximum residency (24 sample(s))
>
>        9328 collections in generation 0 (129.88s)
>          24 collections in generation 1 (  4.10s)
>
>          99 Mb total memory in use
>
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time   27.28s  ( 28.91s elapsed)
>   GC    time  133.98s  (140.08s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time  161.26s  (168.99s elapsed)
>
>   %GC time      83.1%  (82.9% elapsed)
>
>   Alloc rate    89,714,143 bytes per MUT second
>
>   Productivity  16.9% of total user, 16.1% of total elapsed

Interesting... this could mean that updating the stable name table on
each GC is very costly.  This deserves more investigation, but I'm
afraid it's not high on the priority list for me.  I can help if anyone
else wants to look into it.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re[2]: storing highly shared data structures

Bulat Ziganshin
In reply to this post by Christian Maeder
Hello Christian,

Friday, January 06, 2006, 9:43:39 PM, you wrote:

CM> My old version is faster, because the version with makeStableName does
CM> very much GC.

CM>    MUT   time   27.28s  ( 28.91s elapsed)
CM>    GC    time  133.98s  (140.08s elapsed)

try to add infamous "+RTS -A10m" switch ;)

it's the third time i give this advice during last weeks :)  i added
it to the ghc wiki faq, opening up new section "Optimization issues"


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Christian Maeder
Bulat Ziganshin wrote:
> CM> My old version is faster, because the version with makeStableName does
> CM> very much GC.
>
> CM>    MUT   time   27.28s  ( 28.91s elapsed)
> CM>    GC    time  133.98s  (140.08s elapsed)
>
> try to add infamous "+RTS -A10m" switch ;)

You saved my day, thank you Bulat!

Without that flag:

   MUT   time   24.30s  ( 24.76s elapsed)
   GC    time  131.25s  (140.01s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time  155.55s  (164.77s elapsed)

And with it:

   MUT   time   23.86s  ( 24.86s elapsed)
   GC    time   11.03s  ( 11.68s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time   34.89s  ( 36.54s elapsed)

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Christian Maeder
In reply to this post by Bulat Ziganshin
Bulat Ziganshin wrote:
> try to add infamous "+RTS -A10m" switch ;)

Maybe -H300m is more famous?

   MUT   time   24.92s  ( 29.79s elapsed)
   GC    time    6.32s  (  7.67s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time   31.24s  ( 37.46s elapsed)

Christian
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Simon Marlow
In reply to this post by Christian Maeder
Christian Maeder wrote:

> Bulat Ziganshin wrote:
>
>> CM> My old version is faster, because the version with makeStableName
>> does
>> CM> very much GC.
>>
>> CM>    MUT   time   27.28s  ( 28.91s elapsed)
>> CM>    GC    time  133.98s  (140.08s elapsed)
>>
>> try to add infamous "+RTS -A10m" switch ;)
>
>
> You saved my day, thank you Bulat!
>
> Without that flag:
>
>   MUT   time   24.30s  ( 24.76s elapsed)
>   GC    time  131.25s  (140.01s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time  155.55s  (164.77s elapsed)
>
> And with it:
>
>   MUT   time   23.86s  ( 24.86s elapsed)
>   GC    time   11.03s  ( 11.68s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time   34.89s  ( 36.54s elapsed)

The real problem seems to be that minor GCs are taking too long.  Having
said that, you can usually reduce GC overhead with large -A or -H options.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Jan-Willem Maessen

On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:

> Christian Maeder wrote:
>> Bulat Ziganshin wrote:
>>> CM> My old version is faster, because the version with  
>>> makeStableName does
>>> CM> very much GC.
>>>
>>> CM>    MUT   time   27.28s  ( 28.91s elapsed)
>>> CM>    GC    time  133.98s  (140.08s elapsed)
>>>
>>> try to add infamous "+RTS -A10m" switch ;)
>> You saved my day, thank you Bulat!
>> Without that flag:
>>   MUT   time   24.30s  ( 24.76s elapsed)
>>   GC    time  131.25s  (140.01s elapsed)
>>   EXIT  time    0.00s  (  0.00s elapsed)
>>   Total time  155.55s  (164.77s elapsed)
>> And with it:
>>   MUT   time   23.86s  ( 24.86s elapsed)
>>   GC    time   11.03s  ( 11.68s elapsed)
>>   EXIT  time    0.00s  (  0.00s elapsed)
>>   Total time   34.89s  ( 36.54s elapsed)
>
> The real problem seems to be that minor GCs are taking too long.  
> Having said that, you can usually reduce GC overhead with large -A  
> or -H options.

Is the full table of stable names being traversed at every minor GC?  
If so, it might be worth somehow segregating the nursery stable  
names.  I bet this complicates management a bunch, though, since  
right now there's only one table to look things up in.

-Jan

>
> Cheers,
> Simon
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Simon Marlow
Jan-Willem Maessen wrote:

>
> On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:
>
>> Christian Maeder wrote:
>>
>>> Bulat Ziganshin wrote:
>>>
>>>> CM> My old version is faster, because the version with  
>>>> makeStableName does
>>>> CM> very much GC.
>>>>
>>>> CM>    MUT   time   27.28s  ( 28.91s elapsed)
>>>> CM>    GC    time  133.98s  (140.08s elapsed)
>>>>
>>>> try to add infamous "+RTS -A10m" switch ;)
>>>
>>> You saved my day, thank you Bulat!
>>> Without that flag:
>>>   MUT   time   24.30s  ( 24.76s elapsed)
>>>   GC    time  131.25s  (140.01s elapsed)
>>>   EXIT  time    0.00s  (  0.00s elapsed)
>>>   Total time  155.55s  (164.77s elapsed)
>>> And with it:
>>>   MUT   time   23.86s  ( 24.86s elapsed)
>>>   GC    time   11.03s  ( 11.68s elapsed)
>>>   EXIT  time    0.00s  (  0.00s elapsed)
>>>   Total time   34.89s  ( 36.54s elapsed)
>>
>>
>> The real problem seems to be that minor GCs are taking too long.  
>> Having said that, you can usually reduce GC overhead with large -A  or
>> -H options.
>
>
> Is the full table of stable names being traversed at every minor GC?  
> If so, it might be worth somehow segregating the nursery stable  names.  
> I bet this complicates management a bunch, though, since  right now
> there's only one table to look things up in.

It's traversed, but not re-hashed.  So it shouldn't be a bottleneck,
unless the table is really huge.  I agree that separating the gen 0
stable names into a separate table would be the right fix, though.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re[2]: storing highly shared data structures

Bulat Ziganshin
In reply to this post by Simon Marlow
Hello Simon,

Tuesday, January 10, 2006, 12:26:30 PM, you wrote:

>>> CM> My old version is faster, because the version with makeStableName
>>> does
>>> CM> very much GC.
>>>
>>> CM>    MUT   time   27.28s  ( 28.91s elapsed)
>>> CM>    GC    time  133.98s  (140.08s elapsed)
>>>
>>> try to add infamous "+RTS -A10m" switch ;)

SM> The real problem seems to be that minor GCs are taking too long.  Having
SM> said that, you can usually reduce GC overhead with large -A or -H options.

it is the same problem as with scanning large IOArrays on each GC. in
this case, i think, table of stable names scanned each time and
therefore program works so slow. if ghc will dynamically increase
"+RTS -A" area when there are a large IOArrays/stable names table,
this would help improve speed significantly. also, i want to remind my
old suggestion to temporarily disable all GCs, or better to
temporarily change "-A" area at the programs request. That will help
programs like Joels one and my own, where large data structures (say,
5 or 20 mb) are created, used and then completely discarded

--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Simon Marlow
Bulat Ziganshin wrote:

> Hello Simon,
>
> Tuesday, January 10, 2006, 12:26:30 PM, you wrote:
>
>
>>>>CM> My old version is faster, because the version with makeStableName
>>>>does
>>>>CM> very much GC.
>>>>
>>>>CM>    MUT   time   27.28s  ( 28.91s elapsed)
>>>>CM>    GC    time  133.98s  (140.08s elapsed)
>>>>
>>>>try to add infamous "+RTS -A10m" switch ;)
>
>
> SM> The real problem seems to be that minor GCs are taking too long.  Having
> SM> said that, you can usually reduce GC overhead with large -A or -H options.
>
> it is the same problem as with scanning large IOArrays on each GC.

Actually I think it's a different problem, with the same workaround.

> in this case, i think, table of stable names scanned each time and
> therefore program works so slow. if ghc will dynamically increase
> "+RTS -A" area when there are a large IOArrays/stable names table,
> this would help improve speed significantly.

I'm not keen on this, because its a hack and doesn't address the real
cause of the problem.  We're working on addressing the array issue, see
this ticket I created today:

   http://cvs.haskell.org/trac/ghc/ticket/650

> also, i want to remind my
> old suggestion to temporarily disable all GCs, or better to
> temporarily change "-A" area at the programs request. That will help
> programs like Joels one and my own, where large data structures (say,
> 5 or 20 mb) are created, used and then completely discarded

You can change the allocation area size from within a program quite
easily.  Write a little C function to assign to
RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and
call it from Haskell; the next time GC runs it will allocate the larger
nursery.  Please try this and let me know if it works.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Christian Maeder
Simon Marlow wrote:
> You can change the allocation area size from within a program quite
> easily.  Write a little C function to assign to
> RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and
> call it from Haskell; the next time GC runs it will allocate the larger
> nursery.  Please try this and let me know if it works.

Can someone supply a code snippet for me (as I don't speak C and always
hoped to avoid C by using haskell)?

Christian
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Esa Ilari Vuokko
On 1/11/06, Christian Maeder <[hidden email]> wrote:
> Simon Marlow wrote:
> > You can change the allocation area size from within a program quite
> > easily.  Write a little C function to assign to
> > RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and
> > call it from Haskell; the next time GC runs it will allocate the larger
> > nursery.  Please try this and let me know if it works.
>
> Can someone supply a code snippet for me (as I don't speak C and always
> hoped to avoid C by using haskell)?
Hi

Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects.

C-bits:
#include "Rts.h"
#include "RtsFlags.h"
void SetMinAllocArea(int a) {
    RtsFlags.GcFlags.minAllocAreaSize=a;
}

Haskell bits:
import Foreign.C.Types
foreign import ccall "SetMinAllocArea" setMinAllocArea :: CInt -> IO ()

It requires compiling c-bits without --make, something like
ghc -c bits.c -o bits.o
and including it on final linking or --make step (or ar/ld, similary)
ghc bits.o --make main

HTH,
Esa
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: storing highly shared data structures

Simon Marlow
Esa Ilari Vuokko wrote:

> On 1/11/06, Christian Maeder <[hidden email]> wrote:
>
>>Simon Marlow wrote:
>>
>>>You can change the allocation area size from within a program quite
>>>easily.  Write a little C function to assign to
>>>RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and
>>>call it from Haskell; the next time GC runs it will allocate the larger
>>>nursery.  Please try this and let me know if it works.
>>
>>Can someone supply a code snippet for me (as I don't speak C and always
>>hoped to avoid C by using haskell)?
>
> Hi
>
> Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects.
>
> C-bits:
> #include "Rts.h"
> #include "RtsFlags.h"
> void SetMinAllocArea(int a) {
>     RtsFlags.GcFlags.minAllocAreaSize=a;
> }
>
> Haskell bits:
> import Foreign.C.Types
> foreign import ccall "SetMinAllocArea" setMinAllocArea :: CInt -> IO ()
>
> It requires compiling c-bits without --make, something like
> ghc -c bits.c -o bits.o
> and including it on final linking or --make step (or ar/ld, similary)
> ghc bits.o --make main

I forgot to mention, minAllocAreaSize is in units of BLOCK_SIZE, so
divide bytes by BLOCK_SIZE before setting it.  You can get BLOCK_SIZE
from Rts.h.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

summary of storing highly shared data structures

Christian Maeder
In reply to this post by Christian Maeder
I wrote:
> However, shared ATerms are always different for different types,
> because the corresponding data constructors are different.

This isn't quite true. The shared ATerm for the empty list is the same
for all instances.

> Finally, _reading in_ shared ATerms is fast, since ghc seems to exploit
> the sharing given by the injective ATermTable.

This was also wrong. The ATermTable as "IntMap ShATerm" was not enough,
because all objects were constructed anew from the same or different
ShATerms. I had to setup an additional "IntMap Dynamic" for creating
shared data structures. It was no fun to make everything "Typeable" (and
having no Ord instance for TypeRep).

Simon suggested to use StableName or StablePtr. I did not use StablePtr,
because it lacks a hashing function!

I wasn't happy with StableName either, because
  1) it required IO
  2) System.Mem.StableName is marked "non-portable"
     (without a reason in the haddock documentation)
  3) There was no coercion function "StableName a -> StableName b"

I coerced all my objects with makeStableName and GHC.Prim.unsafeCoerce#
to "StableName ()" (I became non-portable, anywy.)

Now I could use "Data.HashTable (StableName ()) Value" since I was in
the IO monad anyway. (The Value is basically an Int, but here I want to
avoid the potential confusion with the key.)

For the sake of interest I also tried "IntMap [(StableName (), Value)]"
using hashStableName as IntMap key. (An IntMap is also used in
http://cs.helsinki.fi/u/ekarttun/SerTH that Einar Karttunen pointed to)

It turned out that the IntMap was not slower than the HashTable (What is
HashTable good for, then? Why is it so slow?)

Since I used hashStableName to get a key for my IntMap, I see no reason
why StableName should not have an Ord instance, in order to avoid the
whole hashing with lists (although a "Data.Map (StableName ()) Value"
may not be faster).

Summerizing, the user-friendliness (at least) of System.Mem.StableName
should be improved (also with respect to the "+RTS -A10m" business for
performing makeStableName).

Cheers Christian
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: summary of storing highly shared data structures

Bulat Ziganshin
Hello Christian,

Wednesday, January 11, 2006, 5:13:25 PM, you wrote:

CM> It turned out that the IntMap was not slower than the HashTable (What is
CM> HashTable good for, then? Why is it so slow?)

see the http://cvs.haskell.org/trac/ghc/ticket/650

and the attached letter. "-A10m" should help in this case


--
Best regards,
 Bulat                            mailto:[hidden email]
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
12