Slow IO

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow IO

Daniel Fischer-4
Hello all,
Now I have an IO-problem, too.
SPOJ problem 41 asks basically to determine whether a directed graph allows a
path that uses every edge exactly once. The data from which the graphs are to
be constructed are contained in a (huge) file, every item (number of test
cases, size of test case, edges) on a single line. I've created my own test
case file (much smaller, but already 217 MB, 2.2 million lines) to check my
programme's performance. Now profiling reveals that over 90% of time and
allocation are consumed by reading the file (line by line, reading the whole
file as a lazy String and then chopping that to pieces is rather worse).

So how do I quickly
1. read an Integer n from the file,
2. read the next n lines and
3. pass the first and last letter of those to the function that builds and
checks the graph?

When that is achieved, I could see whether my algorithm is good.

Thanks for any help,
Daniel
--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Jeff P-2
Hello,

  Try Don Stewart's ByteString library
(http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than
the standard Haskell IO and now has lazy.

-Jeff

On 9/9/06, Daniel Fischer <[hidden email]> wrote:

> Hello all,
> Now I have an IO-problem, too.
> SPOJ problem 41 asks basically to determine whether a directed graph allows a
> path that uses every edge exactly once. The data from which the graphs are to
> be constructed are contained in a (huge) file, every item (number of test
> cases, size of test case, edges) on a single line. I've created my own test
> case file (much smaller, but already 217 MB, 2.2 million lines) to check my
> programme's performance. Now profiling reveals that over 90% of time and
> allocation are consumed by reading the file (line by line, reading the whole
> file as a lazy String and then chopping that to pieces is rather worse).
>
> So how do I quickly
> 1. read an Integer n from the file,
> 2. read the next n lines and
> 3. pass the first and last letter of those to the function that builds and
> checks the graph?
>
> When that is achieved, I could see whether my algorithm is good.
>
> Thanks for any help,
> Daniel
> --
>
> "In My Egotistical Opinion, most people's C programs should be
> indented six feet downward and covered with dirt."
>         -- Blair P. Houghton
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Daniel Fischer-4
Am Sonntag, 10. September 2006 02:29 schrieben Sie:
> Hello,
>
>   Try Don Stewart's ByteString library
> (http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than
> the standard Haskell IO and now has lazy.
>
> -Jeff

Yay, that's really an improvement!
Depending on the size of the file/graph data, using ByteString.Lazy.Char8
reduces the run time by ranging from 'just a little' to a factor of over 30
-- and some data that made the programme segfault before now run in a couple
of seconds.
Thanks Bryan, Simon, David, Don & Duncan.

However, the programme still spends the overwhelming part of the time doing
IO, and I've started thinking about it.
The problem spec states that the input file contains about 500 test cases,
each given by between 1 and 100,000 lines, each line containing a single word
of between 2 and 1000 letters.
So the file should be about 12.5G on average.
A time limit of 7s is given.
So even if we just counted newlines, we would have to scan 1,700 million
(1.7*10^9) chars per second.
Could any ordinary computer of today do that, using whatever language?

Cheers,
Daniel

>
> On 9/9/06, Daniel Fischer <[hidden email]> wrote:
> > Hello all,
> > Now I have an IO-problem, too.
> > SPOJ problem 41 asks basically to determine whether a directed graph
> > allows a path that uses every edge exactly once. The data from which the
> > graphs are to be constructed are contained in a (huge) file, every item
> > (number of test cases, size of test case, edges) on a single line. I've
> > created my own test case file (much smaller, but already 217 MB, 2.2
> > million lines) to check my programme's performance. Now profiling reveals
> > that over 90% of time and allocation are consumed by reading the file
> > (line by line, reading the whole file as a lazy String and then chopping
> > that to pieces is rather worse).
> >
> > So how do I quickly
> > 1. read an Integer n from the file,
> > 2. read the next n lines and
> > 3. pass the first and last letter of those to the function that builds
> > and checks the graph?
> >
> > When that is achieved, I could see whether my algorithm is good.
> >
> > Thanks for any help,
> > Daniel
> > --
> >

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Slow IO

Bulat Ziganshin-2
Hello Daniel,

Monday, September 11, 2006, 6:05:38 PM, you wrote:

> The problem spec states that the input file contains about 500 test cases,
> each given by between 1 and 100,000 lines, each line containing a single word
> of between 2 and 1000 letters.
> So the file should be about 12.5G on average.

are you mean arithmetic or geometric average? ;)



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow IO

Tom Phoenix
In reply to this post by Daniel Fischer-4
On 9/11/06, Daniel Fischer <[hidden email]> wrote:

> The problem spec states that the input file contains about 500 test cases,
> each given by between 1 and 100,000 lines, each line containing a single word
> of between 2 and 1000 letters.
> So the file should be about 12.5G on average.

I don't think that that necessarily follows. Although I've never seen
the input file, of course, I imagine that many cases are fairly small,
but designed to test the accuracy of your algorithm. A few are large
(in one way or another) to test the extremes of your algorithm. But
the overall size of the input file is probably much, much smaller than
that estimate. (Maybe 1MB? Maybe 10MB?)

> A time limit of 7s is given.

That's CPU time, and thus not including I/O, right? I couldn't find
the answer on the SPOJ site. Their FAQ is a forum, but I don't see it
there:

    http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b5682b3

In any case, the problem can't be too large; the top twenty programs
all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as
fast as the C and C++ programs, that would be 3.5s -- twice as fast as
it needs to be.

    http://www.spoj.pl/ranks/WORDS1/

Of course, you don't have access to these other programs for
comparison; but I hope that this gives you a better idea of the size
(and manageability) of the task.

Good luck with it!

--Tom Phoenix
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Daniel Fischer-4
In reply to this post by Daniel Fischer-4
Am Montag, 11. September 2006 17:44 schrieben Sie:
> Daniel Fischer wrote:
> > >   Try Don Stewart's ByteString library
> >
> > -- and some data that made the programme segfault before now run in a
> > couple of seconds.
>
> But that shouldn't happen!  Segfaults aren't performance problems, but
> errors.  So your program contained a bug (lazyness where it isn't
> useful, I suspect), and ByString is hiding it.

Maybe I've misused the word segfault.
What happened is:
The programme consumed more and more memory (according to top),
kswapd started to have a higher CPU-percentage than my programme,
programme died, system yelling 'Speicherzugriffsfehler', top displays
'kswapd<defunct>'.
I believe that means my programme demanded more memory than I have available
(only 256MB RAM + 800MB swap). Is that a segfault or what is the correct
term?

That is probably due to (apart from the stupidity of my IO-code) the large
overhead of Haskell lists. So the chunk of the file which easily fits into my
RAM in ByteString form is too large as a list of ordinary Strings.
However the problem might be too little lazyness, because if I explicitly read
the file line by line, memory usage remains low enough -- but ByteString is
*much* faster anyway.

>
> > So even if we just counted newlines, we would have to scan 1,700 million
> > (1.7*10^9) chars per second.
> > Could any ordinary computer of today do that, using whatever language?
>
> That rate should be no problem for a 2GHz machine.  However, a 200MHz 64
> bit wide bus won't deliver the data fast enough and it is 50x as much as
> the best hard disks could deliver in optimal circumstances.  I guess,
> most of the test cases are a lot smaller than your guesstimate.
>

I suppose so, too, but not knowing the test data, I assumed a bad case
(uniform distribution of line-lengths and test-case-size in the specified
range).

>
> Udo.


Bulat:
> are you mean arithmetic or geometric average? ;)
I meant 'expected value'.
If X_i are independent random variables uniformly distributed on [0 .. k],
Y is a random variable (independent from the X_i) uniformly distributed on
[1 .. n] and Z is the sum of the first Y of the X_i, then the expected value
of Z is (n+1)*k/4.
So we might call that a weighted arithmetic average, I suppose.

Cheers,
Daniel

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Daniel Fischer-4
In reply to this post by Tom Phoenix
Am Montag, 11. September 2006 18:22 schrieben Sie:
> On 9/11/06, Daniel Fischer <[hidden email]> wrote:
> > The problem spec states that the input file contains about 500 test
> > cases, each given by between 1 and 100,000 lines, each line containing a
> > single word of between 2 and 1000 letters.
> > So the file should be about 12.5G on average.
>
> I don't think that that necessarily follows. Although I've never seen
Not necessarily of course. But as the problem contains a warning about large
input/output data, I assumed the worst reasonable case (uniform
distribution).

> the input file, of course, I imagine that many cases are fairly small,
> but designed to test the accuracy of your algorithm. A few are large
> (in one way or another) to test the extremes of your algorithm. But
> the overall size of the input file is probably much, much smaller than
> that estimate. (Maybe 1MB? Maybe 10MB?)

That'd be peanuts with ByteString, 1MB even without.
>
> > A time limit of 7s is given.
>
> That's CPU time, and thus not including I/O, right? I couldn't find

Doesn't IO use the CPU? But seriously, I would have thought that's
what 'time' lists under user and that includes IO-time as far as I can tell.

> the answer on the SPOJ site. Their FAQ is a forum, but I don't see it
> there:
>
>    
> http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b56
>82b3
>
> In any case, the problem can't be too large; the top twenty programs
> all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as
> fast as the C and C++ programs, that would be 3.5s -- twice as fast as
> it needs to be.
>
>     http://www.spoj.pl/ranks/WORDS1/
>
> Of course, you don't have access to these other programs for
> comparison; but I hope that this gives you a better idea of the size
> (and manageability) of the task.
>
> Good luck with it!
>
> --Tom Phoenix

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Jared Updike
In reply to this post by Daniel Fischer-4
> Maybe I've misused the word segfault.
> What happened is:
> The programme consumed more and more memory (according to top),
> kswapd started to have a higher CPU-percentage than my programme,
> programme died, system yelling 'Speicherzugriffsfehler', top displays
> 'kswapd<defunct>'.
> I believe that means my programme demanded more memory than I have available
> (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct
> term?

I thought a segfault was when a program reads or writes to protected
memory (for example, out of bounds on an array, or dereferencing a
null pointer, etc. things mostly avoided in Haskell). Luckily there
are lots of fun ways to crash computers! I've heard your "heap
overflow" called "exploding" because of how the process rapdily
expands its memory usage, filling all available space, causing system
collapse.

  Jared.
--
http://www.updike.org/~jared/
reverse ")-:"
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Udo Stenzel
In reply to this post by Daniel Fischer-4
Daniel Fischer wrote:

> The programme consumed more and more memory (according to top),
> kswapd started to have a higher CPU-percentage than my programme,
> programme died, system yelling 'Speicherzugriffsfehler', top displays
> 'kswapd<defunct>'.
> I believe that means my programme demanded more memory than I have available
> (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct
> term?
>
> That is probably due to (apart from the stupidity of my IO-code) the large
> overhead of Haskell lists.
Most certainly not.  I'm pretty sure this is to a bug in your code.
Something retains a data structure which is actually unneeded.  Probably
a case of "foldl" where "foldl'" should be used or a "try" in Parsec
code where it should be left out or a lot of "updateWiths" to a Map,
etc.  Or it could be a bad choice of data structure.  I bet, it's the
map you're using to represent the graph (which you don't even need to
represent at all, btw).


> So the chunk of the file which easily fits into my
> RAM in ByteString form is too large as a list of ordinary Strings.

The chunk of file should never need to fit into RAM.  If that's a
problem, you also forgot to prime a crucial "foldl".


Udo.
--
"Proof by analogy is fraud." -- Bjarne Stroustrup

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Daniel Fischer-4
Am Dienstag, 12. September 2006 22:26 schrieben Sie:

> Daniel Fischer wrote:
> > The programme consumed more and more memory (according to top),
> > kswapd started to have a higher CPU-percentage than my programme,
> > programme died, system yelling 'Speicherzugriffsfehler', top displays
> > 'kswapd<defunct>'.
> > I believe that means my programme demanded more memory than I have
> > available (only 256MB RAM + 800MB swap). Is that a segfault or what is
> > the correct term?
> >
> > That is probably due to (apart from the stupidity of my IO-code) the
> > large overhead of Haskell lists.
>
> Most certainly not.  I'm pretty sure this is to a bug in your code.
> Something retains a data structure which is actually unneeded.  Probably

Apparently. And my money is on a load of lines from the file (of which I need
only the first and last Char).

> a case of "foldl" where "foldl'" should be used or a "try" in Parsec
> code where it should be left out or a lot of "updateWiths" to a Map,
> etc.  Or it could be a bad choice of data structure.  I bet, it's the
> map you're using to represent the graph (which you don't even need to
> represent at all, btw).

No foldl nor parsec around. I represent the graph as a

UArray (Char,Char) Int

(I've switched to Int for the index type, too, when tuning the code), so that
shouldn't use much memory (array size is 676).
The array is built via accumArray, I hope that's sufficiently efficient
(though now I use unsafeAccumArrayUArray, that's faster).

How could I solve the problem without representing the graph in some way?
Possibly that could be done more efficiently than I do it, but I can't imagine
how to do it without representing the graph in some data structure.
>
> > So the chunk of the file which easily fits into my
> > RAM in ByteString form is too large as a list of ordinary Strings.
>
> The chunk of file should never need to fit into RAM.  If that's a
> problem, you also forgot to prime a crucial "foldl".
>

Forgive the stupid question, but where if not RAM would the chunk currently
processed reside?

>
> Udo.

Cheers,
Daniel

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Donald Bruce Stewart
daniel.is.fischer:

> Am Dienstag, 12. September 2006 22:26 schrieben Sie:
> > Daniel Fischer wrote:
> > > The programme consumed more and more memory (according to top),
> > > kswapd started to have a higher CPU-percentage than my programme,
> > > programme died, system yelling 'Speicherzugriffsfehler', top displays
> > > 'kswapd<defunct>'.
> > > I believe that means my programme demanded more memory than I have
> > > available (only 256MB RAM + 800MB swap). Is that a segfault or what is
> > > the correct term?
> > >
> > > That is probably due to (apart from the stupidity of my IO-code) the
> > > large overhead of Haskell lists.
> >
> > Most certainly not.  I'm pretty sure this is to a bug in your code.
> > Something retains a data structure which is actually unneeded.  Probably
>
> Apparently. And my money is on a load of lines from the file (of which I need
> only the first and last Char).
>
> > a case of "foldl" where "foldl'" should be used or a "try" in Parsec
> > code where it should be left out or a lot of "updateWiths" to a Map,
> > etc.  Or it could be a bad choice of data structure.  I bet, it's the
> > map you're using to represent the graph (which you don't even need to
> > represent at all, btw).
>
> No foldl nor parsec around. I represent the graph as a
>
> UArray (Char,Char) Int
>
> (I've switched to Int for the index type, too, when tuning the code), so that
> shouldn't use much memory (array size is 676).
> The array is built via accumArray, I hope that's sufficiently efficient
> (though now I use unsafeAccumArrayUArray, that's faster).
>
> How could I solve the problem without representing the graph in some way?
> Possibly that could be done more efficiently than I do it, but I can't imagine
> how to do it without representing the graph in some data structure.
> >
> > > So the chunk of the file which easily fits into my
> > > RAM in ByteString form is too large as a list of ordinary Strings.
> >
> > The chunk of file should never need to fit into RAM.  If that's a
> > problem, you also forgot to prime a crucial "foldl".
> >
>
> Forgive the stupid question, but where if not RAM would the chunk currently
> processed reside?

I agree. Some problems simply require you to hold large strings in
memory. And for those, [Char] conks out around 5-10M (try reversing a
10M [Char]).

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Ketil Malde-3
In reply to this post by Daniel Fischer-4
Daniel Fischer <[hidden email]> writes:

> Maybe I've misused the word segfault.

I think so.  A segfault is the operating-system complaining about an
illegal memory access.  If you get them from Haskell, it is likely a
bug in the compiler or run-time system (or you were using unsafeAt, or
FFI).

> The programme consumed more and more memory (according to top),
> kswapd started to have a higher CPU-percentage than my programme,
> programme died, system yelling 'Speicherzugriffsfehler', top displays
> 'kswapd<defunct>'.

I find that swapping the GHC heap is not productive, so I tend to
limit memory to available RAM.  You can do that either by limiting
available memory to process in the shell (ulimit or limit), or by
specifying RTS options to the haskell program (typically +RTS -Mxxx,
where xxx is 80-90% of physical RAM).

>From the GHC docs (6.4.2):

   -Msize

    [Default: unlimited] Set the maximum heap size to size bytes. The

I thought the default was set according to limits?

    heap normally grows and shrinks according to the memory
    requirements of the program. The only reason for having this
    option is to stop the heap growing without bound and filling up
    all the available swap space, which at the least will result in
    the program being summarily killed by the operating system.

In my experience, a program which thrashes severely without heap
limits can run fine if you just limit the heap.  So it's not as much
an issue of 'filling up swap' vs. 'killed by the OS', but 'takes
forever, making the system unresponsive in the process' vs. 'tries
hard to complete in a timely fashion, or halts with an error'.  I much
prefer the latter.

> However the problem might be too little lazyness, because if I explicitly read
> the file line by line, memory usage remains low enough -- but ByteString is
> *much* faster anyway.

Right.  You should still try to consume the file lazily, and make sure
the data can get discarded and GC'ed when you have no further use for
it.  But a String is something like 8 or 12 bytes per character, a
ByteString gets you down to 1.

-k
--
If I haven't seen further, it is by standing in the footprints of giants

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Udo Stenzel
Ketil Malde wrote:
> Daniel Fischer <[hidden email]> writes:
>
> > Maybe I've misused the word segfault.
>
> I think so.  A segfault is the operating-system complaining about an
> illegal memory access.  If you get them from Haskell, it is likely a
> bug in the compiler or run-time system (or you were using unsafeAt, or
> FFI).

Far simpler:  This is really a segfault, and it's because of a
misfeature of Linux called "memory overcommitment".  When physical
memory runs out, Linux happily hands out more to applications requesting
it, in the vain hope that at least some of it is never accessed.
Therefore, malloc() is always successful, but when the memory is finally
accessed, it suddenly turns out that there isn't anything to access,
which results in a segfault.  No amount of error checking can prevent
that and it could have hit any process allocating memory when it ran
out.

Sane people turn overcommitment off.  Sane people wouldn't have
implemented it in the first place, either.


Udo.
--
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all progress
depends on the unreasonable man.
    -- George Bernard Shaw

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Udo Stenzel
In reply to this post by Daniel Fischer-4
Daniel Fischer wrote:
> > Most certainly not.  I'm pretty sure this is to a bug in your code.
> > Something retains a data structure which is actually unneeded.  Probably
>
> Apparently. And my money is on a load of lines from the file (of which I need
> only the first and last Char).

Then you're doing it wrong[TM].  You shouldn't need to keep any part of
the input in memory.  Whatever it is, nobody can tell you without seeing
the code.  Try heap profiling, should you have no idea where to look for
leaks.


> How could I solve the problem without representing the graph in some way?

By using an advanced tool called "brains".  Sorry for not being more
specific, but that's actually the fun part of the challenge and I'm not
going to spoil it for you.  ;-)


> Forgive the stupid question, but where if not RAM would the chunk currently
> processed reside?

Oh, I overlooked "chunk".  Well, yes, the "chunk" currently processed
needs to fit into RAM.  But how much of a problem could a single Char
pose?


Donald Bruce Stewart wrote:
> I agree. Some problems simply require you to hold large strings in
> memory. And for those, [Char] conks out around 5-10M (try reversing a
> 10M [Char]).

Sure, this one just isn't of that kind.


Udo.
--
"Irrationality is the square root of all evil"
                -- Douglas Hofstadter

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re[2]: Slow IO

Bulat Ziganshin-2
In reply to this post by Ketil Malde-3
Hello Ketil,

Wednesday, September 13, 2006, 10:41:13 AM, you wrote:

> But a String is something like 8 or 12 bytes per character, a
> ByteString gets you down to 1.

12-16. Char itself, pointer to the next list element, and two boxes
around them - this count for 16 bytes on 32-bit CPU. but cells with
small Char are preallocated at program startup, so it will be 12 bytes
for ascii-only strings

but that is not the whole story :)  copying GC makes program's memory
usage 3 times larger, on average, than it really allocates while
compacting GC has only 2x overhead

ByteStrings are allocated in _unmovable_ part of GHC heap, so they
don't suffer from this problem. of course, it is not free -
program that creates and destroys large number of ByteStrings will
suffer from memory holes, which is right the problem solved by ghc's GC

so, for program that only allocates the difference may be 24/36 times
on average while for create-use-destroy-loop scenarios i can't make
any detailed prognoses

also, traversing those lengthy lists on GCs is very time-consuming



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Daniel Fischer-4
In reply to this post by Udo Stenzel
Am Mittwoch, 13. September 2006 11:07 schrieben Sie:
> Daniel Fischer wrote:
> > > Most certainly not.  I'm pretty sure this is to a bug in your code.
> > > Something retains a data structure which is actually unneeded.
> > > Probably
> >
> > Apparently. And my money is on a load of lines from the file (of which I
> > need only the first and last Char).
>
> Then you're doing it wrong[TM].  You shouldn't need to keep any part of

Yes, I did it wrong, but I didn't keep anything (but the first and last Char
of each line) in memory on purpose. I hoped for the lines to be read one
after the other, head and last extracted - possibly immediately passed to
accumArray, but I wouldn't necessarily expect that - and the already used
lines thrown in the dustbin on next GC. Maybe the compiler couldn't figure
out that it wouldn't access these lines anymore.

> the input in memory.  Whatever it is, nobody can tell you without seeing
> the code.  Try heap profiling, should you have no idea where to look for
> leaks.

Profiling (hy,hc) shows that the IO part of the programme holds on to tons of
lists - that couldn't be anything but parts of the file-contents, I believe.

>
> > How could I solve the problem without representing the graph in some way?
>
> By using an advanced tool called "brains".  Sorry for not being more
> specific, but that's actually the fun part of the challenge and I'm not
> going to spoil it for you.  ;-)
>
> > Forgive the stupid question, but where if not RAM would the chunk
> > currently processed reside?
>
> Oh, I overlooked "chunk".  Well, yes, the "chunk" currently processed
> needs to fit into RAM.  But how much of a problem could a single Char
> pose?

Well, if it's the last straw...
But not much, I presume and even though it might be that we must have a few
thousand Chars inmemory, that shouldn't do much harm either.

>
> Donald Bruce Stewart wrote:
> > I agree. Some problems simply require you to hold large strings in
> > memory. And for those, [Char] conks out around 5-10M (try reversing a
> > 10M [Char]).
>
> Sure, this one just isn't of that kind.

Yes, but I didn't tell the compiler :-(

>
>
> Udo.

Cheers,
Daniel

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Slow IO

Ketil Malde-3
Daniel Fischer <[hidden email]> writes:

> Yes, I did it wrong, but I didn't keep anything (but the first and last Char
> of each line) in memory on purpose. I hoped for the lines to be read one
> after the other, head and last extracted
   [...]
> Profiling (hy,hc) shows that the IO part of the programme holds on to tons of
> lists - that couldn't be anything but parts of the file-contents, I believe.

Chances are that you don't evaluate these characters right away, so
what you are retaining is just unevaluated thunks that refer to the
lines themselves.  So the fix is to make the extraction of the
characters stricter.

-k
--
If I haven't seen further, it is by standing in the footprints of giants

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe