How to cut a file effciently?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to cut a file effciently?

Magicloud Magiclouds
Hi,
  Let us say I have a text file of a million lines, and I want to cut
it into smaller (10K lines) ones.
  How to do this? I have tried a few ways, none I think is lazy (I
mean not reading the file all at the start).
--
竹密岂妨流水过
山高哪阻野云飞
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to cut a file effciently?

Luke Palmer-2
split n [] = []
split n xs = take n xs : split n (drop n xs)

main = do
    text <- readFile "source"
    mapM_ (\(n,dat) -> writeFile ("dest" ++ show n) dat) . zip [0..] . split 10000 . lines $ text

Modulo brainos... but you get the idea.  This is lazy (because readFile is).

Luke

On Tue, Apr 7, 2009 at 11:20 PM, Magicloud Magiclouds <[hidden email]> wrote:
Hi,
 Let us say I have a text file of a million lines, and I want to cut
it into smaller (10K lines) ones.
 How to do this? I have tried a few ways, none I think is lazy (I
mean not reading the file all at the start).
--
竹密岂妨流水过
山高哪阻野云飞
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to cut a file effciently?

Johan Tibell-2
In reply to this post by Magicloud Magiclouds
On Wed, Apr 8, 2009 at 7:20 AM, Magicloud Magiclouds
<[hidden email]> wrote:
> Hi,
>  Let us say I have a text file of a million lines, and I want to cut
> it into smaller (10K lines) ones.
>  How to do this? I have tried a few ways, none I think is lazy (I
> mean not reading the file all at the start).

I would just seek to the approximate chunk boundaries (10k, 20k, etc)
and the read forward until hitting a newline.

Cheers,

Johan
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe