getRecursiveContents - example from `Real World Haskell'

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

getRecursiveContents - example from `Real World Haskell'

Johann Giwer
`Real World Haskell' is a great book. I really love it. When I tried an example
from the 9th Chapter, I was a bit disappointed:

*Main> f <- getRecursiveContents "/home/johann/"
Heap exhausted;
Current maximum heap size is 128000000 bytes (122 Mb);
use `+RTS -M<size>' to increase it.

The function lookes like this:

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
  names <- getDirectoryContents topdir
  let properNames = filter (`notElem` [".", ".."]) names
  paths <- forM properNames $ \name -> do                                -- 1
    let path = topdir </> name
    isDirectory <- doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path
      else return [path]
  return (concat paths)                                                  -- 2

OK, I'm using a small machine and my home directory contains ~30,000 files. But
that couldn't be the real problem. And even if this function is a small example
it should work reliable.
The programming language I know best (and this is meant relative -- I'm only a
`would be programmer') is python. Python has good support for functional
programming, but no builtin tail recursion. So my first idea about the bug in
`getRecursiveContents' went in this direction. Two hours later I had worked out
this solution:

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents = getRecursiveContents' []
  where
    getRecursiveContents' l p =
      E.handle (\_ -> return (p:l)) $ do                                 -- 3
        c <- getDirectoryContents p  
        let c' = filter (`notElem` [".", ".."]) c
        x <- foldM (\l' p' -> getRecursiveContents' l' (p </> p')) l c'  -- 4
        return (x)

Folding (4) and appending (3) would give less memory usage than mapping (1) and
concatenation (2), I thought. This function worked well for small directory
(for which the original one did, too). But tested with my home directory it
went into an infinite loop.  That led me to the actually problem:
`doesDirectoryExist' also accepts symlinks to directories. Another hour later
this was fixed:

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents = getRecursiveContents' []
  where
    getRecursiveContents' l p = do
      s <- getSymbolicLinkStatus p                                    
      if isDirectory s
        then
          E.handle (\_ -> return (p:l)) $ do
            c <- getDirectoryContents p  
            let c' = filter (`notElem` [".", ".."]) c
            x <- foldM (\l' p' -> getRecursiveContents' l' (p </> p')) l c'
            return (x)
        else
          return (p:l)

Finally I fixed the original function (this only took about 30 min :-). The
handle (5) catches errors caused by unreadable directories

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = E.handle (\_ ->return [topdir]) $ do       -- 5
  names <- getDirectoryContents topdir
  let properNames = filter (`notElem` [".", ".."]) names
  paths <- forM properNames $ \name -> do
    let path = topdir </> name
    s <- getSymbolicLinkStatus path
    if isDirectory s
      then getRecursiveContents path
      else return [path]
  return (concat paths)

The imports for all functions mentioned above are:

import Control.Monad ( forM, filterM, foldM )
import qualified Control.Exception as E
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath ((</>))
import System.Posix (getSymbolicLinkStatus, isDirectory)


Any suggestions about this solution are welcome.  

Johann