How to optimize a directory scanning?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

How to optimize a directory scanning?

Magicloud Magiclouds
Hi,
I have asked this in Stackoverflow without getting an answer.
Wondering if people here could have some thoughts.

I have a function reading the content of /proc every second.
Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
logic in C or Rust just takes like 1% or 2%. Wondering if this can be
improved. /proc is virtual filesystem, so this is not related to HDD
performance. And I noticed this difference because my CPU is too old
(Core Gen2). On modern CPU, as tested by others, the difference is
barely noticeable.

import Control.Exception
import Control.Concurrent
import Control.Monad
import Data.Char
import Data.Maybe
import System.Directory
import System.FilePath
import System.Posix.Files
import System.Posix.Signals
import System.Posix.Types
import System.Posix.User
import System.IO.Strict as Strict

watch u limit0s limit0h = do
  listDirectory "/proc/" >>= mapM_ (\fp -> do
    isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
    wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
  threadDelay 1000000
  watch u limit0s limit0h
  where
    wrap2Maybe :: IO a -> IO (Maybe a)
    wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
return Nothing)
    isMyPid :: FilePath -> UserID -> IO Bool
    isMyPid fp me = do
      let areDigit = fp >= "0" && fp <= "9"
      isDir <- doesDirectoryExist $ "/proc/" </> fp
      owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
      return $ areDigit && isDir && (owner == me)


--
竹密岂妨流水过
山高哪阻野云飞
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

David Feuer
Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?

On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds <[hidden email]> wrote:
Hi,
I have asked this in Stackoverflow without getting an answer.
Wondering if people here could have some thoughts.

I have a function reading the content of /proc every second.
Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
logic in C or Rust just takes like 1% or 2%. Wondering if this can be
improved. /proc is virtual filesystem, so this is not related to HDD
performance. And I noticed this difference because my CPU is too old
(Core Gen2). On modern CPU, as tested by others, the difference is
barely noticeable.

import Control.Exception
import Control.Concurrent
import Control.Monad
import Data.Char
import Data.Maybe
import System.Directory
import System.FilePath
import System.Posix.Files
import System.Posix.Signals
import System.Posix.Types
import System.Posix.User
import System.IO.Strict as Strict

watch u limit0s limit0h = do
  listDirectory "/proc/" >>= mapM_ (\fp -> do
    isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
    wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
  threadDelay 1000000
  watch u limit0s limit0h
  where
    wrap2Maybe :: IO a -> IO (Maybe a)
    wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
return Nothing)
    isMyPid :: FilePath -> UserID -> IO Bool
    isMyPid fp me = do
      let areDigit = fp >= "0" && fp <= "9"
      isDir <- doesDirectoryExist $ "/proc/" </> fp
      owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
      return $ areDigit && isDir && (owner == me)


--
竹密岂妨流水过
山高哪阻野云飞
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Vanessa McHale
In reply to this post by Magicloud Magiclouds

Would you happen to have the Rust/C code available?

One option is to simply using the C code and bind to it.

The one thing that stands out to me in your code is that you call

doesDirectoryExist

as well as

getFileStatus

when you could determine whether it exists with

doesPathExist

and then determine whether it's a directory by checking the result of getFileStatus

Cheers,
Vanessa McHale

On 5/9/19 9:00 PM, Magicloud Magiclouds wrote:
Hi,
I have asked this in Stackoverflow without getting an answer.
Wondering if people here could have some thoughts.

I have a function reading the content of /proc every second.
Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
logic in C or Rust just takes like 1% or 2%. Wondering if this can be
improved. /proc is virtual filesystem, so this is not related to HDD
performance. And I noticed this difference because my CPU is too old
(Core Gen2). On modern CPU, as tested by others, the difference is
barely noticeable.

import Control.Exception
import Control.Concurrent
import Control.Monad
import Data.Char
import Data.Maybe
import System.Directory
import System.FilePath
import System.Posix.Files
import System.Posix.Signals
import System.Posix.Types
import System.Posix.User
import System.IO.Strict as Strict

watch u limit0s limit0h = do
  listDirectory "/proc/" >>= mapM_ (\fp -> do
    isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
    wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
  threadDelay 1000000
  watch u limit0s limit0h
  where
    wrap2Maybe :: IO a -> IO (Maybe a)
    wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
return Nothing)
    isMyPid :: FilePath -> UserID -> IO Bool
    isMyPid fp me = do
      let areDigit = fp >= "0" && fp <= "9"
      isDir <- doesDirectoryExist $ "/proc/" </> fp
      owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
      return $ areDigit && isDir && (owner == me)



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
In reply to this post by David Feuer
I could not tell, since those are some kind of "standard" functions of
Haskell, right?

On Fri, May 10, 2019 at 10:11 AM David Feuer <[hidden email]> wrote:

>
> Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?
>
> On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds <[hidden email]> wrote:
>>
>> Hi,
>> I have asked this in Stackoverflow without getting an answer.
>> Wondering if people here could have some thoughts.
>>
>> I have a function reading the content of /proc every second.
>> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
>> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
>> improved. /proc is virtual filesystem, so this is not related to HDD
>> performance. And I noticed this difference because my CPU is too old
>> (Core Gen2). On modern CPU, as tested by others, the difference is
>> barely noticeable.
>>
>> import Control.Exception
>> import Control.Concurrent
>> import Control.Monad
>> import Data.Char
>> import Data.Maybe
>> import System.Directory
>> import System.FilePath
>> import System.Posix.Files
>> import System.Posix.Signals
>> import System.Posix.Types
>> import System.Posix.User
>> import System.IO.Strict as Strict
>>
>> watch u limit0s limit0h = do
>>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>>   threadDelay 1000000
>>   watch u limit0s limit0h
>>   where
>>     wrap2Maybe :: IO a -> IO (Maybe a)
>>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
>> return Nothing)
>>     isMyPid :: FilePath -> UserID -> IO Bool
>>     isMyPid fp me = do
>>       let areDigit = fp >= "0" && fp <= "9"
>>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>>       return $ areDigit && isDir && (owner == me)
>>
>>
>> --
>> 竹密岂妨流水过
>> 山高哪阻野云飞
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
In reply to this post by Vanessa McHale
Yes, I do. But binding might not be the top priority since I'd like to
leave logic in Haskell.

Tried the dir checking, could not see major differences.

On Fri, May 10, 2019 at 10:13 AM Vanessa McHale <[hidden email]> wrote:

>
> Would you happen to have the Rust/C code available?
>
> One option is to simply using the C code and bind to it.
>
> The one thing that stands out to me in your code is that you call
>
> doesDirectoryExist
>
> as well as
>
> getFileStatus
>
> when you could determine whether it exists with
>
> doesPathExist
>
> and then determine whether it's a directory by checking the result of getFileStatus
>
> Cheers,
> Vanessa McHale
>
> On 5/9/19 9:00 PM, Magicloud Magiclouds wrote:
>
> Hi,
> I have asked this in Stackoverflow without getting an answer.
> Wondering if people here could have some thoughts.
>
> I have a function reading the content of /proc every second.
> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> improved. /proc is virtual filesystem, so this is not related to HDD
> performance. And I noticed this difference because my CPU is too old
> (Core Gen2). On modern CPU, as tested by others, the difference is
> barely noticeable.
>
> import Control.Exception
> import Control.Concurrent
> import Control.Monad
> import Data.Char
> import Data.Maybe
> import System.Directory
> import System.FilePath
> import System.Posix.Files
> import System.Posix.Signals
> import System.Posix.Types
> import System.Posix.User
> import System.IO.Strict as Strict
>
> watch u limit0s limit0h = do
>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>   threadDelay 1000000
>   watch u limit0s limit0h
>   where
>     wrap2Maybe :: IO a -> IO (Maybe a)
>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> return Nothing)
>     isMyPid :: FilePath -> UserID -> IO Bool
>     isMyPid fp me = do
>       let areDigit = fp >= "0" && fp <= "9"
>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>       return $ areDigit && isDir && (owner == me)
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Brandon Allbery
In reply to this post by Magicloud Magiclouds
...what?

Also, in C you'd stat() and check for -1 (not found_ or inspect the result to see if it's what you want. But in Haskell this throws an exception instead of producing a sane Either. so you either make multiple syscalls or you have to catch an exception. So no matter what this ends up being higher overhead than C or Rust.

On Thu, May 9, 2019 at 10:15 PM Magicloud Magiclouds <[hidden email]> wrote:
I could not tell, since those are some kind of "standard" functions of
Haskell, right?

On Fri, May 10, 2019 at 10:11 AM David Feuer <[hidden email]> wrote:
>
> Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?
>
> On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds <[hidden email]> wrote:
>>
>> Hi,
>> I have asked this in Stackoverflow without getting an answer.
>> Wondering if people here could have some thoughts.
>>
>> I have a function reading the content of /proc every second.
>> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
>> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
>> improved. /proc is virtual filesystem, so this is not related to HDD
>> performance. And I noticed this difference because my CPU is too old
>> (Core Gen2). On modern CPU, as tested by others, the difference is
>> barely noticeable.
>>
>> import Control.Exception
>> import Control.Concurrent
>> import Control.Monad
>> import Data.Char
>> import Data.Maybe
>> import System.Directory
>> import System.FilePath
>> import System.Posix.Files
>> import System.Posix.Signals
>> import System.Posix.Types
>> import System.Posix.User
>> import System.IO.Strict as Strict
>>
>> watch u limit0s limit0h = do
>>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>>   threadDelay 1000000
>>   watch u limit0s limit0h
>>   where
>>     wrap2Maybe :: IO a -> IO (Maybe a)
>>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
>> return Nothing)
>>     isMyPid :: FilePath -> UserID -> IO Bool
>>     isMyPid fp me = do
>>       let areDigit = fp >= "0" && fp <= "9"
>>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>>       return $ areDigit && isDir && (owner == me)
>>
>>
>> --
>> 竹密岂妨流水过
>> 山高哪阻野云飞
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--
brandon s allbery kf8nh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
Make sense. I can see those "tricks" of C. But just, since this code
is not some complex computing, really wishing it could be speeded up.
For example, Rust gives Either on IO errors.

On Fri, May 10, 2019 at 10:17 AM Brandon Allbery <[hidden email]> wrote:

>
> ...what?
>
> Also, in C you'd stat() and check for -1 (not found_ or inspect the result to see if it's what you want. But in Haskell this throws an exception instead of producing a sane Either. so you either make multiple syscalls or you have to catch an exception. So no matter what this ends up being higher overhead than C or Rust.
>
> On Thu, May 9, 2019 at 10:15 PM Magicloud Magiclouds <[hidden email]> wrote:
>>
>> I could not tell, since those are some kind of "standard" functions of
>> Haskell, right?
>>
>> On Fri, May 10, 2019 at 10:11 AM David Feuer <[hidden email]> wrote:
>> >
>> > Pure speculation: are you paying for a lot of conversions between FilePath (string) and C strings?
>> >
>> > On Thu, May 9, 2019, 10:02 PM Magicloud Magiclouds <[hidden email]> wrote:
>> >>
>> >> Hi,
>> >> I have asked this in Stackoverflow without getting an answer.
>> >> Wondering if people here could have some thoughts.
>> >>
>> >> I have a function reading the content of /proc every second.
>> >> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
>> >> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
>> >> improved. /proc is virtual filesystem, so this is not related to HDD
>> >> performance. And I noticed this difference because my CPU is too old
>> >> (Core Gen2). On modern CPU, as tested by others, the difference is
>> >> barely noticeable.
>> >>
>> >> import Control.Exception
>> >> import Control.Concurrent
>> >> import Control.Monad
>> >> import Data.Char
>> >> import Data.Maybe
>> >> import System.Directory
>> >> import System.FilePath
>> >> import System.Posix.Files
>> >> import System.Posix.Signals
>> >> import System.Posix.Types
>> >> import System.Posix.User
>> >> import System.IO.Strict as Strict
>> >>
>> >> watch u limit0s limit0h = do
>> >>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>> >>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>> >>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>> >>   threadDelay 1000000
>> >>   watch u limit0s limit0h
>> >>   where
>> >>     wrap2Maybe :: IO a -> IO (Maybe a)
>> >>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
>> >> return Nothing)
>> >>     isMyPid :: FilePath -> UserID -> IO Bool
>> >>     isMyPid fp me = do
>> >>       let areDigit = fp >= "0" && fp <= "9"
>> >>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>> >>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>> >>       return $ areDigit && isDir && (owner == me)
>> >>
>> >>
>> >> --
>> >> 竹密岂妨流水过
>> >> 山高哪阻野云飞
>> >> _______________________________________________
>> >> Haskell-Cafe mailing list
>> >> To (un)subscribe, modify options or view archives go to:
>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> >> Only members subscribed via the mailman list are allowed to post.
>>
>>
>>
>> --
>> 竹密岂妨流水过
>> 山高哪阻野云飞
>>
>> And for G+, please use magiclouds#gmail.com.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>
>
> --
> brandon s allbery kf8nh
> [hidden email]



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Iustin Pop-3
In reply to this post by Magicloud Magiclouds
On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:

> Hi,
> I have asked this in Stackoverflow without getting an answer.
> Wondering if people here could have some thoughts.
>
> I have a function reading the content of /proc every second.
> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> improved. /proc is virtual filesystem, so this is not related to HDD
> performance. And I noticed this difference because my CPU is too old
> (Core Gen2). On modern CPU, as tested by others, the difference is
> barely noticeable.
>
> watch u limit0s limit0h = do
>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>   threadDelay 1000000
>   watch u limit0s limit0h
>   where
>     wrap2Maybe :: IO a -> IO (Maybe a)
>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> return Nothing)
>     isMyPid :: FilePath -> UserID -> IO Bool
>     isMyPid fp me = do
>       let areDigit = fp >= "0" && fp <= "9"
>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>       return $ areDigit && isDir && (owner == me)

Interesting, I can see a few potential issues. But first, have you
measure how many syscalls does this do in Haskell vs. C vs Rust? That
would allow you to separate the problem between internal Haskell
problems (e.g. String) vs. different algorithm in Haskell.

For exacmple, one issue that could lead to unneded syscalls is your
"isMyPid" function. AFAIK there's no caching done by getFileStatus, so
you're stat'ing (and making a syscall) each path twice, once to get file
type (is it directory) information, and then a second time to get owner
information.

You also build `"/proc/" <> fp` twice (and thus evaluate it twice).

But without understanding "how" Haskell it slower, it's not clear where
the problem lies (in syscalls or in GC or …).

regards,
iustin
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
Good point. Let me see what strace can tell me.

On Fri, May 10, 2019 at 3:46 PM Iustin Pop <[hidden email]> wrote:

>
> On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > Hi,
> > I have asked this in Stackoverflow without getting an answer.
> > Wondering if people here could have some thoughts.
> >
> > I have a function reading the content of /proc every second.
> > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > improved. /proc is virtual filesystem, so this is not related to HDD
> > performance. And I noticed this difference because my CPU is too old
> > (Core Gen2). On modern CPU, as tested by others, the difference is
> > barely noticeable.
> >
> > watch u limit0s limit0h = do
> >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> >   threadDelay 1000000
> >   watch u limit0s limit0h
> >   where
> >     wrap2Maybe :: IO a -> IO (Maybe a)
> >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > return Nothing)
> >     isMyPid :: FilePath -> UserID -> IO Bool
> >     isMyPid fp me = do
> >       let areDigit = fp >= "0" && fp <= "9"
> >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> >       return $ areDigit && isDir && (owner == me)
>
> Interesting, I can see a few potential issues. But first, have you
> measure how many syscalls does this do in Haskell vs. C vs Rust? That
> would allow you to separate the problem between internal Haskell
> problems (e.g. String) vs. different algorithm in Haskell.
>
> For exacmple, one issue that could lead to unneded syscalls is your
> "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> you're stat'ing (and making a syscall) each path twice, once to get file
> type (is it directory) information, and then a second time to get owner
> information.
>
> You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
>
> But without understanding "how" Haskell it slower, it's not clear where
> the problem lies (in syscalls or in GC or …).
>
> regards,
> iustin



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
So this is what I got. Seems like both calls two stat(stat/newfstatat)
for dir checking and uid checking. But when open file for reading,
there is an ioctl call (maybe from System.IO.Strict) which seems
failed, for Haskell. I want to test the case without System.IO.Strict.
But have no idea how to get exception catching works with lazy
readFIle.

For Haskell implenmentation,
```
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
ioctl(23, TCGETS, 0x7ffe88c18090)       = -1 ENOTTY (Inappropriate
ioctl for device)
read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
read(23, "", 8192)                      = 0
close(23)
```
For Rust implenmentation,
```
newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
AT_SYMLINK_NOFOLLOW) = 0
stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
read(4, "4885264596992 94885264603013 140"..., 128) = 128
read(4, "0724521155542 140724521155575 14"..., 256) = 64
read(4, "", 192)                        = 0
close(4)
```

On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
<[hidden email]> wrote:

>
> Good point. Let me see what strace can tell me.
>
> On Fri, May 10, 2019 at 3:46 PM Iustin Pop <[hidden email]> wrote:
> >
> > On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > > Hi,
> > > I have asked this in Stackoverflow without getting an answer.
> > > Wondering if people here could have some thoughts.
> > >
> > > I have a function reading the content of /proc every second.
> > > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > > improved. /proc is virtual filesystem, so this is not related to HDD
> > > performance. And I noticed this difference because my CPU is too old
> > > (Core Gen2). On modern CPU, as tested by others, the difference is
> > > barely noticeable.
> > >
> > > watch u limit0s limit0h = do
> > >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> > >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> > >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> > >   threadDelay 1000000
> > >   watch u limit0s limit0h
> > >   where
> > >     wrap2Maybe :: IO a -> IO (Maybe a)
> > >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > > return Nothing)
> > >     isMyPid :: FilePath -> UserID -> IO Bool
> > >     isMyPid fp me = do
> > >       let areDigit = fp >= "0" && fp <= "9"
> > >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> > >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> > >       return $ areDigit && isDir && (owner == me)
> >
> > Interesting, I can see a few potential issues. But first, have you
> > measure how many syscalls does this do in Haskell vs. C vs Rust? That
> > would allow you to separate the problem between internal Haskell
> > problems (e.g. String) vs. different algorithm in Haskell.
> >
> > For exacmple, one issue that could lead to unneded syscalls is your
> > "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> > you're stat'ing (and making a syscall) each path twice, once to get file
> > type (is it directory) information, and then a second time to get owner
> > information.
> >
> > You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
> >
> > But without understanding "how" Haskell it slower, it's not clear where
> > the problem lies (in syscalls or in GC or …).
> >
> > regards,
> > iustin
>
>
>
> --
> 竹密岂妨流水过
> 山高哪阻野云飞
>
> And for G+, please use magiclouds#gmail.com.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Brandon Allbery
The ioctl is standard, including in C unless you are using open() directly: it checks to see if the opened file is a terminal, to determine whether to set block or line buffering.

On Fri, May 10, 2019 at 11:09 AM Magicloud Magiclouds <[hidden email]> wrote:
So this is what I got. Seems like both calls two stat(stat/newfstatat)
for dir checking and uid checking. But when open file for reading,
there is an ioctl call (maybe from System.IO.Strict) which seems
failed, for Haskell. I want to test the case without System.IO.Strict.
But have no idea how to get exception catching works with lazy
readFIle.

For Haskell implenmentation,
```
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
ioctl(23, TCGETS, 0x7ffe88c18090)       = -1 ENOTTY (Inappropriate
ioctl for device)
read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
read(23, "", 8192)                      = 0
close(23)
```
For Rust implenmentation,
```
newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
AT_SYMLINK_NOFOLLOW) = 0
stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
read(4, "4885264596992 94885264603013 140"..., 128) = 128
read(4, "0724521155542 140724521155575 14"..., 256) = 64
read(4, "", 192)                        = 0
close(4)
```

On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
<[hidden email]> wrote:
>
> Good point. Let me see what strace can tell me.
>
> On Fri, May 10, 2019 at 3:46 PM Iustin Pop <[hidden email]> wrote:
> >
> > On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > > Hi,
> > > I have asked this in Stackoverflow without getting an answer.
> > > Wondering if people here could have some thoughts.
> > >
> > > I have a function reading the content of /proc every second.
> > > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > > improved. /proc is virtual filesystem, so this is not related to HDD
> > > performance. And I noticed this difference because my CPU is too old
> > > (Core Gen2). On modern CPU, as tested by others, the difference is
> > > barely noticeable.
> > >
> > > watch u limit0s limit0h = do
> > >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> > >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> > >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> > >   threadDelay 1000000
> > >   watch u limit0s limit0h
> > >   where
> > >     wrap2Maybe :: IO a -> IO (Maybe a)
> > >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > > return Nothing)
> > >     isMyPid :: FilePath -> UserID -> IO Bool
> > >     isMyPid fp me = do
> > >       let areDigit = fp >= "0" && fp <= "9"
> > >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> > >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> > >       return $ areDigit && isDir && (owner == me)
> >
> > Interesting, I can see a few potential issues. But first, have you
> > measure how many syscalls does this do in Haskell vs. C vs Rust? That
> > would allow you to separate the problem between internal Haskell
> > problems (e.g. String) vs. different algorithm in Haskell.
> >
> > For exacmple, one issue that could lead to unneded syscalls is your
> > "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> > you're stat'ing (and making a syscall) each path twice, once to get file
> > type (is it directory) information, and then a second time to get owner
> > information.
> >
> > You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
> >
> > But without understanding "how" Haskell it slower, it's not clear where
> > the problem lies (in syscalls or in GC or …).
> >
> > regards,
> > iustin
>
>
>
> --
> 竹密岂妨流水过
> 山高哪阻野云飞
>
> And for G+, please use magiclouds#gmail.com.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--
brandon s allbery kf8nh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Neil Mayhew
It would be possible to avoid that TCGETS ioctl since the immediately preceding fstat shows that the file is a regular file and not a device. However, I'm not sure how easy it would be for the library to make that optimization.

The Haskell implementation actually makes less syscalls than the Rust one, because Rust reads the file in very small chunks (32,32,64,128,64) whereas Haskell reads one big chunk (8192) which is sufficient to contain the entire file. I think it's unlikely that the extra ioctl outweighs the multiple extra reads. However, if you use the -r option with strace to include timestamps in the output, you'll be able to see just how long each syscall is taking. On my system, they all take about the same amount of time.

It would also be worth using time on the program, to see how much of the CPU time is in user space vs kernel.

On 2019-05-10 9:35 AM, Brandon Allbery wrote:
The ioctl is standard, including in C unless you are using open() directly: it checks to see if the opened file is a terminal, to determine whether to set block or line buffering.

On Fri, May 10, 2019 at 11:09 AM Magicloud Magiclouds <[hidden email]> wrote:
So this is what I got. Seems like both calls two stat(stat/newfstatat)
for dir checking and uid checking. But when open file for reading,
there is an ioctl call (maybe from System.IO.Strict) which seems
failed, for Haskell. I want to test the case without System.IO.Strict.
But have no idea how to get exception catching works with lazy
readFIle.

For Haskell implenmentation,
```
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
ioctl(23, TCGETS, 0x7ffe88c18090)       = -1 ENOTTY (Inappropriate
ioctl for device)
read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
read(23, "", 8192)                      = 0
close(23)
```
For Rust implenmentation,
```
newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
AT_SYMLINK_NOFOLLOW) = 0
stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
read(4, "4885264596992 94885264603013 140"..., 128) = 128
read(4, "0724521155542 140724521155575 14"..., 256) = 64
read(4, "", 192)                        = 0
close(4)
```

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Viktor Dukhovni
In reply to this post by Magicloud Magiclouds
Why is the process id re-computed every second?  Do you
expected it to change during the process lifetime?

>    isMyPid fp me = do
>      let areDigit = fp >= "0" && fp <= "9"
>      isDir <- doesDirectoryExist $ "/proc/" </> fp
>      owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>      return $ areDigit && isDir && (owner == me)

And the code should skip looking for sub-directories of
non-numeric directory entries, avoiding unnecessary stat(2)
calls.

   import System.Posix.Directory as D
   import Control.Monad

   perEntry_ :: FilePath -> (FilePath -> IO ()) -> IO ()
   perEntry_ dirPath entryAction =
        bracket (D.openDirStream)
                (D.closeDirStream)
                (D.readDirStream >=> entryAction)

Or with Conduits:

   import Data.Conduit as C
   import Data.Conduit.Combinators as C

   C.runConduitRes $ C.sourceDirectory dirPath .|
        (C.awaitForever >>= entryAction)

But now you have more choices about when and what to return
from the loop, whether the scan the whole directory, ...

Note that the conduit version prepends the directory name to
the entry names.  I would not have done that, but you can just
copy the handful of lines of source and stream the bare entry names:

  http://hackage.haskell.org/package/conduit-1.3.1.1/docs/src/Data.Conduit.Combinators.html#sourceDirectory
                                 
--
        Viktor.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Niklas Hambüchen
In reply to this post by Magicloud Magiclouds
Hi,

we made the `posix-paths` package for fast directory traversals:

    https://hackage.haskell.org/package/posix-paths

You can find benchmarks in

    https://github.com/JohnLato/posix-paths#benchmarks

Some more tips (some of them you're already following as per other threads):

* Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting
* Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
KC
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

KC
Thank you for making the `posix-paths` package for fast directory traversals:

Are directories stored in consecutive disk blocks?

On Fri, May 10, 2019 at 6:53 PM Niklas Hambüchen <[hidden email]> wrote:
Hi,

we made the `posix-paths` package for fast directory traversals:

    https://hackage.haskell.org/package/posix-paths

You can find benchmarks in

    https://github.com/JohnLato/posix-paths#benchmarks

Some more tips (some of them you're already following as per other threads):

* Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting
* Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--

--

Sent from an expensive device which will be obsolete in a few months! :D

Casey


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Brandon Allbery
Depends on the host filesystem. Traditionally, the first 10 blocks are direct and often (but not always, if the fs is fragmented) consecutive; the remainder are indirect by 1-3 levels (not that you ever want a directory to be double indirect much less triple!), and often are not consecutive simply because by the time you get to that point you're working with a filesystem with a lot of files on it and a fair amount of fragmentation.

On Sat, May 11, 2019 at 6:24 PM KC <[hidden email]> wrote:
Thank you for making the `posix-paths` package for fast directory traversals:

Are directories stored in consecutive disk blocks?

On Fri, May 10, 2019 at 6:53 PM Niklas Hambüchen <[hidden email]> wrote:
Hi,

we made the `posix-paths` package for fast directory traversals:

    https://hackage.haskell.org/package/posix-paths

You can find benchmarks in

    https://github.com/JohnLato/posix-paths#benchmarks

Some more tips (some of them you're already following as per other threads):

* Use `time` to if time is spent on kernel CPU, userspace CPU, or waiting
* Use `strace -fy` with `-ttt` and `-T` to see timings, and `-c` and `-wc` summary statistics
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--

--

Sent from an expensive device which will be obsolete in a few months! :D

Casey

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--
brandon s allbery kf8nh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Viktor Dukhovni
In reply to this post by Niklas Hambüchen
On Sat, May 11, 2019 at 03:52:38AM +0200, Niklas Hambüchen wrote:

> we made the `posix-paths` package for fast directory traversals:
>
>     https://hackage.haskell.org/package/posix-paths
>
> You can find benchmarks in
>
>     https://github.com/JohnLato/posix-paths#benchmarks

It should perhaps be noted that a large fraction of the additional
overhead encountered by the String FilePath traversals in the that
benchmark occur in the output code that prints all the paths to
stdout.  The corresponding ByteString listing is noticeably faster.

If one rather just stats and counts all the files, the performance
difference is somewhat more modest, (IIRC around a factor of ~2
rather than ~5 or 6)

At the directory traversal of course needs to use 'getSymbolicLinkStatus'
rather than 'getFileStatus', since recursive directory traversals
should almost never follow symlinks.

--
        Viktor.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Joachim Durchholz
In reply to this post by KC
Am 12.05.19 um 00:23 schrieb KC:
> Are directories stored in consecutive disk blocks?

That's something that you have to rely on the file system to organize
for you.
Brandon's answer is the traditional one for Unix filesystems, up to and
including ext3fs. Modern filesystems try to do better (and often do),
since scanning large directories has turned out to be so important.
If you do performance testing, both bad and good filesystem performance
may be accidental; if you want to know not just the typical behaviour
but also the pathological cases, you'll either have to wait for user
reports to come in or talk to real filesystem experts (and even their
answers will mostly be on an "it depends" basis).
Note that fragmentation is irrelevant for SSDs.

The OP is at the "what system calls are being done" stage; optimization
questions about fragmentation aren't going to be relevant to him I think.

TL;DR: Don't worry about fragmentation, unless you are willing to spend
a really high amount of time on detail optimization.

Regards,
Jo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to optimize a directory scanning?

Magicloud Magiclouds
Thanks for all replies. I did not track forks with strace since "I do
not have such code" although stack has threaded and rtsopts set. But
now `strace -f` clearly shows that there are qutie a lot of forking
for my code. Removing those options got me a 3% CPU usage reducing.
And as Neil said, ioctl or other syscalls in the whole reading
process, Haskell is more optimized than Rust.

I am trying posix-paths now.

@Viktor,
Sorry, that was a part missing in sample code. isMyPid should be
called before reading the stat file.

@Brandon, @Joachim, @KC,
At least for me, how data is stored on disk is not related. /proc is a
virtual filesystem which just a kernel data structures exposed via IO
operations.

On Sun, May 12, 2019 at 2:27 PM Joachim Durchholz <[hidden email]> wrote:

>
> Am 12.05.19 um 00:23 schrieb KC:
> > Are directories stored in consecutive disk blocks?
>
> That's something that you have to rely on the file system to organize
> for you.
> Brandon's answer is the traditional one for Unix filesystems, up to and
> including ext3fs. Modern filesystems try to do better (and often do),
> since scanning large directories has turned out to be so important.
> If you do performance testing, both bad and good filesystem performance
> may be accidental; if you want to know not just the typical behaviour
> but also the pathological cases, you'll either have to wait for user
> reports to come in or talk to real filesystem experts (and even their
> answers will mostly be on an "it depends" basis).
> Note that fragmentation is irrelevant for SSDs.
>
> The OP is at the "what system calls are being done" stage; optimization
> questions about fragmentation aren't going to be relevant to him I think.
>
> TL;DR: Don't worry about fragmentation, unless you are willing to spend
> a really high amount of time on detail optimization.
>
> Regards,
> Jo
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



--
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.