I'm not exactly new to haskell, but haven't used it much in real world.
So what I want to do is to find all git repositories starting from some folders. Basically I'm trying to do this find . -type d -exec test -e '{}/.git' ';' -print -prune
only faster via using haskell concurrency features.
This is what I got so far.
import Control.Concurrent.Async
import System.Directory (doesDirectoryExist)
import System.FilePath ((</>))
import System.IO (FilePath)
isGitRepo :: FilePath -> IO Bool
isGitRepo p = doesDirectoryExist $ p </> ".git"
main :: IO ()
main = putStrLn "hello"
I've found this lib which has this function mapConcurrently :: Traversable t => (a -> IO b) -> t a -> IO (t b)
Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure. Then filter it concurrently with isGitRepo
and that fold it into list and print it. Well, of course I know how to make data FTree = Node String [FTree]
or something like that, but I have questions. How to produce it concurrently? How to produce absolute path while traversing the tree? Questions like that and so on.
Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure.
I'm not sure you need a tree structure for this. You could make an intermediate such structure, but you could just as well manage without one. The key thing is you need to have O(1)
appending (to combine your results). A difference list (like dlist
) does this.
How to produce it concurrently?
You already got that: using mapConcurrently
!
How to produce absolute path while traversing the tree?
listDirectory
lets you get the next possible segments in the path. You can get the next paths by appending each of these segments to the existing path (they won't be absolute paths unless the existing path was though).
Here is a working function:
import System.Directory (doesDirectoryExist, listDirectory)
import System.FilePath ((</>), combine)
import System.IO (FilePath)
import Control.Concurrent.Async (mapConcurrently)
import qualified Data.DList as DL
-- | tries to find all git repos in the subtree rooted at the path
findGitRepos :: FilePath -> IO (DL.DList FilePath)
findGitRepos p = do
isNotDir <- not <$> doesDirectoryExist p
if isNotDir
then pure DL.empty -- the path 'p' isn't a directory
else do
isGitDir <- doesDirectoryExist (p </> ".git")
if isGitDir
then pure (DL.singleton p) -- the folder is a git repo
else do -- recurse to subfolders
subdirs <- listDirectory p
repos <- mapConcurrently findGitRepos (combine p `map` subdirs)
pure (DL.concat repos)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments