lesscode.org


Wanted: eglob.py  

By Ryan Tomayko under Wanted, Python on 01. July 2005

I’d really like to see an enhanced glob module. Nothing too crazy, just support for recursive wildcards and maybe a nice filtering API. Here’s your test case:

>>> import eglob

… or whatever.

A find function should return an iterator over all matching files and directories. Note that it should be possible to do recursive searches as the iterator is moving. yield kicks so much ass right here.

>>> eglob.find('/etc/**')
<generator object at ...>

Being able to filter the initial glob with such operations as exclude and include (needed?) would be nice. Designing this will be fun - try to abuse chaining generators as much as possible. :)

>>> list(eglob.find('/etc/**').exclude('passwd', 'group', 'init.d/*'))
['/etc/hosts', '/etc/httpd', '/etc/httpd/conf/httpd.conf']

I should be able to pass a extended glob (str, unicode) or a compiled regular expression (sre.SRE_Pattern) to any finding or filtering functions:

>>> list(eglob.find(re.compile(r'^/tmp/.*')))
['/tmp/mysql.sock', '/tmp/foo/bling']

I’d like to filter for directories only or files only:

>>> list(eglob.find('/home/*', directories=1))
['/home/hurly', '/home/curly', '/home/moe']
>>> eglob.find('**/.cvsignore', files=1)

This would be hugely useful in about four projects I’m currently working on.

5 Responses to “Wanted: eglob.py”

  1. Simon Willison:

    Markdown broke my comment. Here it is again.

    I don’t quite understand how the “**/.cvsignore” and “re.compile(r’^/tmp/.*’)” examples would work. Wouldn’t you have to scan every single path on the whole system (the equivalent of running “find /”) and then filter each one? At least with “/etc/**” you only have to scan a single directory, albeit recurisvely. Am I missing something?

    comment at 07. July 2005

  2. Ryan Tomayko:

    Nice to see you here, Simon. I hope your not in harms way over there with all the chaos. :)

    That’s a good point. I had thought of “**/.cvsignore” as being rooted from the current directory. “/**/.cvsignore” would be bad though.

    btw, the first place I saw the syntax was in the python based rdiff-backup and I instantly liked it:

    http://rdiff-backup.stanford.edu/rdiff-backup.1.html#sect7

    comment at 07. July 2005

  3. Ryan Tomayko:

    Dang, the CSS for these comments is stupid. I’m going to have to take a look at that.

    comment at 07. July 2005

  4. Ian Bicking:

    py.path (in the py lib) has several options for recursing files with matchers.

    comment at 07. July 2005

  5. Kent Johnson:

    Jason Orendorff’s path module can do mucho of this. It will recursively walk dirs filtering on an fnmatch and optionally isfile or isdir.

    eglob.find(re.compile(r’^/tmp/.*’)) might be
    path.path(’/tmp’).walk() depending on what you really mean by the re.

    eglob.find(’/home/*’, directories=1) is
    path.path(’/home’).dirs()

    Kent

    comment at 11. July 2005