Wanted: eglob.py
By Ryan Tomayko under Wanted, Python on 01. July 2005I’d really like to see an enhanced glob
module. Nothing too crazy,
just support for recursive wildcards and maybe a nice filtering
API. Here’s your test case:
>>> import eglob
… or whatever.
A find
function should return an iterator over all matching files and
directories. Note that it should be possible to do recursive searches as
the iterator is moving. yield
kicks so much ass right here.
>>> eglob.find('/etc/**')
<generator object at ...>
Being able to filter the initial glob with such operations as exclude
and
include
(needed?) would be nice. Designing this will be fun - try to abuse
chaining generators as much as possible. :)
>>> list(eglob.find('/etc/**').exclude('passwd', 'group', 'init.d/*'))
['/etc/hosts', '/etc/httpd', '/etc/httpd/conf/httpd.conf']
I should be able to pass a extended glob (str
, unicode
) or a
compiled regular expression (sre.SRE_Pattern
) to any finding or
filtering functions:
>>> list(eglob.find(re.compile(r'^/tmp/.*')))
['/tmp/mysql.sock', '/tmp/foo/bling']
I’d like to filter for directories only or files only:
>>> list(eglob.find('/home/*', directories=1))
['/home/hurly', '/home/curly', '/home/moe']
>>> eglob.find('**/.cvsignore', files=1)
This would be hugely useful in about four projects I’m currently working on.
Simon Willison:
Markdown broke my comment. Here it is again.
I don’t quite understand how the “**/.cvsignore” and “re.compile(r’^/tmp/.*’)” examples would work. Wouldn’t you have to scan every single path on the whole system (the equivalent of running “find /”) and then filter each one? At least with “/etc/**” you only have to scan a single directory, albeit recurisvely. Am I missing something?
comment at 07. July 2005
Ryan Tomayko:
Nice to see you here, Simon. I hope your not in harms way over there with all the chaos. :)
That’s a good point. I had thought of “**/.cvsignore” as being rooted from the current directory. “/**/.cvsignore” would be bad though.
btw, the first place I saw the syntax was in the python based rdiff-backup and I instantly liked it:
http://rdiff-backup.stanford.edu/rdiff-backup.1.html#sect7
comment at 07. July 2005
Ryan Tomayko:
Dang, the CSS for these comments is stupid. I’m going to have to take a look at that.
comment at 07. July 2005
Ian Bicking:
py.path (in the py lib) has several options for recursing files with matchers.
comment at 07. July 2005
Kent Johnson:
Jason Orendorff’s path module can do mucho of this. It will recursively walk dirs filtering on an fnmatch and optionally isfile or isdir.
eglob.find(re.compile(r’^/tmp/.*’)) might be
path.path(’/tmp’).walk() depending on what you really mean by the re.
eglob.find(’/home/*’, directories=1) is
path.path(’/home’).dirs()
Kent
comment at 11. July 2005