Entry
Special behavior when reading files
Jul 5th, 2000 09:59
Nathan Wallace, unknown unknown, Hans Nowak, Snippet 69, Gordon McMillan
"""
Packages: files;text
"""
"""
> According to the manuals, read(size) reads (at most) size bytes from a
> file and returns it as a string. That is not very practical for me since
> I usually parse files that have a C structure (statements end in
> semi-colons) and I don't have any idea what size might be.
>
> Omitting size reads the entire file. Again, that is not very
> practical since the files I work with are usually ~200 MB large, and I
> only retrieve less than half of it.
While it somewhat strains my credulity that anyone regularly parses
200Meg files delimited by semicolons, stranger things have happened. Below
you will find something that does what you want with a reasonable degree
of efficiency. There are some standard tricks you can pull to increase
it's speed (probably by 10% to 20%) but I leave it as is for readability.
> I wish I could just do:
> RS = ";"
On a global basis, this will never happen in Python. On a file by
file basis, perhaps, but it's really not too hard to roll your own.
Notice that most of the work below is dedicated to making sure you
get your RS, that no RS is added to the last line, and that it still works
even with a buffersize of 1.
> I guess the best solution would be to use readline(), some flags,
> and determine where each statement ends by scanning for a ";"
> character. Any plans to change that behaviour in Python?
On my priority list, it might get done next century, but YMMV.
"""
import string
class Reader:
def __init__(self, fnm, rs='\n', sz=64*1024):
self.f = open(fnm, 'r')
self.rs = rs
self.sz = sz
self.lines = []
self.leftover = ''
self.eof = 0
def readline(self):
rslt = ''
if not self.lines:
while not self.eof:
tmp = self.f.read(self.sz)
if len(tmp) < self.sz:
self.eof = 1
self.lines = string.split(self.leftover + tmp, self.rs)
if self.lines:
self.leftover = self.lines[-1]
del self.lines[-1]
else:
self.leftover = ''
if self.lines:
break
if self.lines:
rslt = self.lines[0] + self.rs
del self.lines[0]
else:
rslt = self.leftover
self.leftover = ''
return rslt
def test():
r = Reader('c:/python/modules/python.c', ';', 1)
while 1:
ln = r.readline()
if not ln: break
print ln
if __name__ == '__main__':
test()