faqts : Computers : Programming : Languages : Python : Snippets : Stream Manipulation : Files

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

1 of 1 people (100%) answered Yes

Entry

Special behavior when reading files

Jul 5th, 2000 09:59
Nathan Wallace, unknown unknown, Hans Nowak, Snippet 69, Gordon McMillan


"""
Packages: files;text
"""
"""
> According to the manuals, read(size) reads (at most) size bytes from a
> file and returns it as a string. That is not very practical for me since
> I usually parse files that have a C structure (statements end in
> semi-colons) and I don't have any idea what size might be.
> 
> Omitting size reads the entire file. Again, that is not very
> practical since the files I work with are usually ~200 MB large, and I
> only retrieve less than half of it.
While it somewhat strains my credulity that anyone regularly parses 
200Meg files delimited by semicolons, stranger things have happened. Below
you will find something that does what you want with a reasonable degree
of efficiency.  There are some standard tricks you can pull to increase
it's speed (probably by 10% to 20%) but I leave it as is for readability.
> I wish I could just do:
>  RS = ";"
On a global basis, this will never happen in Python. On a file by 
file basis, perhaps, but it's really not too hard to roll your own. 
Notice that most of the work below is dedicated to making sure you 
get your RS, that no RS is added to the last line, and that it still works
even with a buffersize of 1. 
> I guess the best solution would be to use readline(), some flags,
> and determine where each statement ends by scanning for a ";"
> character. Any plans to change that behaviour in Python?
On my priority list, it might get done next century, but YMMV.
"""
import string
class Reader:
  def __init__(self, fnm, rs='\n', sz=64*1024):
    self.f = open(fnm, 'r')
    self.rs = rs
    self.sz = sz
    self.lines = []
    self.leftover = ''
    self.eof = 0
  def readline(self):
    rslt = ''
    if not self.lines:
      while not self.eof:
        tmp = self.f.read(self.sz)
        if len(tmp) < self.sz:
          self.eof = 1
        self.lines = string.split(self.leftover + tmp, self.rs)
        if self.lines:
          self.leftover = self.lines[-1]
          del self.lines[-1]
        else:
          self.leftover = ''
        if self.lines:
          break
    if self.lines:
      rslt = self.lines[0] + self.rs
      del self.lines[0]
    else:
      rslt = self.leftover
      self.leftover = ''
    return rslt
def test():
  r = Reader('c:/python/modules/python.c', ';', 1)
  while 1:
    ln = r.readline()
    if not ln: break
    print ln
if __name__ == '__main__':
  test()