I have seen these two ways to process a file:
file = open("file.txt") for line in file: #do something file = open("file.txt") contents = file.read() for line in contents: # do something
I know that in the first case, the file will act like a list, so the for
loop iterates over the file as if it were a list. What exactly happens in the second case, where we read the file and then iterate over the contents
? What are the consequences of taking each approach, and how should I choose between them?
Advertisement
Answer
In the first one you are iterating over the file, line by line. In this scenario, the entire file data is not read into the memory at once; instead, only the current line is read into memory. This is useful for handling very large files, and good for robustness if you don’t know if the file is going to be large or not.
In the second one, file.read()
returns the complete file data as a string. When you are iterating over it, you are actually iterating over the file’s data character by character. This reads the complete file data into memory.
Here’s an example to show this behavior.
a.txt
file contains
Hello Bye
Code:
>>> f = open('a.txt','r') >>> for l in f: ... print(l) ... Hello Bye >>> f = open('a.txt','r') >>> r = f.read() >>> print(repr(r)) 'HellonBye' >>> for c in r: ... print(c) ... H e l l o B y e