Skip to content
Advertisement

Finding the number of rows for all files within a folder

Hello I am trying to find the number of rows for all files within a folder. I am trying to do this for a folder that contains only “.txt” files and for a folder that contains .”csv” files.

I know that the way to get the number of rows for a SINGLE “.txt” file is something like this:

file = open("sample.txt","r") 
Counter = 0
  
Content = file.read() 
CoList = Content.split("n") 
  
for i in CoList: 
    if i: 
        Counter += 1
          
print("This is the number of lines in the file") 
print(Counter) 

Whereas for a SINGLE “.csv” file is something like this:

file = open("sample.csv")
reader = csv.reader(file)
lines= len(list(reader))
print(lines)

But how can I do this for ALL files within a folder? That is, how can I loop each of these procedures across all files within a folder and, ideally, export the output into an excel sheet with columns akin to these:

Filename  Number of Rows
1.txt     900
2.txt     653

and so on and so on.

Thank you so much for your help.

Advertisement

Answer

You can use glob to detect the files and then just iterate over them.

Other methods : How do I list all files of a directory?

import glob

# 1. list all text files in the directory
rel_filepaths = glob.glob("*.txt")

# 2. (optional) create a function to read the number of rows in a file
def count_rows(filepath):
  res = 0
  f = open(filepath, 'r')
  res = len(f.readlines())
  f.close()

  return res

# 3. iterate over your files and use the count_row function
counts = [count_rows(filepath) for filepath in rel_filepaths]

print(counts)

Then, if you want to export this result in a .csv or .xslx file, I recommend using pandas.

import pandas as pd

# 1. create a new table and add your two columns filled with the previous values
df = pd.DataFrame()
df["Filename"] = rel_filepaths
df["Number of rows"] = counts

# 2. export this dataframe to `.csv`
df.to_csv("results.csv")

You can also use pandas.ExcelWriter() if you want to use the .xlsx format. Link to documentation & examples : Pandas – ExcelWriter doc

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement