Why file row count is more than len(dataframe)?

Question

Good morning, I&#8217;m new to python and data analysis world, so bear with me. I&#8217;ve been trying to understand why when counting file rows it gives the right answer but after converting to dataframe and counting len(datafarme), it gives a rowcount-1. I&#8217;m sure it&#8217;s simple but I&#8217;ve googl…

Accepted Answer

Reason is first row of csv is converted to columns, for avoid it and set columns names by range use header=None parameter:filename = 'amazon_cells_labelled.txt'with open(filename, encoding="utf8") as f:    row_count = sum(1 for line in f)print(row_count)  # 1000#first row of csv is first row of data df1 = pd.read_csv(filename, sep='t', header=None)print(df1.shape[0])  # 1000print(len(df1))  # 1000print(len(df1.index))  # 1000Your code:#first row of csv is converted to columns namesdf1 = pd.read_csv(filename, sep='t')EDIT: In next files is used ", so pandas incorrect parsing. For avoid read starting by " and then next rows ending by " like one row use quoting=3 parameter for quoting=None:filename = 'imdb_labelled.txt'with open(filename, encoding="utf8") as f:    row_count = sum(1 for line in f)print(row_count)  # 1000df = pd.read_csv(filename, sep='t', header=None, quoting=3)print(len(df.index))  1000

Advertisement

Answer