Loading CSV into dataframe results in all records …

I’m new to python (and posting on SO), and I’m trying to use some code I wrote that worked in another similar context to import data from a file into a MySQL table. To do that, I need to convert it to a dataframe. In this particular instance I’m using Federal Election Comission data that is pipe-delimited (It’s the “Committee Master” data here). It looks like this.

C00000059|HALLMARK CARDS PAC|SARAH MOE|2501 MCGEE|MD #500|KANSAS CITY|MO|64108|U|Q|UNK|M|C|| C00000422|AMERICAN MEDICAL ASSOCIATION POLITICAL ACTION COMMITTEE|WALKER, KEVIN MR.|25 MASSACHUSETTS AVE, NW|SUITE 600|WASHINGTON|DC|200017400|B|Q||M|M|ALABAMA MEDICAL PAC| C00000489|D R I V E POLITICAL FUND CHAPTER 886|JERRY SIMS JR|3528 W RENO||OKLAHOMA CITY|OK|73107|U|N||Q|L|| C00000547|KANSAS MEDICAL SOCIETY POLITICAL ACTION COMMITTEE|JERRY SLAUGHTER|623 SW 10TH AVE||TOPEKA|KS|666121627|U|Q|UNK|Q|M|KANSAS MEDICAL SOCIETY| C00000729|AMERICAN DENTAL ASSOCIATION POLITICAL ACTION COMMITTEE|DI VINCENZO, GIORGIO T. DR.|1111 14TH STREET, NW|SUITE 1100|WASHINGTON|DC|200055627|B|Q|UNK|M|M|INDIANA DENTAL PAC|

When I run this code, all of the records come back “NaN.”

import pandas as pd
import pymysql

print('convert CSV to dataframe')
data = pd.read_csv ('Desktop/Python/FECupdates/cm.txt', delimiter='|')
df = pd.DataFrame(data, columns=['CMTE_ID','CMTE_NM','TRES_NM','CMTE_ST1','CMTE_ST2','CMTE_CITY','CMTE_ST','CMTE_ZIP','CMTE_DSGN','CMTE_TP','CMTE_PTY_AFFILIATION','CMTE_FILING_FREQ','ORG_TP','CONNECTED_ORG_NM','CAND_ID'])

print(df.head(10))

If I remove the dataframe part and just do this, it displays the data, so it doesn’t seem like it’s a problem with file itself (but what do I know?):

import pandas as pd
import pymysql
print('convert CSV to dataframe')
data = pd.read_csv ('Desktop/Python/FECupdates/cm.txt', delimiter='|')   
print(data.head(10))

I’ve spent hours looking at different questions here that seem to be trying to address similar issues — in which cases the problems apparently stemmed from things like the encoding or different kinds of delimiters — but each time I try to make the same changes to my code I get the same result. I’ve also converted the whole thing to a csv, by changing all the commas in fields to “$” and then changing the pipes to commas. It still shows up as all “Nan,” even though the number of records is correct if I upload it to MySQL (they’re just all empty).

Answer

You made typos in columns list. Pandas can automatically recognize columns.

import pandas as pd
import pymysql

print('convert CSV to dataframe')
data = pd.read_csv ('cn.txt', delimiter='|')
df = pd.DataFrame(data)

print(df.head(10))

Also, you can create an empty dataframe and concatenate the readed file.

import pandas as pd
import pymysql

print('convert CSV to dataframe')
data = pd.read_csv ('cn.txt', delimiter='|')
data2 = pd.DataFrame()
df = pd.concat([data,data2],ignore_index=True)
print(df.head(10))

Loading CSV into dataframe results in all records becoming “NaN”

Advertisement

Answer