Skip to content
Advertisement

Error tokenizing data. C error: Expected x fields in line 5, saw x

I keep getting this error. I don’t even know how to identify the row that is in error as the data I am requesting is jumbled. I can’t provide a URL to the API but I will provide a sample of the first few lines of data.

My code:

url = "url"
payload={}
headers = {}
data = requests.request("GET", url, headers=headers, data=payload)
df = pd.read_csv(io.BytesIO(data.content))
print(df)

Error:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 5, saw 7

Data from API:

fieldId^raceId^tabNo^position^margin^horse^trainer^jockey^weight^barrier^inRun^flucs^priceSP^priceTAB^stewards^rating^priceBF^horseId^priceTABVic^gears^sex^age^rno^neural^sire^dam^foalDate^jockeyId^trainerId^claim
6043351^894992^3^1^0.1^Harley Street^Natalie Jarvis^Jack Martin^59.5^7^settling_down,1;m800,1;m400,1;^opening,2.10;starting,2.60;^2.6^2.5^^34^2.97625207901001^930781^2.7^^Gelding^3^5^6.38^Exceed And Excel^Avenue^01/08/2018^15238^25478^0^
6043349^894992^1^2^0.1^Eurosay^Todd Smart^Damon Budler^60^2^settling_down,5;m800,5;m400,5;^opening,7.50;starting,10.00;^10^8.2^Held up in straight.^43^13.4302415847778^880761^8.3^^Gelding^6^41^5.95^Eurozone^Magsaya^18/10/2015^16352^26343^1.5^
6043355^894992^7^3^0.3^Titan Star^M F Van Gestel^G Buckley^55.5^1^settling_down,4;m800,4;m400,3;^opening,8.00;starting,5.50;^5.5^6.2^Laid out at start.^60^6^924419^5.6^^Gelding^4^37^14.12^Rubick^Sporty Spur^14/10/2017^9670^3483^0^
6043350^894992^2^4^1.8^Vee Eight^Sue Laughton^Ms R Freeman-Key^61^5^settling_down,3;m800,3;m400,4;^opening,19.00;mid,21.00;starting,20.00;^20^23^^66^25^839743^18.8^^Gelding^8^43^5.29^Commands^Supamach^13/10/2013^12100^27227^0^
6043352^894992^4^5^3.2^Halliday Road^Ms T Bateup^Ms W Costin^58.5^4^settling_down,2;m800,2;m400,2;^opening,9.50;mid,13.00;starting,11.00;^11^11.4^Checked near 200m.^83^15^825899^11.7^^Gelding^8^77^4.49^Congrats^Nickynoo's Girl^12/08/2013^14984^23242^0^
6043353^894992^5^6^3.5^Monte Drifter^R & L Price^Brock Ryan^57.5^6^settling_down,7;m800,7;m400,7;^opening,5.00;mid,3.80;starting,4.00;^4^4^^71^4.5^944388^3.8^^Gelding^3^7^7.98^Capitalist^Belhamage^24/08/2018^15590^26970^0^
6043354^894992^6^7^3.8^Blackhill Kitty^Natalie Jarvis^Ms J Taylor^55.5^3^settling_down,6;m800,6;m400,6;^opening,7.00;mid,6.50;starting,9.00;^9^8.4^^43^9^921457^8.8^Bubble cheeker near side first time. ^Mare^4^11^14.85^Ready For Victory^Bad Kitty^16/10/2017^7901^25478^0^

Advertisement

Answer

Since you don’t specify a separator for columns in the data, python has to guess and it guessed wrong. Be specific.

data = pd.read_csv(io.BytesIO(data.content), sep="^")
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement