Skip to content
Advertisement

pandas.read_csv() returns strings from columns instead numbers

I am trying to find linear regression plot for the data provided

import pandas
from pandas import DataFrame
import matplotlib.pyplot

data = pandas.read_csv('cost_revenue_clean.csv')
data.describe()

X = DataFrame(data,columns=['production_budget_usd'])
y = DataFrame(data,columns=['worldwide_gross_usd'])

when I try to plot it

matplotlib.pyplot.scatter(X,y)
matplotlib.pyplot.show()

the plot was completely empty and when I printed the type of X

for element in X:
    print(type(element))

it shows the type is string.. Where am I standing wrong???

Advertisement

Answer

No need to make new DataFrames for X and y. Try astype(float) if you want them as numeric:

X = data['production_budget_usd'].astype(float)
y = data['worldwide_gross_usd'].astype(float)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement