Converting Pandas dataframe into Spark dataframe error

Question

I&#8217;m trying to convert Pandas DF into Spark one. DF head: Code: And I got an error: Answer You need to make sure your pandas dataframe columns are appropriate for the type spark is inferring. If your pandas dataframe lists something like: And you&#8217;re getting that error try: Now, make sure .astype(st…

Accepted Answer

You need to make sure your pandas dataframe columns are appropriate for the type spark is inferring.  If your pandas dataframe lists something like:pd.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 5062 entries, 0 to 5061Data columns (total 51 columns):SomeCol                    5062 non-null objectCol2                       5062 non-null objectAnd you&#8217;re getting that error try:df[['SomeCol', 'Col2']] = df[['SomeCol', 'Col2']].astype(str)Now, make sure .astype(str) is actually the type you want those columns to be.  Basically, when the underlying Java code tries to infer the type from an object in python it uses some observations and makes a guess, if that guess doesn&#8217;t apply to all the data in the column(s) it&#8217;s trying to convert from pandas to spark it will fail.

Advertisement

Answer