I have a single column of strings that contain alpha numeric characters as follows:
AA128A
AA128B
AA128C
AA128D
AA128E
AA129A
AA129B
AA129C
CP100-10
CP100-11
CP100-12
CP100-13
CORSTG11A
CORSTG11B
CORSTG11C
I’m wanting to explode each individual character into separate columns and convert all alpha characters into their ASCII decimal value and retain the numeric values as they are. If the value is null after exploding the values, I want to replace it with -1.
I have been able to explode the values and replace nulls, however when I attempt to iterate over the values with the ord() function to convert the alpha characters, I get the error:
ord() expected string of length 1, but int found
Even if I create conditional analysis on the datatype within a for loop.
import numpy as np import pandas as pd from sklearn.preprocessing import OrdinalEncoder from pandas.api.types import is_string_dtype from pandas.api.types import is_numeric_dtype loc_df = pd.read_csv('C:\path\to\file.csv',index_col=False) # new data frame with split value columns explode_df = loc_df["stoloc"].apply(lambda x: pd.Series(list(x))) explode_df = explode_df.fillna(-1) #Convert alpha characters to numeric for char in explode_df: if is_string_dtype(explode_df[char]): explode_df_numeric[char] = ord(char) else: explode_df_numeric[char] = char
Advertisement
Answer
The reason you got that error is that the variable char
is the column name, and that’s not the right arg for ord
. You should pass the values in that column instead; you can use apply
or map
for that.
if is_string_dtype(explode_df[char]): explode_df[char] = explode_df[char].apply(ord) else: explode_df[char] = explode_df[char]
But there are other issues in your code. A for loop over the columns and checking the type of the column does not solve the problem because there are columns that contain both strings and integers. A simple solution is an applymap with is_int
check:
def is_int(s): try: int(s) return True except: return False # new data frame with split value columns explode_df = loc_df["stoloc"].apply(list).apply(pd.Series) explode_df = explode_df.fillna(-1) explode_df_numeric = explode_df.applymap(lambda x: x if is_int(x) else ord(x))