How to load SVMlight format files in compressed form to pandas?

Question

I have data in SVMlight format (label feature1:value1 feature2:v2 ...) as such I tried sklearn.load_svmlight_file but it doesn't seem to work with categorical string features and labels. I am trying to store it into pandas DataFrame. Any pointers would be appreciated. Answer You can do it by hand... One way you can convert the file you want in a DataFrame:

Accepted Answer

You can do it by hand&#8230;One way you can convert the file you want in a DataFrame:svmformat_file = """~/svmformat_file_sample"""# Read to listwith open(svmformat_file, mode="r") as fp:    svmformat_list = fp.readlines()# For each line we save the key:values to a dictpandas_list = []for line in svmformat_list:    line_dict = dict()    line_split = line.split(' ')    line_dict["label"] = line_split[0]    for col in line_split[1:]:        col = col.rstrip()  # Remove 'n'        col_split = col.split(':')        key, value = col_split[0], col_split[1]        line_dict[key] = value    pandas_list.append(line_dict)The result DataFrame with your example file:pd.DataFrame(pandas_list)

Advertisement

Answer