Skip to content
Advertisement

I need help formating this data

I have data like this

id,phonenumbers,firstname,lastname,email,birthday,gender,locale,hometown,location,link

The problem is some data is not in the format like this

000000,000000,name1,name2,email@email,1 1 1990,female,en_En,new york,USA ,new yourk,https://www.example.com

As you can see in the “local,hometown” there are 3 commas, I want to delete one of them so the data become like this

000000,000000,name1,name2,email@email,1 1 1990,female,en_En ,new york USA, new yourk,https://www.example.com

This is just an example to the problem in my data there could be more than 3 commas and different addresses

Essentially I want to load the data into excel and have it show up clean each column with the right data

Advertisement

Answer

The problem is that a value is split into multiple colums when it should be in one column. If this is only possible with one column but we have a fixed number of columns before and after, then it’s possible to fix it:

testdata = "000000,000000,name1,name2,email@email,1 1 1990,female,en_En,new york,USA ,new yourk,https://www.example.com"

def split(data, cols_before_addr=8, cols_after_addr=1):    
    raw_cols = data.split(',')
    return  raw_cols[:cols_before_addr] 
          + ["n".join(raw_cols[cols_before_addr:-cols_after_addr])] 
          + raw_cols[-cols_after_addr:]

print(split(testdata))
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement