Skip to content
Advertisement

Regexp_replace “,” with “.” every other commas in spark

I have a dataframe that instead of . it has , and separators of numbers are also comma, I need to replace only odd comma to dot. The dataframe is very big but as an example, I have this:

+---+-----------------+
|id |values           |
+---+-----------------+
| 1 | 12,3,10,4,11,5  |
+---+-----------------+

I want this df:

+---+-----------------+
|id |values           |
+---+-----------------+
| 1 | 12.3,10.4,11.5  |
+---+-----------------+

Advertisement

Answer

You can split on all commas , and later you can use for-loop:

  1. with range(0, len(splitted_data), 2) to create pairs [0:2], [2:4], …, [n:n+2] and join them to strings with dots:
data = '12,3,10,4,11,5'

splitted_data = data.split(',')

new_values = []

for n in range(0, len(splitted_data), 2):
    pair = splitted_data[n:n+2]
    text = '.'.join(pair)

    new_values.append(text)

    print(text)

# -- after loop ---
    
data = ','.join(new_values)

print(data)    
  1. with iter() with zip() to create pairs and join them to strings with dots:
data = '12,3,10,4,11,5'

splitted_data = data.split(',')

iterator = iter(splitted_data)

new_values = []

for pair in zip(iterator, iterator):
    text = '.'.join(pair)

    new_values.append(text)

    print(text)

# -- after loop ---

data = ','.join(new_values)

print(data)    

Result:

12.3
10.4
11.5
12.3,10.4,11.5

EDIT:

You may also use regex for this

import re

data = '12,3,10,4,11,5'

print(re.sub('(d+),(d+)', '\1.\2', data))

Result:

12.3,10.4,11.5
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement