I have a Pyspark dataframe_Old (dfo) as below:
Id | neighbor_sid | neighbor | division |
---|---|---|---|
a1 | 1100 | Naalehu | Hawaii |
a2 | 1101 | key-west-fl | Miami |
a3 | 1102 | lubbock | Texas |
a10 | 1202 | bay-terraces | California |
I have a Pyspark dataframe_new (dfn) as below:
Id | neighbor_sid | neighbor | division |
---|---|---|---|
a1 | 1100 | Naalehu | Hawaii |
a2 | 1111 | key-largo-fl | Miami |
a3 | 1103 | grapevine | Texas |
a4 | 1115 | meriden-ct | Connecticut |
a12 | 2002 | east-louisville | Kentucky |
My objective is to find the Ids from dataframe_new in dataframe_old and replace them with the new values from dataframe_new
Final expected Pyspark dataframe updated – dataframe_old
Id | neighbor_sid | neighbor | division |
---|---|---|---|
a1 | 1100 | Naalehu | Hawaii |
a2 | 1111 | key-largo-fl | Miami |
a3 | 1103 | grapevine | Texas |
a4 | 1115 | meriden-ct | Connecticut |
a10 | 1202 | bay-terraces | California |
a12 | 2002 | east-louisville | Kentucky |
My wrong attempt at solving it as it is comparing column wise instead of row
JavaScript
x
14
14
1
dfO.alias('a').join(dfN.alias('b'), on=['id'], how='left')
2
.select(
3
'id',
4
f.when(
5
~f.isnull(f.col('b.id')),
6
f.col('b.id')
7
).otherwise(f.col('a.id')).alias('id'),
8
'b.col_3'
9
)
10
.union(dfO)
11
.dropDuplicates()
12
.sort('id')
13
.show()
14
Please help – would really appreciate any guidance!
Advertisement
Answer
We can do an outer join on the id
fields and then use coalesce()
to prioritize the fields from dfn
.
JavaScript
1
18
18
1
columns = ['id', 'neighbor_sid', 'neighbor', 'division']
2
3
dfo.
4
join(dfn, 'id', 'outer').
5
select(*['id'] + [func.coalesce(dfn[k], dfo[k]).alias(k) for k in columns if k != 'id']).
6
orderBy('id').
7
show()
8
9
# +---+------------+------------+-----------+
10
# | id|neighbor_sid| neighbor| division|
11
# +---+------------+------------+-----------+
12
# | a1| 1100| Naalehu| Hawaii|
13
# |a10| 1202|bay-terraces| California|
14
# | a2| 1111|key-largo-fl| Miami|
15
# | a3| 1103| grapevine| Texas|
16
# | a4| 1115| meriden-ct|Connecticut|
17
# +---+------------+------------+-----------+
18