I have a DataFrame df
with some country statistics for years from 2014 to 2018. Some of the countries have values for each of the years, while some countries are missing some. The DataFrame looks like this:
countryName 2014 2015 2016 2017 2018 Afghanistan .. .. 281 .. .. Albania .. .. .. 891 901 Algeria 791 801 804 817 820 ...
I want to keep only the most recent data value, so for the DataFrame above, the result should be:
countryName value Afghanistan 281 # the most recent value from 2016 Albania 901 # the most recent value from 2018 Algeria 820 # the most recent value from 2018 ...
Advertisement
Answer
You could use the forward fill on the columns axis to get the last numeric values
data = [{'countryName': 'Afghanistan', '2014': nan, '2015': nan, '2016': 281.0, '2017': nan, '2018': nan}, {'countryName': 'Albania', '2014': nan, '2015': nan, '2016': nan, '2017': 891.0, '2018': 901.0}, {'countryName': 'Algeria', '2014': 791.0, '2015': 801.0, '2016': 804.0, '2017': 817.0, '2018': 820.0}] df = pd.DataFrame(data) df.ffill(1).iloc[:, [0, -1]].set_axis(['countryName', 'value'], axis=1) countryName value 0 Afghanistan 281.0 1 Albania 901.0 2 Algeria 820.0