How is it possible to achieve the following at the same time in python 3:
- Serialize column names and numerical data as a binary file
- Reopen the file and append additional numerical data
For example with the following data:
JavaScript
x
6
1
import numpy as np
2
3
columns = ['a', 'b', 'c']
4
data = np.linspace(0, 1, num=10*3).reshape((10, 3))
5
data_for_appending = np.linspace(2, 3, num=10*3).reshape((10, 3))
6
My approach with numpy
This approach allows to save data and append additional data. However the column names are missing and loading requires several calls to np.load.
JavaScript
1
10
10
1
# storing the data
2
with open('out.npy', 'wb') as f:
3
np.save(f, data)
4
np.save(f, data_for_appending)
5
6
# loading the data
7
with open('out.npy', 'rb') as f:
8
data1 = np.load(f)
9
data2 = np.load(f)
10
My approach with pandas
This approach saves the data and header. However it seems not possible to append data to the file in a separate call.
JavaScript
1
10
10
1
import pandas as pd
2
3
df = pd.DataFrame(data, columns=columns)
4
5
# storing the data
6
df.to_pickle('out.pickle')
7
8
# loading the data
9
df2 = pd.read_pickle('out.pickle')
10
Advertisement
Answer
JavaScript
1
32
32
1
import pickle
2
3
4
5
# Write first df to pickle
6
data = {
7
"name": ["Joe", "Mike", "Tony", "Susan"],
8
"course": ["Masters", "Doctorate", "Graduate", "Bachelors"],
9
"age": [27, 23, 21, 19],
10
}
11
df = pd.DataFrame(data)
12
df.to_pickle(path)
13
14
# Create new row df
15
new_row = {"name": "Phil", "course": "Associates", "age": 30}
16
new_row_df = pd.DataFrame(new_row, index=[0])
17
print(f"{new_row_df}n")
18
19
# read original df from pickle
20
pickled_df = pd.read_pickle(path)
21
22
# concat dfs
23
df_appended = pd.concat([new_row_df, pickled_df]).reset_index(drop=True)
24
25
# Dump concat df to pickle
26
with open(path, "wb") as f:
27
pickle.dump(df_appended, f)
28
29
# read concat df from pickle
30
df = pd.read_pickle(path)
31
print(df)
32
You can append to the file without reading but the dfs wont be concatenated they are seperate entries. You can ofcourse read all the entries in a loop and concat later when it’s time to read the file.
JavaScript
1
17
17
1
# Add new entries
2
with open(path, "ab") as f:
3
pickle.dump(new_df, f)
4
5
# When ready to read and concat.
6
with open(path, "rb") as f:
7
entries = []
8
while True:
9
try:
10
entry = pickle.load(f)
11
except EOFError:
12
break
13
entries.append(entry)
14
15
df = pd.concat(entries).reset_index(drop=True)
16
print(df)
17