I’m trying to create a dataframe containing the values from field_1
and field_2
in a single column. I haven’t used pandas a whole lot before, so I’m sure this is naive.
JavaScript
x
12
12
1
# Create a dataset in CSV format
2
field_names = ["description", "comments"]
3
writer = csv.writer(open("dataset.csv", "w"), quoting=csv.QUOTE_ALL, delimiter=",")
4
writer.writerow(field_names)
5
for instance in Order.objects.all():
6
writer.writerow([str(getattr(instance, f)) for f in field_names])
7
8
# Read CSV
9
data_frame = pd.read_csv("dataset.csv", index_col=0)
10
# Combine columns
11
df2 = data_frame.apply(lambda x: ", ".join(x[x.notnull()]), axis=1)
12
If I’m working with a fairly large dataset, is there a way I can make this more efficient? I would like to eliminate the step that creates the CSV entirely, if possible. If anybody can point me in the right direction, that’d be fantastic.
Advertisement
Answer
You really don’t need:
- the order objects and
getattr
; use.values_list()
to get an iterable of 2-tuples (assumingfield_names
are actual fields on the model). - CSV – now that you have an iterable of 2-tuples, pass them to the
DataFrame
constructor along with the respective column names.
JavaScript
1
6
1
field_names = ["description", "comments"]
2
df = pd.DataFrame.from_records(
3
Order.objects.all().values_list(field_names),
4
columns=field_names,
5
)
6
- Necessarily even Pandas
JavaScript
1
14
14
1
from django.db.models import F, Value
2
from django.db.models.functions import Concat
3
4
# ...
5
my_data = list(
6
Order.objects.annotate(
7
x=Concat(
8
F("description"),
9
Value(", "),
10
F("comments"),
11
)
12
).values_list("x", flat=True)
13
)
14
and you have a list of description, comments
strings, just like the series you’d get with Pandas.