I am analyzing a consumer survey and there are both Dutch (NL) and French (FR) respondents. Depending on the answer they gave when we asked about their mother language they got the same questionnaire but translated in Dutch or French. The problem is that the output of Qualtrics (the survey software) gave us the following output:
User_Language | Q1_NL | Q2_NL | … | Q1_FR | Q_FR | … |
---|---|---|---|---|---|---|
NL | 1 | 3 | … | … | ||
NL | 4 | 4 | .. | … | ||
NL | 1 | 3 | … | … | ||
NL | 2 | 5 | … | … | ||
… | … | … | … | … | … | … |
FR | … | 3 | 2 | … | ||
FR | .. | 4 | 3 | … | ||
FR | … | 2 | 5 | … | ||
FR | … | 1 | 2 | … |
As you can see the answers for the Dutch-speaking participants were recorded in the first n columns, while the French answers were recorded in the following n columns.
How can I cut the French answers from the last n columns and append them on the bottom of the DataFrame as those are answers to the exact same questions?
Thanks!
EDIT: Solutions that make use of grouping by the strings “Q1” or “Q2” are unfortunately not viable as the column names are actually the questions, this was just an example value. I do know the exact range of the French answers and the Dutch answers.
Advertisement
Answer
If there aren’t any overlapping values you could simply split the data frame into two data frames based on where the dutch values “stop” and the french “start” rename the value columns of those two data frames Q2_NL and so on to simply Q2 … and then concatenate those frames again into one.