Skip to content
Advertisement

Using pyparsing, how can I group expressions that are matched by OneOrMore(expre1|expr2)?

My website receives allows users to post a string that contains several Questions followed by multiple choice answers. There is an enforced style-guide that allows the results to be parsed by Regex and then Questions + MCQ choices are stored in a database, to be later returned in randomized practice exams.

I wanted to transition over to pyparsing, because the regex is not immediately readable and I feel a little locked in with it. I would like to have the option to easily expand functionality of my questionparser, and with Regex it feels very cumbersome.

User input is in the form of:

JavaScript

Long user-input string is separated into question-answers, deliminated by the the next question-answer group’s q-start. Questions are all text between q-start and a-start. Answers are a list of all text between a-start and a-start or the following q-start.

Sample text:

JavaScript

Regex I have been using:

JavaScript

Then a little python and re to trim away question numbers, mcq letters, join all broken lines in question, append MCQs into a list.

In pyparsing I have tried this:

JavaScript

Results:

JavaScript

This does match questions and answers, and properly bypasses linebreaks that may interrupt questions or answers. The issue I am having is that they are not grouped the way I expected. I was expecting something along the lines of group[0] = question, answer[1:4] group[2] = question, answer[1:4]

Does anyone have any advice?

Thanks!

Advertisement

Answer

I think you were on the right track – I took a separate pass at your parser and came up with very similar constructs, but just a few differences.

JavaScript

The biggest difference was that the expression I used to parse your text did not just do OneOrMore(question | answer), but instead defined a Group(question + OneOrMore(answer)). This creates a group for each question and its related answers. In your parser, using listAllMatches just creates one results name for all the questions, and another for all the answers, but loses all the associations between them. By creating the “question + one or more answers” group, then these associations are maintained.

If you want to remove the ‘n’s, you can do that more easily with a parse action than with the EOL business.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement