Skip to content
Advertisement

How to use a specific stop-character for shlex.split?

How to tell to shlex that if the character ; is found, then, don’t split anything anymore?

Example:

JavaScript

should give

JavaScript

instead of ["hello", "column number 2", "foo", ";", "bar", "baz"].


More generally, is there a way to define “comment” separators with shlex? i.e.

JavaScript

should give

JavaScript

Advertisement

Answer

The shlex parser provides an option for specifying the comment character(s), but it’s not available from the simplified shlex.split interface. Example:

JavaScript

Here is a slightly expanded split function, mostly copied from the Python standard library, with a slight modification to the comments parameter, allowing the specification of comment characters:

JavaScript

You might want to change the default value of comments in the above code; as written, it has the same default as shlex.split, which is not to recognise comments at all. (The parser objects created by shlex.shlex default to # as the comment character, which is what you get if you specify comments=True. I preserved this behaviour for compatibility.)

Note that comments are ignored; they do not appear in the result vector at all. When the parser hits a comment character, it just stops parsing. (So there can never be two comments.) The comments string is a list of possible comments characters, not a comment sequence. So if you want to recognise both # and ; as comment characters, specify comments='#:'.

Here’s a sample run:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement