How to use a specific stop-character for shlex.split?

Question

How to tell to shlex that if the character ; is found, then, don't split anything anymore? Example: should give instead of ["hello", "column number 2", "foo", ";", "bar", "baz"]. More generally, is there a way to define "comment" separators with shlex? i.e. should give Answer The shlex parser provides an option for specifying the comment character(s), but it's not

Accepted Answer

The shlex parser provides an option for specifying the comment character(s), but it&#8217;s not available from the simplified shlex.split interface. Example:import shlexa = 'hello "bla bla" ; this is a comment'lex = shlex.shlex(a, posix=True)lex.commenters = ';'print(list(lex))  # ['hello', 'bla bla']Here is a slightly expanded split function, mostly copied from the Python standard library, with a slight modification to the comments parameter, allowing the specification of comment characters:import shlexdef shlex_split(s, comments='', posix=True):    """Split the string *s* using shell-like syntax."""    if s is None:        import warnings        warnings.warn("Passing None for 's' to shlex.split() is deprecated.",                      DeprecationWarning, stacklevel=2)    lex = shlex.shlex(s, posix=posix)    lex.whitespace_split = True    if isinstance(comments, str):        lex.commenters = comments    elif not comments:        lex.commenters = ''    return list(lex)You might want to change the default value of comments in the above code; as written, it has the same default as shlex.split, which is not to recognise comments at all. (The parser objects created by shlex.shlex default to # as the comment character, which is what you get if you specify comments=True. I preserved this behaviour for compatibility.)Note that comments are ignored; they do not appear in the result vector at all. When the parser hits a comment character, it just stops parsing. (So there can never be two comments.) The comments string is a list of possible comments characters, not a comment sequence. So if you want to recognise both # and ;  as comment characters, specify comments='#:'.Here&#8217;s a sample run:>>> # Default behaviour is the same as shlex.split>>> shlex_split("""hello "column number 2" foo ; bar baz""") ['hello', 'column number 2', 'foo', ';', 'bar', 'baz']>>> # Supply a comments parameter to specify a comment character >>> shlex_split("""hello "column number 2" foo ; bar baz""", comments=';') ['hello', 'column number 2', 'foo']>>> shlex_split("""hello "column number 2" foo ;this is a comment; "last one" bye """, comments=';')['hello', 'column number 2', 'foo']>>> # The ; is recognised as a comment even if it is not preceded by whitespace.>>> shlex_split("""hello "column number 2" foo;this is a comment; "last one" bye """, comments=';')['hello', 'column number 2', 'foo']

Advertisement

Answer