How to tell to shlex
that if the character ;
is found, then, don’t split anything anymore?
Example:
shlex.split("""hello "column number 2" foo ; bar baz""")
should give
["hello", "column number 2", "foo", "; bar baz"]
instead of ["hello", "column number 2", "foo", ";", "bar", "baz"]
.
More generally, is there a way to define “comment” separators with shlex
? i.e.
shlex.split("""hello "column number 2" foo ;this is a comment; "last one" bye """)
should give
["hello", "column number 2", "foo", ";this is a comment;", "last one", "bye"]
Advertisement
Answer
The shlex
parser provides an option for specifying the comment character(s), but it’s not available from the simplified shlex.split
interface. Example:
import shlex a = 'hello "bla bla" ; this is a comment' lex = shlex.shlex(a, posix=True) lex.commenters = ';' print(list(lex)) # ['hello', 'bla bla']
Here is a slightly expanded split
function, mostly copied from the Python standard library, with a slight modification to the comments
parameter, allowing the specification of comment characters:
import shlex def shlex_split(s, comments='', posix=True): """Split the string *s* using shell-like syntax.""" if s is None: import warnings warnings.warn("Passing None for 's' to shlex.split() is deprecated.", DeprecationWarning, stacklevel=2) lex = shlex.shlex(s, posix=posix) lex.whitespace_split = True if isinstance(comments, str): lex.commenters = comments elif not comments: lex.commenters = '' return list(lex)
You might want to change the default value of comments
in the above code; as written, it has the same default as shlex.split
, which is not to recognise comments at all. (The parser objects created by shlex.shlex
default to #
as the comment character, which is what you get if you specify comments=True
. I preserved this behaviour for compatibility.)
Note that comments are ignored; they do not appear in the result vector at all. When the parser hits a comment character, it just stops parsing. (So there can never be two comments.) The comments
string is a list of possible comments characters, not a comment sequence. So if you want to recognise both #
and ;
as comment characters, specify comments='#:'
.
Here’s a sample run:
>>> # Default behaviour is the same as shlex.split >>> shlex_split("""hello "column number 2" foo ; bar baz""") ['hello', 'column number 2', 'foo', ';', 'bar', 'baz'] >>> # Supply a comments parameter to specify a comment character >>> shlex_split("""hello "column number 2" foo ; bar baz""", comments=';') ['hello', 'column number 2', 'foo'] >>> shlex_split("""hello "column number 2" foo ;this is a comment; "last one" bye """, comments=';') ['hello', 'column number 2', 'foo'] >>> # The ; is recognised as a comment even if it is not preceded by whitespace. >>> shlex_split("""hello "column number 2" foo;this is a comment; "last one" bye """, comments=';') ['hello', 'column number 2', 'foo']