Skipping variable number of C-style comment lines when using pandas read_table

The pandas read_table() function enables us to read *.tab file and the parameter skiprow provides flexible ways to retrieve the data. However, I’m in trouble when I need to read *.tab file in a loop but the number of the rows need to skip is random. For example, the contents need to skip are started with /* and ended with */ , such as:

/*
... 
The number of rows need to skip is random
...
*/

JavaScript
​x
 
/*
... 
The number of rows need to skip is random
...
*/
​

So how do I find the line of the */ and then use the parameter skiprow?

Answer

Consume rows until the current row starts with '*/':

with open('data.txt') as fp:
    for row in fp:
        if row.startswith('*/'):
            df = pd.read_table(fp)

JavaScript
 
with open('data.txt') as fp:
    for row in fp:
        if row.startswith('*/'):
            df = pd.read_table(fp)
​

Advertisement

Answer