I want to read and process a csv file with pandas. The file (as seen below) contains multiple header lines which are indicated by a #
tag. I can import that file easily by using
import pandas as pd file = "data.csv" data = pd.read_csv(file, delimiter="s+", names=["Time", "Cd", "Cs", "Cl", "CmRoll", "CmPitch", "CmYaw", "Cd(f)", "Cd(r)", "Cs(f)", "Cs(r)", "Cl(f)", "Cl(r)"], skiprows=13)
However, I have a lot of such files with different header names and I don’t want to name them (Time Cd Cs ...
) manually. Also the number of commented lines is different between each file. So I want to automate that task.
Do I have to use something like regular expression here, before passing the data into a pandas dataframe?
Thanks for any advise.
And yes, the header names are also beginning with an #
.
data.csv:
# Force coefficients # dragDir : (9.9735673312816520e-01 7.2660490528994301e-02 0.0000000000000000e+00) # sideDir : (0.0000000000000000e+00 0.0000000000000000e+00 -1.0000000000000002e+00) # liftDir : (-7.2660490528994315e-02 9.9735673312816520e-01 0.0000000000000000e+00) # rollAxis : (9.9735673312816520e-01 7.2660490528994301e-02 0.0000000000000000e+00) # pitchAxis : (0.0000000000000000e+00 0.0000000000000000e+00 -1.0000000000000002e+00) # yawAxis : (-7.2660490528994315e-02 9.9735673312816520e-01 0.0000000000000000e+00) # magUInf : 4.5000000000000000e+01 # lRef : 5.9399999999999997e-01 # Aref : 3.5639999999999999e-03 # CofR : (1.4999999999999999e-01 0.0000000000000000e+00 0.0000000000000000e+00) # # Time Cd Cs Cl CmRoll CmPitch CmYaw Cd(f) Cd(r) Cs(f) Cs(r) Cl(f) Cl(r) 5e-06 1.8990180226147195e+00 1.4919925634649792e-11 2.1950119509976829e+00 -1.1085971520784955e-02 -1.0863798447281650e+00 9.5910040927874810e-03 9.3842303978657482e-01 9.6059498282814471e-01 9.5910041002474442e-03 -9.5910040853275178e-03 1.1126130770676479e-02 2.1838858202270064e+00 1e-05 2.1428508927716594e+00 1.0045114197556737e-08 2.5051633252700962e+00 -1.2652317494411272e-02 -1.2367567798452046e+00 1.0822379290263353e-02 1.0587731288914184e+00 1.0840777638802410e+00 1.0822384312820453e-02 -1.0822374267706254e-02 1.5824882789843508e-02 2.4893384424802525e+00 ...
Advertisement
Answer
What about extracting the header before you read the file?
We only assume that your header lines start with #
. Extraction of the header as well as its position in the file is automated. We also ensure that no more lines than necessary are read (except the first data line).
with open(file) as f: line = f.readline() cnt = 0 while line.startswith('#'): prev_line = line line = f.readline() cnt += 1 # print(prev_line) header = prev_line.strip().lstrip('# ').split() df = pd.read_csv(file, delimiter="s+", names=header, skiprows=cnt )
With this, you can also proccess the other header lines. It also gives you the position of the header in the file.