I have two sort of files, xml files
and txt files
. The files have a date in their name. If the date of the xml file
matches the date of a txt file
I want to open the txt file
do some processing and write the output to a list. After that I want to change the xml file
. Multiple xml files
can have the same date but the txt file
is unique so this means that more then 1 xml file
can be linked with a txt file.
Right now I have a problem. my to_csv
list contains data of both 20200907 and 20201025. I don’t want it to work like that. I want my to_csv
list just do one file (and thus one date) at a time.
output_xml = r"c:desktopenergyXML_Output"
output_txt = r"c:desktopenergyTXT_Output"
xml_name = os.listdir(output_xml )
txt_name = os.listdir(output_txt)
txt_name = [x.replace('-', '') for x in txt_name] #remove the - in the filenames
# Extract the date from the xml and txt files.
xml_dates = []
for file in xml_name:
find = re.search("_(.d+)-", file).group(1)
xml_dates.append(find)
txt_dates = []
for file in txt_name:
find = re.search("MM(.+?)AB", file).group(1)
txt_dates.append(find)
#THIS IS SOME REPRODUCABLE OUTPUT FROM WHAT IS RECEIVED FROM ABOVE SNIPPET.
xml_dates = ['20200907', '20200908', '20201025', '20201025', '20201025', '20201025']
txt_dates = ['20200907', '20201025']
to_csv = []
for date_xml in xml_dates:
for date_txt in txt_dates:
if date_xml == date_txt:
match_txt = [s for s in txt_name if date_txt in s] # matching txt file
match_xml = [s for s in xml_name if date_xml in s] # matching xml file
match_txt_temp = match_txt[0]
match_txt_score = [match_txt_temp[:6]+'-'+match_txt_temp[6:8]+'-'+match_txt_temp[8:10]+'-'+match_txt_temp[10:12]+match_txt_temp[12:]]
with open(output_txt + "/" + match_txt_score[0], "r") as outer:
reader = csv.reader(outer, delimiter="t")
for row in reader:
read = [row for row in reader if row]
for row in read:
energy_level = row[20]
if energy_level > 250:
to_csv.append(row)
print(to_csv)
Current output:
[['1', '2', '3', '20200907', '4', '5'],
['1', '2', '3', '20200907', '4', '5'],
['1', '2', '3', '20200907', '4', '5'],
['1', '2', '3', '20201025, '4', '5'],
['1', '2', '3', '20201025, '4', '5']]
Desired output:
[[['1', '2', '3', '20200907', '4', '5'],
['1', '2', '3', '20200907', '4', '5'],
['1', '2', '3', '20200907', '4', '5']],
['1', '2', '3', '20201025, '4', '5'],
['1', '2', '3', '20201025, '4', '5']]
Advertisement
Answer
You said that you have only one txt file by date and only want to process xml files if they are linked to a txt file. That means that one single loop over txt_dates is enough:
for date_txt in txt_dates:
date_xml = date_txt
match_txt = [s for s in txt_name if date_txt in s] # the matching txt file
match_xml = [s for s in xml_name if date_xml in s] # possible matching xml files
if len(match_xml) == 0: # no matching xml files
continue
match_txt_temp = match_txt[0]
match_txt_score = [match_txt_temp[:6]+'-'+match_txt_temp[6:8]+'-'
+match_txt_temp[8:10]+'-'+match_txt_temp[10:12]
+match_txt_temp[12:]]
# prepare a new list for that date
curr = list()
with open(output_txt + "/" + match_txt_score[0], "r") as outer:
reader = csv.reader(outer, delimiter="t")
for row in reader:
read = [row for row in reader if row]
for row in read:
energy_level = row[20]
if energy_level > 250:
curr.append(row)
if len(curr) > 0: # if the current date list is not empty append it
to_csv.append(curr)
print(to_csv)
BEWARE: as what you have provided is not a reproducible example I could not test the above code and typos are possible…