Hello everyone,
I am using SWAN to analyze event data which are in .h5 file format. After the analysis code I save some variables onto my CERNbox. Everything works well, however, as I am accessing 1000s of files, speed of access is crucial. And that is when I noticed that the first 50 files open very quickly and then the whole process slows down. Is there a way to make my code fast the whole way?
import h5py
from os import listdir
import numpy as np
import math
import datetime as dt
import csv
#Define which days I want to look at
startDate = dt.date(2022,6,10)
endDate = dt.date(2022,6,20)
nextDay = dt.timedelta(days=1)
#Prepare the csv to save analysis data
csvfile = open('Test_Report.csv', 'w', newline='')
writer = csv.writer(csvfile) #This is the writer into csv
data = ['Date'] # This is the header
writer.writerow(data)
for nDays in range((endDate-startDate).days+1):#cycle through days
folderDate = startDate+nDays*nextDay
folder = '/eos/experiment/awake/event_data/'+str(folderDate.year)+'/'+folderDate.strftime('%m')+'/'+folderDate.strftime('%d') #Setting the folder I want to go through
try: #just make sure that we can open the folder and find files
files = listdir(folder)
print(folderDate.strftime("%d/%m/%Y"), len(files))
except:
print('Folder:'+folder+' could not have been opened')
files = []
count = 0 #Stupid way to count how many events there are each day
for nFile in files: # cycle through all files in the folder
f = h5py.File(folder+'/'+nFile,'r') #opening h5 file
print(f'\r %d' % count, end = '\r')
#Some more fields are here, not important for this minimal example
try:
LaserEnergy = f['AwakeEventData/EMETER04/Acq/value'][0]
except:
LaserEnergy = -1
data[0] = folderDate.strftime("%d/%m/%Y")
data[1] = LaserEnergy
writer.writerow(data)
f.close()
count = count+1
csvfile.close()
print('DONE!')
Thank you very much,
Jan