Euskalmet open data

javi | May 8, 2025, 9:07 a.m.

Euskalmet is the Basque Agency of Meteorology. Euskalmet collects daily meteorological data at several stations. A database containing this data for the years 2003 to 2012 is provided through the service Open Data Euskadi. This is a rich and interesting amount of information that is worth exploring.

The database is too large to be digested in one go. It contains data about air direction, temperature and visibility among other variables. All the variables are observed and recorded daily every ten minutes. Here, I show how to handle two main operations that are required to make the most of the database:

  1. Merge the source xml files containing the data by year. This step can be performed by means of this python script | merge.py.
    merge.py
    
    from lxml import etree
    import zipfile
    from datetime import date, time, datetime, timedelta
    import calendar
    
    # user defined variables 
    station = "C040"
    y0 = 2008
    yN = 2012
    
    # define the sequence of times (every 10 minutes) 
    # to be used with potential missing files and data
    allHours = [''] * 144
    tlast = datetime.combine(date.today(), time(0, 0))
    allHours[0] = tlast.strftime("%H:%M")
    for i in range(1, 144):
      tlast += timedelta(minutes = 10)
      allHours[i] = tlast.strftime("%H:%M")
    
    # main loop
    fout = open('%s.csv' % station, 'w')
    
    for yy in range(y0, yN + 1):
      print yy
      for imonth in range(1, 13):
        zipf = zipfile.ZipFile("../%s/%s_%s.zip" % (yy, station, yy))
        isNA = False
        try:      
          xmlf = zipf.open("%s/%s_%s_%s.xml" % (station, station, yy, imonth))
        except KeyError:
          try: # first try if month '1' is denoted '01'
            xmlf = zipf.open("%s/%s_%s_0%s.xml" % (station, station, yy, imonth))
          except KeyError: # the file does not exist (missing values)
    	isNA = True
    	temp = ''
        if isNA == False: # if file exists
          xmlData = etree.parse(xmlf)
          monthDays = xmlData.findall("//dia")
          print len(monthDays)
          for day in monthDays:
    	dayLabel = day.attrib['Dia']
    	hours = day.findall("hora")
    	for hour in hours:
    	  temp = hour.findtext("Meteoros/Tem.Aire._a_620cm")
    	  #print "%s; %s; %s" % (dayLabel, hour.attrib['Hora'], temp)
    	  #print hour.find("Meteoros/Tem.Aire._a_620cm")
    	  fout.write("%s;%s;%s\n" % (dayLabel, hour.attrib['Hora'], temp))
          xmlf.close()
        else: # if file does not exist (missing data for that year and month)
          ndays = calendar.monthrange(yy, imonth)[1]
          monthDays = range(1, ndays + 1)
          print "%s (missing)" % len(monthDays)
          for day in monthDays:
    	d = "0%s" % day if day < 10 else day
            month = "0%s" % imonth if imonth < 10 else imonth
    	dayLabel = "%s-%s-%s" % (yy, month, d)
    	for hour in allHours:
    	  temp = '' # NA
    	  fout.write("%s;%s;%s\n" % (dayLabel, hour, temp))
        zipf.close()
    fout.close()
    
    back top
  2. Filter the data by variable, date and time. This step can be carried out using this R script | select.R.
    select.R
    
    library("zoo")
    
    a <- read.csv(file = "C040.csv", header = FALSE, sep = ";", 
      colClasses = c("character", "character", "numeric"))
    
    d0 <- strsplit(a[1,1], "-")[[1]]
    h0 <- strsplit(a[1,2], ":")[[1]]
    d0 <- ISOdate(year = d0[1], month = d0[2], day = d0[3], 
      hour = h0[1], min = h0[2], sec = 0)
    dN <- strsplit(a[nrow(a),1], "-")[[1]]
    hN <- strsplit(a[nrow(a),2], ":")[[1]]
    dN <- ISOdate(year = dN[1], month = dN[2], day = dN[3], 
      hour = hN[1], min = hN[2], sec = 0)
    
    dates <- seq(d0, dN, by = "10 min")
    x <- zoo(a[,3], dates)
    
    # user defined parameters
    
    yy1 <- 2008
    mm1 <- 1
    dd1 <- 1
    h1 <- 0
    m1 <- 00
    yy2 <- 2011
    mm2 <- 12
    dd2 <- 31
    h2 <- 23
    m2 <- 50
    stat <- "mean"
    tscl <- "daily"
    
    p <- c(yy1, mm1, dd1, h1, m1, yy2, mm2, dd2, h2, m2, stat, tscl)
    
    # initial and end dates
    
    d0 <- ISOdate(year=p[1], month=p[2], day=p[3], hour=0, min=0, sec=0)
    dN <- ISOdate(year=p[6], month=p[7], day=p[8], hour=23, min=50, sec=0)
    
    a <- window(x, start = d0, end = dN)
    a0 <- a
    
    # time interval
    
    times <- as.numeric(format(time(a), "%H.%M"))
    
    t0 <- paste(p[4], p[5], sep =".")
    tN <- paste(p[9], p[10], sep = ".")
    ft <- cut(times, 
      breaks = unique(as.numeric(c(0, as.numeric(t0), as.numeric(tN) + 0.05, 23.55))), 
      include.lowest = TRUE, right = FALSE)
    #table(ft)
    
    length(ft) == length(a)
    
    if (as.numeric(t0) == 0) {
      a <- split(a, f = ft)[[1]]
    } else
      a <- split(a, f = ft)[[2]]
    
    # statistic and and time scale
    
    switch(p[11],
      "mean"    = FUN <- mean,
      "median"  = FUN <- median,
      "variance" = FUN <- var,
      "maximum"   = FUN <- max,
      "minimum"   = FUN <- min)
    
    switch(p[12],
      "hourly" = freq <- "60 mins",
      "daily"  = freq <- "1 days",
      "weekly" = freq <- "1 weeks",
      "monthly" = freq <- "1 months")
    
    fd <- cut(time(a), breaks = freq, 
      include.lowest = TRUE, right = FALSE)
    #table(fd)
    
    # split the series and obtain the statistic
    
    la <- split(a, f = fd)
    
    if (p[11] == "min-max")
    {
      la.min <- lapply(X = la, FUN = min, na.rm = TRUE)
      la.max <- lapply(X = la, FUN = max, na.rm = TRUE)
    
      xout1 <- zoo(unlist(la.min), as.Date(names(la.min)))
      xout1[is.nan(xout1)] <- NA
      xout1[!is.finite(xout1)] <- NA
      xout2 <- zoo(unlist(la.max), as.Date(names(la.max)))
      xout2[is.nan(xout2)] <- NA
      xout2[!is.finite(xout2)] <- NA
    
      fout <- matrix(nrow = length(xout1), ncol = 3)
      colnames(fout) <- c("date", "minimum", "maximum")
      fout[,1] <- gsub("-", "", time(xout1))
      fout[,2] <- round(xout1, 2)
      fout[,3] <- round(xout2, 2)
    } else {
      la.stat <- lapply(X = la, FUN = FUN, na.rm = TRUE)
    
      xout <- zoo(unlist(la.stat), as.Date(names(la.stat)))
      xout[is.nan(xout)] <- NA
    
      fout <- matrix(nrow = length(xout), ncol = 2)
      colnames(fout) <- c("date", p[11])
      fout[,1] <- gsub("-", "", time(xout))
      fout[,2] <- round(xout, 2)
    }
    
    write.csv(fout, file = "tmp.csv", row.names = FALSE, quote = FALSE)
    
    back top

The python script assumes that the source files are in the previous directory with respect to the path where the python file is located. The zip files for each must be already unzipped while the files containing the data for each station remain zipped. The script deals with potential missing files and with different naming conventions used in the source files, for example the month January is sometimes denoted '1' and other times '01'. Thus, all that is need is to define the variables 'station' (name of the meteorological station), 'y0' and 'yN' (initial and last year in the sample, respectively) at the top of the script.

One of the usefulness of the R script is that it returns the average values observed in a time interval. For example, we may be interested in temperature averages observed during daytime, nighttime or at a particular interval, e.g., from 8:00 AM to 10:00 AM. On the other hand, in addition to the mean, other relevant statistics such as the median, variance and minumum and maximum values can be fetched from the database. The user defined parameters are, in the order shown in the script: the starting year, month and day of the sample; the ending year, month and day; the time interval (minutes and hours); the statistic and the time scale.

The form below illustrates the output provided by these scripts for one of the series: temperature observed in the station labelled C040 from 2009 to 2011. Daily averages for this station and subsample are shown in the plot below. The source xml files are already merged; submitting the form runs the R script and displays the output series.

The form below is currently not available.




DayMonth Year
Start date:
End date:
Time interval (HH:MM)
From:
To:
Statistic:

Station C040: daily average temperature 2009-2011 (°C degrees)

About

This blog is part of jalobe's website.

jalobe.com

⚠️This site is currently being updated.
At present not all posts from jalobe's blog are available.