Euskalmet open data

javi | May 8, 2025, 9:07 a.m.

Euskalmet is the Basque Agency of Meteorology. Euskalmet collects daily meteorological data at several stations. A database containing this data for the years 2003 to 2012 is provided through the service Open Data Euskadi. This is a rich and interesting amount of information that is worth exploring.

The database is too large to be digested in one go. It contains data about air direction, temperature and visibility among other variables. All the variables are observed and recorded daily every ten minutes. Here, I show how to handle two main operations that are required to make the most of the database:

Merge the source xml files containing the data by year. This step can be performed by means of this python script | merge.py.

merge.py


from lxml import etree
import zipfile
from datetime import date, time, datetime, timedelta
import calendar

# user defined variables 
station = "C040"
y0 = 2008
yN = 2012

# define the sequence of times (every 10 minutes) 
# to be used with potential missing files and data
allHours = [''] * 144
tlast = datetime.combine(date.today(), time(0, 0))
allHours[0] = tlast.strftime("%H:%M")
for i in range(1, 144):
  tlast += timedelta(minutes = 10)
  allHours[i] = tlast.strftime("%H:%M")

# main loop
fout = open('%s.csv' % station, 'w')

for yy in range(y0, yN + 1):
  print yy
  for imonth in range(1, 13):
    zipf = zipfile.ZipFile("../%s/%s_%s.zip" % (yy, station, yy))
    isNA = False
    try:      
      xmlf = zipf.open("%s/%s_%s_%s.xml" % (station, station, yy, imonth))
    except KeyError:
      try: # first try if month '1' is denoted '01'
        xmlf = zipf.open("%s/%s_%s_0%s.xml" % (station, station, yy, imonth))
      except KeyError: # the file does not exist (missing values)
	isNA = True
	temp = ''
    if isNA == False: # if file exists
      xmlData = etree.parse(xmlf)
      monthDays = xmlData.findall("//dia")
      print len(monthDays)
      for day in monthDays:
	dayLabel = day.attrib['Dia']
	hours = day.findall("hora")
	for hour in hours:
	  temp = hour.findtext("Meteoros/Tem.Aire._a_620cm")
	  #print "%s; %s; %s" % (dayLabel, hour.attrib['Hora'], temp)
	  #print hour.find("Meteoros/Tem.Aire._a_620cm")
	  fout.write("%s;%s;%s\n" % (dayLabel, hour.attrib['Hora'], temp))
      xmlf.close()
    else: # if file does not exist (missing data for that year and month)
      ndays = calendar.monthrange(yy, imonth)[1]
      monthDays = range(1, ndays + 1)
      print "%s (missing)" % len(monthDays)
      for day in monthDays:
	d = "0%s" % day if day < 10 else day
        month = "0%s" % imonth if imonth < 10 else imonth
	dayLabel = "%s-%s-%s" % (yy, month, d)
	for hour in allHours:
	  temp = '' # NA
	  fout.write("%s;%s;%s\n" % (dayLabel, hour, temp))
    zipf.close()
fout.close()

back top

Filter the data by variable, date and time. This step can be carried out using this R script | select.R.

select.R


library("zoo")

a <- read.csv(file = "C040.csv", header = FALSE, sep = ";", 
  colClasses = c("character", "character", "numeric"))

d0 <- strsplit(a[1,1], "-")[[1]]
h0 <- strsplit(a[1,2], ":")[[1]]
d0 <- ISOdate(year = d0[1], month = d0[2], day = d0[3], 
  hour = h0[1], min = h0[2], sec = 0)
dN <- strsplit(a[nrow(a),1], "-")[[1]]
hN <- strsplit(a[nrow(a),2], ":")[[1]]
dN <- ISOdate(year = dN[1], month = dN[2], day = dN[3], 
  hour = hN[1], min = hN[2], sec = 0)

dates <- seq(d0, dN, by = "10 min")
x <- zoo(a[,3], dates)

# user defined parameters

yy1 <- 2008
mm1 <- 1
dd1 <- 1
h1 <- 0
m1 <- 00
yy2 <- 2011
mm2 <- 12
dd2 <- 31
h2 <- 23
m2 <- 50
stat <- "mean"
tscl <- "daily"

p <- c(yy1, mm1, dd1, h1, m1, yy2, mm2, dd2, h2, m2, stat, tscl)

# initial and end dates

d0 <- ISOdate(year=p[1], month=p[2], day=p[3], hour=0, min=0, sec=0)
dN <- ISOdate(year=p[6], month=p[7], day=p[8], hour=23, min=50, sec=0)

a <- window(x, start = d0, end = dN)
a0 <- a

# time interval

times <- as.numeric(format(time(a), "%H.%M"))

t0 <- paste(p[4], p[5], sep =".")
tN <- paste(p[9], p[10], sep = ".")
ft <- cut(times, 
  breaks = unique(as.numeric(c(0, as.numeric(t0), as.numeric(tN) + 0.05, 23.55))), 
  include.lowest = TRUE, right = FALSE)
#table(ft)

length(ft) == length(a)

if (as.numeric(t0) == 0) {
  a <- split(a, f = ft)[[1]]
} else
  a <- split(a, f = ft)[[2]]

# statistic and and time scale

switch(p[11],
  "mean"    = FUN <- mean,
  "median"  = FUN <- median,
  "variance" = FUN <- var,
  "maximum"   = FUN <- max,
  "minimum"   = FUN <- min)

switch(p[12],
  "hourly" = freq <- "60 mins",
  "daily"  = freq <- "1 days",
  "weekly" = freq <- "1 weeks",
  "monthly" = freq <- "1 months")

fd <- cut(time(a), breaks = freq, 
  include.lowest = TRUE, right = FALSE)
#table(fd)

# split the series and obtain the statistic

la <- split(a, f = fd)

if (p[11] == "min-max")
{
  la.min <- lapply(X = la, FUN = min, na.rm = TRUE)
  la.max <- lapply(X = la, FUN = max, na.rm = TRUE)

  xout1 <- zoo(unlist(la.min), as.Date(names(la.min)))
  xout1[is.nan(xout1)] <- NA
  xout1[!is.finite(xout1)] <- NA
  xout2 <- zoo(unlist(la.max), as.Date(names(la.max)))
  xout2[is.nan(xout2)] <- NA
  xout2[!is.finite(xout2)] <- NA

  fout <- matrix(nrow = length(xout1), ncol = 3)
  colnames(fout) <- c("date", "minimum", "maximum")
  fout[,1] <- gsub("-", "", time(xout1))
  fout[,2] <- round(xout1, 2)
  fout[,3] <- round(xout2, 2)
} else {
  la.stat <- lapply(X = la, FUN = FUN, na.rm = TRUE)

  xout <- zoo(unlist(la.stat), as.Date(names(la.stat)))
  xout[is.nan(xout)] <- NA

  fout <- matrix(nrow = length(xout), ncol = 2)
  colnames(fout) <- c("date", p[11])
  fout[,1] <- gsub("-", "", time(xout))
  fout[,2] <- round(xout, 2)
}

write.csv(fout, file = "tmp.csv", row.names = FALSE, quote = FALSE)

back top

The python script assumes that the source files are in the previous directory with respect to the path where the python file is located. The zip files for each must be already unzipped while the files containing the data for each station remain zipped. The script deals with potential missing files and with different naming conventions used in the source files, for example the month January is sometimes denoted '1' and other times '01'. Thus, all that is need is to define the variables 'station' (name of the meteorological station), 'y0' and 'yN' (initial and last year in the sample, respectively) at the top of the script.

One of the usefulness of the R script is that it returns the average values observed in a time interval. For example, we may be interested in temperature averages observed during daytime, nighttime or at a particular interval, e.g., from 8:00 AM to 10:00 AM. On the other hand, in addition to the mean, other relevant statistics such as the median, variance and minumum and maximum values can be fetched from the database. The user defined parameters are, in the order shown in the script: the starting year, month and day of the sample; the ending year, month and day; the time interval (minutes and hours); the statistic and the time scale.

The form below illustrates the output provided by these scripts for one of the series: temperature observed in the station labelled C040 from 2009 to 2011. Daily averages for this station and subsample are shown in the plot below. The source xml files are already merged; submitting the form runs the R script and displays the output series.

The form below is currently not available.

Station C040: daily average temperature 2009-2011 (°C degrees)

About

This blog is part of jalobe's website.

jalobe.com

⚠️This site is currently being updated.
At present not all posts from jalobe's blog are available.

	Day	Month	Year
Start date:
End date:
	Time interval (HH:MM)
From:
To:
Statistic: