Euskalmet is the Basque Agency of Meteorology. Euskalmet collects daily meteorological data at several stations. A database containing this data for the years 2003 to 2012 is provided through the service Open Data Euskadi. This is a rich and interesting amount of information that is worth exploring.
The database is too large to be chewed in one go. It contains data about air direction, temperature and visibility among other variables. All the variables are observed and recorded daily every ten minutes. Here, I show how to handle two main operations that are required to make the most of the database:
- Merge the source xml files containing the data by year. This step can be performed by means of this python script | merge.py.
- Filter the data by variable, date and time. This step can be carried out using this R script | select.R.
The python script assumes that the source files are in the previous directory with respect to the path where the python file is located. The zip files for each must be already unzipped while the files containing the data for each station remain zipped. The script deals with potential missing files and with different naming conventions used in the source files, for example the month January is sometimes denoted '1' and other times '01'. Thus, all that is need is to define the variables 'station' (name of the meteorological station), 'y0' and 'yN' (initial and last year in the sample, respectively) at the top of the script.
One of the usefulness of the R script is that it returns the average values observed in a time interval. For example, we may be interested in temperature averages observed during daytime, nighttime or at a particular interval, e.g., from 8:00 AM to 10:00 AM. On the other hand, in addition to the mean, other relevant statistics such as the median, variance and minumum and maximum values can be fetched from the database. The user defined parameters are, in the order shown in the script: the starting year, month and day of the sample; the ending year, month and day; the time interval (minutes and hours); the statistic and the time scale.
The form below illustrates the output provided by these scripts for one of the series: temperature observed in the station labelled C040 from 2009 to 2011. Daily averages for this station and subsample are shown in the plot below. The source xml files are already merged; submitting the form runs the R script and displays the output series.
Station C040: daily average temperature 2009-2011 (°C degrees)