The goal of digitizing historical weather observations is to extend the knowledge and understanding for comparisons with paleo-proxy data and the decadal and centennial variations of earth climate.
1. Data access and structure
The goal of digitizing historical weather observations is to extend the knowledge and understanding for comparisons with paleo-proxy data and the decadal and centennial variations of Earth climate (Williamson et al., 2017). The used data is freely available in the internet. It can be downloaded directly at MEGA1, a file sharing company based in New Zealand. All data files are in Excel in the file format .xlxs. Altogether, the data consists of 180 to 220 stations (Williamson et al., 2017), shown in Figure 1, and 13 variables (a list of all the stations, their coordinates, heights and actual location is also available online2 ). The first years start with the fewest number of stations; over the years the number is increasing. Some of these stations are marked with an “(a)”. This means that the descriptive station only measured precipitation; consequently the matrix of the variable precipitation has the most stations. Each file consists of one year data with a small matrix for each station, containing all variables which were only measured once a day at 8 a.m. unless the values are daily. Not every file for each year is available; the grey marked cells in Table 1 show that the files of the
Abbildung in dieser Leseprobe nicht enthalten
Figure 1: Available Indian Climate Stations 1910-1930
years 1920/21, 1923/24/25 and 1927/28 are still in quality check and can’t be downloaded yet. The data of the green marked cells are completely available, while the red marked cells are not. Pressure, wind, wind speed, max. temperature, min. temperature, humidity, clouds, precipitation and weather remarks are all measured for the whole period. In the years 1910-1914 and 1916, temperature in shade is available, the daily mean temperature is measured from 1910-1916. Data of the dry and wet bulb thermometer is available from 1917-1930, for dry bulb there were also measurements in 1915. With the Indian data the basis for further analyses, like data/ model comparisons, validation of reanalysis or the study of the early 20 th century, is built (Williamson et al., 2017). In the following, the examples and codes were made for the prepared time series matrix of the variable pressure.
Abbildung in dieser Leseprobe nicht enthalten
Table 1: Availability of the Indian Climate Station Data from 1910-1930
2. Data processing
The finished matrices should include one variable over the whole period for each station which measured the chosen variable in the period to have the whole time series in one file. Thus, there are 13 matrices to prepare. In scope of this work three matrices were created3 for the variables daily mean temperature, precipitation at 8 a.m. and pressure at 8 a.m. (Table 1). The aim of this work was to cut off the yearly data of the respective variable and put it into the previously created mask 4, where all settings are done and only the values have to be filled in. After finishing this step the result is a time series for each station and the respective variable.
3. Implementation and plotting in R
After the matrix is completely finished it needs to be loaded into the R-Environment and subsequently getting plotted. In order to be able to understand the following steps; general knowledge about R or R-Studio are recommended and useful for an easier understanding. To read the prepared matrix into R the package ‘readxl’ is necessary. The package needs to be read in with the command ‘library(readxl)’; if no errors occur, everything works and the next steps can be performed. If an error occurs the package ‘readxl’ is not downloaded and installed yet. This has to be done with ‘package.install()’, after the installation is finished the package has to be loaded again to the R-Session as shown before. With the command ‘read_excel()’ the data gets read into the R-Environment and a chosen name assigned; it is important that the first line is skipped (‘skip = 1’), because the first line will be the header and this should be the names of the stations and not the coordinates. To check if the data is implemented correctly into the R-Environment use the command ‘header()’ which shows the first six rows of the respective data frame.
install.packages('<package>')
library(<package>)
<name_output> <- read_excel("<data_matrix>", skip = 1)
head(<name_output>)
attach(<name_output>)
plot(<station> ~ I(as.POSIXct(<time_axis>)), type = 'l', xlab = "Year", ylab = "<variable>",
main = "<variable> in <station> 1910 - 1930")
Before creating the plot (Figure 2) the data frame has to be attached to the R search path. 'attachQ' provides that these objects can be accessed easily and be found by their names without writing the whole name of the data frame. By loading the data into the R-Environment the names of the stations become the header. This means that the stations turn into objects which have to be accessed for plotting. In the command 'plotQ' it is set which object (station) will be plotted, with 'asPOSIXctQ' the respective object (time) is manipulated and shown on the x-axis as the years of the time series.'type = 'I" provides that the data will be plotted in form of a line instead of dotted data.
The following plot shows a time series of the pressure at 8 a.m. in Nizamabad of all available years (Table 1) done with the previous explained code, available in a commented and filled out script5. In this short tutorial the structure and availability of the initial data were explained. With chapter 2 and the available mask, the way to process the data fast and easily should not provide any problems. Further plots of every variable can easily be done with the short code described before. Copy the code into your R-Script and fill out the placeholders which are marked with the bigger- and smaller-than sign ("<xyz>").Now the script is prepared and the plots can be produced.
Pressure in Nizamabad 1910- 1930
Abbildung in dieser Leseprobe nicht enthalten
Figure 2: Pressur e Nlzamabad l9lCJ.l930
4. References
Williamson, F., and Coauthors, 2018:
Collating historic weather observations for the East Asian region: Challenges, solutions and reanalyses. Adv. Atmos. Sci., 35(8), 899–904, https://doi.org/10.1007/s00376-017-7259-z.
Indian Climate Stations (1910-1930)
[...]
1 https://mega.nz/#F!m0Yh2K6T!8isvazimmxhRKfB9aq9zkw
2 https://www.dropbox.com/s/h0ieo3jdh6ipupf/india_1910-1930_stations_coordinates.xlsx?dl=0
3 https://www.dropbox.com/sh/es7ow5ap7eo4e8n/AABk2meovU78DschqsQ6_Gnia?dl=0
4 https://www.dropbox.com/s/q2qljwv4r5urz47/india_1910-1930_mask.xlsx?dl=0
Frequently asked questions
What is the goal of digitizing historical weather observations?
The goal is to extend knowledge and understanding for comparisons with paleo-proxy data and the decadal and centennial variations of Earth climate.
Where can I find the historical Indian climate data (1910-1930)?
The data is freely available on MEGA, a file sharing company based in New Zealand.
What format are the data files in?
All data files are in Excel format (.xlsx).
How many climate stations are included in the data?
The data consists of 180 to 220 stations.
What variables are measured in the Indian climate data?
Variables measured include pressure, wind, wind speed, max. temperature, min. temperature, humidity, clouds, precipitation, and weather remarks. Some years also include temperature in shade, daily mean temperature, and dry/wet bulb thermometer readings.
Which years of data are not yet available?
Data for the years 1920/21, 1923/24/25, and 1927/28 are still in quality check and cannot be downloaded yet.
What data processing steps are required?
The data requires processing to create matrices of one variable over the whole period for each station. This involves cutting off the yearly data and putting it into a previously created mask.
What R package is needed to read the data into R?
The 'readxl' package is necessary to read the Excel data into R.
How do I load the data into R?
Use the command library(readxl)
to load the package, then use the command <name_output> <- read_excel("<data_matrix>", skip = 1)
to read the data, where <name_output> is the chosen name for the data, and <data_matrix> is the path to the Excel file.
How do I create a plot of the time series data in R?
After loading and attaching the data frame, use the plot(<station> ~ I(as.POSIXct(<time_axis>)), type = 'l', xlab = "Year", ylab = "<variable>", main = "<variable> in <station> 1910 - 1930")
command to create a line plot of the specified station data over time.
What is the purpose of the 'attach()' command in R?
The attach()
command makes the objects (e.g., station names) within the data frame directly accessible by their names without needing to specify the data frame name.
Where can I find the mask used for data preparation?
The mask is available at: https://www.dropbox.com/s/q2qljwv4r5urz47/india_1910-1930_mask.xlsx?dl=0
Where can I find a sample R script for plotting pressure data?
A sample R script is available at: htl,ps:/Jwww.dropbox.comfs/kon2dbavw80iuvi/india 1910-1930 pressure Nizamabad.R?di=O
What reference is provided for the data collection process?
Williamson, F., and Coauthors, 2018: Collating historic weather observations for the East Asian region: Challenges, solutions and reanalyses. Adv. Atmos. Sci., 35(8), 899–904, https://doi.org/10.1007/s00376-017-7259-z.
- Quote paper
- Tim Sperzel (Author), 2018, Indian Climate Stations (1910-1930). From Digitized Meteorological Sub-Daily and Daily Data to Simple Time Series Plots in R, Munich, GRIN Verlag, https://www.grin.com/document/456211