Extract hourly max/min/median values with timestamp in R

  • A+
Category:Languages

I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.

This is part of my dataframe: (it also contains missing values NA)

 Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`                  <dttm>     <dbl>      <dbl>     <dbl>     <dbl>     <dbl>  1 2017-08-26 17:00:00      27.5       27.5      27.5      27.0      27.0  2 2017-08-26 17:10:00      27.5       27.0      27.5      27.0      27.0  3 2017-08-26 17:20:00      27.5       27.0      27.0      27.0      27.0  4 2017-08-26 17:30:00      27.0       26.5      27.0      26.5      26.5  5 2017-08-26 17:40:00      26.5       26.5      26.5      26.5      26.5  6 2017-08-26 17:50:00      26.5       26.0      26.5      26.0      26.5  7 2017-08-26 18:00:00      26.5       26.0      26.5      26.5      26.5  8 2017-08-26 18:10:00      27.0       26.0      26.5      26.5      26.0  9 2017-08-26 18:20:00      26.5       26.5      26.5      26.5      26.0 10 2017-08-26 18:30:00      26.5       26.5      26.5      26.5      26.0 

I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.

Is this possible with R?

I tried already various functions.

group_by allowed me to calculate min/max for each column but without timestamps. period.apply also allowed me to calculate min/max/median but only for one column. Also aggregate() didn't lead to any succes.

I'm learning in R, but didn't came close to a solution on this problem.

This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance


We can use floor_date from the package to create a new column Time2 to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date or ceiling_date. After that, we can use gather from the package to convert the data frame from wide format to long format.

library(dplyr) library(tidyr) library(lubridate)  dat2 <- dat %>%   mutate(Time = ymd_hms(Time),          Time2 = floor_date(Time, unit = "hour")) %>%   gather(LCZ, Value, starts_with("LCZ")) %>%   group_by(Time2, LCZ) 

After that, we can summarize the data by LCZ and Time2.

dat3 <- dat2 %>%   summarise(Min = min(Value, na.rm = TRUE),             Max = max(Value, na.rm = TRUE),             Median = median(Value, na.rm = TRUE)) %>%   ungroup() dat3 # # A tibble: 10 x 5 #    Time2               LCZ        Min   Max Median #    <dttm>              <chr>    <dbl> <dbl>  <dbl> #  1 2017-08-26 17:00:00 LCZ.3.10  26.0  27.5   26.8 #  2 2017-08-26 17:00:00 LCZ.3.2   26.5  27.5   27.2 #  3 2017-08-26 17:00:00 LCZ.6.1   26.5  27.5   27.0 #  4 2017-08-26 17:00:00 LCZ.6.9   26.0  27.0   26.8 #  5 2017-08-26 17:00:00 LCZ.9.4   26.5  27.0   26.8 #  6 2017-08-26 18:00:00 LCZ.3.10  26.0  26.5   26.2 #  7 2017-08-26 18:00:00 LCZ.3.2   26.5  27.0   26.5 #  8 2017-08-26 18:00:00 LCZ.6.1   26.5  26.5   26.5 #  9 2017-08-26 18:00:00 LCZ.6.9   26.5  26.5   26.5 # 10 2017-08-26 18:00:00 LCZ.9.4   26.0  26.5   26.0 

If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.

dat4 <- dat2 %>%   mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,          Max = (Value == max(Value, na.rm = TRUE)) + 0L,          Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%   ungroup() dat4 # # A tibble: 50 x 7 #    Time                Time2               LCZ     Value   Min   Max Median #    <dttm>              <dttm>              <chr>   <dbl> <int> <int>  <int> #  1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0 #  2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0 #  3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0 #  4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2  27.0     0     0      0 #  5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0 #  6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0 #  7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1 #  8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2  27.0     0     1      0 #  9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1 # 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1 # # ... with 40 more rows 

DATA

dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'                   '2017-08-26 17:00:00'      27.5       27.5      27.5      27.0      27.0                   '2017-08-26 17:10:00'      27.5       27.0      27.5      27.0      27.0                   '2017-08-26 17:20:00'      27.5       27.0      27.0      27.0      27.0                   '2017-08-26 17:30:00'      27.0       26.5      27.0      26.5      26.5                   '2017-08-26 17:40:00'      26.5       26.5      26.5      26.5      26.5                   '2017-08-26 17:50:00'      26.5       26.0      26.5      26.0      26.5                   '2017-08-26 18:00:00'      26.5       26.0      26.5      26.5      26.5                   '2017-08-26 18:10:00'      27.0       26.0      26.5      26.5      26.0                   '2017-08-26 18:20:00'      26.5       26.5      26.5      26.5      26.0                   '2017-08-26 18:30:00'      26.5       26.5      26.5      26.5      26.0",                   header = TRUE, stringsAsFactors = FALSE) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: