Count data divided by year and by region in R

  • A+
Category:Languages

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990,                1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,               1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)     species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A',                   'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B',                   'C', 'C', 'C', 'A')     region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3,                  3, 2, 2, 1, 1, 1)     df <- data.frame(year, species, region)      df     year species region  1  1990       A      1  2  1980       A      1  3  1985       B      1  4  1980       B      3  5  1990       B      2  6  1990       C      3  7  1980       C      3  8  1985       C      2  9  1985       A      1  10 1990       A      1  11 1980       A      3  12 1985       B      3  13 1980       B      3  14 1990       B      2  15 1990       C      2  16 1980       C      1  17 1985       C      1  18 1985       A      1  19 1990       A      1  20 1980       A      3  21 1985       B      3  22 1980       B      3  23 1990       B      2  24 1990       C      2  25 1980       C      1  26 1985       C      1  27 1985       A      1 

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990  1      1      0      0      0      0      0      0      0      0      0  2      2      1      1      1      1      1      1      1      1      1  3      3      2      2      2      2      2      2      2      2      2 

such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.

Does anyone have any suggestions?

 


Something like this?

library(dplyr)  df2 <- df %>%    mutate(sp_year = paste(species, year, sep = "_")) %>%   group_by(region) %>%    count(sp_year) %>%    spread(sp_year,n)  df2 

Which gives this:

# A tibble: 3 x 10 # Groups:   region [3]   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990    <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int> 1      1      1      3      3     NA      1     NA      2      2     NA 2      2     NA     NA     NA     NA     NA      3     NA      1      2 3      3      2     NA     NA      3      2     NA      1     NA      1 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: