A simplified version of my dataset can be reproduced by the following:
df <- data.frame(buyer = c("A","C","B"), seller = c("B","D","E"), amount = c(1,2,3))
I am looking for a preferably dplyr solution to achieve the following.
buyer seller amount A B 1 C D 2 B E 3
should result in a grouped summary for each agent (A,B,C,D,E)
output agent total_amount A 1 B 4 #(=1+3) C 2 D 2
I could group_by for buyer and seller and then add the results but this is not elegant and somewhat cumbersome.
library(dplyr) res_b <- df %>% group_by(buyer) %>% summarise(total_amount=sum(amount)) res_s <- df %>% group_by(seller) %>% summarise(total_amount=sum(amount))
Any help is appreciated. Other solutions (not in tidyverse) are obviously welcome too.
Edit: should have said that my original dataset as around 60 million observations.
We can convert to long format first and do a simple aggregation, i.e.
library(tidyverse) df %>% gather(var, agent, -amount) %>% group_by(agent) %>% summarise(total_amount = sum(amount))
# A tibble: 5 x 2 agent total_amount <chr> <dbl> 1 A 1 2 B 4 3 C 2 4 D 2 5 E 3
You can try
data.table for more efficiency. Here is a direct translation of the
tidyverse code above,
library(data.table) dt1 <- setDT(df) melt(dt1, measure.vars = c('buyer', 'seller'), id.vars = 'amount', value.name = "agent" )[, .(total_amount = sum(amount)), by = agent] # agent total_amount #1: A 1 #2: C 2 #3: B 4 #4: D 2 #5: E 3