R – Nested list to tibble

  • A+
Category:Languages

I have a nested list like so:

> ex <- list(list(c("This", "is", "an", "example", "."), c("I", "really", "hate", "examples", ".")), list(c("How", "do", "you", "feel", "about", "examples", "?"))) > ex [[1]] [[1]][[1]] [1] "This"    "is"      "an"      "example" "."        [[1]][[2]] [1] "I"        "really"   "hate"     "examples" "."          [[2]] [[2]][[1]] [1] "How"      "do"       "you"      "feel"     "about"    "examples" "?"  

I want to convert it to a tibble like so:

> tibble(d_id = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)), +        s_id = as.integer(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1)), +        t_id = as.integer(c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7)), +        token = c("This", "is", "an", "example", ".", "I", "really", +                  "hate", "examples", ".", "How", "do", "you", "feel", "about", "examples", "?")) # A tibble: 17 x 4     d_id  s_id  t_id token       <int> <int> <int> <chr>     1     1     1     1 This      2     1     1     2 is        3     1     1     3 an        4     1     1     4 example   5     1     1     5 .         6     1     2     1 I         7     1     2     2 really    8     1     2     3 hate      9     1     2     4 examples 10     1     2     5 .        11     2     1     1 How      12     2     1     2 do       13     2     1     3 you      14     2     1     4 feel     15     2     1     5 about    16     2     1     6 examples 17     2     1     7 ?        

What is the most efficient way for me to perform this? Preferably using tidyverse functionality?


Time to get some sequences working, which should be very efficient:

d_id <- rep(seq_along(ex), lengths(ex)) s_id <- sequence(lengths(ex)) t_id <- lengths(unlist(ex, rec=FALSE))  data.frame(   d_id  = rep(d_id, t_id),   s_id  = rep(s_id, t_id),   t_id  = sequence(t_id),   token = unlist(ex) )  #   d_id s_id t_id    token #1     1    1    1     This #2     1    1    2       is #3     1    1    3       an #4     1    1    4  example #5     1    1    5        . #6     1    2    1        I #7     1    2    2   really #8     1    2    3     hate #9     1    2    4 examples #10    1    2    5        . #11    2    1    1      How #12    2    1    2       do #13    2    1    3      you #14    2    1    4     feel #15    2    1    5    about #16    2    1    6 examples #17    2    1    7        ? 

This will run in about 2 seconds for a 500K sample of your ex list. I suspect that will be hard to beat in terms of efficiency.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: