identify consecutively overlapping segments in R

  • A+
Category:Languages

I need to aggregate overlapping segments into a single segment ranging all connected segments.

Note that a simple foverlaps cannot detect connections between non overlapping but connected segments, see the example for clarification. If it would rain on my segments in my plot I am looking for the stretches of dry ground.

So far I solve this problem by an iterative algorithm but I'm wondering if there is a more elegant and stright forward way for this problem. I'm sure not the first one to face it.

I was thinking about a non-equi rolling join, but faild to implement that

library(data.table) (x <- data.table(start = c(41,43,43,47,47,48,51,52,54,55,57,59),                    end = c(42,44,45,53,48,50,52,55,57,56,58,60)))  #     start end #  1:    41  42 #  2:    43  44 #  3:    43  45 #  4:    47  53 #  5:    47  48 #  6:    48  50 #  7:    51  52 #  8:    52  55 #  9:    54  57 # 10:    55  56 # 11:    57  58 # 12:    59  60  setorder(x, start)[, i := .I] # i is just a helper for plotting segments plot(NA, xlim = range(x[,.(start,end)]), ylim = rev(range(x$i))) do.call(segments, list(x$start, x$i, x$end, x$i))  x$grp <- c(1,3,3,2,2,2,2,2,2,2,2,4) # the grouping I am looking for do.call(segments, list(x$start, x$i, x$end, x$i, col = x$grp)) (y <- x[, .(start = min(start), end = max(end)), k=grp])  #    grp start end # 1:   1    41  42 # 2:   2    47  58 # 3:   3    43  45 # 4:   4    59  60  do.call(segments, list(y$start, 12.2, y$end, 12.2, col = 1:4, lwd = 3)) 

 


The OP has requested to aggregate overlapping segments into a single segment ranging all connected segments.

Here is another solution which uses cummax() and cumsum() to identify groups of overlapping or adjacent segments:

x[order(start, end), grp := cumsum(cummax(shift(end, fill = 0)) < start)][   , .(start = min(start), end = max(end)), by = grp] 
   grp start end 1:   1    41  42 2:   2    43  45 3:   3    47  58 4:   4    59  60 

Disclaimer: I have seen that clever approach somewhere else on SO but I cannot remember exactly where.

Edit:

As David Arenburg has pointed out, it is not necessary to create the grp variable separately. This can be done on-the-fly in the by = parameter:

x[order(start, end), .(start = min(start), end = max(end)),    by = .(grp = cumsum(cummax(shift(end, fill = 0)) < start))] 

Visualisation

OP's plot can be amended to show also the aggregated segments (quick and dirty):

plot(NA, xlim = range(x[,.(start,end)]), ylim = rev(range(x$i))) do.call(segments, list(x$start, x$i, x$end, x$i)) x[order(start, end), .(start = min(start), end = max(end)),    by = .(grp = cumsum(cummax(shift(end, fill = 0)) < start))][     , segments(start, grp + 0.5, end, grp + 0.5, "red", , 4)] 

identify consecutively overlapping segments in R

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: