- A+

I have a sorted array of N intervals of different length. I am plotting these intervals with alternating colors blue/green.

I am trying to find a method or algorithm to "downsample" the array of intervals to produce a visually similar plot, but with less elements.

Ideally I could write some function where I can pass the target number of output intervals as an argument. The output length only has to come close to the target.

`input = [ [0, 5, "blue"], [5, 6, "green"], [6, 10, "blue"], // ...etc ] output = downsample(input, 25) // [[0, 10, "blue"], ... ] `

Below is a picture of what I am trying to accomplish. In this example the input has about 250 intervals, and the output about ~25 intervals. The input length can vary a lot.

**Update:**

Below is my original post which I deleted yesterday, because there were issues with displaying the equations and also I wasn't very confident if it really makes sense. But today, I figured that the optimisation problem that I described can be actually solved efficiently with DP (Dynamic programming).

So I did a sample C++ implementation. Here are some results:

Here is a live demo that you can play with in your browser (make sure browser support WebGL2, like Chrome or Firefox). It takes a bit to load the page.

Here is the C++ implementation: link

Hope this is useful.

**Original post below:**

Even-though some good algorithms have already been proposed, I would like to propose a slightly unusual approach - interpreting the task as an optimisation problem. Although, I don't know how to efficiently solve the optimisation problem (or even if it can be solved in reasonable time at all), it might be useful to someone purely as a concept.

First, without loss of generality, lets declare the *blue* color to be background. We will be painting **N** *green* intervals on top of it (**N** is the number provided to the `downsample()`

function in OP's description). The **i ^{th}** interval is defined by its starting coordinate 0 <=

**x**< x

_{i}_{max}and width

**w**>= 0 (x

_{i}_{max}is the maximum coordinate from the input).

Lets also define the array **G(x)** to be the number of *green* cells in the interval [0, x) in the **input data**. This array can easily be pre-calculated. We will use it to quickly calculate the number of green cells in arbitrary interval [x, y) - namely: **G(y) - G(x)**.

We can now introduce the first part of the cost function for our optimisation problem:

The smaller **F _{1}** is, the better our

**generated**intervals cover the

**input**intervals, so we will be searching for

**x**,

_{i}**w**that minimise it. Ideally we want

_{i}**F**=0 which would mean that the intervals do not cover

_{1}*any*of the background (which of course is not possible because

**N**is less than the input intervals).

However, this function is not enough to describe the problem, because obviously we can minimise it by taking empty intervals: **F _{1}(x, 0)**=0. Instead, we want to cover as much as possible from the input intervals. Lets introduce the second part of the cost function which corresponds to this requirement:

The smaller **F _{2}** is, the more input intervals are covered. Ideally we want

**F**=0 which would mean that we covered all of the input rectangles. However, minimising

_{2}**F**competes with minimising

_{2}**F**.

_{1}Finally, we can state our optimisation problem: find **x _{i}**,

**w**that minimize

_{i}**F**=

**F**+

_{1}**F**

_{2}How to solve this problem? Not sure. Maybe use some metaheuristic approach for global optimisation such as Simulated annealing or Differential evolution. These are typically easy to implement, especially for this simple cost function.

Best case would be to exist some kind of DP algorithm for solving it efficiently, but unlikely.