I have a game with the following rules:
User is given fruit prices and has a chance to buy or sell items in their fruit basket every turn.
The user cannot make more than a 10% total change in their basket on a single turn.
- Fruit prices change everyday and when multiplied by the quantities of items in the fruit basket, the total value of the basket changes relative to the fruit price changes everyday as well.
- The program is only given the current price of all the fruits and the current value of the basket (current price of fruit * quantities for all items in the basket).
- Based on these 2 inputs(all fruit prices and basket total value), the program tries to guess what items are in the basket.
- Basket cannot hold more than 100 items but slots can be empty
- The play can play several turns.
My goal is to accurately guess as computationally inexpensively as possible (read: no brute force) and scale if there are thousands of new fruits.
I am struggling to find an answer but in my mind it’s not hard. If I have the below table. I could study day 1 and get the following data:
Apple 1 Pears 2 Oranges 3 Basket Value = 217
I can do a back of napkin calculation and assume, the weights in the basket are: 0 apple, 83 pears, and 17 Oranges equaling a basket value of 217.
The next day, the values of the fruits and basket changes. To (apple = 2, Pear 3, Oranges 5) with a basket value of 348. When I take my assumed weights above (0,83,17) I get a total value of 334 – not correct! Running this by my script, I see the closest match is 0 apples, 76 pears, 24 oranges which although does equal 348 when % change of factored in it’s a 38% change so it’s not possible!
I know I can completely brute force this but if I have 1000 fruits, it won’t scale. Not to jump on any band wagon but can something like a neural net quickly rule out the unlikely so I calculate large volumes of data? I think they has to be a more scalable/quicker way than pure brute force? Or is there any other type solution that could can get result?
Here is the raw data (remember program can only see prices and total basket value only):
Here's some brute force code (Thank you @paul Hankin for a cleaner example than mine):
def possibilities(value, prices): for i in range(0, value+1, prices): for j in range(0, value+1-i, prices): k = value - i - j if k % prices == 0: yield i//prices, j//prices, k//prices def merge_totals(last, this, r): ok =  for t in this: for l in last: f = int(sum(l) * r) if all(l[i] -f <= t[i] <= l[i] + f for i in range(len(l))): ok.append(t) break return ok days = [ (217, (1, 2, 3)), (348, (2, 3, 5)), (251, (1, 2, 4)), ] ps = None for i, d in enumerate(days): new_ps = list(possibilities(*d)) if ps is None: ps = new_ps ps = merge_totals(ps, new_ps, 0.10) print('Day %d' % (i+1)) for p in ps: print('Day %d,' % (i+1), 'apples: %s, pears: %s, oranges: %s' % p) print
Update - The info so far is awesome. Does it make sense to break the problem into two problems? One is generating the possibilities while the the other is finding the relationship between the possibilities(no more than a 10% daily change). By ruling out possibilities, couldn't that also be used to help only generate possibilities that are possible to begin with? I'm not sure the approach still but I do feel both problems are different but tightly related. Your thoughts?
Update 2 - there are a lot of questions around the % change. This is the total volume of items in the basket that can change. To use the game example, Imagine the store says - you can sell/return/buy fruits but they cannot be more than 10% of your last bill. So although the change in fruit prices can cause changes in your basket value, the user cannot take any action that would impact it by more than 10%. So if the value was 100, they can make changes that create get it to 110 but not more.
I hate to let you down but I really don't think a neural net will help at all for this problem, and IMO the best answer to your question is the advice "don't waste your time trying neural nets".
An easy rule of thumb for deciding whether or not neural networks are applicable is to think, "can an average adult human solve this problem reasonably well in a few seconds?" For problems like "what's in this image", "respond to this question", or "transcribe this audio clip", the answer is yes. But for your problem, the answer is a most definite no.
Neural networks have limitations, and one is that they don't deal well with highly logical problems. This is because the answers are generally not "smooth". If you take an image and slightly change a handful of pixels, the content of the image is still the same. If you take an audio clip and insert a few milliseconds of noise, a neural net will probably still be able to figure out what's said. But in your problem, change a single day's "total basket value" by only 1 unit, and your answer(s) will drastically change.
It seems that the only way to solve your problem is with a "classical" algorithmic approach. As currently stated, there might not be any algorithm better than brute force, and it might not be possible to rule out much. For example, what if every day has the property that all fruits are priced the same? The count of each fruit can vary, as long as the total number of fruits is fixed, so the number of possibilities is still exponential in the number of fruits. If your goal is to "produce a list of possibilities", then no algorithm can be better than exponential time since this list can be exponentially large in some cases.
It's interesting that part of your problem can be reduced to an integer linear program (ILP). Consider a single day, where you are given the basket total
B and each fruit's cost
n is the total number of distinct fruits). Let's say the prices are large, so it's not obvious that you can "fill up" the basket with unit cost fruits. It can be hard in this situation to even find a single solution. Formulated as an ILP, this is equivalent to finding integer values of
x_i such that:
sum_i (x_i*c_i) = x_1*c_1 + x_2*c_2 + ... + x_n*c_n = B
x_i >= 0 for all
1 <= i <= n (can't have negative fruits), and
sum_i x_i <= 100 (can have at most 100 fruits).
The good news is that decent ILP solvers exist -- you can just hand over the above formulas and the solver will do its best to find a single solution. You can even add an "objective function" that the solver will maximize or minimize -- minimizing
sum_i x_i has the effect of minimizing the total number of fruits in the basket. The bad news is that ILP is NP-complete, so there is almost no hope of finding an efficient solution for a large number of fruits (which equals the number of variables
I think the best approach forward is to try the ILP approach, but also introduce some more constraints on the scenario. For example, what if all fruits had a different prime number cost? This has the nice property that if you find one solution, you can enumerate a bunch of other related solutions. If an apple costs
m and an orange costs
n are relatively prime, then you can "trade"
n*x apples for
m*x oranges without changing the basket total, for any integer
x>0 (so long as you have enough apples and oranges to begin with). If you choose all fruits to have different prime number costs, then all of the costs will be pairwise relatively prime. I think this approach will result in relatively few solutions for a given day.
You might also consider other constraints, such as "there can't be more than 5 fruits of a single kind in the basket" (add the constraint
x_i <= 5), or "there can be at most 5 distinct kinds of fruits in the basket" (but this is harder to encode as an ILP constraint). Adding these kinds of constraints will make it easier for the ILP solver to find a solution.
Of course the above discussion is focused on a single day, and you have multiple days' worth of data. If the hardest part of the problem is finding any solution for any day at all (which happens if your prices are large), then using an ILP solver will give you a large boost. If solutions are easy to find (which happens if you have a very-low-cost fruit that can "fill up" your basket), and the hardest part of the problem is finding solutions that are "consistent" across multiple days, then the ILP approach might not be the best fit, and in general this problem seems much more difficult to reason about.
Edit: and as mentioned in the comments, for some interpretations of the "10% change" constraint, you can even encode the entire multi-day problem as an ILP.