# How is the most efficient way to subtract two lists?

• A+
Category：Languages

I have two lists in python `list_a` and `list_b`. The `list_a` have some images links, and the `list_b` too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in `list_a`, that means all items in `list_b` are in `list_a`. My initial idea is subtract all items: `list_a - list_b = list_c`, where the `list_c` are my surplus items. My code is:

``list_a = [] list_b = [] list_c = []  arq_b = open('list_b.txt','r') for b in arq_b:     list_b.append(b)  arq_a = open('list_a.txt','r') for a in arq_a:     if a not in arq_b:         list_c.append(a)  arq_c = open('list_c.txt','w') for c in list_c:     arq_c.write(c) ``

I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have `78.514.022` items in my `list_b.txt` and `78.616.777` in my list `list_a.txt`. I dont't know the cost of this expression: `if a not in arq_b`. But if i execute this code, i think wont finish in this year.

My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.

My question is, there's another way to make this operation more efficiently(Faster)?

• The `list_a` is ordinate but the `list_b` not.
• Each item have this size: `images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png`
• The order doesnt matter, i want know the surplus.

you can create one set of the first file contents, then just use `difference` or `symmetric_difference` depending on what you call a difference

``with open("list_a.txt") as f:     set_a = set(f)  with open("list_b.txt") as f:     diffs = set_a.difference(f) ``

if `list_b.txt` contains more items than `list_a.txt` you want to swap them or use `set_a.symmetric_difference(f)` instead, depending on what you need.

`difference(f)` works but still has to construct a new `set` internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.