How is the most efficient way to subtract two lists?

  • A+
Category:Languages

I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items: list_a - list_b = list_c, where the list_c are my surplus items. My code is:

list_a = [] list_b = [] list_c = []  arq_b = open('list_b.txt','r') for b in arq_b:     list_b.append(b)  arq_a = open('list_a.txt','r') for a in arq_a:     if a not in arq_b:         list_c.append(a)  arq_c = open('list_c.txt','w') for c in list_c:     arq_c.write(c) 

I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.

My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.

My question is, there's another way to make this operation more efficiently(Faster)?

  • The list_a is ordinate but the list_b not.
  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
  • The order doesnt matter, i want know the surplus.

 


you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference

with open("list_a.txt") as f:     set_a = set(f)  with open("list_b.txt") as f:     diffs = set_a.difference(f) 

if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.

difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: