Python – Delete duplicates in a dataframe based on two columns combinations?

  • A+
Category:Languages

I have a dataframe with 3 columns in Python:

Name1 Name2 Value Juan  Ale   1 Ale   Juan  1 

and would like to eliminate the duplicates based on columns Name1 and Name2 combinations.

In my example both rows are equal (but they are in different order), and I would like to delete the second row and just keep the first one, so the end result should be:

Name1 Name2 Value Juan  Ale   1 

Any idea will be really appreciated!

Thanks

Juan S.

 


You can convert to frozenset and use pd.DataFrame.duplicated.

res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]  print(res)    Name1 Name2  Value 0  Juan   Ale      1 

frozenset is necessary instead of set since duplicated uses hashing to check for duplicates.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: