Label encoding across multiple columns with same attributes in sckit-learn

  • A+
Category:Languages

If I have two columns as below:

Origin  Destination   China   USA   China   Turkey   USA     China   USA     Turkey   USA     Russia   Russia  China   

How would I perform label encoding while ensuring the label for the Origin column matches the one in the destination column i.e

Origin  Destination   0   1   0   3   1   0   1   0   1   0   2   1   

If I do the encoding for each column separately then the algorithm will see the China in column1 as different from column2 which is not the case


You can using replace

df.replace(dict(zip(np.unique(df.values),list(range(len(np.unique(df.values)))))))    Origin  Destination 0       0            3 1       0            2 2       3            0 3       3            2 4       3            1 5       1            0 

Succinct and nice answer from Pir

df.replace((lambda u: dict(zip(u, range(u.size))))(np.unique(df))) 

And

df.replace(dict(zip(np.unique(df), itertools.count()))) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: