Replace strings using List Comprehensions

  • A+
Category:Languages

Is it possible to do this example using List Comprehensions:

a = ['test', 'smth'] b = ['test Lorem ipsum dolor sit amet',      'consectetur adipiscing elit',      'test Nulla lectus ligula',      'imperdiet at porttitor quis',      'smth commodo eget tortor',       'Orci varius natoque penatibus et magnis dis parturient montes']   for s in a:     b = [el.replace(s,'') for el in b] 

What I want is to delete specific words from list of sentences. I can do it using loop, but I suppose it is possible using some one-line solution.

I tried something like:

b = [[el.replace(s,'') for el in b] for s in a ] 

but it goes wrong


I got a lot of quality answers, but now I have on more complication: what if I want to use combination of words?

a = ['test', 'smth commodo'] 


There's nothing wrong with what you have, but if you want to clean things up a bit and performance isn't important, then compile a regex pattern and call sub inside a loop.

>>> import re >>> p = re.compile(r'/b({})/b'.format('|'.join(a))) >>> [p.sub('', text).strip() for text in b] 

['Lorem ipsum dolor sit amet',  'consectetur adipiscing elit',  'Nulla lectus ligula',  'imperdiet at porttitor quis',  'commodo eget tortor',  'Orci varius natoque penatibus et magnis dis parturient montes' ] 

Details
Your pattern will look something like this:

/b    # word-boundary - remove if you also want to replace substrings ( test  # word 1 |     # regex OR pipe smth  # word 2 ... you get the picture ) /b    # end with another word boundary - again, remove for substr replacement 

And this is the compiled regex pattern matcher:

>>> p re.compile(r'/b(test|smth)/b', re.UNICODE) 

Another consideration is whether your replacement strings themselves contain characters that could be interpreted by the regex engine differently - rather than being treated as literals - these are regex metacharacters, and you can escape them while building your pattern. That is done using re.escape.

p = re.compile(r'/b({})/b'.format(     '|'.join([re.escape(word) for word in a])) ) 

Of course, keep in mind that with larger data and more replacements, regex and string replacements both become tedious. Consider the use of something more suited to large operations, like flashtext.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: