Split a string by different start and end delimiters

  • A+
Category:Languages

I have a log with data from a TCP port with a given buffer length. Each event has a variable length and it is delimited by <+++> and <--->.

Example:

s = '<+++>A1 , Some Text, Other Text=12327463, Some Other Text<--->  <+++>A2, Some Text, IP=0.0.0.0, DateTime=12/07/2018 <---> <+++> A3, Some Text, Other Text=12327463, Some Other Text, Text<---><+++>A3, New Text, IP=0.0.0.0, DateTime=12/07/2018, Text3Text3Text3, Text3Text3Text3, Text3Text3Text3, Text3Text3Text3<--->Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4Text4  Text4<---><+++>Text5Text5Text5Text5Text5Text5Text5Text5<---><+++>Text6Text6Text6Text6Text6Text6Text6Text6Text6<--->' 

I need to split it so that each event is the element of a list -- like this:

['A1 , Some Text, Other Text=12327463, Some Other Text' , 'A2, Some Text, IP=0.0.0.0, DateTime=12/07/2018', 'A3, Some Text, Other Text=12327463, Some Other Text'] Text'] 

How would you do this with Python?

 


You can also use regular expressions for this task, re.findall in particular:

import re s = '<+++>A1 , Some Text, Other Text=12327463<---> <+++>A2, IP=0.0.0.0 <--->' re.findall(r'</+/+/+>(.+?)<--->', s) # ['A1 , Some Text, Other Text=12327463', 'A2, IP=0.0.0.0 '] 

The captured group (.+?) stands for one or more (+) of any character (.) non-greedily (?) matched, so as to not capture the entire part from the first opening to the last closing tag.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: