Regular Expression for separating strings enclosed in parenthesis

  • A+
Category:Languages

I have a String that contains 2 or 3 company names each enclosed in parenthesis. Each company name can also contains words in parenthesis. I need to separate them using regular expressions but didn't find how.

My inputStr:

(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) (Motorsport racing Ltd.) or  (Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) 

The expected result is:

str1 = Motor (Sport) (racing) Ltd. str2 = Motorsport racing (Ltd.) str3 = Motorsport racing Ltd. 

My Code:

String str1, str2, str3; Pattern p = Pattern.compile("//((.*?)//)"); Matcher m = p.matcher(inputStr); int index = 0; while(m.find()) {      String text = m.group(1);     text = text != null && StringUtils.countMatches(text, "(") != StringUtils.countMatches(text, ")") ? text + ")" : text;      if (index == 0) {         str1= text;     } else if (index == 1) {         str2 = text;     } else if (index == 2) {         str3 = text;     }      index++; } 

This work great for str2 and str3 but not for str1.

Current result:

str1 = Motor (Sport) str2 = Motorsport racing (Ltd.) str3 = Motorsport racing Ltd. 


So we can assume that the parentheses can nest at most two level deep. So we can do it without not too much magic. I would go with this code:

List<String> matches = new ArrayList<>(); Pattern p = Pattern.compile("//([^()]*(?://([^()]*//)[^()]*)*//)"); Matcher m = p.matcher(inputStr); while (m.find()) {     String fullMatch = m.group();     matches.add(fullMatch.substring(1, fullMatch.length() - 1)); } 

Explanation:

  • First we match a parentheses: //(
  • Then we match some non-parentheses characters: [^()]*
  • Then zero or more times: (?:...)* we will see some stuff within parentheses, and then some non-parentheses again:
  • //([^()]*//)[^()]* - it's important that we don't allow any more parentheses within the inside parentheses
  • And then the closing parentheses comes: //)
  • m.group(); returns the actual full match.
  • fullMatch.substring(1, fullMatch.length() - 1) removes the parentheses from the start and the end. You could do it with another groups too. I just didn't want to make the regex uglier.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: