Searching for number after a specific word that does not immediately precede the number

  • A+
Category:Languages

I am trying to use a pattern to search for a Zip Code within a string. I cannot get it to work correctly.

A sample of the inputLine is

What is the weather in 75042? 

What I am trying to use for a pattern is

public String getZipcode(String inputLine) {          Pattern pattern = Pattern.compile(".*weather.*([0-9]+).*");         Matcher matcher = pattern.matcher(inputLine);          if (matcher.find()) {              return matcher.group(1).toString();         }          return "Zipcode Not Found.";      } 

If I am looking to only get 75002, what do I need to change? This only outputs the last digit in the number, 2. I am terribly confused and I do not completely understand the Javadocs for the Pattern class.

 


Your .*weather.*([0-9]+).* pattern grabs the whole line with the first .* and backtracks to find weather, and if it finds it, it grabs the line portion after the words to the end of line with the subsequent .* pattern and backtracks again to find the last digit and the only one digit is stored in Capturing group 1 since one digit satisfies the [0-9]+ pattern. The last .* just consumes the line to its end.

You may solve the issue by just using ".*weather.*?([0-9]+).*" (making the second .* lazy), but since you are using Matcher#find(), you can use a simpler regex:

Pattern pattern = Pattern.compile("weather//D*(//d+)"); 

And after getting a match, retrieve the value with matcher.group(1).

See the regex demo.

Pattern details

  • weather - a weather word
  • //D* - 0+ chars other than digits
  • (//d+) - Capturing group 1: one or more digits

See the Java demo:

String inputLine = "What is the weather in 75042?"; Pattern pattern = Pattern.compile("weather//D*(//d+)"); Matcher matcher = pattern.matcher(inputLine);  if (matcher.find()) {     System.out.println(matcher.group(1)); // => 75042 } 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: