Check if string can be splitted into sentence using words in provided list

  • A+

I've recently stumbled upon coding task, and I've struggled to get it right. It goes like this:

Given a non-empty string s and a list word_list containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list does not contain duplicates, but each word can be used more than once.

For example, given:

s = 'whataniceday' word_list = ['a', 'what', 'an', 'nice', 'day'] 

Return True, because 'whataniceday' can be segmented as 'what a nice day'.

I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']). There are plenty of other things that can mess up my solution, but anyway here goes:

s = "whataniceday" word_list = ["h", "a", "what", "an", "nice", "day"]  def can_be_segmented(s, word_list):     tested_str = s     buildup_str = ''      for letter in tested_str:                 buildup_str += letter          if buildup_str not in word_list:             continue          tested_str = tested_str[len(buildup_str):]         buildup_str = ''      return bool(tested_str == '' and buildup_str == '')  print(can_be_segmented(s, word_list)) 

Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?


This is my solution, using a generator expression for brevity, and recursion

s = "whataniceday" word_list = ["h", "ani", "a", "what", "an", "nice", "day"]  def can_be_segmented(s, word_list):     return s == "" or any(         s.startswith(word) and can_be_segmented(s[len(word):], word_list)         for word in word_list)  assert can_be_segmented(s, word_list) assert not can_be_segmented("whataniannicday", word_list) 

This code states that the string can be segmented if we can find a word so that the string begins by this word, and the rest of the string can itself be segmented.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: