Word Break Problem: Dynamic Programming Solution with Code
The word break problem asks if a string can be split into dictionary words. DP table approach, Python and C++ code, and full complexity analysis.
The word break problem asks one question: given a string and a dictionary of valid words, can the string be split into a sequence of those words?
It looks deceptively simple. The naive approach (try every possible split, recurse on the pieces) runs into exponential time on strings where many prefixes overlap. The standard fix is a bottom-up dynamic programming table that runs in O(n²) time and O(n) space.
What the Word Break Problem Tests
The problem statement is:
- Given a string
sand a list of dictionary words, returnTrueifscan be segmented into one or more words from the dictionary, orFalseif it cannot.
A concrete example:
s = "ilike", dict ={"i", "like", "sam", "sung"}- Expected output:
True(splits as"i"+"like")
Another:
s = "ilikesam", dict ={"i", "like", "sam", "sung"}- Expected output:
True(splits as"i"+"like"+"sam")
And a negative case:
s = "catsandog", dict ={"cats", "dog", "sand", "and", "cat"}- Expected output:
False(no combination covers the full string)
It’s LeetCode 139: Word Break, one of the most commonly encountered string-DP problems in technical placement screening. Placement coding tests at product companies and some IT services firms include it in either the moderate or advanced tier.
Why the Recursive Approach Is Too Slow
The brute-force idea: at each position in the string, try every prefix. If the prefix is in the dictionary, recurse on the remaining suffix.
wordBreak("ilike", {"i", "like", "sam", "sung"})
try "i" → in dict → wordBreak("like", ...)
try "l" → not in dict
try "li" → not in dict
try "lik" → not in dict
try "like" → in dict → wordBreak("", ...) → True ✓
try "il" → not in dict
try "ili" → not in dict
...
On a short string this terminates quickly. On a long string with many overlapping prefixes, the recursion tree balloons: the same suffix gets re-evaluated from multiple entry points. This is the overlapping-subproblems property that makes DP the right tool.
Without memoisation, the worst-case time is exponential. With memoisation or a bottom-up table, it drops to O(n²).
DP Table Approach: Step by Step
The key insight: define dp[i] as True if the substring s[0:i] can be segmented using dictionary words, and False otherwise.
Initialisation:
dp[0] = True— the empty prefix is always “breakable” (no characters, nothing needed).dp[1]throughdp[n]=Falseinitially.
Transition:
For each position i from 1 to n:
- For each split point
jfrom 0 toi:- If
dp[j]isTrueANDs[j:i]is in the dictionary, setdp[i] = Trueand break the inner loop.
- If
Return: dp[n]
Worked trace: s = "ilike", dict = {"i", "like", "sam", "sung"}
dp[0] = True(base case)dp[1]: j=0:dp[0]is True,s[0:1]="i"is in dict —dp[1] = Truedp[2]: j=0:"il"not in dict; j=1:dp[1]is True,s[1:2]="l"not in dict —dp[2] = Falsedp[3]: j=0:"ili"not in dict; j=1:dp[1]is True,s[1:3]="li"not in dict; j=2:dp[2]is False —dp[3] = Falsedp[4]: j=0:"ilik"not in dict; j=1:dp[1]is True,s[1:4]="lik"not in dict; j=2,3:dp[j]is False —dp[4] = Falsedp[5]: j=0:"ilike"not in dict; j=1:dp[1]is True,s[1:5]="like"is in dict —dp[5] = True- Return
dp[5] = True✓
Code: Python and C++
Python
def word_break(s: str, word_dict: list[str]) -> bool:
n = len(s)
word_set = set(word_dict) # O(1) average lookup
dp = [False] * (n + 1)
dp[0] = True # base case: empty prefix
for i in range(1, n + 1):
for j in range(i):
if dp[j] and s[j:i] in word_set:
dp[i] = True
break # no need to check further split points
return dp[n]
# Driver
if __name__ == "__main__":
dictionary = ["i", "like", "sam", "sung"]
print(word_break("ilike", dictionary)) # True
print(word_break("catsandog", ["cats", "dog", "sand", "and", "cat"])) # False
C++
#include <iostream>
#include <unordered_set>
#include <vector>
#include <string>
using namespace std;
bool wordBreak(const string& s, const vector<string>& wordDict) {
int n = static_cast<int>(s.size());
unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
vector<bool> dp(n + 1, false);
dp[0] = true; // base case: empty prefix is breakable
for (int i = 1; i <= n; ++i) {
for (int j = 0; j < i; ++j) {
if (dp[j] && wordSet.count(s.substr(j, i - j))) {
dp[i] = true;
break;
}
}
}
return dp[n];
}
int main() {
vector<string> dict = {"i", "like", "sam", "sung"};
cout << (wordBreak("ilike", dict) ? "True" : "False") << "\n"; // True
vector<string> dict2 = {"cats", "dog", "sand", "and", "cat"};
cout << (wordBreak("catsandog", dict2) ? "True" : "False") << "\n"; // False
return 0;
}
Complexity Analysis
Time complexity:
- The outer loop runs
niterations (positions 1 throughn). - The inner loop runs at most
iiterations per position, up tontotal per outer step. - Each dictionary lookup involves comparing a substring
s[j:i]of length up ton. With a hash set, average lookup cost is O(k) wherekis the substring length. - Total worst-case: O(n × n × k) = O(n²k). If
kis treated as a constant (short words), this simplifies to O(n²).
Space complexity:
- The
dparray holdsn + 1booleans: O(n). - The hash set holds all dictionary words: O(m × k) where
mis the dictionary size andkis average word length. This is typically dominated by the input itself, so the effective auxiliary space is O(n).
For a more detailed derivation of auxiliary vs. total space, see the space complexity of algorithms guide.
Where Word Break Shows Up in Real Hiring
String and DP problems overlap most heavily in intermediate coding rounds. The word break problem specifically tests whether you:
- Recognise the overlapping-subproblems pattern (calling for DP over brute-force recursion).
- Set the correct base case (
dp[0] = True, notdp[0] = False). - Index substrings correctly without off-by-one errors.
Off-by-one errors in the inner loop bounds are the single most common wrong answer on this problem in automated coding assessments. Check your inner loop ends at j < i, not j <= i.
Product company interviews sometimes extend the problem: return all valid segmentations (requires backtracking over the dp array), or count them (requires a count array alongside the boolean array). Both are O(n²) DP variants, so the same tabulation logic applies. For other array-based DP patterns in the same difficulty class, the equilibrium index problem and the symmetric pairs problem are worth working through back-to-back.
The GeeksforGeeks Word Break (DP-32) page lists several test cases with edge conditions. Worth running your implementation against those boundary inputs before any coding round.
Tokenisation in modern NLP uses a structurally similar idea: given a vocabulary of sub-word units, split raw text into the shortest or most probable sequence of valid tokens. The dp array you filled above, position by position, does the same thing at a smaller scale. If you want to see that process running on live strings, feeding text to a real tokeniser and watching it segment, TinkerLLM at ₹299 is a hands-on environment where you can run that experiment without any local setup.
Primary sources
Frequently asked questions
Is the word break problem solved with greedy or dynamic programming?
Dynamic programming. A greedy approach (always matching the longest possible prefix) fails on inputs where the greedy choice blocks a valid split that a shorter prefix would have allowed. DP checks all possible split points and avoids that trap.
What is the time complexity of the word break DP solution?
O(n² × k), where n is the string length and k is the average word length in the dictionary. The outer loop runs n times, the inner loop runs up to n times per iteration, and each dictionary lookup requires comparing a substring of length up to k. If dictionary lookups are treated as O(1) using a hash set, the complexity simplifies to O(n²).
Can the word break problem return multiple valid segmentations?
The basic DP solution returns only True or False, not the actual segmentation. Recovering all valid segmentations requires a backtracking step over the filled dp array, or a variant that stores the split points alongside the boolean.
Does the word break problem appear in TCS NQT or AMCAT coding rounds?
Yes. String and DP combinations appear in TCS NQT Digital and NQT Ninja coding sections and in AMCAT Automata. The word break variant is less common than Longest Common Subsequence or Coin Change, but it tests dictionary lookup combined with DP, a pattern that interviewers at product companies use as a follow-up to simpler string questions.
What is the difference between memoisation and tabulation for word break?
Memoisation is top-down: start from the full string, recursively check prefixes, and cache results to avoid recomputation. Tabulation is bottom-up: fill a dp array from dp[0] upward. Both give O(n²) time and O(n) space. The tabulation version is usually preferred in placement coding rounds because it avoids recursion depth limits on long strings.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)