Placement Prep

Word Break Problem: Dynamic Programming Solution with Code

The word break problem asks if a string can be split into dictionary words. DP table approach, Python and C++ code, and full complexity analysis.

By FACE Prep Team 5 min read
dynamic-programming word-break dsa placement-prep algorithms python cpp

The word break problem asks one question: given a string and a dictionary of valid words, can the string be split into a sequence of those words?

It looks deceptively simple. The naive approach (try every possible split, recurse on the pieces) runs into exponential time on strings where many prefixes overlap. The standard fix is a bottom-up dynamic programming table that runs in O(n²) time and O(n) space.

What the Word Break Problem Tests

The problem statement is:

  • Given a string s and a list of dictionary words, return True if s can be segmented into one or more words from the dictionary, or False if it cannot.

A concrete example:

  • s = "ilike", dict = {"i", "like", "sam", "sung"}
  • Expected output: True (splits as "i" + "like")

Another:

  • s = "ilikesam", dict = {"i", "like", "sam", "sung"}
  • Expected output: True (splits as "i" + "like" + "sam")

And a negative case:

  • s = "catsandog", dict = {"cats", "dog", "sand", "and", "cat"}
  • Expected output: False (no combination covers the full string)

It’s LeetCode 139: Word Break, one of the most commonly encountered string-DP problems in technical placement screening. Placement coding tests at product companies and some IT services firms include it in either the moderate or advanced tier.

Why the Recursive Approach Is Too Slow

The brute-force idea: at each position in the string, try every prefix. If the prefix is in the dictionary, recurse on the remaining suffix.

wordBreak("ilike", {"i", "like", "sam", "sung"})
    try "i"   → in dict → wordBreak("like", ...)
        try "l"    → not in dict
        try "li"   → not in dict
        try "lik"  → not in dict
        try "like" → in dict → wordBreak("", ...) → True  ✓
    try "il"  → not in dict
    try "ili" → not in dict
    ...

On a short string this terminates quickly. On a long string with many overlapping prefixes, the recursion tree balloons: the same suffix gets re-evaluated from multiple entry points. This is the overlapping-subproblems property that makes DP the right tool.

Without memoisation, the worst-case time is exponential. With memoisation or a bottom-up table, it drops to O(n²).

DP Table Approach: Step by Step

The key insight: define dp[i] as True if the substring s[0:i] can be segmented using dictionary words, and False otherwise.

Initialisation:

  • dp[0] = True — the empty prefix is always “breakable” (no characters, nothing needed).
  • dp[1] through dp[n] = False initially.

Transition:

For each position i from 1 to n:

  • For each split point j from 0 to i:
    • If dp[j] is True AND s[j:i] is in the dictionary, set dp[i] = True and break the inner loop.

Return: dp[n]

Worked trace: s = "ilike", dict = {"i", "like", "sam", "sung"}

  • dp[0] = True (base case)
  • dp[1]: j=0: dp[0] is True, s[0:1] = "i" is in dict — dp[1] = True
  • dp[2]: j=0: "il" not in dict; j=1: dp[1] is True, s[1:2] = "l" not in dict — dp[2] = False
  • dp[3]: j=0: "ili" not in dict; j=1: dp[1] is True, s[1:3] = "li" not in dict; j=2: dp[2] is False — dp[3] = False
  • dp[4]: j=0: "ilik" not in dict; j=1: dp[1] is True, s[1:4] = "lik" not in dict; j=2,3: dp[j] is False — dp[4] = False
  • dp[5]: j=0: "ilike" not in dict; j=1: dp[1] is True, s[1:5] = "like" is in dict — dp[5] = True
  • Return dp[5] = True

Code: Python and C++

Python

def word_break(s: str, word_dict: list[str]) -> bool:
    n = len(s)
    word_set = set(word_dict)          # O(1) average lookup
    dp = [False] * (n + 1)
    dp[0] = True                       # base case: empty prefix

    for i in range(1, n + 1):
        for j in range(i):
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break                  # no need to check further split points

    return dp[n]


# Driver
if __name__ == "__main__":
    dictionary = ["i", "like", "sam", "sung"]
    print(word_break("ilike", dictionary))    # True
    print(word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]))  # False

C++

#include <iostream>
#include <unordered_set>
#include <vector>
#include <string>
using namespace std;

bool wordBreak(const string& s, const vector<string>& wordDict) {
    int n = static_cast<int>(s.size());
    unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
    vector<bool> dp(n + 1, false);
    dp[0] = true;   // base case: empty prefix is breakable

    for (int i = 1; i <= n; ++i) {
        for (int j = 0; j < i; ++j) {
            if (dp[j] && wordSet.count(s.substr(j, i - j))) {
                dp[i] = true;
                break;
            }
        }
    }

    return dp[n];
}

int main() {
    vector<string> dict = {"i", "like", "sam", "sung"};
    cout << (wordBreak("ilike", dict) ? "True" : "False") << "\n";  // True

    vector<string> dict2 = {"cats", "dog", "sand", "and", "cat"};
    cout << (wordBreak("catsandog", dict2) ? "True" : "False") << "\n";  // False
    return 0;
}

Complexity Analysis

Time complexity:

  • The outer loop runs n iterations (positions 1 through n).
  • The inner loop runs at most i iterations per position, up to n total per outer step.
  • Each dictionary lookup involves comparing a substring s[j:i] of length up to n. With a hash set, average lookup cost is O(k) where k is the substring length.
  • Total worst-case: O(n × n × k) = O(n²k). If k is treated as a constant (short words), this simplifies to O(n²).

Space complexity:

  • The dp array holds n + 1 booleans: O(n).
  • The hash set holds all dictionary words: O(m × k) where m is the dictionary size and k is average word length. This is typically dominated by the input itself, so the effective auxiliary space is O(n).

For a more detailed derivation of auxiliary vs. total space, see the space complexity of algorithms guide.

Where Word Break Shows Up in Real Hiring

String and DP problems overlap most heavily in intermediate coding rounds. The word break problem specifically tests whether you:

  • Recognise the overlapping-subproblems pattern (calling for DP over brute-force recursion).
  • Set the correct base case (dp[0] = True, not dp[0] = False).
  • Index substrings correctly without off-by-one errors.

Off-by-one errors in the inner loop bounds are the single most common wrong answer on this problem in automated coding assessments. Check your inner loop ends at j < i, not j <= i.

Product company interviews sometimes extend the problem: return all valid segmentations (requires backtracking over the dp array), or count them (requires a count array alongside the boolean array). Both are O(n²) DP variants, so the same tabulation logic applies. For other array-based DP patterns in the same difficulty class, the equilibrium index problem and the symmetric pairs problem are worth working through back-to-back.

The GeeksforGeeks Word Break (DP-32) page lists several test cases with edge conditions. Worth running your implementation against those boundary inputs before any coding round.

Tokenisation in modern NLP uses a structurally similar idea: given a vocabulary of sub-word units, split raw text into the shortest or most probable sequence of valid tokens. The dp array you filled above, position by position, does the same thing at a smaller scale. If you want to see that process running on live strings, feeding text to a real tokeniser and watching it segment, TinkerLLM at ₹299 is a hands-on environment where you can run that experiment without any local setup.

Primary sources

Frequently asked questions

Is the word break problem solved with greedy or dynamic programming?

Dynamic programming. A greedy approach (always matching the longest possible prefix) fails on inputs where the greedy choice blocks a valid split that a shorter prefix would have allowed. DP checks all possible split points and avoids that trap.

What is the time complexity of the word break DP solution?

O(n² × k), where n is the string length and k is the average word length in the dictionary. The outer loop runs n times, the inner loop runs up to n times per iteration, and each dictionary lookup requires comparing a substring of length up to k. If dictionary lookups are treated as O(1) using a hash set, the complexity simplifies to O(n²).

Can the word break problem return multiple valid segmentations?

The basic DP solution returns only True or False, not the actual segmentation. Recovering all valid segmentations requires a backtracking step over the filled dp array, or a variant that stores the split points alongside the boolean.

Does the word break problem appear in TCS NQT or AMCAT coding rounds?

Yes. String and DP combinations appear in TCS NQT Digital and NQT Ninja coding sections and in AMCAT Automata. The word break variant is less common than Longest Common Subsequence or Coin Change, but it tests dictionary lookup combined with DP, a pattern that interviewers at product companies use as a follow-up to simpler string questions.

What is the difference between memoisation and tabulation for word break?

Memoisation is top-down: start from the full string, recursively check prefixes, and cache results to avoid recomputation. Tabulation is bottom-up: fill a dp array from dp[0] upward. Both give O(n²) time and O(n) space. The tabulation version is usually preferred in placement coding rounds because it avoids recursion depth limits on long strings.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF