Count Common Subsequences of Two Strings
Count common subsequences of two strings using DP: derive the recurrence, trace a 4x4 table, implement in Python, and cut space to O(min(m,n)) with a rolling array.
Counting common subsequences of two strings is a DP problem distinct from LCS: instead of finding the length of one longest shared subsequence, the goal is to count every shared subsequence.
What This Problem Actually Asks
A subsequence is a sequence derived from another by deleting some elements without changing the order of the remaining ones. Characters do not need to be adjacent. From “abcd”, both “ac” and “abd” are valid subsequences; “ca” is not, because the original order is reversed.
A common subsequence of two strings appears as a subsequence of both. For “ab” and “ab”, the common subsequences are “a”, “b”, and “ab”, giving a count of 3. For “abc” and “abc”, there are 7: “a”, “b”, “c”, “ab”, “ac”, “bc”, and “abc”. You can verify this by listing all subsequences of “abc” (7 non-empty ones) and confirming each also appears in the second string.
Two clarifications before the algorithm:
- This is NOT the Longest Common Subsequence (LCS) problem. LCS returns the length of the single longest shared subsequence. Here, every shared subsequence adds 1 to the count.
- When strings have repeated characters, the DP counts by position pairs, not by distinct string values. For “aa” and “aa”, the count is 5 (four single-character position pairs plus one two-character pair), not 2 distinct strings. For strings with all-distinct characters, both interpretations agree.
The DP Recurrence, Term by Term
Let dp[i][j] = number of common subsequences of s1[0..i-1] and s2[0..j-1]. Base cases: dp[i][0] = 0 and dp[0][j] = 0 for all i, j (no common subsequences when either prefix is empty).
For the general case, consider adding s1[i-1] and s2[j-1]:
When s1[i-1] != s2[j-1]:
The new characters cannot form any new matching pair. The count equals all common subsequences using s1[0..i-2] and s2[0..j-1], plus all using s1[0..i-1] and s2[0..j-2], minus the overlap (s1[0..i-2] and s2[0..j-2], counted in both terms):
dp[i][j] = dp[i-1][j] + dp[i][j-1] - dp[i-1][j-1]
When s1[i-1] == s2[j-1]:
Same inclusion-exclusion as the no-match case, plus the new matches: every common subsequence of the prefixes s1[0..i-2] and s2[0..j-2] can be extended by appending the matching character, and the matching character alone forms one new common subsequence. That adds dp[i-1][j-1] + 1. The double-subtraction and re-addition cancel:
dp[i][j] = dp[i-1][j] + dp[i][j-1] + 1
The unified form covers both cases in one expression:
dp[i][j] = dp[i-1][j] + dp[i][j-1] - dp[i-1][j-1]
+ (dp[i-1][j-1] + 1 if s1[i-1] == s2[j-1] else 0)
Traced on a 4×4 Table
Using s1 = “abc” and s2 = “abc” (expected count: 7):
| "" | a | b | c | |
|---|---|---|---|---|
| "" | 0 | 0 | 0 | 0 |
| a | 0 | 1 | 1 | 1 |
| b | 0 | 1 | 3 | 3 |
| c | 0 | 1 | 3 | 7 |
Three representative cells:
dp[1][1]:s1[0]='a'matchess2[0]='a'. Match rule:dp[0][1] + dp[1][0] + 1 = 0 + 0 + 1 = 1. The single common subsequence at this point is “a”.dp[2][2]:s1[1]='b'matchess2[1]='b'. Match rule:dp[1][2] + dp[2][1] + 1 = 1 + 1 + 1 = 3. Those three are “a”, “b”, and “ab”.dp[3][3]:s1[2]='c'matchess2[2]='c'. Match rule:dp[2][3] + dp[3][2] + 1 = 3 + 3 + 1 = 7. All seven common subsequences are counted.
dp[2][3] uses the no-match case: s1[1]='b' versus s2[2]='c'. Result: dp[1][3] + dp[2][2] - dp[1][2] = 1 + 3 - 1 = 3. The subtraction removes the double-counted overlap.
Python Implementation
The tabulated solution follows the recurrence directly:
def count_common_subsequences(s1: str, s2: str) -> int:
m, n = len(s1), len(s2)
# dp[i][j] = common subsequence count for s1[0..i-1] and s2[0..j-1]
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
dp[i][j] = dp[i - 1][j] + dp[i][j - 1] + 1
else:
dp[i][j] = dp[i - 1][j] + dp[i][j - 1] - dp[i - 1][j - 1]
return dp[m][n]
# Verification
print(count_common_subsequences("ab", "ab")) # 3
print(count_common_subsequences("abc", "abc")) # 7
print(count_common_subsequences("abcd", "abc")) # 7
Complexity:
- Time: O(m×n), two nested loops, each cell computed in constant time.
- Space: O(m×n) for the full table.
For the space complexity analysis of this recurrence: each row i reads only from row i-1, so two rows are sufficient.
Rolling-Array Optimisation: O(min(m,n)) Space
def count_common_subsequences_optimised(s1: str, s2: str) -> int:
# Swap so s2 is shorter, minimising the row length
if len(s1) < len(s2):
s1, s2 = s2, s1
m, n = len(s1), len(s2)
prev = [0] * (n + 1) # represents dp[i-1][*]
for i in range(1, m + 1):
curr = [0] * (n + 1)
for j in range(1, n + 1):
if s1[i - 1] == s2[j - 1]:
curr[j] = prev[j] + curr[j - 1] + 1
else:
curr[j] = prev[j] + curr[j - 1] - prev[j - 1]
prev = curr
return prev[n]
Space is now O(min(m,n)) because only two arrays of length min(m,n)+1 are alive at once. Time stays O(m×n).
Where This Shows Up in Technical Rounds
DP problems appear across most Indian placement assessments: TCS NQT Digital, AMCAT Automata, and Infosys InfyTQ all include coding questions that require deriving and implementing a recurrence. The common-subsequence count tests two specific skills: setting up a 2-D DP table with correct base cases, and applying the inclusion-exclusion correction in the no-match case.
The same inclusion-exclusion pattern appears in array-pairing problems and other DP variants. Getting one right transfers directly to the others. For follow-up questions in a technical round, the expected extension is the rolling-array space optimisation, specifically the O(n) discipline also seen in linear-scan and DP problems across the corpus.
The O(min(m,n)) space reduction covered above is exactly the kind of follow-up that turns a correct solution into a strong one in an interview. The same reasoning applies to LCS, edit distance, and knapsack variants: two rows of state are enough, so carry only two. If you want to see this spatial discipline applied to language model inference (where “two rows of state” becomes the key-value cache), TinkerLLM starts at ₹299 and walks you through building a working LLM-backed application where those tradeoffs are visible in real code.
Primary sources
Frequently asked questions
Is counting common subsequences the same as LCS?
No. LCS (Longest Common Subsequence) returns the length of one longest shared subsequence. Counting common subsequences returns the total number of shared subsequences. For 'abc' and 'abc', LCS length is 3 but the count is 7.
Why does the DP for 'aa' and 'aa' return 5, not 2?
The DP counts common subsequences by position pairs, not by distinct string values. 'aa' versus 'aa' has four single-character position pairs plus one two-character pair, totalling 5. If you count only distinct string values, the answer is 2.
What is the difference between a subsequence and a substring?
A subsequence maintains the original order of characters but the characters do not need to be adjacent. A substring is contiguous. 'ac' is a subsequence of 'abc' but not a substring of it. 'ab' is both a subsequence and a substring of 'abc'.
What is the time and space complexity of this solution?
Time complexity is O(m×n), where m and n are the lengths of the two strings. Standard space complexity is O(m×n) for the full DP table. A rolling-array reduces space to O(min(m,n)) by keeping only two rows at a time.
Why do we initialize dp[i][0] and dp[0][j] to 0?
dp[i][0] counts common subsequences when the second string is empty. An empty string has no non-empty subsequences, so there are zero common subsequences with anything. The same reasoning applies to dp[0][j].
What does the -dp[i-1][j-1] term do in the no-match case?
When characters do not match, dp[i-1][j] and dp[i][j-1] both count the common subsequences of the overlapping prefix s1[0..i-2] and s2[0..j-2], so that region gets counted twice. Subtracting dp[i-1][j-1] removes the double count. This is the inclusion-exclusion principle applied to two overlapping sets.
Does this problem appear in placement tests?
Dynamic programming problems appear in the coding sections of TCS NQT Digital, Infosys InfyTQ, and AMCAT Automata. The common-subsequence count problem tests whether you can derive and implement a DP recurrence from first principles, which is a standard skill in technical rounds.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)