KMP Algorithm for Longest Prefix Suffix problem
The Longest Prefix Suffix (LPS) problem is central to the Knuth-Morris-Pratt (KMP) algorithm for pattern matching. Here's a detailed explanation:
Problem: Longest Prefix Suffix (LPS)
For a given string s, the LPS array (or prefix table) at position i stores the length of the longest prefix of s[0:i] that is also a suffix of s[0:i], excluding the entire string itself.
Example
For the string s = "abab", the LPS array is [0, 0, 1, 2]:
LPS[0] = 0because "a" has no prefix that is also a suffix.LPS[1] = 0because "ab" has no prefix that matches a suffix.LPS[2] = 1because "a" is the longest prefix of "aba" that is also a suffix.LPS[3] = 2because "ab" is the longest prefix of "abab" that is also a suffix.
Importance in the KMP Algorithm
The KMP algorithm uses the LPS array to avoid redundant comparisons when a mismatch occurs during pattern matching. Specifically:
- If a mismatch occurs after
jmatches, instead of restarting the comparison from the beginning of the pattern, the algorithm uses the LPS array to shift the pattern intelligently. - The LPS value at position
j-1determines the next position to check.
The Longest Prefix Suffix (LPS) array is a fundamental component of the Knuth-Morris-Pratt (KMP) algorithm, which is used for efficient pattern matching in strings. The LPS array captures the lengths of the longest proper prefixes of a pattern that are also suffixes, enabling the KMP algorithm to skip unnecessary comparisons during the search process.
Definition
For a given pattern P of length n, the LPS array is an array lps where lps[i] represents the length of the longest proper prefix of the substring P[0:i] that is also a suffix of this substring. A proper prefix is a prefix that is not equal to the entire string.
Construction of the LPS Array
The LPS array can be constructed in linear time using the following algorithm:
Initialization:
- Create an array
lpsof lengthnand initialize all elements to0. - Set
lps[0]to0because a single character has no proper prefix. - Initialize two pointers:
len(length of the current longest prefix suffix) to0andi(current position in the pattern) to1.
- Create an array
Iteration:
- While
i < n, perform the following checks:- Match: If
P[i] == P[len]:- Increment
lenby1. - Set
lps[i]tolen. - Increment
iby1.
- Increment
- Mismatch: If
P[i] != P[len]:- If
lenis not0, updatelentolps[len - 1]. This step utilizes the previously computed LPS values to avoid unnecessary comparisons. - If
lenis0, setlps[i]to0and incrementiby1.
- If
- Match: If
- While
Example
Consider the pattern P = "ABABAC":
Initialization:
lps = [0, 0, 0, 0, 0, 0]len = 0,i = 1
Iteration:
- At
i = 1:P[1] = 'B'andP[0] = 'A'(mismatch)lenis0, so setlps[1] = 0and incrementito2.
- At
i = 2:P[2] = 'A'andP[0] = 'A'(match)- Increment
lento1, setlps[2] = 1, and incrementito3.
- Increment
- At
i = 3:P[3] = 'B'andP[1] = 'B'(match)- Increment
lento2, setlps[3] = 2, and incrementito4.
- Increment
- At
i = 4:P[4] = 'A'andP[2] = 'A'(match)- Increment
lento3, setlps[4] = 3, and incrementito5.
- Increment
- At
i = 5:P[5] = 'C'andP[3] = 'B'(mismatch)lenis not0, so updatelentolps[2] = 1.- Now,
P[5] = 'C'andP[1] = 'B'(mismatch)- Update
lentolps[0] = 0.
- Update
- Set
lps[5] = 0and incrementito6.
- At
The final LPS array is [0, 0, 1, 2, 3, 0].
MOST IMPORTANT POINT:-
In the construction of the Longest Prefix Suffix (LPS) array for the Knuth-Morris-Pratt (KMP) algorithm, the step where len is updated to lps[len - 1] during a mismatch is crucial for efficiently computing the LPS values.
Context
While building the LPS array, two pointers are maintained:
i: Iterates through the pattern.len: Tracks the length of the current longest proper prefix which is also a suffix.
When a mismatch occurs between pattern[i] and pattern[len], the algorithm needs to determine the next potential value for len without redundant comparisons.
The Step: len = lps[len - 1]
If len is not zero and a mismatch occurs, the algorithm updates len to lps[len - 1]. This adjustment leverages previously computed LPS values to find the next longest proper prefix that could potentially match the suffix ending at pattern[i - 1].
Why This Works:
Avoiding Redundant Comparisons: By setting
len = lps[len - 1], the algorithm skips over prefixes that have already been matched and found to be proper prefixes and suffixes. This prevents re-evaluating these segments, enhancing efficiency.Utilizing Precomputed Information: The LPS array stores lengths of the longest proper prefixes that are also suffixes for all sub-patterns. By referencing
lps[len - 1], the algorithm quickly identifies the next potential prefix length that could align with the current suffix, based on prior computations.Ensuring Correctness: This step ensures that all possible proper prefixes are considered in decreasing order of length, maintaining the correctness of the LPS array construction.
Illustrative Example:
Consider the pattern P = "ABABC":
- At
i = 4(P[4] = 'C'), supposelen = 2. - The current prefix of length 2 is "AB", and the suffix of length 2 ending at
P[3]is also "AB". - Since
P[4]('C') does not matchP[2]('A'), a mismatch occurs. - Instead of resetting
lento 0, the algorithm updateslentolps[len - 1], which islps[1] = 0. - This adjustment allows the algorithm to consider the next possible proper prefix length without redundant comparisons.
By updating len to lps[len - 1], the algorithm efficiently narrows down the search for the longest proper prefix that is also a suffix, ensuring that all potential matches are considered systematically.
For a more in-depth understanding, you might find the following video helpful:
Comments
Post a Comment