KMP Algorithm for Longest Prefix Suffix problem
The Longest Prefix Suffix (LPS) problem is central to the Knuth-Morris-Pratt (KMP) algorithm for pattern matching. Here's a detailed explanation:
Problem: Longest Prefix Suffix (LPS)
For a given string s
, the LPS array (or prefix table) at position i
stores the length of the longest prefix of s[0:i]
that is also a suffix of s[0:i]
, excluding the entire string itself.
Example
For the string s = "abab"
, the LPS array is [0, 0, 1, 2]
:
LPS[0] = 0
because "a" has no prefix that is also a suffix.LPS[1] = 0
because "ab" has no prefix that matches a suffix.LPS[2] = 1
because "a" is the longest prefix of "aba" that is also a suffix.LPS[3] = 2
because "ab" is the longest prefix of "abab" that is also a suffix.
Importance in the KMP Algorithm
The KMP algorithm uses the LPS array to avoid redundant comparisons when a mismatch occurs during pattern matching. Specifically:
- If a mismatch occurs after
j
matches, instead of restarting the comparison from the beginning of the pattern, the algorithm uses the LPS array to shift the pattern intelligently. - The LPS value at position
j-1
determines the next position to check.
The Longest Prefix Suffix (LPS) array is a fundamental component of the Knuth-Morris-Pratt (KMP) algorithm, which is used for efficient pattern matching in strings. The LPS array captures the lengths of the longest proper prefixes of a pattern that are also suffixes, enabling the KMP algorithm to skip unnecessary comparisons during the search process.
Definition
For a given pattern P
of length n
, the LPS array is an array lps
where lps[i]
represents the length of the longest proper prefix of the substring P[0:i]
that is also a suffix of this substring. A proper prefix is a prefix that is not equal to the entire string.
Construction of the LPS Array
The LPS array can be constructed in linear time using the following algorithm:
Initialization:
- Create an array
lps
of lengthn
and initialize all elements to0
. - Set
lps[0]
to0
because a single character has no proper prefix. - Initialize two pointers:
len
(length of the current longest prefix suffix) to0
andi
(current position in the pattern) to1
.
- Create an array
Iteration:
- While
i < n
, perform the following checks:- Match: If
P[i] == P[len]
:- Increment
len
by1
. - Set
lps[i]
tolen
. - Increment
i
by1
.
- Increment
- Mismatch: If
P[i] != P[len]
:- If
len
is not0
, updatelen
tolps[len - 1]
. This step utilizes the previously computed LPS values to avoid unnecessary comparisons. - If
len
is0
, setlps[i]
to0
and incrementi
by1
.
- If
- Match: If
- While
Example
Consider the pattern P = "ABABAC"
:
Initialization:
lps = [0, 0, 0, 0, 0, 0]
len = 0
,i = 1
Iteration:
- At
i = 1
:P[1] = 'B'
andP[0] = 'A'
(mismatch)len
is0
, so setlps[1] = 0
and incrementi
to2
.
- At
i = 2
:P[2] = 'A'
andP[0] = 'A'
(match)- Increment
len
to1
, setlps[2] = 1
, and incrementi
to3
.
- Increment
- At
i = 3
:P[3] = 'B'
andP[1] = 'B'
(match)- Increment
len
to2
, setlps[3] = 2
, and incrementi
to4
.
- Increment
- At
i = 4
:P[4] = 'A'
andP[2] = 'A'
(match)- Increment
len
to3
, setlps[4] = 3
, and incrementi
to5
.
- Increment
- At
i = 5
:P[5] = 'C'
andP[3] = 'B'
(mismatch)len
is not0
, so updatelen
tolps[2] = 1
.- Now,
P[5] = 'C'
andP[1] = 'B'
(mismatch)- Update
len
tolps[0] = 0
.
- Update
- Set
lps[5] = 0
and incrementi
to6
.
- At
The final LPS array is [0, 0, 1, 2, 3, 0]
.
MOST IMPORTANT POINT:-
In the construction of the Longest Prefix Suffix (LPS) array for the Knuth-Morris-Pratt (KMP) algorithm, the step where len
is updated to lps[len - 1]
during a mismatch is crucial for efficiently computing the LPS values.
Context
While building the LPS array, two pointers are maintained:
i
: Iterates through the pattern.len
: Tracks the length of the current longest proper prefix which is also a suffix.
When a mismatch occurs between pattern[i]
and pattern[len]
, the algorithm needs to determine the next potential value for len
without redundant comparisons.
The Step: len = lps[len - 1]
If len
is not zero and a mismatch occurs, the algorithm updates len
to lps[len - 1]
. This adjustment leverages previously computed LPS values to find the next longest proper prefix that could potentially match the suffix ending at pattern[i - 1]
.
Why This Works:
Avoiding Redundant Comparisons: By setting
len = lps[len - 1]
, the algorithm skips over prefixes that have already been matched and found to be proper prefixes and suffixes. This prevents re-evaluating these segments, enhancing efficiency.Utilizing Precomputed Information: The LPS array stores lengths of the longest proper prefixes that are also suffixes for all sub-patterns. By referencing
lps[len - 1]
, the algorithm quickly identifies the next potential prefix length that could align with the current suffix, based on prior computations.Ensuring Correctness: This step ensures that all possible proper prefixes are considered in decreasing order of length, maintaining the correctness of the LPS array construction.
Illustrative Example:
Consider the pattern P = "ABABC"
:
- At
i = 4
(P[4] = 'C'
), supposelen = 2
. - The current prefix of length 2 is "AB", and the suffix of length 2 ending at
P[3]
is also "AB". - Since
P[4]
('C') does not matchP[2]
('A'), a mismatch occurs. - Instead of resetting
len
to 0, the algorithm updateslen
tolps[len - 1]
, which islps[1] = 0
. - This adjustment allows the algorithm to consider the next possible proper prefix length without redundant comparisons.
By updating len
to lps[len - 1]
, the algorithm efficiently narrows down the search for the longest proper prefix that is also a suffix, ensuring that all potential matches are considered systematically.
For a more in-depth understanding, you might find the following video helpful:
Comments
Post a Comment