Knuth–Morris–Pratt (KMP) Pattern Matching Substring Search - First Occurrence Of Substring

  • 🎬 Video
  • ℹ️ Description
preview_player
UCmJz2DV1a3yfgrR7GqRtUUA

📹 Intuitive Video Explanations
🏃 Run Code As You Learn
💾 Save Progress
❓New Unseen Questions
🔎 Get All Solutions

Question: Given a string s and a pattern p, determine if the pattern exists in the string. Return the index of where the first occurrence starts.



The Brute Force

The naive approach to solving this is looking in s for the first character in p.

If a match is found we begin a search from that index, call it i (for intersect).

We compare the 2nd character of p with index i + 1 of s
We compare the 3rd character of p with index i + 2 of s

...and so on until we have matched to all of p without either having overrun s or having found a mismatch between characters being compared.

We can then return i as our answer.

It doesn’t work well in cases where we see many matching characters followed by a mismatching character.

Complexities:

Time: O( len(s) * len(p) )

In a simple worst case we can have
s = "aaaaaab"
p = "aaab"

The problem is that for each first character match we have the potential to naively go into a search on a string that would never yield a correct answer repeatedly.


Other Algorithms

There are three linear time string matching algorithms: KMP (nuth–Morris–Pratt), Boyer-Moore, and Rabin-Karp.

Of these, Rabin-Karp is by far the simplest to understand and implement


Analysis

The time complexity of the KMP algorithm is O(len(s) + len(p)) "linear" in the worst case.

The key behind KMP is that it takes advantage of the succesful character checks during an unsuccessful pattern comparison subroutine.

We may have a series of many comparisons that succeed and then even if one fails at the end, we should not repeat the comparison work done since we already saw that a series matched.

What we will do is very similar to the naive algorithm, it is just that we save comparisons by tracking the longest propert prefixes of pattern that are also suffixes.

The key is that everytime we have a mismatch we try our best to prevent going backwards in s and repeating comparisons.


Algorithm Steps

We will preprocess the pattern string and create an array that indicates the longest proper prefix which is also suffix at each point in the pattern string.

A proper prefix does not include the original string.

For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”.

For example, suffixes of "ABC" are, "", "C", "BC", and "ABC". Proper prefixes are "", "C", and "BC".

Why do we care about these??

We know all characters behind our mismatch character match.

If we can find the length of the longest prefix that matches a suffix to that point, we can skip len(prefix) comparisons at the beginning.

The key reason we care about the prefix to suffix is because we want to "teleport" back to as early in the string to the point that we still know that there is a match.

Our goal is to minimize going backwards in our search string.


Complexities:

Time: O( len(p) + len(s) )

We spend len(p) time to build the prefix-suffix table and we spend len(s) time for the traversal on average.

Space: O( len(p) )

Our prefix-suffix table is going to be the length of the pattern string.


++++++++++++++++++++++++++++++++++++++++++++++++++






++++++++++++++++++++++++++++++++++++++++++++++++++

This question is number 7.13 in "Elements of Programming Interviews" by Adnan Aziz (they do Rabin-Karp but same problem, different algorithm)

💬 Comments
Author

Table of Contents: (I'm literally screaming. I'm sorry. I want to redo this video to make it better.)

Introducing The Creators The KMP Algorithm* 0:00 - 0:15
Problem Introduction With The Naive Approach 0:15 - 2:30
Why The Naive Approach Is Not Good 2:30 - 2:45
Walkthrough How The Algorithm Works 2:45 - 8:14
Building The Suffix-Proper-Prefix Table 8:14 - 12:34
Using The Suffix-Proper-Prefix Table In Traversal Walkthrough 12:34 - 15:08
Time & Space For The Table Build Step 15:08 - 15:51
Time & Space For The Traversal Step 15:51 - 16:08
Overall Time & Space 16:08 - 16:25
Summarizing Our Learnings 16:25 - 16:40
Wrap Up (Begging For More Subs) 16:40 - 17:05

*All 3 of the creators published the final paper together.

Author — Back To Back SWE

Author

An algorithm invented by 3 people and they expect us to come up with this in 45 min. Makes sense

Author — Ahmed Syed

Author

I never knew I needed this kind of teaching which makes me feel like im scolded in order to understand why a box is incremented.

Author — shinra desune

Author

Thanks to you I'm one less algoritm away towards achieving my dream. So thanks for the good work and keep it up ;))

Author — fire science

Author

No it was the perfect pace! Good way to keep people engaged in the explanation :)

Author — teemu828

Author

I really like your passion and felt very involved during the video
So far it’s the most approachable KMP video I’ve seen.
I vote for keeping the speed and voice as is, they allow the video to stand out in a good way

Author — no no

Author

thank you for this content, it helped me a ton!! I loved how hyped you were explaining this algorithm, great job!

Author — Ghabriel Mielli

Author

Now that's a cool explanation of a seemingly difficult concept. The thing is you cannot watch this video without some sort of preparation. You have to understand the limitations of the naive approach and then think of a logical method/alternative to overcome that. Only after that, when you watch the video and implement the algo, would it become clear to you...


Good job with the video, guys!!

Author — Rahul Saxena

Author

You're the best explainer of algorithms I've ever seen. The extra very visible colored text on the screen while you're explaining also makes a huge difference.

Author — GT 77

Author

Thank you, bro, for the brief explanation. I have one question for string pattern matching in a different approach. What does it make the two-pointer technique (sliding widow technique) slower than this Algorithm?

Author — siva reddy

Author

I'm doing leetcode looking for jobs right now, and this was really helpful! I actually understood how the algorithm works for once, instead of the video just showing the algorithm working. Thanks!

Author — Raine Hu

Author

Your explanations are amazing! This is the first time I fully understand the kmp algorithm. Thanks a lot! Keep up the good work ;)

Author — Sanskriti Tomar

Author

Good lord glad I found you! This is the best explanation I've found! Thank you! Will be following. Don't slow down your enthusiasm makes it fun.

Author — Nick Kiermaier

Author

I loved how you explained KMP algorithm really neat! But one question, when doing preprocessing, when the j index reaches the end of the pattern, why does 'i' index have to go back to the value of the previous index 'i - 1'?

Author — Lee Matt

Author

I had a hard time understanding KMP but you broke it down so well and I get it now. This goes for all the videos I've watch from you, thank you!

Author — Barrel :

Author

Very nice video! it's also interesting to know that we can consider the time complexity as O(n), because if we add an early exit condition for when p is longer than s (where we directly return -1), then we can affirm that m cannot be greater than n, so in the worst case, m would be equal to n, which gives 2n, and by removing the constant, we get a time complexity of O(n)

Author — Inside code

Author

I love the way you reply literally every single comment in this channel, this shows how much you like what you're doing. And indeed we can see it in your eyes while explaining! Thanks so much for bringing content with such quality and love! I don't usually write comments on Youtube but this channel made me stop for some few minutes to write this.

Author — LennethValky

Author

This is amazing. SO clear. Explain every single part of it.

Author — varun rao

Author

This was awesome man! Finally I understood the algo!

Author — Abhishek Sharma

Author

Dude, thanks. That's the most elaborative explanation I came across. Good work!

Author — Supratik Sadhu