How to implement a regular expression matching

Problem description

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*' where:

  • '.' Matches any single character.​​​​
  • '*' Matches zero or more of the preceding element.
Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
Input: s = "aab", p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".
Input: s = "mississippi", p = "mis*is*p*."
Output: false

Solution

For the given strings s and p, in order to know whether p is a regular expression of s, we need to check every character in both strings. And for the symbols in the string p, we need to check whether those symbols are valid. Here we’ll use dynamic programming to keep track of each validation. And we will use a matrix to store the boolean result.

Step 1: matrix initiation

If the length of the string s is n, and the length of the string p is m, we’ll create a matrix with a size (m+1)*(n+1). For the element in the matrix[i][j], it’s representing at the current index i and j, the boolean result of wether p[:j] can represent s[:i] . First, let’s create such a matrix with the input s = “aab”, p = “c*a*b”.

DP initialize
  • case 1: p = “”, s = “” , p can represent s , matrix[0][0] = True
  • case 2: p = “c”, s = “” , p can not represent s , matrix[1][0] = False
  • case 3: p = “c*”, s = “” , since* can repeat preceding character 0 times, p can represent s , matrix[2][0] = True
  • case 4: p = “c*a”, s = “” , p can not represent s , matrix[3][0] = False
  • case 5: p = “c*a*”, s = “” , similar to case 3, p can represent s , matrix[4][0] = True
  • case 6: p = “c*a*b”, s = “” , p can not represent s , matrix[5][0] = False

Step 2: fill out the remaining index

For the remaining matrix[i][j], the boolean value depends on each character in string p. There are two cases we need to consider:

  • if the current character is . , .can represent any single character, matrix[i][j] has the same value with matrix[i-1][j-1]
  • if the current character is *, * can repeat the previous character 0 to n times, matrix[i][j] depends on matrix[i-1][j] or matrix[i][j-2]

Implementation

Following is the implementation in Ruby:

def is_match(s, p)
return false if !s || !p
column = [false] * (p.size + 1)
dp = []
(s.size + 1).times do
dp << Array.new(column)
end

dp[0][0] = true
for i in 1..p.size do
if p[i - 1] == '*'
dp[0][i] = dp[0][i-2]
end
end

for i in 1..s.size do
for j in 1..p.size do
if p[j-1] == s[i-1] || p[j-1] == '.'
dp[i][j] = dp[i-1][j-1]
elsif p[j - 1] == '*'
if p[j - 2] == s[i - 1] || p[j - 2] == '.'
dp[i][j] = dp[i-1][j] || dp[i][j - 2]
else
dp[i][j] = dp[i][j - 2]
end
end
end
end

dp[s.size][p.size]
end

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Wangyy

Wangyy

on the way to become a programmer