Homework 3: NFAs And DFAs (26 Points)
Chris Tralie
Overview / Logistics
The purpose of this problem set is to give you practice with NFAs and their interplay with DFAs. At the heart of what we're doing is the idea that if we've done the hard work to design DFAs to recognize some language(s), we can do a little more work to treat them as modules and to combine them in clever ways to make DFAs that recognize other languages. Sometimes an NFA will help as an intermediate step.
For all problems that require you to construct an NFA or a DFA, you should submit a JFLAP file. For all other problems, submit a picture of pen + paper work or a PDF document of a writeup. Enjoy!
Problem 1: The Union of Regular Languages (3 Points)
Design a DFA that recognizes strings that are divisible by 3 OR divisible by 4. Then, design an NFA that also recognizes this language
Below are some tests you can try in JFLAP (Click here to download them)
Decimal Number  Input  Result 
44  101100  Accept 
47  101111  Reject 
64  1000000  Accept 
67  1000011  Reject 
67  1000011  Reject 
9  1001  Accept 
83  1010011  Reject 
21  10101  Accept 
36  100100  Accept 
87  1010111  Accept 
Problem 2: The Intersection of Regular Languages (3 Points)
Prove by construction that regular languages are closed under intersection
Hint: You can do a similar construction to the proof that regular languages are closed under union, but all you need to change is the accept state
Problem 3: NFA Design (3 Points)
Design an NFA that recognizes the language of odd binary numbers that have a 1 in the 32s place, when read from left to right.
Below are some tests you can try in JFLAP (Click here to download them)
Decimal Number  Input  Result 
99  1100011  Accept 
206  11001110  Reject 
239  11101111  Accept 
189  10111101  Accept 
230  11100110  Reject 
118  1110110  Reject 
144  10010000  Reject 
73  1001001  Reject 
8  1000  Reject 
228  11100100  Reject 
Problem 4: The Complement of Regular Languages (3 Points)
We've seen some examples where swapping the accept and nonaccept states of a DFA that recognizes a language L over the alphabet Σ yields a DFA which recognizes the complement of L; that is, all strings made up of characters from Σ that are not in L. For instance, the machine below accepts binary strings that contain 1010 somewhere (Click here for the JFLAP file)
and the machine below accepts binary strings that do not contain 1010 (Click here for the JFLAP file)
Does this same property hold for NFAs? If so, provide a proof in general. If not, show a counterexample (an example where this is not true), along with the inputs that break it.
Problem 5: The Union And Concatenation of Regular Languages (3 Points)
Create an NFA that recognizes the the language of strings with a nonzero and even number of a's or with a nonzero and even number of bs, followed by one or more bs, followed by an odd number of a's or an odd number of b's.
Below are some tests you can try in JFLAP (Click here to download them). If you don't agree on a particular test case, take advantage of JFLAP's NFA branch explorer by trying that test out with Input>Step By State
Input  Result 
aaaab  Reject 
bbaabbb  Accept 
bbaabaa  Reject 
aabbbb  Accept 
ababab  Accept 
abab  Reject 
ababb  Accept 
aaaba  Reject 
bbb  Reject 
bbbb  Accept 
bbbbbbbbbbbbbb  Accept 
aaaaaaaaaaa  Reject 
bbbaabb  Accept 
Let's highlight a couple of the accept strings to show how they get split up
Example 1: ababab
We can split this up as follows
A nonzero and even number of a's or with a nonzero and even number of bs  One or more bs  An odd number of a's or an odd number of b's 
aba  b  ab 
Example 2: ababb
A nonzero and even number of a's or with a nonzero and even number of bs  One or more bs  An odd number of a's or an odd number of b's 
aba  b  b 
Example 3: bbbbbbbb
There are a bunch of ways to split this one up.
A nonzero and even number of a's or with a nonzero and even number of bs  One or more bs  An odd number of a's or an odd number of b's 
bb  bbb  bbb 
bbbb  b  bbb 
bb  bbbbb  b 
Problem 6: The Reverse of Regular Languages Part 1 (3 Points)
Prove using a DFA that if a binary string is divisible by 3, then its reverse is also divisible by 3.
Problem 7: The Reverse of Regular Languages Part 2 (3 Points)
If a binary string is divisible by 6, then its reverse is not necessarily divisible by 6. For instance, the string 1100 (12) is divisible by 6, but its reverse 0011 (3) is not divisible by 6. Construct a DFA that recognize the language of binary strings whose reverse is divisible by 6. These strings should be inputted from right to left.
Hint: First construct and test an NFA that recognizes this language, then convert this to a DFA using the technique we described in class. If you've only expanded the states you needed to expand, the DFA should have even fewer states than the one that recognizes strings divisible by 6!
Below are some tests you can try in JFLAP (Click here to download them)
Input  Decimal Number  Reverse Binary  Reverse Decimal  Result 
1010111  87  1110101  117  Reject 
00100001  33  10000100  132  Accept 
001101  13  101100  44  Reject 
00000011  3  11000000  192  Accept 
0010011  19  1100100  100  Reject 
0110011  51  1100110  102  Accept 
01101111  111  11110110  246  Accept 
11010111  215  11101011  235  Reject 
0010101  21  1010100  84  Accept 
00010001  17  10001000  136  Reject 
Problem 8: DFAs for Substring Search (5 Points)
We're going to examine an application of DFAs that's used all the time behind the scenes in text editors and web browsers. Consider the following problem: given a string pattern
and a string s
, determine if pattern
is a substring (contiguous subset) of s
. One way to do this is to check every possible subset of length len(pattern)
in s
. The code below shows how one might do this in python:
So, for instance, if we run
We should get [14, 20]
(zeroindexed), since the pattern shows up twice at the end. However, our algorithm goes quite slow on this because there are many times where it almost matches the pattern, but then it has to start over. In the worst case, this algorithm can take len(pattern)*len(s)
steps, so we say it's a O(len(pattern)*len(s)) algorithm (we will talk about bigO notation later in the course).
As it turns out, we can construct a DFA for substring that only has to go through each character in the string exactly once, and we report a match every time it reaches an accept state. Hence, the algorithm only takes O(len(s)) time, which can be substantially better in practice if pattern
is long. For instance, consider the following DFA (Click here to download the JFLAP file)
Now let's step through and search for our pattern:
You'll notice that we reached the accept state exactly twice at the end of each instance of pattern
in s
. But how on earth did we come up with this? Here is a fairly straightforward algorithm to do this:
Algorithm 1: Straightforward Substring DFA Creation

Create a state
start
and a statec_i
for each characterc
in the pattern at indexi
. For instance, if the pattern isacdc
, you should have the states
start

a_0

c_1

d_2

c_3


Add state transitions for the matched characters in sequence. For instance, in the
acdc
pattern, we'd have a transition fromstart
toa_0
when we see ana
, a transition froma_0
toc_1
when we see ac
, etc.Also, in a separate loop, add the remaining arrows from start back to itself for all other characters in the alphabet. In
acdc
, assuming an alphabet of{a, b, c, d}
, we'd add the arrows from b, c, and d from start to itself. 
Loop through each character in the pattern in sequence. For each character
c
at indexi
, loop through all charactersa
in the alphabetSigma
.
If an arrow doesn't exist yet at
c_{i}
usinga
,
Then the first
i+1
characters ofpattern
do not match the string where we are ending ata

However, it is possible that the last
i
characters of the string match the firsti
characters of the pattern, or that the lasti1
characters of the pattern match the firsti1
characters of the pattern, etc. .
To figure out how much of the pattern we've actually matched, send
a
concatenated to the lasti
characters of pattern (pattern[1:i+1]+a
in python) through the DFA that you've built so far, and use the state that you end on as the transition out ofc_{i}
through a 
Then the first

If an arrow doesn't exist yet at
Your task
Click here to download the starter code for this task.
Create a python method make_substring_dfa(Sigma, pattern)
that takes in an alphabet as a list in Sigma
and a pattern
string, and which returns a delta
dictionary describing the DFA transitions that match pattern
. For example, if you run the following code:
Hint
The following code will create the strings representing the states for each character c
at index i
Hint 2
If you're having trouble figuring out where to get started, at least see if you can do step 1 in the algorithm, then save the DFA and look at it in JFLAP to see if you have the right states. Then try step 2 and see if you have the arrows for matching characters. Finally, you can tackle the third step, which is the hardest
Extra Credit (+2)
The DFA that you've built is very efficient and can find all instances of the pattern in O(len(s)) time, which is substantially better than the O(len(s)*len(pattern)) algorithm we started with. However, algorithm 1 for constructing the DFA is not very efficient, and is in fact O(len(alphabet)*len(pattern)^{2})
, assuming a constant alphabet size, which is quite bad for long patterns. Luckily, there is a variation that we can do known as the KnuthMorrisPratt algorithm to accomplish DFA construction with all arrows in O(len(alphabet)*len(pattern))
time only. Implement this algorithm to construct your DFA instead of the one described in algorithm 1. Click here to view some notes about this on GeeksForGeeks.
NOTE: In practice, the KMP algorithm doesn't actually store arrows for every character in the alphabet coming out of each state; it only tells you where to jump if there's about to be a mismatch, before you process the next character. This is more efficient to do in practice, because it scales well for large alphabets. What I'm looking for in the extra credit, though, is to create all arrows using a similar trick to what KMP does. In particular, you'll want to track the longest suffix of the string you're searching that's still a prefix of the pattern you're matching, rather than checking the entire suffix minus one character like we've been doing.