Homework 4: Regular Expressions And Nonregular Languages
Chris Tralie
Overview / Logistics
The purpose of this problem set is to give you practice with regular expressions, which are an alternative to describing regular languages. Remarkably, with the just the three operations union, concatenation, and star, we can describe any regular language by building up elements from an alphabet.
Beyond this, you will also explore languages that cannot be described by a regular expression, or nonregular languages. You will prove that some languages described are nonregular.
Finally, you will code up part of the most important application of the theory in the first unit: a regular expression evaluator known as Grep
What to hand in: You should hand in pictures or typed solutions to all problems, showing work where necessary for full credit. For problem 1, you should upload mygrep.py
, and for problem 5, you should upload a JFLAP file.
NOTE: For the problems that ask you to build regular expressions, I have provided builtin testers for you with test cases to help you check that your regular expressions are correct. To input your regular expression into the browser, use  for the union, and use no text for the empty string λ. For instance, the regular expression
\[ ab \cup \lambda \cup b \]
would be typed in as abb
Problem 1: MyGrep (10 Points)
In this problem, you will complete a program in python to implement the algorithm to test whether a particular string s is part of the language generated by a regular expression in postfix notation, as described in these notes.
Click here to download the starter code for this problem. You will be editing mygrep.py
Your Task
Fill in the method eval_regexp
in mygrep.py
. You should conform to the naming conventions described in the notes when you name states.
I've already provided code to evaluate NFAs and reduce away λ arrows, as well as code to convert an infix regular expression into a postfix one with shorthand expanded. Your job is to

Construct the
delta
transition function for an NFA that corresponds to the regular expression  Reduce the λ arrows in the NFA, and evaluate it on the string in question to see whether that string accepts
Debugging Utilities
I've provided a few utilities to help you debug in the file debug.py
. First, you can call the method write_dfa_jflap(delta, filename)
to output a JFLAP file that you can examine. This will be easier than printing out delta
by itself, but you can also do this. Below are a few examples:
Example 1: (AB)*C
Postfix expression:
A_0 B_1  * C_2 .
Below is the transition function you should get:
Below is the JFLAP file that write_dfa_jflap
will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing
Example 2: ((AB)*C)*B(C(A*B))
This is the worked example from the notes
Postfix expression:
Postfix A_0 B_1 . * C_2 . * B_3 . C_4 A_5 * B_6 .  .
Below is the transition function you should get:
Below is the JFLAP file that write_dfa_jflap
will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing
Beyond these individual examples, you can also stress test with a battery of examples by calling the cmp_python
method in debug.py
to compare all strings in an alphabet up to a certain length with python's default regular expression evaluator to see if they agree. The loop will let you know about any strings that disagree. For example, if you call
And things are working properly, you should see
Problem 2 (3 Points)
Devise a regular expression that describes the language of strings over the alphabet Σ={a, b, c} which have a nonzero even number of as, but which do not contain any cs.
Below is a series of test cases you can apply to debug your expression
Problem 3 (4 Points)
Devise a regular expression to describe the language of binary strings divisible by 5. Note that there are many possible answers, so you should show your work.
Below is a live program you can use to test your regular expression. It will show all of the multiples of 5 up to 4000, along with the result that your regular expression gives. It will also show any erroneous values not divisible by 5 that your regular expression accepts.
Problem 4 (3 Points)
Let Σ = {0, 1, +, =}, and the language ADD = {a=b+c a, b, c are binary strings and a is the sum of b and c}. Prove that ADD is nonregular.
Note the similarity to homework 2 problem 1.4, which we proved was actually regular, but where we had to define quite a different alphabet and input format so that the machine could process both strings at once. In the version here, a machine would have to "remember" enough about the first string to know what it had to add to the first string by the time it got there. Interesting, eh?
Problem 5 (3 Points)
We showed in class that the language of binary strings with an equal number of 0s and 1s is nonregular. Let's consider the seemingly very similar language of binary strings with an equal number of 01 and 10 substrings. For instance, 010 is in this language because it contains a single 01 and a single 10, but 0101 is not in this language because it contains two 01 substrings and only a single 01 substring. Surprisingly, this language is regular! Prove that this is the case by constructing a DFA in JFLAP to recognize this language.
Below are some tests you can run (Click here to download these tests in a file you can run in JFLAP's multiple run tab)
Input  01 Count  10 Count  Result 
100111110010110  100111110010110 (3)  100111110010110 (4)  Reject 
1000100100010001  1000100100010001 (4)  100010010001000 (4)  Accept 
11100011111101100  11100011111101100 (2)  1110001111110110 (3)  Reject 
00111010011  00111010011 (3)  0011101001 (2)  Reject 
1111001100001110  1111001100001110 (2)  1111001100001110 (3)  Reject 
0110110100  0110110100 (3)  011011010 (3)  Accept 
1101101111000001111  1101101111000001111 (3)  110110111100000111 (3)  Accept 
1100000011111100110  1100000011111100110 (2)  1100000011111100110 (3)  Reject 
10001111101010  10001111101010 (3)  10001111101010 (4)  Reject 
011101000010101110000001  011101000010101110000001 (6)  01110100001010111000000 (5)  Reject 
010011000100001000011001  010011000100001000011001 (6)  01001100010000100001100 (5)  Reject 
100000111011000  100000111011000 (2)  10000011101100 (3)  Reject 
10111011110100101  10111011110100101 (5)  1011101111010010 (5)  Accept 
110110111010101111  110110111010101111 (5)  11011011101010111 (5)  Accept 
00110001100000111111100  00110001100000111111100 (3)  0011000110000011111110 (3)  Accept 
0001010100001  0001010100001 (4)  000101010000 (3)  Reject 
0001011001010  0001011001010 (4)  0001011001010 (4)  Accept 
01010111010111  01010111010111 (5)  0101011101011 (4)  Reject 
010000110110011111110000  010000110110011111110000 (4)  01000011011001111111000 (4)  Accept 
001000001111000101  001000001111000101 (4)  00100000111100010 (3)  Reject 
Problem 6 (3 Points)
Now we will use an alphabet more similar in spirit to that of homework 2 problem 1.4. Let Σ_{2} be the alphabet
\[ \Sigma_2 = \left\{ \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{c} 0 \\ 1 \end{array} \right], \left[ \begin{array}{c} 1 \\ 0 \end{array} \right], \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \right\} \]
And let L be the language
\[ \{ s \in \Sigma_2^*  \text{ the top row of s is the reverse of the bottom row of s }\} \]
For instance, the string
\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \in L \]
but the string
\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \notin L \]
is in L. Prove that L is nonregular.