Homework 4: Regular Expressions And Nonregular Languages
Chris Tralie
Overview / Logistics
The purpose of this problem set is to give you practice with regular expressions, which are an alternative to describing regular languages. Remarkably, with the just the three operations union, concatenation, and star, we can describe any regular language by building up elements from an alphabet.
Beyond this, you will also explore languages that cannot be described by a regular expression, or nonregular languages. You will prove that some languages described are nonregular.
Finally, you will code up part of the most important application of the theory in the first unit: a regular expression evaluator known as Grep
What to hand in: You should hand in pictures or typed solutions to all problems, showing work where necessary for full credit. For problem 1, you should upload mygrep.py
, and for problem 5, you should upload a JFLAP file.
NOTE: For the problems that ask you to build regular expressions, I have provided built-in testers for you with test cases to help you check that your regular expressions are correct. To input your regular expression into the browser, use | for the union, and use no text for the empty string λ. For instance, the regular expression
\[ ab \cup \lambda \cup b \]
would be typed in as ab||b
Problem 1: MyGrep (10 Points)
In this problem, you will complete a program in python to implement the algorithm to test whether a particular string s is part of the language generated by a regular expression in postfix notation, as described in these notes.
Click here to download the starter code for this problem. You will be editing mygrep.py
Your Task
Fill in the method eval_regexp
in mygrep.py
. You should conform to the naming conventions described in the notes when you name states.
I've already provided code to evaluate NFAs and reduce away λ arrows, as well as code to convert an infix regular expression into a postfix one with shorthand expanded. Your job is to
-
Construct the
delta
transition function for an NFA that corresponds to the regular expression - Reduce the λ arrows in the NFA, and evaluate it on the string in question to see whether that string accepts
Debugging Utilities
I've provided a few utilities to help you debug in the file debug.py
. First, you can call the method write_dfa_jflap(delta, filename)
to output a JFLAP file that you can examine. This will be easier than printing out delta
by itself, but you can also do this. Below are a few examples:
Example 1: (A|B)*C
Postfix expression:
A_0 B_1 | * C_2 .
Below is the transition function you should get:
Below is the JFLAP file that write_dfa_jflap
will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing
Example 2: ((AB)*C)*B(C|(A*B))
This is the worked example from the notes
Postfix expression:
Postfix A_0 B_1 . * C_2 . * B_3 . C_4 A_5 * B_6 . | .
Below is the transition function you should get:
Below is the JFLAP file that write_dfa_jflap
will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing
Beyond these individual examples, you can also stress test with a battery of examples by calling the cmp_python
method in debug.py
to compare all strings in an alphabet up to a certain length with python's default regular expression evaluator to see if they agree. The loop will let you know about any strings that disagree. For example, if you call
And things are working properly, you should see
Problem 2 (3 Points)
Devise a regular expression that describes the language of strings over the alphabet Σ={a, b, c} which have a nonzero even number of as, but which do not contain any cs.
Below is a series of test cases you can apply to debug your expression
Problem 3 (4 Points)
Devise a regular expression to describe the language of binary strings divisible by 5. Note that there are many possible answers, so you should show your work.
Below is a live program you can use to test your regular expression. It will show all of the multiples of 5 up to 4000, along with the result that your regular expression gives. It will also show any erroneous values not divisible by 5 that your regular expression accepts.
Problem 4 (3 Points)
Let Σ = {0, 1, +, =}, and the language ADD = {a=b+c| a, b, c are binary strings and a is the sum of b and c}. Prove that ADD is nonregular.
Note the similarity to homework 2 problem 1.4, which we proved was actually regular, but where we had to define quite a different alphabet and input format so that the machine could process both strings at once. In the version here, a machine would have to "remember" enough about the first string to know what it had to add to the first string by the time it got there. Interesting, eh?
Problem 5 (3 Points)
We showed in class that the language of binary strings with an equal number of 0s and 1s is nonregular. Let's consider the seemingly very similar language of binary strings with an equal number of 01 and 10 substrings. For instance, 010 is in this language because it contains a single 01 and a single 10, but 0101 is not in this language because it contains two 01 substrings and only a single 01 substring. Surprisingly, this language is regular! Prove that this is the case by constructing a DFA in JFLAP to recognize this language.
Below are some tests you can run (Click here to download these tests in a file you can run in JFLAP's multiple run tab)
Input | 01 Count | 10 Count | Result |
100111110010110 | 100111110010110 (3) | 100111110010110 (4) | Reject |
1000100100010001 | 1000100100010001 (4) | 100010010001000 (4) | Accept |
11100011111101100 | 11100011111101100 (2) | 1110001111110110 (3) | Reject |
00111010011 | 00111010011 (3) | 0011101001 (2) | Reject |
1111001100001110 | 1111001100001110 (2) | 1111001100001110 (3) | Reject |
0110110100 | 0110110100 (3) | 011011010 (3) | Accept |
1101101111000001111 | 1101101111000001111 (3) | 110110111100000111 (3) | Accept |
1100000011111100110 | 1100000011111100110 (2) | 1100000011111100110 (3) | Reject |
10001111101010 | 10001111101010 (3) | 10001111101010 (4) | Reject |
011101000010101110000001 | 011101000010101110000001 (6) | 01110100001010111000000 (5) | Reject |
010011000100001000011001 | 010011000100001000011001 (6) | 01001100010000100001100 (5) | Reject |
100000111011000 | 100000111011000 (2) | 10000011101100 (3) | Reject |
10111011110100101 | 10111011110100101 (5) | 1011101111010010 (5) | Accept |
110110111010101111 | 110110111010101111 (5) | 11011011101010111 (5) | Accept |
00110001100000111111100 | 00110001100000111111100 (3) | 0011000110000011111110 (3) | Accept |
0001010100001 | 0001010100001 (4) | 000101010000 (3) | Reject |
0001011001010 | 0001011001010 (4) | 0001011001010 (4) | Accept |
01010111010111 | 01010111010111 (5) | 0101011101011 (4) | Reject |
010000110110011111110000 | 010000110110011111110000 (4) | 01000011011001111111000 (4) | Accept |
001000001111000101 | 001000001111000101 (4) | 00100000111100010 (3) | Reject |
Problem 6 (3 Points)
Now we will use an alphabet more similar in spirit to that of homework 2 problem 1.4. Let Σ2 be the alphabet
\[ \Sigma_2 = \left\{ \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{c} 0 \\ 1 \end{array} \right], \left[ \begin{array}{c} 1 \\ 0 \end{array} \right], \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \right\} \]
And let L be the language
\[ \{ s \in \Sigma_2^* | \text{ the top row of s is the reverse of the bottom row of s }\} \]
For instance, the string
\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \in L \]
but the string
\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \notin L \]
is in L. Prove that L is nonregular.