Homework 4: Regular Expressions And Nonregular Languages

Chris Tralie

Overview / Logistics

The purpose of this problem set is to give you practice with regular expressions, which are an alternative to describing regular languages. Remarkably, with the just the three operations union, concatenation, and star, we can describe any regular language by building up elements from an alphabet.

Beyond this, you will also explore languages that cannot be described by a regular expression, or nonregular languages. You will prove that some languages described are nonregular.

Finally, you will code up part of the most important application of the theory in the first unit: a regular expression evaluator known as Grep

What to hand in: You should hand in pictures or typed solutions to all problems, showing work where necessary for full credit. For problem 1, you should upload mygrep.py, and for problem 5, you should upload a JFLAP file.

NOTE: For the problems that ask you to build regular expressions, I have provided built-in testers for you with test cases to help you check that your regular expressions are correct. To input your regular expression into the browser, use | for the union, and use no text for the empty string λ. For instance, the regular expression

\[ ab \cup \lambda \cup b \]

would be typed in as ab||b


Problem 1: MyGrep (10 Points)

In this problem, you will complete a program in python to implement the algorithm to test whether a particular string s is part of the language generated by a regular expression in postfix notation, as described in these notes.

Click here to download the starter code for this problem. You will be editing mygrep.py

Your Task

Fill in the method eval_regexp in mygrep.py. You should conform to the naming conventions described in the notes when you name states.

I've already provided code to evaluate NFAs and reduce away λ arrows, as well as code to convert an infix regular expression into a postfix one with shorthand expanded. Your job is to

  1. Construct the delta transition function for an NFA that corresponds to the regular expression
  2. Reduce the λ arrows in the NFA, and evaluate it on the string in question to see whether that string accepts

Debugging Utilities

I've provided a few utilities to help you debug in the file debug.py. First, you can call the method write_dfa_jflap(delta, filename) to output a JFLAP file that you can examine. This will be easier than printing out delta by itself, but you can also do this. Below are a few examples:

Example 1: (A|B)*C

Postfix expression:

A_0 B_1 | * C_2 .

Below is the transition function you should get:

Below is the JFLAP file that write_dfa_jflap will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing

Example 2: ((AB)*C)*B(C|(A*B))

This is the worked example from the notes

Postfix expression:

Postfix A_0 B_1 . * C_2 . * B_3 . C_4 A_5 * B_6 . | .

Below is the transition function you should get:

Below is the JFLAP file that write_dfa_jflap will generate in this case (click here to view the JFLAP file). Note that I've moved the positions of the states around a bit to group things in a more logical way than that method outputs, and I've also annotated smaller NFA fragments to show what the machine is doing

Beyond these individual examples, you can also stress test with a battery of examples by calling the cmp_python method in debug.py to compare all strings in an alphabet up to a certain length with python's default regular expression evaluator to see if they agree. The loop will let you know about any strings that disagree. For example, if you call

And things are working properly, you should see


Problem 2 (3 Points)

Devise a regular expression that describes the language of strings over the alphabet Σ={a, b, c} which have a nonzero even number of as, but which do not contain any cs.

Below is a series of test cases you can apply to debug your expression

Problem 3 (4 Points)

Devise a regular expression to describe the language of binary strings divisible by 5. Note that there are many possible answers, so you should show your work.

Below is a live program you can use to test your regular expression. It will show all of the multiples of 5 up to 4000, along with the result that your regular expression gives. It will also show any erroneous values not divisible by 5 that your regular expression accepts.

Problem 4 (3 Points)

Let Σ = {0, 1, +, =}, and the language ADD = {a=b+c| a, b, c are binary strings and a is the sum of b and c}. Prove that ADD is nonregular.

Note the similarity to homework 2 problem 1.4, which we proved was actually regular, but where we had to define quite a different alphabet and input format so that the machine could process both strings at once. In the version here, a machine would have to "remember" enough about the first string to know what it had to add to the first string by the time it got there. Interesting, eh?

Problem 5 (3 Points)

We showed in class that the language of binary strings with an equal number of 0s and 1s is nonregular. Let's consider the seemingly very similar language of binary strings with an equal number of 01 and 10 substrings. For instance, 010 is in this language because it contains a single 01 and a single 10, but 0101 is not in this language because it contains two 01 substrings and only a single 01 substring. Surprisingly, this language is regular! Prove that this is the case by constructing a DFA in JFLAP to recognize this language.

Below are some tests you can run (Click here to download these tests in a file you can run in JFLAP's multiple run tab)

Input01 Count10 CountResult
100111110010110 100111110010110 (3) 100111110010110 (4) Reject
1000100100010001 1000100100010001 (4) 100010010001000 (4) Accept
11100011111101100 11100011111101100 (2) 1110001111110110 (3) Reject
00111010011 00111010011 (3) 0011101001 (2) Reject
1111001100001110 1111001100001110 (2) 1111001100001110 (3) Reject
0110110100 0110110100 (3) 011011010 (3) Accept
1101101111000001111 1101101111000001111 (3) 110110111100000111 (3) Accept
1100000011111100110 1100000011111100110 (2) 1100000011111100110 (3) Reject
10001111101010 10001111101010 (3) 10001111101010 (4) Reject
011101000010101110000001 011101000010101110000001 (6) 01110100001010111000000 (5) Reject
010011000100001000011001 010011000100001000011001 (6) 01001100010000100001100 (5) Reject
100000111011000 100000111011000 (2) 10000011101100 (3) Reject
10111011110100101 10111011110100101 (5) 1011101111010010 (5) Accept
110110111010101111 110110111010101111 (5) 11011011101010111 (5) Accept
00110001100000111111100 00110001100000111111100 (3) 0011000110000011111110 (3) Accept
0001010100001 0001010100001 (4) 000101010000 (3) Reject
0001011001010 0001011001010 (4) 0001011001010 (4) Accept
01010111010111 01010111010111 (5) 0101011101011 (4) Reject
010000110110011111110000 010000110110011111110000 (4) 01000011011001111111000 (4) Accept
001000001111000101 001000001111000101 (4) 00100000111100010 (3) Reject

Problem 6 (3 Points)

Now we will use an alphabet more similar in spirit to that of homework 2 problem 1.4. Let Σ2 be the alphabet

\[ \Sigma_2 = \left\{ \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{c} 0 \\ 1 \end{array} \right], \left[ \begin{array}{c} 1 \\ 0 \end{array} \right], \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \right\} \]

And let L be the language

\[ \{ s \in \Sigma_2^* | \text{ the top row of s is the reverse of the bottom row of s }\} \]

For instance, the string

\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \in L \]

but the string

\[ \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \notin L \]

is in L. Prove that L is nonregular.