Introduction
============
You have certainly come a long way! That AVL tree was pretty tough.
This time around, we are going to build a parse tree. Really, parse
trees aren't too difficult to build, so long as you have a good set
of pseudocode (from your class notes) and a good understanding of
what trees look like in memory.
In this lab you will be building a parse tree using our btree.h
file. You will not use the AVL tree for this, as this is a
different type of tree. The final product should be able to take in
a mathematical expression in standard notation and compute the
correct answer according to the standard order of operations.
In order to do this, you will need a lexer. The lexer is actually
the hard part of this, so I have provided that for you. Also,
you'll need a healthy dose of tree iterators, which we wrote in
class. I've included those in the guided part as well.
Once you are done with this, you will have the beginnings of a small
compiler! Add in a little code generation, and you're set to write
your own programming language.
So now, with that bit of a pep talk, let's dig in.
The Lexer and Tree Iterators
============================
The first order of business is to create a lexer. A lexer is just a
function or program which decomposes a string into lexical elements.
That way you can reason about things like the order of operations in
the tree, and not worry about how you handle multiple characters in
a number, or decimal points, etc. I have written a very simple
lexer which I want you to copy and try out. This one can decompose
a string into double values and into operators. It also sets the
precedence level according to the order of operations, where the
higher the level, the deeper it goes in the tree.
As you can see, the interface is pretty straightforward. The
LexicalElement structure contains all the information we need to
process the statement one element at a time. The nextElement
function returns the next lexical element from an istream. In our
case, we will be using cin as the istream. The implementation file
for this function follows:
--begin lexer.cpp--
1 #include "lexer.h"
2 #include <cctype>
3 #include <cstdlib>
4 #include <iostream>
5
6 using namespace std;
7
8 //static helper functions
9 static bool isoperator(char c);
10 static void handleOperator(LexicalElement &l);
11 static void handleNumber(LexicalElement &l);
12
13 /* returns the next lexical element in is */
14 LexicalElement nextElement(istream &is) {
15 LexicalElement result; //the resulting element
16 bool done = false; //indicates if we are done
17 bool decimal; //indicates if we have hit a decimal
18 char c; //the current character
19
20 result.type = LexicalElement::UNKNOWN;
21
22 //scan until we've encountered all the characters
23 while(!done) {
24 //peek at the character
25 c = cin.peek();
26
27 //start a type if needed
28 if(result.type == LexicalElement::UNKNOWN) {
29 if(isdigit(c) || c == '.') {
30 result.type = LexicalElement::NUM;
31 decimal = false; //no decimal seen yet!
32 } else if (isoperator(c)) {
33 result.type = LexicalElement::OP;
34 } else if(isspace(c)) {
35 cin.get(); //consume spaces
36 } else {
37 done = true;
38 }
39 }
40
41 //consume characters (if needed)
42 if(result.type == LexicalElement::OP) {
43 if(isoperator(c)) {
44 c = cin.get();
45 result.text += c;
46 } else {
47 done = true;
48 }
49 } else if(result.type == LexicalElement::NUM) {
50 if(c == '.' && !decimal) {
51 //we get just one decimal point!
52 decimal = true;
53 c = cin.get();
54 result.text += c;
55 } else if(isdigit(c)) {
56 c = cin.get();
57 result.text += c;
58 } else {
59 done = true;
60 }
61 }
62 }
63
64 if(result.type==LexicalElement::OP)
65 handleOperator(result);
66 else if(result.type==LexicalElement::NUM)
67 handleNumber(result);
68
69 return result;
70 }
71
72
73 //static helper functions
74 static bool
75 isoperator(char c) {
76 const char *operators = "*/%+-";
77
78 while(*operators) {
79 if(*operators == c)
80 return true;
81 operators++;
82 }
83
84 return false;
85 }
86
87
88 static void
89 handleOperator(LexicalElement &l) {
90 //assign appropriate symbol and precedence
91 switch(l.text[0]) {
92 case '*':
93 l.op = LexicalElement::MUL;
94 l.p = 2;
95 break;
96 case '/':
97 l.op = LexicalElement::DIV;
98 l.p = 2;
99 break;
100 case '%':
101 l.op = LexicalElement::MOD;
102 l.p = 2;
103 break;
104 case '+':
105 l.op = LexicalElement::ADD;
106 l.p = 1;
107 break;
108 case '-':
109 l.p = 1;
110 break;
111 }
112 }
113
114 static void
115 handleNumber(LexicalElement &l) {
116 l.p = 3;
117 l.val = atof(l.text.c_str());
118 }
--end lexer.cpp--
Study this set of functions to get a feel for how the lexer works.
Really it's all about state and consuming the right kinds of
characters. We won't go into a detailed explanation of how it works
here, largely because there should be enough comments in the code to
show you how it does what it does. The final step is to build a
little test application:
--begin lexerTest.cpp--
1 /*
2 * A small test program for the lexer.
3 */
4
5 #include <iostream>
6 #include "lexer.h"
7
8
9 using namespace std;
10
11 int main(void) {
12 LexicalElement l;
13
14 //run until the end of the line
15 while(cin.peek() != '\n') {
16 l = nextElement(cin);
17
18 //print according to the type
19 switch(l.type) {
20 case LexicalElement::NUM:
21 cout << "NUM " << l.val << " P: " << l.p << endl;
22 break;
23 case LexicalElement::OP:
24 cout << "OP " << l.text << ' ';
25
26 //render the op enum
27 switch(l.op) {
28 case LexicalElement::MUL:
29 cout << "MUL";
30 break;
31 case LexicalElement::DIV:
32 cout << "DIV";
33 break;
34 case LexicalElement::MOD:
35 cout << "MOD";
36 break;
37 case LexicalElement::ADD:
38 cout << "ADD";
39 break;
40 case LexicalElement::SUB:
41 cout << "SUB";
42 break;
43 }
44
45 //display p value
46 cout << " P: " << l.p << endl;
47 break;
48 case LexicalElement::UNKNOWN:
49 cout << "UNKOWN" << endl;
50 break;
51 }
52 }
53
54 return 0;
55 }
--end lexerTest.cpp--
This test driver reads in a line of text, and prints out a rendering
of the lexical elements. A couple of sample runs follows:
$ ./lexerTest
2+2
NUM 2 P: 3
OP + ADD P: 1
NUM 2 P: 3
$ ./lexerTest
3+.5 - 2000 + 6 /2.1
NUM 3 P: 3
OP + ADD P: 1
NUM 0.5 P: 3
OP - ADD P: 1
NUM 2000 P: 3
OP + ADD P: 1
NUM 6 P: 3
OP / DIV P: 2
NUM 2.1 P: 3
Study the test driver to make sure you understand how to use the
lexer.
Now all that remains is to do the tree iterators which go with
btree.h. (You should have the btree.h file from lab6. Go ahead and
copy that into your current directory). Here, we just use the
treeItr.h file and the btreeTest.cpp file that we wrote in class.
Here they are, just in case you didn't manage to type them all in.
--begin treeItr.h--
1 #include "btree.h"
2
3
4 //helper functions
5 template<typename T> BinaryTree<T>*
6 leftMost(BinaryTree<T> *t) {
7 //protect against null trees
8 if(!t)
9 return NULL;
10
11 BinaryTree<T> *left = t->getLeft();
12
13 //go left iteratively
14 while(left) {
15 t = left;
16 left = t->getLeft();
17 }
18
19 return t;
20 }
21
22 //and iterator which visits a tree in in-order traversal order.
23 template <typename T>
24 class InOrderIterator {
25 public:
26 //constructor
27 InOrderIterator(BinaryTree<T> *t) {
28 this->t = leftMost(t);
29 }
30
31
32 //prefix iterator
33 InOrderIterator &operator++() {
34 advance(); //go to the next node
35 return *this;
36 }
37
38
39 //postfix iterator
40 InOrderIterator operator++(int) {
41 InOrderIterator itr(*this);
42 advance();
43 return itr;
44 }
45
46
47 //dereference operator
48 T operator*() {
49 return t->getData();
50 }
51
52
53 //return true if we are past the end
54 bool eot() {
55 return !t;
56 }
57
58 private:
59 BinaryTree<T> *t; //current node
60
61 void advance() {
62 //protect against the end
63 if(!t) return;
64
65 if(!t->getRight()) {
66 //go up
67 BinaryTree<T> *p=t;
68 do {
69 t = p;
70 p = t->getParent();
71
72 //if this was the root, we are done
73 if(!p) {
74 t = p;
75 return;
76 }
77 } while(p->getLeft() != t);
78
79 //move up
80 t = p;
81 } else {
82 //go the left most child of the right tree
83 t = leftMost(t->getRight());
84 }
85 }
86 };
87
88
89
90 //and iterator which visits a tree in pre-order traversal order.
91 template <typename T>
92 class PreOrderIterator {
93 public:
94 //constructor
95 PreOrderIterator(BinaryTree<T> *t) {
96 this->t = t;
97 }
98
99
100 //prefix iterator
101 PreOrderIterator &operator++() {
102 advance(); //go to the next node
103 return *this;
104 }
105
106
107 //postfix iterator
108 PreOrderIterator operator++(int) {
109 PreOrderIterator itr(*this);
110 advance();
111 return itr;
112 }
113
114
115 //dereference operator
116 T operator*() {
117 return t->getData();
118 }
119
120
121 //return true if we are past the end
122 bool eot() {
123 return !t;
124 }
125
126 private:
127 BinaryTree<T> *t; //current node
128
129 void advance() {
130 //protect against the end
131 if(!t) return;
132
133 if(t->getLeft()) {
134 t = t->getLeft();
135 } else {
136 //go up until we approach from left and can go right
137 BinaryTree<T> *p = t;
138 do {
139 t = p;
140 p = t->getParent();
141 } while(p && (p->getLeft() != t || !p->getRight()));
142
143 //done! (no more tree)
144 if(!p) {
145 t = p;
146 return;
147 }
148 t = p->getRight();
149 }
150 }
151 };
-- end treeItr.h --
And now, write out this test driver, and give it all a spin to make
sure your lexer and your tree iterators are working.
-- begin btreeTest.cpp --
1 /*
2 * This is a small test driver for our btree. It builds a naive
3 * BST. Behold the glories of degenerative cases!
4 */
5
6 #include <iostream>
7 #include "btree.h"
8 #include "treeItr.h"
9
10 using namespace std;
11
12 //an implementation of the bstInsert algorithm
13 void bstInsert(BinaryTree<int> *t, int data) {
14 if(data <= t->getData()) {
15 //insert left
16 if(!t->getLeft())
17 t->linkLeft(new BinaryTree<int>(data));
18 else
19 bstInsert(t->getLeft(), data);
20 } else {
21 //insert right
22 if(!t->getRight())
23 t->linkRight(new BinaryTree<int>(data));
24 else
25 bstInsert(t->getRight(), data);
26 }
27 }
28
29
30 int main(void) {
31 BinaryTree<int> *t=NULL;
32 int data;
33
34 do {
35 //prompt the user
36 cout << "Enter Number (-1 to exit): ";
37
38 //get the data
39 cin >> data;
40
41 if(data == -1)
42 continue;
43
44 if(!t)
45 t=new BinaryTree<int>(data);
46 else
47 bstInsert(t, data);
48 } while(data != -1);
49
50 //print the tree
51 t->printTree();
52
53 //do an inorder traversal
54 cout << "In order traversal: ";
55 for(InOrderIterator<int> itr(t); !itr.eot(); itr++) {
56 cout << *itr << " ";
57 }
58 cout << endl;
59
60
61 //do an preorder traversal
62 cout << "preorder traversal: ";
63 for(PreOrderIterator<int> itr(t); !itr.eot(); itr++) {
64 cout << *itr << " ";
65 }
66 cout << endl;
67
68 return 0;
69 }
-- end btreeTest.cpp --
Parsing Expressions (Challenge Lab)
===================================
Ok! Enough hand holding. Now is the time for you to write a parse
tree application. You will put together the pieces of the app using
my lexer and the pseudocode you have from class. Your application
should allow the user to enter an expression, and then it will
display the answer with correct order of operations. You don't have
to worry about checking the correctness of what they type. For the
purpose of the lab, you can assume the user will always give you a
correct instruction.
Sample output follows:
./infixCalc
Expression: 2.5 + 2.5
Answer: 5
./infixCalc
Expression: 2.5*1.2 - 5*2
Answer: -7
Hints:
1.) Tree traversal is the key to evaluating the expressions.
That and good function decomposition. For instance, I have a
function called "apply" which takes an operator and 2 numbers
and applies the correct operation. You may want to look at
previous examples and what you get when you do a post order
traversal.
2.) Try printing your parse tree out using the print function
already present in the BinaryTree template. Make sure you
are creating proper trees before you evaluate them.
3.) It is possible to evaluate the trees using only 3 variables.
You will need to figure out which direction you rise from a
child if you want to try to do this.
4.) Just relax. You have 95% of this spelled out in your notes.
You can do this!
Extra Credit
============
Using the lexer and the parsing notes we have from class, it
wouldn't be too hard to take the next step and do error checking.
If your program can spot syntax errors (invalid sequences of
lexemes) then I will give you extra credit. The better it is at
expressing where the error is, the higher your extra credit points
will be!