HTTP/1.1 200 OK Date: Sat, 18 Aug 2001 00:08:23 GMT Server: Apache/1.3.19 (Unix) (Red-Hat/Linux) mod_ssl/2.8.1 OpenSSL/0.9.6 PHP/4.0.4pl1 mod_perl/1.24_01 Last-Modified: Fri, 15 Jun 2001 13:46:23 GMT ETag: "f6004-1bd8-3b2a11af" Accept-Ranges: bytes Content-Length: 7128 Connection: close Content-Type: text/html
Sometime back I wanted a simple algebraic to RPN (Reverse Polish Notaion) conversion program. My goal was to parse an equation and execute the parsed results. I stripped most of the details related to the original program so that the parser would not be made any more complicated than it already is. I chose C++ because my older C and pascal versions were real hack jobs- you know, like the ones you typically see.
Before diving too deeply, let's go through a couple of typical scenarios. Most of us have been taught standard algebraic notation for equations. We've been taught that multiplication has precedence over addition, etc. So when given an expression like
a + b * c
we automatically know that b and c are multiplied and then a is added. However, if you entered this equation on a 4 - function calculator (like the ones sales people give you) then you would get the result
(a+b)*c
(the same answer the sales person would have given you if you're brave enough to ask them). On an HP calculator (or any RPN calculator) you would enter
a b + c *
Here are a few examples of some (slightly) more complicated expressions:
Algebraic | RPN |
-1 + 5 | 1 -(unary) 5 + |
cos(2*pi) | 2 pi * cos |
(x +y)^(3*z) | x y + 3 z * ^ |
In these expressions, we can classify the individual elements as either operands or operators. The operands are the numbers and the variables, while the operators perform some function on one or more of operands. In RPN, the operands are said to be pushed onto a stack. While the operators perform some function to the operands that are currently on the stack and possible placing another element on to the stack.
Now that we have a basic understanding of the RPN and Albraic notations, we can proceed to the program. In C++, the class definitions are the most important part of the design. For this project, the structure is as follows:
token - This is a virtual base class that all other objects are derived from.
The next level of classes:
null_token - Used as an "end of list" marker
function - A base class for functions like sin() and cos().
BOPH - High priority binary operator like * and /.
BOPL - Low priority binary operator like + and -.
UOP - Unary operator like unary minus.
FOP - Floating operand i.e. a constant.
variable - A variable operand like x.
open_parenthesis - A special token for an open parenthesis like '(' or '['.
closed_parenthesis - A special token a closed parenthesis like ')' or ']'.
The final level of classes is application dependent. These are used to define special functions like the trigonometric functions, the binary operators like + - * /, variables, and etc. It's entirely possible (by design) to change these application dependent objects for other applications. For example, so far we have assumed that the expressions to be parsed are only algebraic, yet they could just as well be boolean expressions.
The parsing program is easily broken down into the following steps. (Remember, I said I made it simple). The programming effort has been mainly in the parsing algorithm. The other sections of code are intended to primarily illustrate how to use the parsing algorithm. A real program would have these replaced with code more appropriate to its application.
A couple of calls are made to initialize the user's variable list and function list.
The Input Stream is a character string containing the expression that is to be parsed. One would certainly replace this with a C++ iostream in a real application.
The function tokenize converts the character string into an array of tokens (the_array). It is here where you'll find some "ad-hocness". The current implementation assumes the character string to be an algebraic expression. Consequently, it searches for algebraic operators such as + and -. Fortunately, it does not explicitely search for functions and variables. It implicitly searches for them by comparing substrings in the input string to strings the user defines in their variable and function lists.
The function parse takes the tokenized array (the_array) and creates a parsed array (the_stack). The function is deceptively simple:
PS_state parse(PS_state state)
{
do{
state = the_array[t1++]->a_search(state);
} while((state!=PS_search_end)&&(state!=PS_search_closed));
return(state);}
The real work is done by the objects via the virtual member function a_search. Depending on the parsing state, a_search may or may not be a recursive call. For example, if the current object being parsed is an open_parenthesis then a_search will be a recursive call.
Once the expression has been parsed, it is ready to be evaluated. Again, this step is deceptively simple.
for(i=0; i<stack_index; i++)
the_stack[i]->execute();
The virtual member function execute does all of the work. In essence, execute acts like an RPN calculator. It takes the parsed tokens from the_stack and evaluates them according to their type: operands are pushed onto the RPN type stack (called numeric_stack in the program) and operators pop one or more objects from the RPN stack operate on them and push the result back onto the stack. Upon completion of the for loop, the 0'th entry in numeric_stack contains the final answer.
When I get time, I'll fill in some of the details on how the state variables are used. If you have looked at the program and are totally confused, drop me an e-mail.
If you have any comments, suggestions, or questions please e-mail me. If you find any bugs, please let me know.