Math 304 ML: Binary trees

Due Friday, March 18, deposited in your Blackboard drop box

All the ML code discussed in this page is available here in a text file which can be downloaded and used in the assignment.

A labeled binary tree is one where each node is labeled, and has up to two daughter nodes. Such objects are ubiquitous in computer science. They are used for sorting, searching, parsing code, and many other tasks.

To manipulate such trees, we use a feature of ML, namely the possibility of defining new data types (beyond integer, boolean, etc.). A tree is defined recursively, and the data type of a tree reflects this fact. The line of code is

datatype 'a b_tree = Empty   |   Node of   'a b_tree   *   'a   *   'a b_tree;

Here 'a b_tree indicates a binary tree with nodes labeled by elements of type 'a (called val in ML), i.e. unrestricted here but set in an application, such as integer, or string, or list. Each such tree is empty (the first choice on the right, or (indicated by |) a (non-empty) node that is a triple of another tree (the left subtree), an 'a object (the label), and another tree (the right subtree). That is

  1. The empty tree Empty is a binary tree.
  2. If tree1 and tree2 are binary trees, and lab is a value of type typ, then Node (tree1, lab, tree2) is a binary tree.
  3. Nothing else is a binary tree.

Although we usually do not include empty nodes when we visualize a tree, it is more convenient in ML to have every non-empty node have exactly two daughters, of which one or both may be empty. This is clearly a recursive definition (for which ML is perfect), and all our manipulation of trees will be recursive.

For example, two basic functions of trees are the height (largest level of non-empty nodes in the tree), and size (number of non-empty nodes) (for Moscow ML, you have to first use the command load "Int";):

fun height Empty = 0   (* height of b_tree *)
  | height (Node (lft, _, rht)) = 1 + Int.max (height (lft), height (rht));

fun size Empty = 0   (* size of a tree *)
  | size (Node (lft, _, rht)) = 1 + size lft + size rht;

Understand how these simple functions work over the recursion of the data type. A binary tree is quite analogous to a list (with the empty tree the analogue of the nil list), except two successors are attached at each stage. Recursion over a tree is often called structural recursion.

A standard use for a labeled binary tree is to create an object for searching--for example, a dictionary. Of course, to find a particular word in a dictionary, one could start at the beginning and check every word in order--i.e., treat the dictionary as a list. However, this is inefficient, and no one looks up a word that way. One makes a guess where to open the dictionary, and then moves forward and backward through by comparing the word of interest to the labels in the dictionary. A search tree is set up for precisely this kind of search.

Let us suppose we have a collection of objects (of type val), and a comparison function comp(e.g., less than for integers, or lexicographically preceding for strings) which for any two such objects lab1, lab2, the evaluation comp (lab1,lab2) is either true or false.

Then we can create a search tree where

  1. the nodes are labeled by the lab,
  2. lab' that are less than (by comp) lab lie to the left of lab,
  3. lab' that are greater than (by comp) lab lie to the right of lab.

We search the tree by dropping down to the left or right at each step until we either encounter the label we are looking for, or run out of tree.

There are two basic operations necessary,

  1. creating a tree from a list of labels, and
  2. searching a tree for a particular label.

The basic step for the first is inserting a new node in the proper place. The following code does the job:

fun insert_node (lab:'a, comp:'a*'a->bool, Empty) = Node (Empty, lab, Empty) (*new node*)
| insert_node (lab, comp, Node (lft, lab1, rht)) =
    if comp (lab, lab1) then Node (insert_node (lab, comp, lft), lab1, rht) (* go left *)
    else Node (lft, lab1, insert_node (lab, comp, rht)); (* go right *)

With this as the recursive step, the following code creates a tree from a list of labels:

fun create_tree (nil, comp) = Empty
| create_tree (h::q, comp) = insert_node (h, comp, create_tree (q, comp));

Here is a small list of numbers and a comparison function for testing:

val nums = [~5,1,0,3,~2];
fun lt(n1:int, n2:int) = n1 < n2;

Thus

val t = create_tree (nums, lt);

creates a tree with the numbers in the list ordered ready for searching. Try it and see what a tree looks like. Test out the height and size functions on t.

Assignment, Part 1. Draw a picture of the resulting binary tree, and hand it in (either by hardcopy or via the dropbox). Note the height. (Note: the trees are created from the right; the root is the last item in the list.)

Assignment, Part 2. Construct three ordering functions of a tree t: inorder t, preorder t, postorder t, which take trees to lists, and realize the three orderings of the text (these are short one-liners). In particular, given a list lst, and a comparison function comp of its labels, what does

inorder (create_tree (lst, comp))

do (try it out and/or look at your picture from part 1). Starting hint:

fun inorder Empty = nil
| inorder (Node (lft, lab, rht)) = <you take it from here>

Here is another list of labels. In this case, the labels are 2-tuples of strings, consisting of a word and a definition. We want to put these together in a (short) dictionary. The comparison function looks at the lexicographic ordering of the words. When a comparison function looks at one part of a tuple, often that part is called the key. Thus the word is the key in this case.

val words = [
("new","not old"),
("nu","Greek letter"),
("gnu","a silly animal"),
("New","West Virginia river"),
("knew","used to know"),
("nu","Greek letter"),
("nuu","a nonsense word"),
("gnew","past tense of gnaw"),
("noo","sound of cow with nose cold")];


fun key (word:string, def:string ) = word;

fun compwords (word1:string*string, word2:string*string) = key(word1) < key(word2);

Thus for example

val newdict = create_tree (words, compwords);

creates a search tree of these words.

 

Assignment, part 3. Repeat part 1 for this tree.

Assignment, part 4. Define a function

lookup (t, comp, key, lab)

so that for example

lookup (newdict, comp, key, "gnu")

looks up the word "gnu" and its definition for me (you have to figure out what comp should be). As a test for this part, run lookup a couple of times.

The output from search should be either the proper word and definition, or "Not found" (if, e.g. I searched for "news"). Your search is going to end, respectively, at a non-empty node (with the proper label) or an empty node. We need to define a data type to handle this:

datatype 'a lookup_result = Stringans of string | Ans of 'a;

Thus for example, your definition could begin

fun lookup (Empty, comp, key, lab) = Stringans ("Not found")
| lookup (Node (lft, lab1, rht), comp, key, lab) = <you take it from here>

Assignment, part 5. In the text file of code, there is a list named "europe." Each entry is a 4-tuple, consisting of the name of the county, its area, its population, and its gross national product. Create a comparison function that will create a tree using the area as key.

For more detail on data types, see PSML.