Archive for March, 2010

Parse and Clean Query Strings with PHP

Tuesday, March 23rd, 2010

I wrote a quick, useful function in PHP to parse a query string, strip out dirty key(s), and return the cleaned query string.

The function expects two parameters

  1. $dirty – a string, or an array of strings. These are the names of the variables that will be stripped out.
  2. $qs – OPTIONAL – The source to use as the query string. If not provided, the $_SERVER['QUERY_STRING'] variable is used

The function returns a string in the format

var1=value1&var2=value2…

function clean_query_string($dirty, $qs = false) {
    //default to the server query string if none is provided
    $qs = ($qs) ? $qs : $_SERVER['QUERY_STRING'];

    // parse the query string as an array, writing it
    // down as a string as we go
    $clean = "";
    foreach(explode("&", $qs) as $var) {
        $t = explode("=", $var);
        if( !in_array($t[0], (array)$dirty) ) {
            $clean .= "{$t[0]}={$t[1]}&";
        }
    }

    //trim the trailing ampersand
    return trim($clean, "&");
}

Printing binary trees to stdout in C++

Wednesday, March 17th, 2010

Have you ever found yourself playing with binary trees, be it for fun, a class, or a real project/work? A class I am enrolled in recently introduced the concept of binary trees, and to help myself confirm my code behaved well, I thought it would be good if I could visually represent my binary trees. There is a simple problem with printing binary trees to stdout. The problem is that you can only print row-by-row. So you have to print all nodes at depth four in sequence, however the nodes won’t all share a common immediate parent.

I solved the problem, by implementing a method to traverse the tree to each specific node. I liked the solution, so I thought I would share it.

I won’t go into the code for creating the tree, but let’s assume our tree is made of nodes that look like this:

 struct node {
    int data;
    node* parent;
    node* left;
    node* right;
 };

The very first problem that you will face is: Where do I place the first node? You’ll already know that there is only going to be one node on the first row, so the first row is easy, right? Well, it’s not quite that simple. You have to know how far out to place the node. In order to have the node centered, you have to know how many nodes will be on the bottom row!

So, I implemented a beautifully simple function to count how many rows the tree has. It looks like this:


int num_rows(node* parent) {
   int num_left = 0;
   int num_right = 0;

   if(parent->left != NULL) {
      num_left = num_rows(parent->left);
   }
   if(parent->right != NULL) {
      num_right = num_rows(parent->right);
   }

   if(num_left > num_right) {
      return 1 + num_left;
   } else {
      return 1 + num_right;
   }
}

Basically the method recursively counts how many children its children have, and adds one to the total (to include itself in the sum). Now we know our tree has X number of rows. Our next task is to compute how many nodes are on the bottomrow. This is very easy to do. Since the tree is binary, it grows by powers of 2. The first row has 2^(1-0) = 2^0 = 1 nodes. The second row has 2^(2-1) = 2^1 = 2 nodes. The third row has 2^(3-1) = 2^2 = 4 nodes, etc… So, if our tree has 6 rows, we know the 6th and final row has 32 nodes (2^5).

Now we’re ready to tackle the next problem for placing the first node, which is getting the spacing exact. If we just simply place a node, and have its children always the same distance, down and left/right, we very quickly run into an overlapping problem (marked in purple):

Poorly aligned trees overlap nodes

To overcome this, we now have to figure out our spacing from the bottom up. I place the nodes on the bottom row evenly. The space between the nodes is equal to the width of the space allotted for each node, which must be defined. If you allot no space for the printing of the node, the spacing will ruined. I predefine the width of the individual nodes with a constant, and including the iomanip library to control whitespace.

#define NODE_WIDTH 2
...
cout << setw(NODE_WIDTH) << current_node->data;

Draw out the tree from the bottom up, and the pattern should become clear pretty quickly. Here is an example tree with three rows, properly spaced:

Well aligned trees allow plenty of space for each node

The pattern becomes more and more clear when more rows are added.

Well aligned binary tree with four rows

Adding a grid makes it easier to see where the node-widths are located.

Overlaying a grid onto the binary tree makes the spacing obvious

From this, I see that the spacing on the bottom row is: NODE, 1 NODE-WIDTH, NODE, 1 NODE-WIDTH, etc.. The next row up is: 1 NODE-WIDTH, NODE, 1 NODE-WIDTH, 1 NODE-WIDTH, 1 NODE-WIDTH, NODE, etc… The next row up is: 3 NODE-WIDHTS, NODE, 3 NODE-WIDTHS, 1 NODE-WIDTH, 3 NODE-WIDTHS, NODE.

The pattern that should be striking your eye is 0, 1, 3, 7. Now, where do these numbers come from? You should see that it is:

0 = (2^0) – 1
1 = (2^1) – 1
3 = (2^2) – 1
7 = (2^3) – 1

Where {0,1,2,3} are equal to num_rows(parent) – current_row. So, if we have a tree of size rows, the magic number for the 4th row is: (2^(6-4)) – 1 = (2^2) – 1 = 3.

We finally know where to place the first node. We will make (2^(NUM_ROWS-1)) – 1 NODE-WIDTHs, and print our node’s value.

The next great challenge is finding which node to print next. As a human, I know I will print:
parent,
parent->left, parent->right,
parent->left->left, parent->left->right, parent->right->left, parent->right->right.

Yikes, that is clearly going to be too messy for a computer. So how do we know which node to print next? I have a quite simple method for this. I give each node an imaginary index, based on its location in the tree:

Give the nodes incrementing indices so their locations can be predicted

I know where each index is located, because I know which range of numbers will fall on each row. It is just a binary search to find the right index. Row 1 starts with index 1, and uses up 1 number. Row 2 starts with index 2, and uses up 2 numbers (2 & 3). Row 3 starts with 4, and uses up 4 numbers (4, 5, 6, & 7). Row 4 starts with 8 and uses up 8 numbers… etc. Pretty simple, right?

I wrote a recursive function to seek out a requested node, given a few calculations. The function looks like this:


node* seek(node* parent, int index, int a, int b, int currRow, int seekRow) {

   if(parent == NULL) {
	   return NULL;
   }

   int c = floor( (a+b) / 2 );

   node* currNode = NULL;
   if(index <= c) {
      currNode = parent->left;
      b = c;
   } else {
      currNode = parent->right;
      a = c+1;
   }

   if(seekRow == 1) {
      return parent;
   } else if(currRow == (seekRow-1)) {
      if(currNode == NULL) {
         return NULL;
      } else {
         return currNode;
      }
   } else {
      return seek(currNode, index, a, b, currRow+1, seekRow);
   }
}

For example, say we are seeking out the node at index #10. I know 10 is on the 4th row (because 10 falls between 2^(4-1) and 2^(5-1)). I know the 4th row starts at 8, and ends at 15 ( 2^(4-1), 2^(4-1) + 2^(4-1) – 1). So I pass the seek function these parameters:

node* found = seek(parent, 10, 8, 15, 1, 4);

First iteration, the function looks at the index and the range 8-15. If 10 falls in the lower half of the range, it traverses left 1 node, if 10 falls in the upper half of the range, it traverses right 1 node. Since currRow = 1 and seekRow = 4, the function will recursively call itself with updated parameters.

return seek(currNode, 10, 8, 11, 2, 4);

The 15 turns into an 11, because the upper-bound of the range was eliminated (via binary search). This process continues until the search pinpoints the sought after node index, or reaches a NULL node on the way.

We now have everything we need to know to print out our binary tree. The final coding phase is simple. This requires the cmath and iomanip libraries.


//assume parent is the root node a pre-built tree

   int total_rows = num_rows(parent);
   int current_index = 0;

   //loop through, print each row one-by-one
   for(int i = 1; i <= total_rows; i++) {
      int num_nodes_on_row = pow(2, (i-1));
      int num_spaces = pow(2, (total_rows-i)) - 1;

      //loop through and print each node of the row
      for(int q = 1; q <= num_nodes_on_row; q++) {
         if(q == 1) {
            //special rules for the first node of the row
            //spacer size is half what it SHOULD BE .. the other half
            //falls after the LAST node on the row

            //spacer function simply prints X number of spaces...
            spacer(num_spaces*NODE_WIDTH);
         } else {
            spacer( (num_spaces*NODE_WIDTH) +
                             NODE_WIDTH + (num_spaces*NODE_WIDTH));
         }
         current_index++;
         node* val = seek( parent, current_index,
                                num_nodes_on_row, (num_nodes_on_row*2)-1,
                                1, i);

         if(val == NULL) {
            cout << setw(NODE_WIDTH) << "N";
         } else {
            cout << setw(NODE_WIDTH) << val->data;
         }
      }
      //end of the row, now print newline char
      cout << endl;
   }

And there you have it. You should get something beautiful, like this:

              58
      32              73
   7      49      59      79
 4  23   N   N   N   N  73   N

Feel free to spend some time getting your code to print lines connecting the nodes. I added the feature to mine, it looks like this:

              70
        _____/  \_____
      55              86
     /  \            /  \
  46      60      74      99
 /  \    /  \    /  \    /  \
26  49   N   N   N   N  98   N

Let me know what you think, or if you find bugs in my code!

Edit (08/14/2010):


I went back and found the source code. I'm not sure what happened to my FINAL version. The one I'm uploading is a little lacking in documentation, but it is at least 5 months old! Yikes. I tried to clean it up some though.

tree.cpp (12k)


Variable variables in PHP

Tuesday, March 2nd, 2010

Well, after a couple month hiatus, I guess it is time to throw up another post. I was up working on a project, and came across a completely valid usage of variable variables, and felt it might be worth making a post, because it is kind of fun.

At first glance, “variable variables” seemed pretty pointless to me. Surely they had a place somewhere, but I guess that’s just not how I write code. The best use I could think of was a pointless, but stupid-fun script, like this:

$a = "hello world";
$b = "a";
$c = "b";
$d = "c";
$e = "d";
$f = $$$$$e;

Perhaps, for this example I’m about to use, there is a better way of accomplishing the task (I can think of other methods, but can argue which is “better”). My task was to feed a function the numerical representation of a month, and spit back the textual representation, adhering to preferred formatting. So I wrote a function that has a couple hard-coded lookup arrays, named after the $format parameter. The function is pretty well self-explanatory:

function getMonth($i, $format = 'F') {
    $F = array(
        1 => "January",
        2 => "February",
        3 => "March",
        4 => "April",
        5 => "May",
        6 => "June",
        7 => "July",
        8 => "August",
        9 => "September",
       10 => "October",
       11 => "November",
       12 => "December",
    );

    $M = array(
        1 => "Jan",
        2 => "Feb",
        3 => "Mar",
        4 => "Apr",
        5 => "May",
        6 => "Jun",
        7 => "Jul",
        8 => "Aug",
        9 => "Sep",
       10 => "Oct",
       11 => "Nov",
       12 => "Dec",
    );

    $format = strtoupper($format);

    if($format != 'F' && $format != 'M') {
        $format = 'F';
    }

    if($i > 12 || $i < 1) {
        return false;
    }

    $source = $$format;
    return $source[$i];
}