New focus: nothing

So, I’ve decided to ditch the whole ‘Look at me, I’m a web developer: here is some useless PHP code!’ theme that I had going on with my blog. Quite a few reasons. First, it was lame. Second, nobody really wants to read that. Third, I struggled to think of things to write about. More complex examples would just take too much time; I’d probably lose focus before completing anything (If only you could see my ‘Drafts’ folder).

Instead, from now on, I will probably make some more pointless posts that nobody wants to read, but I will hopefully enjoy more. After all, this blog really is just about me, right?

Parse and Clean Query Strings with PHP

I wrote a quick, useful function in PHP to parse a query string, strip out dirty key(s), and return the cleaned query string.

The function expects two parameters

  1. $dirty – a string, or an array of strings. These are the names of the variables that will be stripped out.
  2. $qs – OPTIONAL – The source to use as the query string. If not provided, the $_SERVER['QUERY_STRING'] variable is used

The function returns a string in the format

var1=value1&var2=value2…

function clean_query_string($dirty, $qs = false) {
    //default to the server query string if none is provided
    $qs = ($qs) ? $qs : $_SERVER['QUERY_STRING'];

    // parse the query string as an array, writing it
    // down as a string as we go
    $clean = "";
    foreach(explode("&", $qs) as $var) {
        $t = explode("=", $var);
        if( !in_array($t[0], (array)$dirty) ) {
            $clean .= "{$t[0]}={$t[1]}&";
        }
    }

    //trim the trailing ampersand
    return trim($clean, "&");
}

Printing binary trees to stdout in C++

Have you ever found yourself playing with binary trees, be it for fun, a class, or a real project/work? A class I am enrolled in recently introduced the concept of binary trees, and to help myself confirm my code behaved well, I thought it would be good if I could visually represent my binary trees. There is a simple problem with printing binary trees to stdout. The problem is that you can only print row-by-row. So you have to print all nodes at depth four in sequence, however the nodes won’t all share a common immediate parent.

I solved the problem, by implementing a method to traverse the tree to each specific node. I liked the solution, so I thought I would share it.

I won’t go into the code for creating the tree, but let’s assume our tree is made of nodes that look like this:

 struct node {
    int data;
    node* parent;
    node* left;
    node* right;
 };

The very first problem that you will face is: Where do I place the first node? You’ll already know that there is only going to be one node on the first row, so the first row is easy, right? Well, it’s not quite that simple. You have to know how far out to place the node. In order to have the node centered, you have to know how many nodes will be on the bottom row!

So, I implemented a beautifully simple function to count how many rows the tree has. It looks like this:


int num_rows(node* parent) {
   int num_left = 0;
   int num_right = 0;

   if(parent->left != NULL) {
      num_left = num_rows(parent->left);
   }
   if(parent->right != NULL) {
      num_right = num_rows(parent->right);
   }

   if(num_left > num_right) {
      return 1 + num_left;
   } else {
      return 1 + num_right;
   }
}

Basically the method recursively counts how many children its children have, and adds one to the total (to include itself in the sum). Now we know our tree has X number of rows. Our next task is to compute how many nodes are on the bottomrow. This is very easy to do. Since the tree is binary, it grows by powers of 2. The first row has 2^(1-0) = 2^0 = 1 nodes. The second row has 2^(2-1) = 2^1 = 2 nodes. The third row has 2^(3-1) = 2^2 = 4 nodes, etc… So, if our tree has 6 rows, we know the 6th and final row has 32 nodes (2^5).

Now we’re ready to tackle the next problem for placing the first node, which is getting the spacing exact. If we just simply place a node, and have its children always the same distance, down and left/right, we very quickly run into an overlapping problem (marked in purple):

Poorly aligned trees overlap nodes

To overcome this, we now have to figure out our spacing from the bottom up. I place the nodes on the bottom row evenly. The space between the nodes is equal to the width of the space allotted for each node, which must be defined. If you allot no space for the printing of the node, the spacing will ruined. I predefine the width of the individual nodes with a constant, and including the iomanip library to control whitespace.

#define NODE_WIDTH 2
...
cout << setw(NODE_WIDTH) << current_node->data;

Draw out the tree from the bottom up, and the pattern should become clear pretty quickly. Here is an example tree with three rows, properly spaced:

Well aligned trees allow plenty of space for each node

The pattern becomes more and more clear when more rows are added.

Well aligned binary tree with four rows

Adding a grid makes it easier to see where the node-widths are located.

Overlaying a grid onto the binary tree makes the spacing obvious

From this, I see that the spacing on the bottom row is: NODE, 1 NODE-WIDTH, NODE, 1 NODE-WIDTH, etc.. The next row up is: 1 NODE-WIDTH, NODE, 1 NODE-WIDTH, 1 NODE-WIDTH, 1 NODE-WIDTH, NODE, etc… The next row up is: 3 NODE-WIDHTS, NODE, 3 NODE-WIDTHS, 1 NODE-WIDTH, 3 NODE-WIDTHS, NODE.

The pattern that should be striking your eye is 0, 1, 3, 7. Now, where do these numbers come from? You should see that it is:

0 = (2^0) – 1
1 = (2^1) – 1
3 = (2^2) – 1
7 = (2^3) – 1

Where {0,1,2,3} are equal to num_rows(parent) – current_row. So, if we have a tree of size rows, the magic number for the 4th row is: (2^(6-4)) – 1 = (2^2) – 1 = 3.

We finally know where to place the first node. We will make (2^(NUM_ROWS-1)) – 1 NODE-WIDTHs, and print our node’s value.

The next great challenge is finding which node to print next. As a human, I know I will print:
parent,
parent->left, parent->right,
parent->left->left, parent->left->right, parent->right->left, parent->right->right.

Yikes, that is clearly going to be too messy for a computer. So how do we know which node to print next? I have a quite simple method for this. I give each node an imaginary index, based on its location in the tree:

Give the nodes incrementing indices so their locations can be predicted

I know where each index is located, because I know which range of numbers will fall on each row. It is just a binary search to find the right index. Row 1 starts with index 1, and uses up 1 number. Row 2 starts with index 2, and uses up 2 numbers (2 & 3). Row 3 starts with 4, and uses up 4 numbers (4, 5, 6, & 7). Row 4 starts with 8 and uses up 8 numbers… etc. Pretty simple, right?

I wrote a recursive function to seek out a requested node, given a few calculations. The function looks like this:


node* seek(node* parent, int index, int a, int b, int currRow, int seekRow) {

   if(parent == NULL) {
	   return NULL;
   }

   int c = floor( (a+b) / 2 );

   node* currNode = NULL;
   if(index <= c) {
      currNode = parent->left;
      b = c;
   } else {
      currNode = parent->right;
      a = c+1;
   }

   if(seekRow == 1) {
      return parent;
   } else if(currRow == (seekRow-1)) {
      if(currNode == NULL) {
         return NULL;
      } else {
         return currNode;
      }
   } else {
      return seek(currNode, index, a, b, currRow+1, seekRow);
   }
}

For example, say we are seeking out the node at index #10. I know 10 is on the 4th row (because 10 falls between 2^(4-1) and 2^(5-1)). I know the 4th row starts at 8, and ends at 15 ( 2^(4-1), 2^(4-1) + 2^(4-1) – 1). So I pass the seek function these parameters:

node* found = seek(parent, 10, 8, 15, 1, 4);

First iteration, the function looks at the index and the range 8-15. If 10 falls in the lower half of the range, it traverses left 1 node, if 10 falls in the upper half of the range, it traverses right 1 node. Since currRow = 1 and seekRow = 4, the function will recursively call itself with updated parameters.

return seek(currNode, 10, 8, 11, 2, 4);

The 15 turns into an 11, because the upper-bound of the range was eliminated (via binary search). This process continues until the search pinpoints the sought after node index, or reaches a NULL node on the way.

We now have everything we need to know to print out our binary tree. The final coding phase is simple. This requires the cmath and iomanip libraries.


//assume parent is the root node a pre-built tree

   int total_rows = num_rows(parent);
   int current_index = 0;

   //loop through, print each row one-by-one
   for(int i = 1; i <= total_rows; i++) {
      int num_nodes_on_row = pow(2, (i-1));
      int num_spaces = pow(2, (total_rows-i)) - 1;

      //loop through and print each node of the row
      for(int q = 1; q <= num_nodes_on_row; q++) {
         if(q == 1) {
            //special rules for the first node of the row
            //spacer size is half what it SHOULD BE .. the other half
            //falls after the LAST node on the row

            //spacer function simply prints X number of spaces...
            spacer(num_spaces*NODE_WIDTH);
         } else {
            spacer( (num_spaces*NODE_WIDTH) +
                             NODE_WIDTH + (num_spaces*NODE_WIDTH));
         }
         current_index++;
         node* val = seek( parent, current_index,
                                num_nodes_on_row, (num_nodes_on_row*2)-1,
                                1, i);

         if(val == NULL) {
            cout << setw(NODE_WIDTH) << "N";
         } else {
            cout << setw(NODE_WIDTH) << val->data;
         }
      }
      //end of the row, now print newline char
      cout << endl;
   }

And there you have it. You should get something beautiful, like this:

              58
      32              73
   7      49      59      79
 4  23   N   N   N   N  73   N

Feel free to spend some time getting your code to print lines connecting the nodes. I added the feature to mine, it looks like this:

              70
        _____/  \_____
      55              86
     /  \            /  \
  46      60      74      99
 /  \    /  \    /  \    /  \
26  49   N   N   N   N  98   N

Let me know what you think, or if you find bugs in my code!

Edit (08/14/2010):


I went back and found the source code. I'm not sure what happened to my FINAL version. The one I'm uploading is a little lacking in documentation, but it is at least 5 months old! Yikes. I tried to clean it up some though.

tree.cpp (12k)


Variable variables in PHP

Well, after a couple month hiatus, I guess it is time to throw up another post. I was up working on a project, and came across a completely valid usage of variable variables, and felt it might be worth making a post, because it is kind of fun.

At first glance, “variable variables” seemed pretty pointless to me. Surely they had a place somewhere, but I guess that’s just not how I write code. The best use I could think of was a pointless, but stupid-fun script, like this:

$a = "hello world";
$b = "a";
$c = "b";
$d = "c";
$e = "d";
$f = $$$$$e;

Perhaps, for this example I’m about to use, there is a better way of accomplishing the task (I can think of other methods, but can argue which is “better”). My task was to feed a function the numerical representation of a month, and spit back the textual representation, adhering to preferred formatting. So I wrote a function that has a couple hard-coded lookup arrays, named after the $format parameter. The function is pretty well self-explanatory:

function getMonth($i, $format = 'F') {
    $F = array(
        1 => "January",
        2 => "February",
        3 => "March",
        4 => "April",
        5 => "May",
        6 => "June",
        7 => "July",
        8 => "August",
        9 => "September",
       10 => "October",
       11 => "November",
       12 => "December",
    );

    $M = array(
        1 => "Jan",
        2 => "Feb",
        3 => "Mar",
        4 => "Apr",
        5 => "May",
        6 => "Jun",
        7 => "Jul",
        8 => "Aug",
        9 => "Sep",
       10 => "Oct",
       11 => "Nov",
       12 => "Dec",
    );

    $format = strtoupper($format);

    if($format != 'F' && $format != 'M') {
        $format = 'F';
    }

    if($i > 12 || $i < 1) {
        return false;
    }

    $source = $$format;
    return $source[$i];
}

An Amazing Resource

I found an amazing resource, completely worth sharing.

It’s Free Online Course Material from MIT (yes, THE Massachusetts Institute of Technology MIT…).

It’s got free lecture notes, exams, and videos from actual MIT courses.

just another php hacker

After reading about the Just Another Perl Hacker code snippets on Wikipedia, I was pretty intrigued. The only problem is, I hate Perl. So, here is my version: just another php hacker.

I cleaned the code up a little bit here for readability (which is against the point, I guess).

foreach(explode("2","1152117298211521162114") as $z) {
	$h .= chr($z);
}

$f = (string)(2*19*227*11489);

$q = chr($h($f,(int)$f[3],(int)$f[2]+1)).chr($h($f,(int)$f[2]+1,(int)$f[0]/3)).chr($h($f,(int)$f[strlen($f)-1]+1,(int)$f[0]/3));

$z = eval("\$u=".$q."(".str_replace("3",").$q(","117311031153101311431053973108310531223101").");");

$w = "a:23:{i:0F106F1F11F2F-2F3F1F4F-84F5F65F6F13F7F1F8F5F9F-12F10F-3F11F13F12F-82F13F80F14F-8F15F8F16F-80F17F72F18F-7F19F2F20F8F21F-6F22F13;}";

$b = str_replace("F",";i:", $w);

foreach($u($b) as $c) {
	echo($q($s+=$c));
}

Here is the more obfuscated text:

foreach(explode("2","1152117298211521162114")as$z){$h.=chr($z);}$f=(string)(2*19*227*11489);$q=chr($h($f,(int)$f[3],(int)$f[2]+1)).chr($h($f,(int)$f[2]+1,(int)$f[0]/3)).chr($h($f,(int)$f[strlen($f)-1]+1,(int)$f[0]/3));$z=eval("\$u=".$q."(".str_replace("3",").$q(","117311031153101311431053973108310531223101").");");$b=str_replace("F",";i:","a:23:{i:0F106F1F11F2F-2F3F1F4F-84F5F65F6F13F7F1F8F5F9F-12F10F-3F11F13F12F-82F13F80F14F-8F15F8F16F-80F17F72F18F-7F19F2F20F8F21F-6F22F13;}");foreach($u($b)as$c){echo($q($s+=$c));}

Export to spreadsheet with PHP

With PHP, it is extremely simple to create a file that can be interpreted by speadsheet programs, such as Excel or OpenOffice.org.

When you think about what this really means, it should be clear how simple it is. We’ll do this with by created a CSV file. CSV stands for Comma-Separated Values. So basically, we’ll print out a text file with a csv MIME type, and some separated values. The MIME type can either be “text/csv” or “text/x-csv”. The standard, per the name, is to separated the values with a comma, and all values that contain a comma should be wrapped in double quotes. For example:

we,are,separated

"And here, ",we have a, comma in a value

Another way, which may be the more common practice, per my version of OpenOffice.org, is to separate the values with a semicolon:

One;Two;Three

I’ll use the semicolon method. Here is a quick script that will create a small CSV file, which can be read by a spreadsheet program:

header("Content-type: text/csv");

$data = array(
	array(
		"a-1",
		"a-2",
		"a-3",
	),
	array(
		"b-1",
		"b-2",
		"b-3",
	),
	array(
		"c-1",
		"c-2",
		"c-3",
	),
);

foreach($data as $row) {
	echo implode(";", $row) . "\n\r";
}

Now, this will prompt the user for a download, which will end in a .php filename. If you have access, and are using Apache, you can solve this with a little .htaccess. Simply putting a .htaccess file in the directory, with this in it, should do the trick.

AddType application/x-httpd-php .csv

With that, working properly, you should be able to create PHP scripts that use a .csv extension. Then, when you run the above script, save it as “mydata.csv” and the downloader should see no connection to PHP.

Now, how can we use this in a little more realistic scenario? Well, we can use this to create CSV dumps of SQL result sets.

$query = @mysql_query($sql);
$data = array();

while($row = @mysql_fetch_assoc($query)) {
	//first time through, grab the keys
	if(!sizeof($data)) {
		$data[] = array_keys($row);
	}	

	$tmp = array();
	foreach($row as $key=>$value) {
		$tmp[] = $value;
	}
	$data[] = $tmp;
}

foreach($data as $row) {
	echo implode(";", $row) . "\n\r";
}

And there you have it! That script should be capable of turning your queries into a spreadsheet readable file. Fun, right?

Implementing the Singleton pattern in PHP

In short, the Singleton Pattern is a design pattern used when we want only ONE instance of a class to be initialized. This is useful in CPU/Memory intensive classes, such as database handle abstractions.

The Singleton Pattern is a pretty fundamental pattern, worth learning about, if you aren’t already aware. It’s characterized by a private constructor, which can only be called by class member methods, and a public static accessor method, to return a reference to the object.

Here is a quick example of how it can be implemented in PHP (There are other ways, as well).

class Singleton {

    private static $__instance;
    protected $_value;

    private function __construct() {
        $this->_value = rand();
    }

    public static function getInstance() {
        if(!isset(self::$__instance)) {
            self::$__instance = new Singleton();
        }
        return self::$__instance;
    }

    public function display() {
        echo $this->_value . "<br />";
    }

}

Use it like this:

$foo = Singleton::getInstance();
$foo->display();

$bar = Singleton::getInstance();
$bar->display();

Running C/C++ Code as a CGI Script

When I first had the realization that this was not only possible, but really simple, I was very excited. Using C/C++ to create dynamic web applications isn’t new. In fact, C and Perl were originally the primary method for dynamic web based applications. Well, the Internet has been around long enough, that even Perl is beginning to become obsolete (Thank God).

C++ Web applications are not going to be a game-changer. If this were true, it would have happened a long time ago. Will I forget about PHP? Of course not! Many (probably most) shared hosting solutions won’t even allow you (with very good reason) to execute arbitrary binary files on their servers. Mine won’t. You also can’t (to my knowledge) do this technique on Windows servers.

I WILL, probably, write a few C++ web applications to run on my laptop, just because I’m a nerd, and the satisfaction I’d get if from just knowing every time I hit the Home button, I’m seeing web pages dished up by a C++ script… I’ll stop there.

What kind of solutions do I see this as a replacement for PHP (or Ruby, or Python, or ASP, or JSP, or whatever you use)?

  • You are looking to build a robust web application to be run on your own private web servers.
  • I’m thinking, if Twitter was written in C/C++, we wouldn’t have as many Twitter-outages :) .

  • You are looking to build a high-end distributable web-based software package.
  • For example, a software package, where the codebase sits on a server, and clients (or internal employees), interface with it through a web browser. Now we’re talking APPLICATION, more than just a website.

  • You are looking to build a web application, and have the ability to run arbitrary binary scripts on your server (such as a private server), and runtime speed is crucial.
  • With FastCGI and precompiled binary scripts, well-written C/C++ code will trump compile-on-the-fly approaches of PHP, Perl, Python, etc. (Of course PHP has memcache..)

Enough jibber-jabber. Let’s create a C++ CGI script! First, you’ll need to configure Apache to execute CGI scripts. It’s general practice (but you may not care.. I don’t) to create a single directory, and only allow CGI scripts in that directory to be executed. You’ve probably seen a lot of cgi-bin/ directories on various websites. Let’s say we create a directory at /var/www/cgi. In this directory, we’ll put our CGI scripts. Let’s tell Apache.

You’ll want to edit your httpd.conf file (on Ubuntu, it’s in /etc/apache2). Add this (as root/sudo):

<Directory /var/www/cgi>
    Options ExecCGI+
    AddHandler cgi-script .cgi
</Directory>

The Options ExecCGI+ line is the one that allows CGI scripts to be executed. If instead of using a single directory, you opted to make the whole ServerRoot allow CGI scripts (like I did), you’ll want to make sure the Options aren’t overridden elsewhere in the server conf. Namely, check for something in /etc/apache2/sites-available/default (or where ever else your system may store Apache config). In this file, You may see another <Directory> block for your server root. Add ExecCGI to the Options list.

You can also create arbitrary file extensions for your CGI scripts with the AddHandler directive. Imagine the possibilities.

Now, restart Apache. On Ubuntu:

sudo /etc/init.d/apache2 restart

When Apache comes back up, you should be ready to roll. You may feel like throwing in a test Perl script before we get to the C++, just to make sure things are working as expected. If you aren’t a Perl Monk (most of us aren’t), do this:

which perl

Will tell you where perl is installed (if at all). It’s probably /usr/bin/perl. So then, create this Perl script:

#!/usr/bin/perl

print qq(Content-type: text/html\n\n);
print qq(Hello, world!);

Make sure to chmod that bad boy to at least 755, and hit it in the browser, you should see “Hello, world!”. If not, you probably got one of these:

  • You saw the perl code
  • That means the CGI script didn’t attempt to execute, check back over the steps, make sure you restarted Apache.

  • You got Forbidden
  • You either aren’t allowed to execute CGI scripts, or didn’t get the right permissions.

  • Internal Server Error
  • Perl code is probably messed up. Check out tail /var/logs/apache2/error.log for what SHOULD be a more detailed error message.

  • File not found
  • You probably have a typo in the filename or the address bar :) .

Well, hopefully you have that working now, lets throw down some C++ code.

I’m not going to teach you C++, so if this code doesn’t make sense, you should look into learning C++ before approaching this technique (obviously).


#include <iostream>
#include <cstdlib>

using namespace std;

int main() {

	cout << "Content-type: text/html\n\n";
	cout << "Hello World (Wide Web)<br />" << endl;

	cout << getenv("REMOTE_ADDR") << endl;

}

Save this as hello.C, or whatever you want.. and compile the code:

g++ -o hello.cgi hello.C

Now, make sure that hello.cgi is in /var/www/cgi (or wherever you specified), and hit it in the web browser. You should see an output something like:

Hello World (Wide Web)
127.0.1.1

One of the biggest pitfalls I can foresee, is that server-side scripting is not an interactive technique. Thats why scripting languages are perfect for dynamic web pages. C++, not being a scripting language by nature may cause you some headaches. Just be sure to write smart, efficient code.

You can also download a C++ CGI library, to help out with accessing header data, such as Cookies, GET and POST variables, etc. Here is a link to an ANSI C library for CGI Programming.

Protecting Your Web Application’s Code

If you have ever considered commercial web application development, you’ve probably faced the challenge of protecting your intellectual property. I’ve spent the past few months researching and pondering this very problem.

There are many possibilities and issues here. Dynamic web applications are generally written in a client-side scripting language. The nature of these scripting languages is to compile on-the-fly. This means you store the code, in plain-sight. Not very good when you’re trying to sell software, and anyone who purchases it has the ability to reverse engineer your product.

Some technologies, such as JSP, allow you to compile the code down to bytecode, however, by nature of its design, Java bytecode is compact and simple to reverse engineer. Encryption techniques such as ZendGuard are crackable (unencryption has to happen somewhere). ActionScript (Flash) is promising, it compiles down to binary SWF files, however, tools exist to convert these SWF files into their FLA counterparts.

There may be no guaranteed, fool-proof way to protect your code, but one thing that has obviously worked well for stand-alone software vendors is binary compilation. Great, so how do I compile my web code to binary? Simple! Just write your web applications in C/C++ (or other language that compiles to binaries), and run them as CGI scripts.

I had a major AH-HA moment, when I realized that all a CGI script needs to do, is print out the content MIME type, and the actual content, Apache will take care of the rest. This approach will only work on Unix based hosts, as Windows does binaries a little different (suckers). But as the vast majority of web hosts run on Unix, this isn’t a huge deal.

I’m going to create a 2nd post, demonstrating this technique. Look for it in the not-so-distant future.

Update: The 2nd post is up! Running C/C++ Code as a CGI Script