Week 4 Assignment

Due Sunday, October 1, 2006

Preventing SQL Injection

What I Learned

I learned a few new things in researching this weeks assignment. In the past I have used the PHP function htmlspecialchars() and strip_tags() on user input. But the recommendation is to use htmlspecialchars() on output that is going to be displayed and use mysql_real_escape_string() on the input.

Along with using mysql_real_escape_string() some have said that all input should be quoted, whereas others have said not to quote numeric data. Is it that some databases do not except quoted numeric data? My solution quotes all the input data.

Another new function I learned about is sprintf() which is a string concatenation function that allows you to control the type of variables as they are used in the SQL query. %s for string variables and %d for numeric variables.

Example of sprintf()

<?php
$variableA = "dogs";
$variableB = 5;
echo(sprintf("How do %s bury %d bones?", $variableA, $variableB));
?>
OUTPUT: How do dogs bury 5 bones?

A couple of reminders when using the mysql_real_escape_string() function:

Finally, the last new technique I learned concerns URL output - Use urlencode() to replace spaces with "+" signs, and unsafe ASCII characters with "%" followed by their hex equivalent. This function could be used on the search results link output in our assignment. Not only does this make URL output safer but it also becomes XHTML standards compliant :)

Our Assignment

The database table name is spider_results, keyword is the database table column and the input field is search_string. The search_string input needs to be sanitized before the SQL query compares it to the keyword information in the database.

The mysql_user (sent in database connection string) should be setup for minimum access rights, so in our assignment the user would only need Select rights.

Another good suggestion to prevent SQL injection attacks is to use a "white list" with RegEx. I didn't do that for this assignment because I think it would be difficult to determine a whitelist for search engine queries. The key here is the recommendation to create a whitelist not a blacklist RegEx (whitelist is what characters are accepted versus a blacklist that says what characters are NOT accepted.)

<?php
// Quote variables to make them safe
function quote_text($value) {
   // stripslashes or mysql_real_escape
   if (get_magic_quotes_gpc()) {
       $value = stripslashes($value);
   }
   // escape and quote 
      $value = "'" . mysql_real_escape_string($value) . "'";
   return $value;
}

// Connect to database first
$link = mysql_connect('mysql_host', 'mysql_user', 'mysql_password')
   OR die($errmsg);

// Make a safe query using sprintf and cleaned data
$query = sprintf("SELECT * FROM spider_results WHERE keyword=%s",
           quote_text($_POST['search_string']);

mysql_query($query);
?> 

Summary

Data sanitation is extremely important:

  1. Don't trust user input
  2. All user input needs to be escaped

To escape data: