Tuesday, March 13, 2012

Auto-increment ID Obfuscation

One problem I encountered is that an auto incremented ID may expose certain information that you do not want your clients to know.

For example, http://www.company.com/order?id=1234 tells you that this company probably does not have a lot of new orders since your order probably is the 1234th of their overall orders. http://www.NewDatingSite.com/member?id=256 is simply telling people that you probably will not be able to find lots new matches since the newest member is just their 256th member.

Of course you can encrypt the ID number to make it more complicated like http://www.ConfusingID.com/stuff?id=!W%23R%5E%25YFT%5E. While this ensures that your customers probably will not be able to probe around and know that they are actually your first ever guinea pig customer, they will definitely have a fun time reading that ID to your service representative when calling over the phone. That ID is simply too ugly for the eyes.

You can also create a 1-to-1 hash table (beware of the collisions) or use some mathematical skills to create a mapping algorithm, but that is just too complicated for me.

I will just present the approach I am currently using. One of the main ingredients is base32 encode. Base32 encode has one major advantage over Base64 encode, that is: base32 is URL ready; it does not require additional encoding for URL while Base64 uses '=' character, which will need to be URL encoded as '%3D'.

Let's just see the two main functions:
function id_encode($id) {
return base_convert((9999999999 - (float) $id) , 10 , 32 );
}

function id_decode($fakeid) {
return 9999999999 - (float) base_convert($fakeid, 32, 10);
}


Let's use some numbers to run the following code:
<?
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
?>


The output would be:
1 → ghq6n2 → 1 
2 → ghq6n1 → 2
3 → ghq6n0 → 3
100 → ghq6jv → 100
1000 → ghq5nr → 1000
10000 → ghpsuj → 10000
1000000 → ggrm53 → 1000000
94256245 → dntnje → 94256245


So instead of having http://www.mycompany.com/order?id=3, the new URL will be: http://www.mycompany.com/order?id=ghq6n0. Yes, the ID is longer, but the length is fixed. One of the reasons why we need a fixed length ID is that user will have no way to tell the relative position of the ID in the whole database.If you do not need to have fixed length, just remove the minus 9999999999 part, whose purpose is just for fixed length padding.

It is pretty simple right? If you don't need encryption, you can simply stop reading right here.



Now, some people might need some simple encryption. Let's throw in xor and base32_encode (base32_encode is needed because string after xor might not be a printable char). Of course, if you intentionally choose your xor key, you can even make the ID resemble a meaningful string (but will gradually obfuscate as the number grows larger and reveal the fact that the number is growing). The following is an example with xor encryption.

1 → mycompanynbq → 1
2 → mycompanynba → 2
3 → mycompanyncq → 3
100 → mycompanyzca → 100
1000 → mycompanqvma → 1000
10000 → mycompaeqjia → 10000
1000000 → mzokais22faa → 1000000
94256245 → mzjoaic3yzlq → 94256245




Another encrypted example without padding:
1 → ny → 1
2 → nu → 2
3 → nq → 3
100 → nriq → 100
1000 → ffoq → 1000
10000 → myflc → 10000
1000000 → fibl6yq → 1000000
94256245 → nuk2kncr3e → 94256245



Now let's examine the xor_this function, it's also pretty short (and it's actually an example from php.net):

<?php
function xor_this($string) {
$key = 'JohnyBeGood';

$text = sprintf('%s', $string);

$outText = '';

for($i=0;$i<strlen($text);) {
for($j=0;$j<strlen($key);$j++,$i++) {
$outText .= $text{$i} ^ $key{$j};
}
}
return $outText;
}
?>


Finally you need the base32_encode and base32_decode functions. Interestingly, PHP offers base64 encode/decode, but not base32. You can Google a bit and find it right around here: http://xref.moodle.org/nav.html?lib/base32.php.source.html

For those who are too lazy to click (or the above link is down for some reason), I will just paste everything here:
<?php
//
// +----------------------------------------------------------------------+
// | Base32 Library |
// +----------------------------------------------------------------------+
// | Copyright (c) 2001 The PHP Group |
// +----------------------------------------------------------------------+
// | This source file is dual-licensed. It is available under the terms |
// | of the GNU GPL v2.0 and under the terms of the PHP license version |
// | 2.02, available at through the world-wide-web at |
// | available at through the world-wide-web at |
// | http://www.php.net/license/2_02.txt. |
// +----------------------------------------------------------------------+
// | Minor fixes and additional functions by Allan Hansen. |
// | Moodle porting work by Martin Langhoff |
// +----------------------------------------------------------------------+
// | base32.php - based on race.php - RACE encode and decode strings. |
// +----------------------------------------------------------------------+
// | Authors: Allan Hansen <All@nHansen.dk> |
// | Arjan Wekking <a.wekking@synantics.nl> |
// | Martin Langhoff <martin@catalyst.net.nz> |
// +----------------------------------------------------------------------+
//

/**
* Base32 encode a binary string
*
* @param $inString Binary string to base32 encode
*
* @return $outString Base32 encoded $inString
*
* @access private
*
*/

function base32_encode($inString) {
$outString = "";
$compBits = "";
$BASE32_TABLE = array('00000' => 0x61, '00001' => 0x62, '00010' => 0x63, '00011' => 0x64, '00100' => 0x65, '00101' => 0x66, '00110' => 0x67, '00111' => 0x68, '01000' => 0x69, '01001' => 0x6a, '01010' => 0x6b, '01011' => 0x6c, '01100' => 0x6d, '01101' => 0x6e, '01110' => 0x6f, '01111' => 0x70, '10000' => 0x71, '10001' => 0x72, '10010' => 0x73, '10011' => 0x74, '10100' => 0x75, '10101' => 0x76, '10110' => 0x77, '10111' => 0x78, '11000' => 0x79, '11001' => 0x7a, '11010' => 0x32, '11011' => 0x33, '11100' => 0x34, '11101' => 0x35, '11110' => 0x36, '11111' => 0x37);

/* Turn the compressed string into a string that represents the bits as 0 and 1. */
for ($i = 0; $i < strlen($inString); $i++) {
$compBits .= str_pad(decbin(ord(substr($inString,$i,1))), 8, '0', STR_PAD_LEFT);
}

/* Pad the value with enough 0's to make it a multiple of 5 */
if((strlen($compBits) % 5) != 0) {
$compBits = str_pad($compBits, strlen($compBits)+(5-(strlen($compBits)%5)), '0', STR_PAD_RIGHT);
}

/* Create an array by chunking it every 5 chars */
$fiveBitsArray = split("\n",rtrim(chunk_split($compBits, 5, "\n")));

/* Look-up each chunk and add it to $outstring */
foreach($fiveBitsArray as $fiveBitsString) {
$outString .= chr($BASE32_TABLE[$fiveBitsString]);
}

return $outString;
}

function Base32_decode($inString) {
/* declaration */
$inputCheck = null;
$deCompBits = null;

$BASE32_TABLE = array(0x61 => '00000', 0x62 => '00001', 0x63 => '00010', 0x64 => '00011', 0x65 => '00100', 0x66 => '00101', 0x67 => '00110', 0x68 => '00111', 0x69 => '01000', 0x6a => '01001', 0x6b => '01010', 0x6c => '01011', 0x6d => '01100', 0x6e => '01101', 0x6f => '01110', 0x70 => '01111', 0x71 => '10000', 0x72 => '10001', 0x73 => '10010', 0x74 => '10011', 0x75 => '10100', 0x76 => '10101', 0x77 => '10110', 0x78 => '10111', 0x79 => '11000', 0x7a => '11001', 0x32 => '11010', 0x33 => '11011', 0x34 => '11100', 0x35 => '11101', 0x36 => '11110', 0x37 => '11111');

/* Step 1 */
$inputCheck = strlen($inString) % 8;
if(($inputCheck == 1)||($inputCheck == 3)||($inputCheck == 6)) {
trigger_error('input to Base32Decode was a bad mod length: '.$inputCheck);
return false;
}

/* $deCompBits is a string that represents the bits as 0 and 1.*/
for ($i = 0; $i < strlen($inString); $i++) {
$inChar = ord(substr($inString,$i,1));
if(isset($BASE32_TABLE[$inChar])) {
$deCompBits .= $BASE32_TABLE[$inChar];
} else {
trigger_error('input to Base32Decode had a bad character: '.$inChar);
return false;
}
}

/* Step 5 */
$padding = strlen($deCompBits) % 8;
$paddingContent = substr($deCompBits, (strlen($deCompBits) - $padding));
if(substr_count($paddingContent, '1')>0) {
trigger_error('found non-zero padding in Base32Decode');
return false;
}

/* Break the decompressed string into octets for returning */
$deArr = array();
for($i = 0; $i < (int)(strlen($deCompBits) / 8); $i++) {
$deArr[$i] = chr(bindec(substr($deCompBits, $i*8, 8)));
}

$outString = join('',$deArr);

return $outString;
}

?>
<?php
// +----------------------------------------------------------------------+
// | Auto-increment ID Obfuscation |
// +----------------------------------------------------------------------+
// | Copyright (c) 2012 MyNikko.com |
// |http://mynikko.blogspot.com/2012/03/auto-increment-id-obfuscation.html|
// +----------------------------------------------------------------------+
function xor_this($string) {
$key = 'JohnnyBeGood';

$text = sprintf('%s', $string);

$outText = '';

for($i=0;$i<strlen($text);) {
for($j=0;$j<strlen($key);$j++,$i++) {
$outText .= $text{$i} ^ $key{$j};
}
}
return $outText;
}

function id_encode($id) {
return base32_encode(xor_this(base_convert(((float) $id) , 10 , 32 )));
}

function id_decode($fakeid) {
return (float) base_convert(xor_this(base32_decode($fakeid)), 32, 10);
}

$test = "1";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "2";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "3";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "100";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "1000";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "10000";http://www.blogger.com/img/blank.gif
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "1000000";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
$test = "94256245";
printf("%s → %s → %s <br />", $test, id_encode($test), id_decode(id_encode($test)) );
?>



This actually answers the question here: http://stackoverflow.com/questions/432291/1-1-mappings-for-id-obfuscation

No comments: