cookandkaye

scientific and technical website design projects news

CAS number validation with PHP

Chemical names tend to be a mixture of IUPAC and lab-speak, and an empirical formula can represent a dozen very different substances. So, when it is important that you report chemicals accurately – how do you do it? This is a problem we have recently encountered in developing an interactive database for the ‘Control of Substances Hazardous to Health’ (COSHH) for a University department. Here the system must be able to unambiguously associate chemical substances with risk and safety data.

Pretty much the only way to do this is through the CAS registry number, this is issued to all reported chemicals by the Americal Chemical Society. The chemicals themeselves will be called up by users by their IUPAC name or any synonym that is available in the database, or by fragments of these, making it easier and quicker to use, though the option of using the CAS registry number will also be available to them.

The CAS number has the format 1234567-12-1, i.e. a number up to seven digits long – a two digit number – a single digit number. The last number acts as a check for the whole number, to help prevent typo’s, so this system is quite robust, and allows us to add new chemicals to the existing database without great fear of duplication (in fact some chemicals have been associated with more than one CAS number, so even this is not infallible).

All we need now is a check program for the CAS number, to help ensure the user or administrator does not make typos. This we adapted from the code posted by Rich Apodaca on Depth First (in Ruby):


function valid_cas($cas_no) {
$pattern=false;
$sum=false;
$checksum=0;
//pattern check
if(ereg('[0-9]{2,7}-[0-9]{2}-[0-9]', $cas_no)){
$pattern=true;
}
//perform check sum
$cas_no=strrev($cas_no);//reverse the string
$cas_no=str_replace('-','',$cas_no);//take out hyphens
$chk_digit=substr($cas_no, 0, 1);//get the check digit, now the first number!
for($i=0; $i<strlen($cas_no); $i++){
$digit[$i]=substr($cas_no, $i, 1);
$checksum=$checksum+($digit[$i]*$i);
}
if($chk_digit==$checksum%10){
$sum=true;
}
if(($pattern)AND($sum)){
return true;
} else {
return false;
}
}

Calling valid_cas(number-to-test) within a PHP code block will return true or false, depending upon whether or not the CAS number has a valid format, so providing a valuable chack against typos on adding new chemicals to the database, and permitting the database to check against duplication.

For more on the CAS registry number – see the wikipedia entry