Simple right? I needed to extract numbers from strings such as “12 L per 100 km”, “on-sale price: USD$299.99”, “2000 sq. ft.”, etc. PHP’s built-in FILTER_SANITIZE_NUMBER_FLOAT, however, was completely useless, since it just blindly filters out any character that isn’t a digit, decimal, or sign:
- “12 L per 100 k.m.” returns “12100..”,
- “on-sale price: USD$299.99” returns “-299.99”
- “2000 sq. ft.” returns “2000..”.
- Useless.
The PHP function below extracts numbers from strings far more reliably, with a few caveats:
- Only extracts the first number found in a string.
- Doesn’t support scientific notation (numbers with "e" or "E" in them).
- Doesn’t support French number formatting (thousands/millions/etc. separated by spaces, decimal point represented by a comma).
If anyone feels like remedying these shortcomings (or has any comments whatsoever about this function), please leave a comment.
P.S. Since diving back into coding about 6 weeks ago, I’ve been astonished at how quickly I’ve managed to get up to speed. The incredible smoothness of my learning curve is due ENTIRELY to hundreds of generous developers who’ve posted videos, tutorials and their own code on the net, solely so that others can learn from and use it. As a small contribution back to the community that has helped me so much, I humbly offer the PHP function below, for any and all to use.
<?php
// This function extracts a number from a string.
// It takes a single string as a parameter, and returns either a number with sign (+/-) (if found in input string) or NULL (no number found in string).
//
// Known limitations:
// * Does not support French number formatting (spaces instead of commas, decimal point represented by a comma).
// * Only extracts the first number found in a string.
// * Does not support scientific notation (numbers with "e" or "E" in them).
function extractNumber($string)
{
// Return NULL if input string is NULL:
if (!$string) {
return NULL;
}
// Break input string into an array of single characters:
$chars = str_split($string);
// Set up some arrays for later use:
$all_num_chars = array( "-",
"+",
".",
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9");
$digits_and_decimal = array(".",
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9");
$just_digits = array( "0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9");
foreach ($chars as $key => $char) {
if ($char == ",") {
// If a comma is found before a number has been encountered in the input string, skip it and iterate to next char.
if (!$number) {
continue;
}
// If a comma is found, make sure that it's preceded by a digit, followed by 3 digits and that the 4th digit is not a number, and that the comma is not found after a decimal:
if (in_array($chars[$key-1], $just_digits)
&& in_array($chars[$key+1], $just_digits)
&& in_array($chars[$key+2], $just_digits)
&& in_array($chars[$key+3], $just_digits)
&& !in_array($chars[$key+4], $just_digits)
&& !$decimal_found) {
continue; // $char is a "legit comma" and should be skipped, and the main loop should iterate to the next char.
} else { // $char is a "rogue comma" and the number found up to the rogue comma is returned:
return $number;
}
}
if ($number && !in_array($char, $all_num_chars)) { // If a $number has been started and $char is a non-numerical char, return $number:
return $number;
}
if (!$number && !in_array($char, $all_num_chars)) { // If a $number has not been started and $char is a non-numerical char, continue (iterate to next char):
continue;
}
if (in_array($char, $just_digits)) { // $char is a digit, and should be appended to $number.
$number .= $char;
continue;
}
if ($char == ".") {
if ($decimal_found) { // $char is a "rogue decimal" and the number up to the rogue decimal is returned:
return $number;
}
if (!in_array($chars[$key+1], $just_digits)) { // If the char following the decimal is not a number, return $number.
return $number;
}
// $char is a "legit decimal" and should be appended to $number.
$number .= $char;
$decimal_found = true;
continue;
} else { // $char is a sign (+ or -):
if (!$number && in_array($chars[$key + 1], $digits_and_decimal)) { // Sign occurs at beginning of number and should be added to $number.
$number .= $char;
continue;
}
if (!$number && !in_array($chars[$key + 1], $digits_and_decimal)) { // Sign occurs before the beginning of a number and should be ignored.
continue;
}
if ($number) { // Sign occurs in the middle of a number. Number before the sign is returned.
return $number;
}
}
}
return $number;
}
?>
Why not to use PHP support for Regular Expressions?
http://www.php.net/manual/en/function.preg-match-all.php
Posted by: Sasha | September 17, 2010 at 07:48 PM
Sasha!
I don't know a lot about regex's, but as I understand them, it would be difficult/impossible to write one that handles the examples in my post correctly, or cases like these:
"12.1 L per 100 k.m." should return "12.1" (and not "12.1100..").
"14 MP3 players" should return "14" (and not "143").
"On-site employees: 200" should return "200" (and not "-200").
"33,333" should return "33333", but "33,,333" should return "33".
"3,333.3" should return "3333.3", but "3.333,3" should return "3.333".
Is it possible to write a regex string that would handle cases like these?
Posted by: Nick Desbarats | September 18, 2010 at 08:29 AM
Yes, it's possible.
For example this regex will much any optionally signed floating point number in the text:
"[-+]?[0-9]*\.?[0-9]+"
Comma in the number (33,333) is a different case since you want to strip it out rather then just match it.
Posted by: Sasha | September 21, 2010 at 10:24 AM
"[-+]?[0-9,]*\.?[0-9]+"
- works for all your examples.
You can remove ',' from the result as a second step.
Posted by: Sasha | September 21, 2010 at 10:30 AM
Why not to use PHP support for Regular Expressions?
Posted by: mbt sandals | July 10, 2011 at 08:53 PM
One thing I know: The only ones among you who will be really happy are those who will have sought and found how to serve.
Posted by: oakley outlet | December 04, 2011 at 08:21 PM
My co-worker got a hamp mod and now herpayments were lowered considerably. I need one.
Posted by: Flash Mp3 player | February 03, 2012 at 03:26 AM
Thanks Sasha your regex worked really well :)
Posted by: Vivek | March 07, 2012 at 06:17 AM
Thanks for sharing! Fantastic article, I can’t wait to view more. These aircraft’s are something different.
Posted by: Eye Beauty Review | March 29, 2012 at 12:01 PM
These days, it is possible for Wii owners to cleverly run an unlock software and have access to complete the additional features without having to kill their gadget.
Posted by: chat software download | June 08, 2012 at 08:16 AM