Follow us on Twitter!
Society leans ever heavily on computers, if you have the power to take out computers you can take out society. - cubeman372
Sunday, April 20, 2014
Navigation
Home
HellBoundHackers Main:
HellBoundHackers Find:
HellBoundHackers Information:
Learn
Communicate
Submit
Shop
Challenges
HellBoundHackers Exploit:
HellBoundHackers Programming:
HellBoundHackers Think:
HellBoundHackers Track:
HellBoundHackers Patch:
HellBoundHackers Other:
HellBoundHackers Need Help?
Other
Members Online
Total Online: 23
Guests Online: 22
Members Online: 1

Registered Members: 82843
Newest Member: hx47
Latest Articles

PHP preg_match and regex (Regular Expressions)

Arrow Image I hope this will cover most of everyone's needs...



*WARNING!!!* For some reason the escaped values of this article will no longer appear escaped due to the PHP script replacing them. You can find the official article on my archive server at http://rasterized.net:8080/regex.txt

This is a tutorial I decided to make because alot of tutorials never explained regex syntax clearly or did not cover everything. I hope to make this a complete tutorial, and plan to develope oppon it as I find out more.

Now let me say, I am going to be explaining the syntax for preg_match() in php, but the command is develope off of perl script, so I dont think there will be very many issues with compadability

Lets first start with escape characters. Most of you who maybe reading this tutorial right now, know what they are, but in regex there are things to keep in mind. The characters that you MUST escape are:
* + ? . ( ) [ ] { } \ / | ^ $
If you were trying to match strings that include these characters.
Examples:
The String: "How much money do you need? ($100, $200)"
Escaped: "How much money do you need\? \(\$100, \$200\)"

Now, you are going to learn the various wildcards. You need to know what '*', '+', and '?' mean. They are really easy to figure out :D .
'*' - match 0 or more characters
'+' - match 1 or more characters
'?' - match 0 or 1 characters
Example:
'ref*' - match 0 or more f's
'sds_d+' - match 1 or more d's
'terr?' - match 0 or 1 r's
Note: You are not only limited to letters, you can use any character you like.

If you want to specify how manny characters to match you can use the '{' and '}' brackets (not sure on the real name of these brackets :P )
The basic syntax is: (character){minimum [, maximum]}
Example:
'something_r{2}' - match exactly 2 r's
'ni{3,} - matches 3 or more i's
'blay{5,10}' - match atleast 5 y's but not more than 10. Asin match 5, 6, 7, 8, 9, or 10 y's

Other special characters:
'.' - can be any character what so ever
Example: 'The letter of today is .!" - the '.' can equal anything
'/' - Used to denote the beginning of a string and an end (will be covered later)
'^' - denotes the beginning of a line (notice I said line instead of string this time)
'$' - denotes the end of a line (again notice I used line)
'|' - The OR function (will also be covered later)

Now lets get to the '[' and ']' brackets.
These brackets are used to match a range of characters. (The dashes '-' are used to apply a range, if you want to match a character '-' then you would have to escape it)
Example: '[a-zA-Z]' - Will match 1 character that is either an upper case letter or a lower case letter.

You can also expand the use of this by using our friendly wildcards.
Example:
'[0-9ab]*' - will match 0 or more characters that are 0 through 9, a, or b.
'[0-9a-f]{32}' - match exactly 32 characters that are 0 thourgh 9, or a through f

You now should understand the basic syntax of regex, now lets get into using it with preg_match()!
int preg_match ( string pattern, string subject [, array &matches])

Here is an example html that is stored in var $html:
<html>
<header>
<title>Rasterized</title>
</header>
<body>
<table>
<tr>
<td>test123</td>
<td>test4567</td>
<td>nota789</td>
<td>mehtest</td>
</tr>
</table>
</body>
</html>

Ok, lets say you want to match all the <td> tags that contain the word test. You would use:
"/<td>.*test.*<\/td>/" (notice I used the '/' slashes at the beginning and end)
Which then in preg_match() it would be:
preg_match("/<td>.*test.*<\/td>/", $html);
At this point preg_match() will return the amount of times it has matched the string above, which would be 3.

Now lets say you want to match all <td>tags that contain the word test followed by a set of numbers, you would use:
"/<td>test[0-9]+<\/td>/"
Which then in preg_match() it would be:
preg_match("/<td>test[0-9]+<\/td>/", $html);
This time it would return a value of 2.

I am sure you have the basic idea now, but lets say you want to organize data within that html. Say you want to grab any string between the <td></td> tags and starts with the 3 numbers 789. You would use:
"/<td>(.*)789<\/td>/"
Now in preg_match():
preg_match("/<td>(.*)789<\/td>/", $html, $matches) (Notice I used '(' and ')', these are used to tell the program 'In this location, with these characters, store in array')

Now $matches is an array, the first part of the array ($matches[0]) is the string it has matched, then $matches[1] will contain 'nota' because it is the only string between the <td> tags and starts with 789.

The parenthisys are not limited to only (.*), you can also use things like ([0-9]) to match a single number ([a-z]*) to match 0 or more characters that contain the letters a-z, and so on...

What I have experienced with preg_match(), it has a hard time dealing with multiple lines that are needed to be matched. Good luck and feel free to post your questions, and maybe fix any mistakes that I might of made.

Comments

mozzeron October 10 2006 - 15:47:37
Ooh, first post. Nice one Raster
Larikaon October 10 2006 - 16:25:00
This may help many people that are starting in php
system_meltdownon October 10 2006 - 18:00:57
Nice work, I hate regex Grin
Rasteron October 10 2006 - 20:07:43
I will post a more advanced article of regex including modifiers and such when I get the chance. Hope this helps people out a bit.
Rasteron October 11 2006 - 11:48:40
For some reason... the forward slashes and such are not escaped anymore.... I think it might be because of the php script thinking they are being escaped...
Rasteron October 11 2006 - 17:09:50
http://rasterized.net:8080/regex.txt <-- here I uploaded it to my server, this has the acctual escaped characters and such...
mozzeron October 11 2006 - 18:01:03
Just like to say that the PHP cheat sheet has some useful notes about RegEx syntax and regexp.net is another useful resource
SwiftNomadon October 12 2006 - 00:19:11
good one Raster and thanks again mozzer. You're as resourceful as google. ROFL!
The_Cellon October 12 2006 - 15:43:16
Just for the record; you can't use POSIX-style regexes with preg_match(), only with ereg() or eregi() Smile Nine article Wink
midoon November 30 2009 - 16:41:08
your server does not work.
midoon November 30 2009 - 16:53:54
awesome article overall, but in the last paragraph, you meant preg_match_all?
Post Comment

Sorry.

You must have completed the challenge Basic 1 and have 100 points or more, to be able to post.