[dev] Bug 616: URL-to-link in message.php3
Rich Lafferty
rich@horde.org
Sat, 17 Feb 2001 14:16:45 -0500
On Sat, Feb 17, 2001 at 11:43:03AM -0600, Brent J. Nordquist (bjn@horde.org) wrote:
>
> But I'm still curious about why the regex has a list in the first place.
> Does anyone know why it was done this way, or if there are nonspace
> characters that we should be careful not to pull in? Is it just so that a
> URL in prose does the expected thing, such as:
>
> "You might look at http://example.com/foo, but it's not a very good site."
That'd be my guess -- that's the usual gotcha for this sort of
thing. For what it's worth, the standard Perl solution, from Tom
Christiansen, is
$urls = '(' . join ('|', qw{
http
telnet
gopher
file
wais
ftp
} )
. ')';
$ltrs = '\w';
$gunk = '/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any = "${ltrs}${gunk}${punc}";
s{
\b # start at word boundary
( # begin $1 {
$urls : # need resource and a colon
[$any] +? # followed by on or more
# of any valid character, but
# be conservative and take only
# what you need to....
) # end $1 }
(?= # look-ahead non-consumptive assertion
[$punc]* # either 0 or more puntuation
[^$any] # followed by a non-url char
| # or else
$ # then end of the string
)
}{<A HREF="$1">$1</A>}igox;
which I don't have the time to convert to PHP (insert grumble about
function- instead of operator-based RE implementations here) but might
be useful. (In fact, I don't know if the pcre library that PHP uses
*has* lookaheads yet.)
-Rich
--
------------------------------ Rich Lafferty ---------------------------
Sysadmin/Programmer, Instructional and Information Technology Services
Concordia University, Montreal, QC (514) 848-7625
------------------------- rich@alcor.concordia.ca ----------------------