[dev] Bug 616: URL-to-link in message.php3

Rich Lafferty rich@horde.org
Sat, 17 Feb 2001 14:16:45 -0500


On Sat, Feb 17, 2001 at 11:43:03AM -0600, Brent J. Nordquist (bjn@horde.org) wrote:
> 
> But I'm still curious about why the regex has a list in the first place.
> Does anyone know why it was done this way, or if there are nonspace
> characters that we should be careful not to pull in?  Is it just so that a
> URL in prose does the expected thing, such as:
> 
> "You might look at http://example.com/foo, but it's not a very good site."

That'd be my guess -- that's the usual gotcha for this sort of
thing. For what it's worth, the standard Perl solution, from Tom
Christiansen, is 

  $urls = '(' . join ('|', qw{
                http
                telnet
                gopher
                file
                wais
                ftp
            } ) 
        . ')';
  
  $ltrs = '\w';
  $gunk = '/#~:.?+=&%@!\-';
  $punc = '.:?\-';
  $any  = "${ltrs}${gunk}${punc}";

  s{
    \b                          # start at word boundary
    (                           # begin $1  {
      $urls     :               # need resource and a colon
      [$any] +?                 # followed by on or more
                                #  of any valid character, but
                                #  be conservative and take only
                                #  what you need to....
    )                           # end   $1  }
    (?=                         # look-ahead non-consumptive assertion
            [$punc]*            # either 0 or more puntuation
            [^$any]             #   followed by a non-url char
        |                       # or else
            $                   #   then end of the string
    )
  }{<A HREF="$1">$1</A>}igox;
 
which I don't have the time to convert to PHP (insert grumble about
function- instead of operator-based RE implementations here) but might
be useful. (In fact, I don't know if the pcre library that PHP uses
*has* lookaheads yet.)

  -Rich

-- 
------------------------------ Rich Lafferty ---------------------------
 Sysadmin/Programmer, Instructional and Information Technology Services
   Concordia University, Montreal, QC                 (514) 848-7625
------------------------- rich@alcor.concordia.ca ----------------------