ignoring the url generation bug in imp

Anil Madhavapeddy anil@recoil.org
Sun, 19 Nov 2000 12:12:42 +0000


[Please keep this discussion on the dev@lists.horde.org list]

Quoting Marc Lehmann <pcg@goof.com>:

> I've seen you closing the bug-report, but can't understand why - I've read
> the thread
> (http://lists.horde.org/cgi-bin/ezmlm-cgi?
3:mss:597:200011:bmfjldgdbbninlgdgeal)
> and saw that people seem to think of this as a browser bug, which
> it isn't.
> 
> Your reasoning is like this: "The html code is broken so the browser
> shouldn't misinterpret it", which of course makes no sense.
> 

That does indeed make no sense, but I didn't think it was our
reasoning :-)

I fail to see how that html code is broken; the SGML spec very
clearly states the conditions for an entity to be an entity,
and you can only leave off the semicolon if an invalid character
is encountered (for example, an entity at the end of a line does
not need a terminating semicolon in SGML).  I was under the 
impression that HTML basically used SGML entities.

Now, if browsers implement this incorrectly, and it's widespread
enough to cause grief, then we certainly need to work around 
this.  However, the only reported instances of this breaking
were for IE4 and NS2.

> If you read the php manual about the url escaping functions you will see
> that this is explicitly mentioned as a very common bug (I'd say it's a
> typical php bug ;), including the exact reference in the html standard,
> which also mentions this bug _explicitly_.
> 

Now we are getting somewhere!  I hadn't seen this particular 
reference before.  It seems to be a trivial fix.  Why don't we just
stick a htmlentities in the Horde::url() function ?  That should
keep everyone happy.

> When in doubt it's always better to follow the standard rather then to
> rely on some companie's implementations parsing it correctly (for your
> notion of "correctly", which contradicts the html standard(s)).
> 
> Also (to Anil), if you read the html standard closely then you will see
> that entities are defined quite differently then to what you think they
> are (if you can't find it, ask, I'll provide references), although the
> html standard (I always refer to the old 4.01 rather than the current one)
> mentions a lot of compatibility guidelines which basically boil down to:
> "most browsers parse it incorrectly, so better do not use this syntax".
> 

I had a quick look, but couldn't find those specific references; could
you pass them on to us?  Incidentally, why do you refer to the older
HTML 4.01 spec in preference to the new one?

> So, fact is, you generate broken code and the most common browsers happen
> to interpret it correctly most(!) of the time.

Hmm, still not convinced.  We are generating perfectly valid SGML
entity references.  Even if there are compatibility guidelines in the
spec, modern browsers (which presumably don't have this bug) would
work fine.  So I'd say we are generating correct code which older
browsers don't grok correctly.

> 
> However, please don't ignore this bug in your code - there are too many
> broken html pages out there already, don't add to these by ignoring the
> standard and relying on some companies browsers to fix this for you.
> 

We have no interest in ignoring bugs, but were rather under the 
impression that it was just a case of the occasional broken
browser from past days.  If this isn't the case, then we'd be pleased
to fix this particular issue.

-- 
Anil Madhavapeddy, <anil@recoil.org>