M. Adam Davis wrote:

> On 5/25/08, Gerhard Fiedler <lists@connectionbrazil.com> wrote:
>> FWIW, IE decodes the image URLs correctly and displays the images.
> 
> FWIW IE does NOT follow the RFC for URL encoding, 

Possibly correct. It probably would if it tried the (wrong) URL with the
encoded backslashes first, and the (correct) guessed URL if the first URL
doesn't work. It doesn't do this. (See below for more on this.)

(FWIW, this applies not only to IE, but to all other browsers that display
Roman's page correctly. I don't know whether any of the others do it the
conform way.)

> and that means that when I make a webpage that has windows users copy
> path information from their file system into a text box on a webpage and
> post that to my website for various reasons then I have to work around
> this BUG that IE continues to use to support web developers that don't
> follow internet standards.  

I don't think this is correct. The IE version I have here (IE 6 on WinXP
Pro) leaves %5C in the query part of the URL alone. So you can very well
send back a properly encoded query result. I just sent a request to
/?test%5C, and the http header sent by IE is "GET /?test%5C HTTP/1.1". Have
you looked at the actual http traffic? How do you send the file system path
back? I don't see why this shouldn't work.

(As a side point: The Windows API functions accept the forward slash as
well as the backslash. In many cases -- especially when programming
multi-platform code in C or C++ :) -- using the forward slash has
advantages. So even if IE did send you back a Windows file system path with
forward slashes instead of backslashes, you could pass this path to any
Windows file system API and it would work just the same.)

> I further can't use the standard javascript encode routine to alter the
> URL - IE just decodes it, so I have to write and debug a new routine
> that enables a custom encoding (which further must fit within the
> correct URL encoding, meaning it's much stricter) and then use the
> reverse on the other end.

I don't fully understand this. Can you provide an example for why you can't
use the standard JavaScript routine to encode?

> What IE has done, essentially, is ELIMINATED the use of the \ for
> anything other than a bad path delimiter - they've dropped a character
> (for which there is a completely valid encoding - %5C - from the
> character set.

Correct as for the use in the path part of the URL (and wrong as for its
use in the query part). That's a bit sad, but I'm still not sure that there
is a real harm done. First, the (encoded) character seems to be transmitted
correctly when used in the query part. Second, I don't think there's a need
for the character in the path part. Thirdly, I think I've never seen a
correct http URL path (path, not query!) that contained a backslash. Have
you? Has anyone? (I've asked this before, without response so far, so I
guess nobody has ever seen a standard-conform URL path with a backslash in
it.)

> How is that in any way acceptable?

Not that nice, but I still question what's the actual harm. I myself (and I
agree with Sergio on this, even though he seems to think he disagrees with
me :) think that using backslashes (encoded or not) in the URL path is not
a good idea (and I think it is not being used by anybody -- see above), so
I think it is acceptable. Not perfect, but acceptable.

> It also means new developers get to work around this BUG and waste
> time while using a perfectly good character:

It's not "perfectly good". It's a character with special meaning and
characterized as "excluded", "disallowed" (and "unwise" :) by RFC2396.
Therefore using it may, standard-conformly, be characterized as, at least,
"unwise"... :) (which is, for me, something different from "perfectly
good"). If you want to use an "excluded" character in the URI, you
shouldn't be surprised that the results are, at best, unpredictable.

> http://silverlight.net/forums/t/16247.aspx

I'm not familiar with Silverlight (and don't have any increased desire to
become so). But what he did (using a backslash in an URL) is characterized
as "unwise" by the relevant RFC, so that there are problems shouldn't
surprise. Standard-conform solution: just use forward slashes for directory
separators, and things work.

> http://www.netmechanic.com/news/vol4/html_no18.htm

This here describes that some links work in IE and not in Netscape
Navigator (seems a bit dated... anyone remember this browser? :) Using
browsers to check for formal correctness of the HTML is not a good idea to
start with. There is no commonly used browser out there that /only/ accepts
fully standard-conform HTML. They all are pretty lenient with HTML errors
(and have to be, otherwise they wouldn't really serve as a tool for users
to browse the web). Standard-conform solution: don't use browsers to check
your sites, use proper validators. Links accepted by
<http://validator.w3.org/checklink> do work on all common browsers. (There
are many other validators.)

> http://www.cs.tut.fi/~jkorpela/www/revsol.html

No contest to this (even though some details may be outdated -- it's from
2000). One comment: He writes "It is a common error to use "\" instead of
"/" in URLs". Being a common error, it may not be so absurd for a browser
designer to include that into the (verrry long) list of common errors in
HTML that the browser will work around. As I wrote above, it's not IE
alone; all commonly used browsers accept a long list of HTML errors, the
differences are mainly which errors they accept rather than whether or not
they accept errors. 

> There are times when one wants to pass a path to the webserver as a
> parameter for a script.  For readability '\' is a perfect answer - '/'
> isn't allowed except as a path, and no other character conveys to the
> user what the parameter is about as well.  

I rather stay with RFC2396 and think this is "unwise" if you cherish
standard-conformity. If you want to use the backslash in a standard-conform
URI, you have to use its encoded form. Which leaves the readability about
on par with the encoded forward slash. Both can be used in encoded form in
the parameter part of the URI, even when using IE, and neither can be used
in unencoded form.

> Regardless of how the developer wants to use it, it shouldn't be
> artificially restricted because it's what windows systems use as a path
> delimiter - 

It's not artificially restricted. Some of what you wrote seems to indicate
you're ranting against RFC2396 and it's restriction placed on the use of
the unencoded backslash, but that has little to do with IE.

> take your MS specific standards out of my web standards, please.

I really don't understand these personal attacks; you're not the first one
with this. What are you thinking when you write this? That I somehow "own"
Microsoft standards? 


> If the web developer insists on using non standard UNIFORM RESOURCE
> LOCATOR techniques, then they should deal with the conversion on their
> end, not force clients to adopt their personal style.

Right... which also applies to web developers insisting on using unencoded
backslashes in URLs :) 

Fact is that there is a URL that contains a backslash in the HTML. This is
not allowed according to the RFC that deals with URLs. I don't know whether
there is a clear rule in any RFC what to do with such illegal URLs; more
specifically, I don't know whether encoding the backslash or transforming
it into a forward slash is required or condoned by an RFC. If not, both are
equally "guessing" -- just that one happens to have a result that works and
the other a result that doesn't work. Do you know where it is defined how
to handle an image src attribute that contains a backslash in the path?

OTOH, it doesn't help me to think "Roman should fix the image URLs". This
doesn't get them fixed, nor does it help me look at the page. How did you
look at the page?


> Trying the correct URL first, and only on error attempting another hit
> would only increase the load on servers and make it take longer to get
> the content - slowing down the site.

Huh? Again -- how did you look at Roman's page
<http://www.romanblack.com/binclk.htm>? (I suppose you did, as this is what
my point is about.) 

A (standard-conform) way to look at it is to first try the image URLs with
encoded backslashes, then (after they come back as 404) edit the image
URLs, replace the encoded backslashes with forward slashes and retrieve the
images. Which is exactly what you say shouldn't be done. If this creates
additional server load, so be it -- it is caused by an error on the server
side. If Roman thinks this additional traffic is too much to handle, all he
needs to do is to fix the image links.

OTOH, you seem to suggest that not trying the (broken) URL with the encoded
backslashes first but rather go directly to the (correct) URL is better
because it avoids the request with the 404 response. Or not?

Gerhard

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist