M. Adam Davis wrote: > On 5/25/08, Gerhard Fiedler wrote: >> FWIW, IE decodes the image URLs correctly and displays the images. > > FWIW IE does NOT follow the RFC for URL encoding, Possibly correct. It probably would if it tried the (wrong) URL with the encoded backslashes first, and the (correct) guessed URL if the first URL doesn't work. It doesn't do this. (See below for more on this.) (FWIW, this applies not only to IE, but to all other browsers that display Roman's page correctly. I don't know whether any of the others do it the conform way.) > and that means that when I make a webpage that has windows users copy > path information from their file system into a text box on a webpage and > post that to my website for various reasons then I have to work around > this BUG that IE continues to use to support web developers that don't > follow internet standards. I don't think this is correct. The IE version I have here (IE 6 on WinXP Pro) leaves %5C in the query part of the URL alone. So you can very well send back a properly encoded query result. I just sent a request to /?test%5C, and the http header sent by IE is "GET /?test%5C HTTP/1.1". Have you looked at the actual http traffic? How do you send the file system path back? I don't see why this shouldn't work. (As a side point: The Windows API functions accept the forward slash as well as the backslash. In many cases -- especially when programming multi-platform code in C or C++ :) -- using the forward slash has advantages. So even if IE did send you back a Windows file system path with forward slashes instead of backslashes, you could pass this path to any Windows file system API and it would work just the same.) > I further can't use the standard javascript encode routine to alter the > URL - IE just decodes it, so I have to write and debug a new routine > that enables a custom encoding (which further must fit within the > correct URL encoding, meaning it's much stricter) and then use the > reverse on the other end. I don't fully understand this. Can you provide an example for why you can't use the standard JavaScript routine to encode? > What IE has done, essentially, is ELIMINATED the use of the \ for > anything other than a bad path delimiter - they've dropped a character > (for which there is a completely valid encoding - %5C - from the > character set. Correct as for the use in the path part of the URL (and wrong as for its use in the query part). That's a bit sad, but I'm still not sure that there is a real harm done. First, the (encoded) character seems to be transmitted correctly when used in the query part. Second, I don't think there's a need for the character in the path part. Thirdly, I think I've never seen a correct http URL path (path, not query!) that contained a backslash. Have you? Has anyone? (I've asked this before, without response so far, so I guess nobody has ever seen a standard-conform URL path with a backslash in it.) > How is that in any way acceptable? Not that nice, but I still question what's the actual harm. I myself (and I agree with Sergio on this, even though he seems to think he disagrees with me :) think that using backslashes (encoded or not) in the URL path is not a good idea (and I think it is not being used by anybody -- see above), so I think it is acceptable. Not perfect, but acceptable. > It also means new developers get to work around this BUG and waste > time while using a perfectly good character: It's not "perfectly good". It's a character with special meaning and characterized as "excluded", "disallowed" (and "unwise" :) by RFC2396. Therefore using it may, standard-conformly, be characterized as, at least, "unwise"... :) (which is, for me, something different from "perfectly good"). If you want to use an "excluded" character in the URI, you shouldn't be surprised that the results are, at best, unpredictable. > http://silverlight.net/forums/t/16247.aspx I'm not familiar with Silverlight (and don't have any increased desire to become so). But what he did (using a backslash in an URL) is characterized as "unwise" by the relevant RFC, so that there are problems shouldn't surprise. Standard-conform solution: just use forward slashes for directory separators, and things work. > http://www.netmechanic.com/news/vol4/html_no18.htm This here describes that some links work in IE and not in Netscape Navigator (seems a bit dated... anyone remember this browser? :) Using browsers to check for formal correctness of the HTML is not a good idea to start with. There is no commonly used browser out there that /only/ accepts fully standard-conform HTML. They all are pretty lenient with HTML errors (and have to be, otherwise they wouldn't really serve as a tool for users to browse the web). Standard-conform solution: don't use browsers to check your sites, use proper validators. Links accepted by do work on all common browsers. (There are many other validators.) > http://www.cs.tut.fi/~jkorpela/www/revsol.html No contest to this (even though some details may be outdated -- it's from 2000). One comment: He writes "It is a common error to use "\" instead of "/" in URLs". Being a common error, it may not be so absurd for a browser designer to include that into the (verrry long) list of common errors in HTML that the browser will work around. As I wrote above, it's not IE alone; all commonly used browsers accept a long list of HTML errors, the differences are mainly which errors they accept rather than whether or not they accept errors. > There are times when one wants to pass a path to the webserver as a > parameter for a script. For readability '\' is a perfect answer - '/' > isn't allowed except as a path, and no other character conveys to the > user what the parameter is about as well. I rather stay with RFC2396 and think this is "unwise" if you cherish standard-conformity. If you want to use the backslash in a standard-conform URI, you have to use its encoded form. Which leaves the readability about on par with the encoded forward slash. Both can be used in encoded form in the parameter part of the URI, even when using IE, and neither can be used in unencoded form. > Regardless of how the developer wants to use it, it shouldn't be > artificially restricted because it's what windows systems use as a path > delimiter - It's not artificially restricted. Some of what you wrote seems to indicate you're ranting against RFC2396 and it's restriction placed on the use of the unencoded backslash, but that has little to do with IE. > take your MS specific standards out of my web standards, please. I really don't understand these personal attacks; you're not the first one with this. What are you thinking when you write this? That I somehow "own" Microsoft standards? > If the web developer insists on using non standard UNIFORM RESOURCE > LOCATOR techniques, then they should deal with the conversion on their > end, not force clients to adopt their personal style. Right... which also applies to web developers insisting on using unencoded backslashes in URLs :) Fact is that there is a URL that contains a backslash in the HTML. This is not allowed according to the RFC that deals with URLs. I don't know whether there is a clear rule in any RFC what to do with such illegal URLs; more specifically, I don't know whether encoding the backslash or transforming it into a forward slash is required or condoned by an RFC. If not, both are equally "guessing" -- just that one happens to have a result that works and the other a result that doesn't work. Do you know where it is defined how to handle an image src attribute that contains a backslash in the path? OTOH, it doesn't help me to think "Roman should fix the image URLs". This doesn't get them fixed, nor does it help me look at the page. How did you look at the page? > Trying the correct URL first, and only on error attempting another hit > would only increase the load on servers and make it take longer to get > the content - slowing down the site. Huh? Again -- how did you look at Roman's page ? (I suppose you did, as this is what my point is about.) A (standard-conform) way to look at it is to first try the image URLs with encoded backslashes, then (after they come back as 404) edit the image URLs, replace the encoded backslashes with forward slashes and retrieve the images. Which is exactly what you say shouldn't be done. If this creates additional server load, so be it -- it is caused by an error on the server side. If Roman thinks this additional traffic is too much to handle, all he needs to do is to fix the image links. OTOH, you seem to suggest that not trying the (broken) URL with the encoded backslashes first but rather go directly to the (correct) URL is better because it avoids the request with the 404 response. Or not? Gerhard -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist