How to determine the content encoding.....?
By using WebRequest and WebResponse classes, I meet a problem on dealing with the responsed data.
Thanks a lot.
Ming Tsung Lin, Taiwan
Jun 17, 2005
Thanks a lot.
Ming Tsung Lin, Taiwan
Jun 17, 2005
Hi Ming,
If you want to get the encoding prior to downloading any of the content you can request that the server sends only the headers associated with the resource by setting the Methods propery of HttpWebRequest = WebRequestMethods.Http.Head
You can then inspect the CharacterSet property of the HttpWebResponse. This method should work most of the time, however, sometimes HTML pages define encodings in the <HEAD></HEAD> section of a page, so you may want to inspect that as well (which requires getting more than just the document headers):
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">Take a look at the following code snippet and please let us know if you have any further questions:
using System; using System.Net; using System.IO;
Console.WriteLine(reader.ReadToEnd()); } } |
Hi,
please see the xml file pasted below
<?
xml version="1.0" encoding="Windows-1252"?><
root><
div>Utilisez cet assistant pour comparer trois stratégies d’investissement communément appliquées : 1/ assurance vie simple, 2/ achat d’immobilier couplé à un contrat d’assurance vie et 3/ achat d’immobilier « optimisé » en empruntant un montant suffisant pour annuler l’effet de la fiscalité sur les revenus fonciers. L’assistant déterminera laquelle de ces trois stratégies est la plus avantageuse à la fin de l’horizon d’investissement retenu.</div></
root>and the xml listed below
<?
xml version="1.0" encoding="utf-8"?><
root><
div>Utilisez cet assistant pour comparer trois stratégies d’investissement communément appliquées : 1/ assurance vie simple, 2/ achat d’immobilier couplé à un contrat d’assurance vie et 3/ achat d’immobilier « optimisé » en empruntant un montant suffisant pour annuler l’effet de la fiscalité sur les revenus fonciers. L’assistant déterminera laquelle de ces trois stratégies est la plus avantageuse à la fin de l’horizon d’investissement retenu.</div></
root>Both the xmls have same content the only difference is of encoding. Save this xmls with some editor and check the difference the two files. if possible use "windiff" for it.
Check the difference between the two highlighted words the character ’ will show differences
Can anyone tell me the reason of it?
They are different because they are in different codings.
The character is SINGLE QUOTATION MARK' (U+2019) http://www.fileformat.info/info/unicode/char/2019/index.htm In UTF-8 this is encoded as the bytes 0xE2 0x80 0x99 . In Windows-1252 it is encoded as the byte 0x92, see http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx. Windiff just treats both files are being sequence of bytes in a single encoding (Windows-1252 probably) so sees them as different