How to determine the content encoding.....?

Dear All:
By using WebRequest and WebResponse classes, I meet a problem on dealing with the responsed data.Sad How to determine the encoding type of the responsed data before get the entire content?

Thanks a lot.

Ming Tsung Lin, Taiwan
Jun 17, 2005

[320 byte] By [MingTsungLin] at [2008-2-13]
# 1

Hi Ming,

If you want to get the encoding prior to downloading any of the content you can request that the server sends only the headers associated with the resource by setting the Methods propery of HttpWebRequest = WebRequestMethods.Http.Head

You can then inspect the CharacterSet property of the HttpWebResponse. This method should work most of the time, however, sometimes HTML pages define encodings in the <HEAD></HEAD> section of a page, so you may want to inspect that as well (which requires getting more than just the document headers):
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Take a look at the following code snippet and please let us know if you have any further questions:



using System;
using System.Net;
using System.IO;


class test
{
public static void Main(string[] args)
{
WebRequest req = WebRequest.Create("PUT ANY HTTP URL HERE");
// uncomment line below if you only want to download the response headers otherwise all content is retreived;
//req.Method = WebRequestMethods.Http.Head;
HttpWebResponse resp = (HttpWebResponse) req.GetResponse();
Console.WriteLine("Character Set={0}",resp.CharacterSet);
StreamReader reader = new StreamReader(resp.GetResponseStream(), System.Text.Encoding.GetEncoding(resp.CharacterSet));

Console.WriteLine(reader.ReadToEnd());
// you just need to call one of the below two close methods, but calling both will not cause an error
reader.Close();
resp.Close();
Console.ReadLine();

}

}


MikeFlasko at 2007-9-9 > top of Msdn Tech,.NET Development,.NET Framework Networking and Communication...
# 2

Hi,

please see the xml file pasted below

<?xml version="1.0" encoding="Windows-1252"?>

<root>

<div>Utilisez cet assistant pour comparer trois strat&eacute;gies d’investissement commun&eacute;ment appliqu&eacute;es : 1/ assurance vie simple, 2/ achat d’immobilier coupl&eacute; &agrave; un contrat d’assurance vie et 3/ achat d’immobilier &laquo; optimis&eacute; &raquo; en empruntant un montant suffisant pour annuler l’effet de la fiscalit&eacute; sur les revenus fonciers. L’assistant d&eacute;terminera laquelle de ces trois strat&eacute;gies est la plus avantageuse &agrave; la fin de l’horizon d’investissement retenu.</div>

</root>

and the xml listed below

<?xml version="1.0" encoding="utf-8"?>

<root>

<div>Utilisez cet assistant pour comparer trois strat&eacute;gies d’investissement commun&eacute;ment appliqu&eacute;es : 1/ assurance vie simple, 2/ achat d’immobilier coupl&eacute; &agrave; un contrat d’assurance vie et 3/ achat d’immobilier &laquo; optimis&eacute; &raquo; en empruntant un montant suffisant pour annuler l’effet de la fiscalit&eacute; sur les revenus fonciers. L’assistant d&eacute;terminera laquelle de ces trois strat&eacute;gies est la plus avantageuse &agrave; la fin de l’horizon d’investissement retenu.</div>

</root>

Both the xmls have same content the only difference is of encoding. Save this xmls with some editor and check the difference the two files. if possible use "windiff" for it.

Check the difference between the two highlighted words the character will show differences

Can anyone tell me the reason of it?

Swapnil at 2007-9-9 > top of Msdn Tech,.NET Development,.NET Framework Networking and Communication...
# 3

They are different because they are in different codings.

The character is SINGLE QUOTATION MARK' (U+2019) http://www.fileformat.info/info/unicode/char/2019/index.htm In UTF-8 this is encoded as the bytes 0xE2 0x80 0x99 . In Windows-1252 it is encoded as the byte 0x92, see http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx. Windiff just treats both files are being sequence of bytes in a single encoding (Windows-1252 probably) so sees them as different

AlanJ.McFarlane at 2007-9-9 > top of Msdn Tech,.NET Development,.NET Framework Networking and Communication...

.NET Development

Site Classified