StreamReader encoding autodetect, and fallback default problems, plz help

Hi Im having some problem to get StreamReader behave "correct" (?)

Take a look:

fileStream =newStreamReader(fileInfo.OpenRead(),Encoding.GetEncoding(1252), true);

... As I understand this will check first bytes of file for UTF8 mask and if there isn't it will fallback to default encoding 1252 ?

But no metter which file I oppen I allways getfileStream.CurrentEncoding = 1252. I Tryed with many different files and chechek that

EF BB BF in place, but no UTF8 returned from CurrentEncoding.

Thanks for helping !

[1078 byte] By [zubziro] at [2007-12-23]
# 1
You want to use the StreamReader(String, Boolean) constructor that tells it to autodetect the BOM. Here's an example:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

Dim sw As New StreamWriter("c:\temp\utf8.txt", False, System.Text.Encoding.UTF8)

sw.WriteLine("Hello world")

sw.Close()

End Sub

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

Dim sr As New StreamReader("c:\temp\utf8.txt", True)

Debug.Print("Read '{0}' in encoding {1}", sr.ReadLine, sr.CurrentEncoding.EncodingName)

sr.Close()

End Sub

nobugz at 2007-8-30 > top of Msdn Tech,.NET Development,.NET Base Class Library...
# 2

Thank tou wery much for your answer.

You are correct i'm trying to use BOM autodetect, the problem is system needs to process two types of text files

Windows1252 encoding (has no BOM) and UTF8, I need read the file get som information from it and write to new file witch must be in same code standart.

When I use: StreamReader(String, true) I always get UTF8 enconding becouse when I open Windows1252 encoded file and there is no BOM and StreamReader defaults to UTF8 (So I allways get UTF8 no mether UTF8 or Windows1252 file).

The problem begins when I need to create new file and store readed information, couse then I don't know for shure which code standart was used by original file. So new created file is allways UTF8 standard when I read property

fileStream.CurrentEncoding And try to use it as parameter to stream writer.

Becouse of that I tryed to Use:

StreamReader(fileInfo.OpenRead(), Encoding.GetEncoding(1252), true);

As I understand when I create instanse with this constructor it should default to cp1252 when no BOM is found, but strange thing now I allways get cp1252 no mether if file is 1252 or UTF8 !

zubziro at 2007-8-30 > top of Msdn Tech,.NET Development,.NET Base Class Library...
# 3
That's

a problem. In code page #1252, the BOM codes represent legitimate

characters. There is thus no way that StreamReader could

autodetect the encoding with 100% guaranteed accuracy. If you can

live with 99.9% accuracy, you could try to read the BOM yourself by

"pre-opening" the file and reading the first few characters. BOM

encoding are as follows (in hex):

EF-BB-BF: UTF-8

FE-FF: UTF-16, big endian

FF-FE: UTF-16, little endian

00-00-FE-FF: UTF-32, big endian

FF-FE-00-00: UTF-32, little endian

2B-2F-76-xx: UTF-7

and several really obscure ones...

nobugz at 2007-8-30 > top of Msdn Tech,.NET Development,.NET Base Class Library...

.NET Development

Site Classified