Recieving '??' character symbols when loading text files not containing them
I am writing a cross-platform application and I am using Visual Basic 2003 Standard for the Windows XP version of the application. My Windows Mobile version writes information out to a text file and my Windows XP version can load and modify them. The text files do not contain the ?? symboles, but when I load the text files into my Visual Basic application each file loaded has those symbols attached to the begining of the first line I read into my XP application.
Does anyone know why this is happening? At first I thought my Windows Mobile Program was the culprit but considering the fact that those symbols are not in the text file when I view them (I used 3 different text editors to make sure) so I am assuming it's something I need to do in my Visual Basic code. The text files are saved out with Unicode UTF-8 formatting. I think the language I am using will save out to ASCII as well, but I would really not like to re-write a ton of code because I will need to use a different command to write each line of code out to the file as ASCII.
If anyone could offer some advice I would appreciate it!
Thanks,
Scionwest
Those look like the markers that identify the file as being in UTF-8 formatting - most editors that do recognize that format will read the characters but not display them.
On the visual basic app, how are you reading those files? If you use Io.File.ReadAllText the encoding part should be processed correctly, and you should only get back the actual text.
Almost certainly this is the UTF encoding characters and if you are writing the text out then you can determine the encoding being used.
WriteAllText Method
http://msdn2.microsoft.com/en-us/library/27t17sxs.aspx
You'll notice the encoding parameter which you can specify. By default its UTF-8.
There are a lot of encodings to chose from and ASCII is one of them.
The problem isn't with VB.net writing the text out, it is with reading in UTF-8 text files. My mobile program is wrote in a different language(non Microsoft) and writse the text files out as UTF-8, but my VB.net reads in those characters, I am using the following code to open and read in each line of the file that is opened, and store the line into an array.
Public Sub LoadFile(ByVal filename As String, ByVal directory As String)Dim free As Integer = FreeFile()FoundFiles.Clear()
FileOpen(free, directory & "\" & filename, OpenMode.Input, OpenAccess.Read)
Do Until EOF(free)FoundFiles.Add(lineinput(free))LoopFileClose(free)
End SubI then go through each array entry looking at what it contains and assigning the information to the variables.
LoadFile(room,world)
For Each entry As String In file.FoundFiles
''roomnameIf entry.StartsWith("Roomname=") ThenInformation.Name = entry.Substring(9)
''room worldElseIf entry.StartsWith("World=") ThenInformation.World = entry.Substring(6)
End IfNext
If anyone could offer any other advice on how I could read in the UTF-8 files without getting those characters I would appreciate it.
VB legacy file read functions do not support unicode, if I recall correctly. they'll try to read every character as ascii, hence showing the weird characters you're seeing. Try this instead:
Public Sub LoadFile(ByVal filename As String, ByVal directory As String)
FoundFiles.Clear()
Dim
f As New IO.StreamReader(IO.File.OpenRead(directory & " \ " & filename))Dim
s As String = f.ReadLine()While Not s Is NothingFoundFiles.Add(f.ReadLine)
s = f.ReadLine()
End WhileEnd Sub'This should work in vs2003 - if you have 2005, check out io.file.read* - it's a lot easier to use. (I think one of the methods returns all the lines in the file as a string array, so you can just iterate over that.)
Awesome, thanks for the help. I will give these options a shot.
I have 2005 express and love the .net 2.0 features but I don't like that it runs slower than my 2003 so I don't use it.
Thanks for the help!
Scionwest
Sorry it took so long to try out the options you guys provided me with. I have tried both the io.streamreader and the io.text.encoding options and neither of them would strip out that text. That text traveled into any string that i convert it to and I can't remove it using string.substring or string.Remove.
Any other suggestions?
Thanks,
Scionwest
EDIT:
I found out how to save the text files out as ASCII instead of UTF-8 within the Mobile Language I am using and that fixed the problem.
Thanks again for the help!
I fixed the problem by changing my Mobile code to write out as ASCII, it only took about an hour to do so, and it fixed my problem with loading the files in VB.net.
Thanks for the help
Scionwest