Writting file with RFile and reading with TFileText (unicode
| Fri, 2005-04-29 11:43 | |
|
Is it possible to write a file in unicode format?
I have used TFileText and TLex to read a file previously created in unicode format. Now I want to write the file which will be read, but RFile::Write only accepts TDes8 and TFileText::Read only accepts TDes16, so if I have to write it with TDes8 I will not be able to read it as if it were unicode ![]() |
|







Forum posts: 723
tOtE
Gabor Torok
Software architect, Agil Eight (http://www.agileight.com/)
Blog: http://mobile-thoughts.blogspot.com/
Forum posts: 1379
However, to be _really_ unicode, you should write the correct Byte Order Marker (BOM) to the start of the file.
Otherwise, some text editors will think it's ASCII and you'll end up with
H E L L O
didster
Forum posts: 21
I have a TDes16 and I write it to a file with RFileWriteStream. Then I can read it ok with TFileText...
But, as you said, I obtain a strange file as an ASCII editor read it strange...
Forum posts: 1379
That writes a TDesC out in external format.
That is with codes that say "hay, im a stream" and the length of the descriptor and all that.
Just use plain old RFile.
didster
Forum posts: 21
sorry, perhaps I am not taking the idea rightly
Forum posts: 1379
RFileWriteStream isn't for writting to a file so you can read it with another app. It's for writting to a file so you can read it back as a stream, from your symbian application. It writes special magic codes along with the actual data that identify the stream type, and the data types as you write them.
Yes, RFile::Write expects TDesC8. Thats because every file, even Unicode text files, ultimatly boil down to 8-bit data.
You have two choices. Either you convert the unicode text to ascii (8-bit) and write that, or you really write the unicode data to the file. Here is how you do both.
Firstly, the conversion:
TBuf16<20> aMyFileData = ....
TBuf8<20> aMyFileDataAs8Bit;
aMyFileDataAs8Bit.Copy(aMyFileData);
aAlreadyOpenedFile.Write(aMyFileDataAs8Bit);
This copies the 16-bit data into a 8 bit descriptor (and disregards any non-ASCII characters as it does so) then writes the text to the file.
If the 16 bit descriptor contains "HELLO" - the contents of the file is:
HELLO.
Now, actually writting the unicode data to the file:
RFile aAlreadyOpenedFile;
TBuf16<20> aMyFileData = ....
aAlreadyOpenedFile.Write(DES_AS_8_BIT(aMyFileData));
What this does is basically casts (doesn't convert, just casts) the 16 bit descriptor to a 8-bit one so you can pass it to the file API.
Again, lets say the 16 bit descriptor contains "HELLO" - the contents of the file now is:
H E L L O
That is, Unicode data.
Many text editors will read that file as is. Some however, will require you also write the Unicode BOM at the start of the file to indicate the format of the file (endiness, code page etc).
If what I have showed you there doesn't work for you, ill show you how to write the BOM also.
didster
Forum posts: 723
TBuf16<20> aMyFileData = ....
TBuf8<20> aMyFileDataAs8Bit;
aMyFileDataAs8Bit.Copy(aMyFileData);
aAlreadyOpenedFile.Write(aMyFileDataAs8Bit);
Note that your 8-bit buffer must be twice bigger then the Unicode. You know, 20 Unicode characters takes up 40 ASCII character slots. That is, aMyFileDataAs8Bit must be TBuf8<40>.
tOtE
Gabor Torok
Software architect, Agil Eight (http://www.agileight.com/)
Blog: http://mobile-thoughts.blogspot.com/
Forum posts: 1379
TBuf16<20> aMyFileData = ....
TBuf8<20> aMyFileDataAs8Bit;
aMyFileDataAs8Bit.Copy(aMyFileData);
aAlreadyOpenedFile.Write(aMyFileDataAs8Bit);
Note that your 8-bit buffer must be twice bigger then the Unicode. You know, 20 Unicode characters takes up 40 ASCII character slots. That is, aMyFileDataAs8Bit must be TBuf8<40>.
tOtE
Eh? You sure...
20 Unicode characters is 20 ASCII characters. The only difference is Unicode ones take up twice as much space per character - but thats ok, since each "slot" in a TBuf16 is twice that of each slot in a TBuf8.
If TBuf16 and TBuf8 were normal C arrays, the difference is sizeof(TBuf16) is twice sizeof(TBuf8).
So, in windows speak,
WCHAR szUniBuffer[20];
and
CHAR szAsciiBuffer[20];
Can store the same amount of characters, but not the same amount of bytes - i.e. sizeof(szAsciiBuffer) = 20, sizeof(szUniBuffer) = 40.
Descriptors hide all that rubbish from you, and the tempate parameter is the size in characters - i.e. the same in both cases.
[/b]
didster
Forum posts: 21
However, the example with
aMyFileDataAs8Bit.Copy(aMyFileData);
does not work for me. I think that is because Copy() "transforms" the TDes16 in a TDes8 character by character, and it does not copy 8-bit by 8-bit...
Forum posts: 1379
If everything is representable in ASCII, it should work fine.
If you have characters in there which don't exist within the ASCII codepage (i.e. Unicode characters) of course it won't work - not really supprising!! That's why someone invented Unicode.
There are posts on here about "better" ways to convert - that is ones that don't just chuck away non-ascii compatable Unicode characters. But if your string does contain Unicode characters, the only way to really get whats in the buffer into the file is to write it in Unicode.
didster
Forum posts: 723
TBuf16<20> aMyFileData = ....
TBuf8<20> aMyFileDataAs8Bit;
aMyFileDataAs8Bit.Copy(aMyFileData);
aAlreadyOpenedFile.Write(aMyFileDataAs8Bit);
Note that your 8-bit buffer must be twice bigger then the Unicode. You know, 20 Unicode characters takes up 40 ASCII character slots. That is, aMyFileDataAs8Bit must be TBuf8<40>.
tOtE
Eh? You sure...
20 Unicode characters is 20 ASCII characters. The only difference is Unicode ones take up twice as much space per character - but thats ok, since each "slot" in a TBuf16 is twice that of each slot in a TBuf8.
You're right if we're talking about characters, not single and double bytes. The original question did not mention characters, only that TFileText can handle 16-bit data as opposed to RFile::Write, which is capable of handling 8-bit data only.
Right.
[/b]
Sorry, but I have to disagree. Many times descriptors are not used for string manipulation, but for handling binary data. In the current case, I'm not really sure for what we're using the descriptors.
I've just had a look at Symbian's online help about TBuf16 and it says:
"This is a descriptor class which provides a buffer of fixed length for containing, accessing and manipulating TUint16 data."
Not characters, sorry. If those bytes read by TFileText are binary data, then TDes8::Copy will surely fail, because
"Each double-byte value can only be copied into the corresponding single byte when the double-byte value is less than decimal 256. A double-byte value of 256 or greater cannot be copied and the corresponding single byte is set to a value of decimal 1."
tOtE
Gabor Torok
Software architect, Agil Eight (http://www.agileight.com/)
Blog: http://mobile-thoughts.blogspot.com/
Forum posts: 1379
Sure, descriptors (8-bit) are also used to work with raw binary data - one of the good (some people say) things about them.
Characters may be the wrong word - elements is better maybe.
Anyway, I never said .Copy would accuratly copy the string - only that it would never overflow the destination descriptor - i.e. this statment is not true:
"Note that your 8-bit buffer must be twice bigger then the Unicode. You know, 20 Unicode characters takes up 40 ASCII character slots. That is, aMyFileDataAs8Bit must be TBuf8<40>."
I can see what you're saying (I think). I think you're saying that .Copy actually does a memory copy of one descriptor to the other.... It doesn't... If it did, yes that statment is correct. i.e. if you done:
TBuf8<XXX> b;
memcpy(a.Ptr(), b.Ptr());
XXX would need to be 40.
Copy actually works by talking a TUint8 pointer to the unicode string. One byte is copied into the destination discriptor, and the next is then skipped. Since every other byte is skipped - it clearly shows the destination buffer does not need to be twice the size of the source.
As I said in my last post and you said - .Copy will only actually accuratly copy the string (or what ever) if every "character" in the string is < decimal 256 - otherwise, as you say, it will just write a 1.
didster
Forum posts: 11
I've made a Unicode text file having Unicode font representing Hindi char as प्रियंका (saved that text file as ENCODING = UNICODE)
Reading that file with the code -
RFs fs;
User::LeaveIfError( fs.Connect() );
_LIT( KStreamStoreName, "C:\\Unicode1.txt");
CleanupClosePushL(fs);
RFile file;
User::LeaveIfError(file.Open(fs, KStreamStoreName, EFileRead | EFileStreamText));
CleanupClosePushL(file);
TBuf<64> buf16;
TFileText aTxtFile;
aTxtFile.Set(file);
if(aTxtFile.Read(buf16) != KErrEof)
{
const TDesC& aText16 = buf16;
iAppContainer->SetTextL(aText16);
}
CleanupStack::PopAndDestroy(2)
I have to read that text and operate on that on charactor by charactor basis. But it is showing square boxes on emulator s60 2nd ed. fp2
I know unicode is represented by square boxes when it is seen in non unicode environment.
May be that emularot doesnt support unicode. I have read that same text file contaning unicode chars on Nokia 6600, same square boxes is being seen on that also.
Please let me know in detail how and what shud i do?
Its very much urgent ....
Forum posts: 3
Hi All,
I had similar problems, corresponding with use of German letters in 9.1 Symbian.
I have solved problem with using of
Class CnvUtfConverter
Defined in CnvUtfConverter:
ConvertFromUnicodeToUtf7(), ConvertFromUnicodeToUtf7L(), ConvertFromUnicodeToUtf8(), ConvertFromUnicodeToUtf8L(), ConvertToUnicodeFromUtf7(), ConvertToUnicodeFromUtf7L(), ConvertToUnicodeFromUtf8(), ConvertToUnicodeFromUtf8L()
Location: UTF.H //#include
Link against: charconv.lib // edit libs in .mmp
for additional information,
please refer to SDK help: » Symbian OS v9.1 » Symbian OS reference » C++ component reference » Syslibs CHARCONV_ONGOING » CnvUtfConverter
Cheers,
Broqua