[Lazarus] String vs WideString

classic Classic list List threaded Threaded
130 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
Hi,

I have a "old" system that was coded in FPC 2.6.5.
Today I had to change something in the code and now I need to update
to FPC 3.0 and Lazarus 1.9.

This system uses a COM object. I made a class to wrap the configuration.

So, all string arguments in this class is WideString based.
The SetLicence method will receive a WideString but the source is a "string".

Look:

    Lib.SetLicense(
      IniFile.ReadString('TheLib', 'license', '')
    );

As you know, IniFile.ReadString returns a "string" and some internal
conversion is happening and the licence is not valid anymore.

If I put the licence as a string directly, it works:

    Lib.SetLicense(
      'my_licence_here'
    );

How can I change my code to work properly using Ini files, strings and
WideString?

Best regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, 12 Aug 2017 16:46:09 -0300
"Marcos Douglas B. Santos via Lazarus" <[hidden email]>
wrote:

>[...]
>     Lib.SetLicense(
>       IniFile.ReadString('TheLib', 'license', '')
>     );

What encoding has the ini file?

Mattias
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, Aug 12, 2017 at 5:32 PM, Mattias Gaertner via Lazarus
<[hidden email]> wrote:

> On Sat, 12 Aug 2017 16:46:09 -0300
> "Marcos Douglas B. Santos via Lazarus" <[hidden email]>
> wrote:
>
>>[...]
>>     Lib.SetLicense(
>>       IniFile.ReadString('TheLib', 'license', '')
>>     );
>
> What encoding has the ini file?

ANSI. A simple text file on Windows with only ANSI chars.

But I'm so sorry Mattias, it was my fault.
The program was reading the wrong file version (problem in paths...).

It works now, but I have one question:
What is the right way to code to do not see this warning?

Warning: Implicit string type conversion from "AnsiString" to "WideString"

Regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, 12 Aug 2017 17:43:29 -0300
"Marcos Douglas B. Santos via Lazarus" <[hidden email]>
wrote:

>[...]
> > What encoding has the ini file?  
>
> ANSI. A simple text file on Windows with only ANSI chars.

Which one? Do you mean Windows CP-1252?

 
>[...]
> Warning: Implicit string type conversion from "AnsiString" to "WideString"

Explicit type cast:

Lib.SetLicense(
   WideString(IniFile.ReadString('TheLib', 'license', ''))
);

Mattias
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, Aug 12, 2017 at 5:49 PM, Mattias Gaertner via Lazarus
<[hidden email]> wrote:

> On Sat, 12 Aug 2017 17:43:29 -0300
> "Marcos Douglas B. Santos via Lazarus" <[hidden email]>
> wrote:
>
>>[...]
>> > What encoding has the ini file?
>>
>> ANSI. A simple text file on Windows with only ANSI chars.
>
> Which one? Do you mean Windows CP-1252?

Yes...
But would it make any difference?

>>[...]
>> Warning: Implicit string type conversion from "AnsiString" to "WideString"
>
> Explicit type cast:
>
> Lib.SetLicense(
>    WideString(IniFile.ReadString('TheLib', 'license', ''))
> );

Wow... everywhere? :(

Regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, 12 Aug 2017 17:56:58 -0300, "Marcos Douglas B. Santos via
Lazarus" <[hidden email]> wrote:

>> Which one? Do you mean Windows CP-1252?
>
>Yes...
>But would it make any difference?

I recently had a problem with an application that was converted from
old string type to AnsiString and seemingly worked in the new Unicode
environment.
However, we received reports that it had failed in some Asian
countries (Korea, China, Thailand) and upon checking it turned out
that the data inside a string used as buffer was changed because of
locale differences....

After switching out the affected variable declarations from AnsiString
to RawByteString the application seemingly started to work again also
on these locations.

So AnsiString is not safe either....

And after this I have spent some time to totally rework the use of
strings as buffers to instead use TBytes. Lots of work but guaranteed
to not sneak in unexpected conversions.


--
Bo Berglund
Developer in Sweden

--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, Aug 12, 2017 at 7:21 PM, Bo Berglund via Lazarus
<[hidden email]> wrote:

> On Sat, 12 Aug 2017 17:56:58 -0300, "Marcos Douglas B. Santos via
> Lazarus" <[hidden email]> wrote:
>
>>> Which one? Do you mean Windows CP-1252?
>>
>>Yes...
>>But would it make any difference?
>
> I recently had a problem with an application that was converted from
> old string type to AnsiString and seemingly worked in the new Unicode
> environment.
> However, we received reports that it had failed in some Asian
> countries (Korea, China, Thailand) and upon checking it turned out
> that the data inside a string used as buffer was changed because of
> locale differences....
>
> After switching out the affected variable declarations from AnsiString
> to RawByteString the application seemingly started to work again also
> on these locations.
>
> So AnsiString is not safe either....
>
> And after this I have spent some time to totally rework the use of
> strings as buffers to instead use TBytes. Lots of work but guaranteed
> to not sneak in unexpected conversions.

Is not simpler to use RawByteString instead TBytes?

Regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sat, 12 Aug 2017 23:42:43 -0300, "Marcos Douglas B. Santos via
Lazarus" <[hidden email]> wrote:

>> After switching out the affected variable declarations from AnsiString
>> to RawByteString the application seemingly started to work again also
>> on these locations.
>>
>> So AnsiString is not safe either....
>>
>> And after this I have spent some time to totally rework the use of
>> strings as buffers to instead use TBytes. Lots of work but guaranteed
>> to not sneak in unexpected conversions.
>
>Is not simpler to use RawByteString instead TBytes?

Well, initially just changing the declarations would seem to be
simpler. But given how the conversion problem sneaked up behind my
back, I thought it wiser to move all serial comm buffers from various
string types (string->AnsiString->RawByteString) to TBytes since that
is really guaranteed to be "the real thing".

Whenever there is a need for displaying the data or putting them into
a string type variable I have added a few utility functions to do the
conversions using the Move() procedure. Likewise I made a PosBin() for
searching for patterns like Pos() for strings etc.


--
Bo Berglund
Developer in Sweden

--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
<[hidden email]> wrote:
> So AnsiString is not safe either....

That is a little misleading.
Actually using the Windows system codepage is not safe any more.
The current Unicode system in Lazarus maps AnsiString to use UTF-8.
Text with Windows codepage must be converted explicitly.
This is a breaking change compared to the old Unicode suppport in
Lazarus 1.4.x + FPC 2.6.x.
The right solution is to use Unicode everywhere. Windows codepages can
be seen as a historical remain, retained for backwards compatibility.
Now is year 2017, Unicode has been used for decades. Everybody should
use it by now.

Marcos Douglas, please change the encoding in your text file to UTF-8.
Every decent text editor, including the editor in Lazarus, has a
feature to do it.
Once the data is Unicode, it is all smooth sailing.
Data is converted between UTF-8 and UTF-16 losslessly.

One more thing:
Data for WideString/UnicodeString parameters in WinAPI functions are
converted automatically. You can ignore the warning or suppress it by
a type cast as Mattias showed.
However for PWideChar parameters you should create an explicit
temporary variable, usually UnicodeString but WideString for OLE.
Assigning to it from your "String" data converts encoding.
Then cast the new variable as the required pointer type.

Juha
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
<[hidden email]> wrote:
> I recently had a problem with an application that was converted from
> old string type to AnsiString and seemingly worked in the new Unicode
> environment.

What was the old string type?

> However, we received reports that it had failed in some Asian
> countries (Korea, China, Thailand) and upon checking it turned out
> that the data inside a string used as buffer was changed because of
> locale differences....

Unicode was designed to solve exactly the problems caused by locale differences.
Why don't you use it?

> After switching out the affected variable declarations from AnsiString
> to RawByteString the application seemingly started to work again also
> on these locations.
> ...
> And after this I have spent some time to totally rework the use of
> strings as buffers to instead use TBytes. Lots of work but
> guaranteed to not sneak in unexpected conversions.

RawByteString is for text which encoding is not meant to be converted.
It has its special use cases.
TBytes is usually for binary data.
Did I understand right: you use TBytes to hold strings having Windows
codepage encoding?
That sounds like a very dummy thing to do!
Again: Why not Unicode? Then you could throw away your hacks.

Juha
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sun, 13 Aug 2017 14:18:23 +0300, Juha Manninen via Lazarus
<[hidden email]> wrote:

>On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
><[hidden email]> wrote:
>> I recently had a problem with an application that was converted from
>> old string type to AnsiString and seemingly worked in the new Unicode
>> environment.
>
>What was the old string type?

Note: The programs were started back in around 2000 using Delphi 7...

We used "string" as the container for processing serial data to/from
CNC machine tool controllers amongst others. This was triggered really
by the serial components, which mostly transferred char(acters) and
had methods for sending and receiving strings, even though we usually
used char.

>> However, we received reports that it had failed in some Asian
>> countries (Korea, China, Thailand) and upon checking it turned out
>> that the data inside a string used as buffer was changed because of
>> locale differences....
>
>Unicode was designed to solve exactly the problems caused by locale differences.
>Why don't you use it?

Again, these are old existing programs and  we are not doing this
anymore for new programs. However, there is one problem still becauyse
there is an interface point to the hardware, in the form of serial
components, which still handle chars...
And chars are nowadays Unicode chars, i.e. not mapping to bytes sent
by RS232...
And our data are NOT text, they are binary streams of bytes.

>> After switching out the affected variable declarations from AnsiString
>> to RawByteString the application seemingly started to work again also
>> on these locations.
>> ...
>> And after this I have spent some time to totally rework the use of
>> strings as buffers to instead use TBytes. Lots of work but
>> guaranteed to not sneak in unexpected conversions.
>
>RawByteString is for text which encoding is not meant to be converted.
>It has its special use cases.

My first attempt at "fixing" the problem in Asian locales was to use
RawByteString so as to inhibit conversions.
Still with these as comm buffers...
It seemed to work out, but to be safer I have reworked one application
to replace with TBytes everywhere comm data are handled.

>TBytes is usually for binary data.

Exactly, and this is why I made the comment that to be on the safe
side dealing with RS232 the buffers should be TBytes (or some other
similar construct).

>Did I understand right: you use TBytes to hold strings having Windows
>codepage encoding?

No, definitively not. At the time we were not aware of any encoding at
all. To us a string was just a handy container for the serial data
like a dynamic array of byte with some useful functions available for
searching and things like that. I think we were not alone...

>Again: Why not Unicode? Then you could throw away your hacks.

The application itself is Unicode now but we had to run circles around
the RS232 comm part. When converting to Unicode we first set the comm
related strings to be AnsiString...

PS: We never programmed the serial interface directly, we always used
commercial RS232 components and they all dealt with char and string...
DS


--
Bo Berglund
Developer in Sweden

--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Sun, Aug 13, 2017 at 7:41 PM, Bo Berglund via Lazarus
<[hidden email]> wrote:
> And our data are NOT text, they are binary streams of bytes.

I see. Then TBytes indeed is the best choice.
You have misused "String" or "AnsiString" from the beginning for binary data.
There have always been warnings against it.
The new Lazarus Unicode system did not create the problem but made it
more visible.

Marcos Douglas however has a different problem.
Your recommendation to use RawByteString or TBytes does not apply in
his case and thus was a bit misleading.

Juha
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On 13.08.2017 22:41, Juha Manninen via Lazarus wrote:
> You have misused "String" or "AnsiString" from the beginning for binary data.
> There have always been warnings against it.
While this might be true, it's decently silly, IMHO.

The name "String" can easily be interpreted as "String of things" and
does not necessarily mean "String of printable stuff".

The management Pascal always provided for strings (after the "Short
String" was not the only string type) (i.e. Operators, built-in
functions, lazy copy, reference counting) is perfectly applicable to
"Strings of things", and don't force any known encoding at all.

The drama only was introduced by Embarcadero's abysmal / sloppy
implementation of automatic code conversion for strings.

-Michael
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list

On 13/08/17 12:18, Juha Manninen via Lazarus wrote:
> Unicode was designed to solve exactly the problems caused by locale differences.
> Why don't you use it?
I believe you effectively answer your own question in your preceding post:

> Actually using the Windows system codepage is not safe any more.
> The current Unicode system in Lazarus maps AnsiString to use UTF-8.
> Text with Windows codepage must be converted explicitly.
> This is a breaking change compared to the old Unicode suppport in
> Lazarus 1.4.x + FPC 2.6.x.
If you are processing strings as "text" then you probably do not care
how it is encoded and can live with "breaking changes". However, if, for
some reason you are or need to be aware of how the text is encoded - or
are using string types as a useful container for binary data then, types
that sneak up on you with implicit type conversions or which have
semantics that change between compilers or versions, are just another
source of bugs.

PChar used to be  a safe means to access binary data - but not anymore,
especially if you move between FPC and Delphi. (One of my gripes is that
the FCL still makes too much use of PChar instead of PByte with the
resulting Delphi incompatibility). The "string" type also used to be a
safe container for any sort of binary data, but when its definition can
change between compilers and versions, it is now something to be avoided.

As a general rule, I now always use PByte for any sort of string that is
binary, untyped or encoding to be determined. It works across compilers
(FPC and Delphi) with consistent semantics and is safe for such use.

I also really like AnsiString from FCP 3.0 onwards. By making the
encoding a dynamic attribute of the type, it means that I know what is
in the container and can keep control.

I am sorry, but I would only even consider using Unicodestrings as a
type (or the default string type) when I am just processing text for
which the encoding is a don't care, such as a window caption, or for
intensive text analysis. If I am reading/writing text from a file or
database where the encoding is often implicit and may vary from the
Unicode standard then my preference is for AnsiString. I can then read
the text (e.g. from the file) into a (RawByteString) buffer, set the
encoding and then process it safely while often avoiding the overhead
from any transliteration. PByte comes into its own when the file
contains a mixture of binary data and text.

Text files and databases tend to use UTF-8 or are encoded using legacy
Windows Code pages. The Chinese also have GB18030. With a database, the
encoding is usually known and AnsiString is a good way to read/write
data and to convey the encoding, especially as databases usually use a
variable length multi-byte encoding natively and not UTF-16/Unicode.
With files, the text encoding is usually implicit and AnsiString is
ideal for this as it lets you read in the text and then assign the
(implicit) encoding to the string, or ensure the correct encoding when
writing.

And anyway, I do most of my work in Linux, so why would I even want to
bother myself with arrays of widechars when the default environment is UTF8?

We do need some stability and consistency in strings which, as someone
else noted have been confused by Embarcadero. I would like to see that
focused on AnsiString with UnicodeString being only for specialist use
on Windows or when intensive text analysis makes a two byte encoding
more efficient than a variable length multi-byte encoding.

Tony Whyman
MWA

--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Sun, Aug 13, 2017 at 7:51 AM, Juha Manninen via Lazarus
<[hidden email]> wrote:

> On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
> <[hidden email]> wrote:
>> So AnsiString is not safe either....
>
> That is a little misleading.
> Actually using the Windows system codepage is not safe any more.
> The current Unicode system in Lazarus maps AnsiString to use UTF-8.
> Text with Windows codepage must be converted explicitly.
> This is a breaking change compared to the old Unicode suppport in
> Lazarus 1.4.x + FPC 2.6.x.
> The right solution is to use Unicode everywhere. Windows codepages can
> be seen as a historical remain, retained for backwards compatibility.
> Now is year 2017, Unicode has been used for decades. Everybody should
> use it by now.

"The right solution is to use Unicode everywhere."
I agree. But would be best if the compiler uses Unicode everywhere and
us, developers, using just one type called "string"... Even if this
break the old code. Maybe, instead using "string", the new code should
be use just UnicodeString...

Well, I know that many people here already had this "fight" about
Unicode so, let's forget about it what the compiler "should" or not to
do.

> Marcos Douglas, please change the encoding in your text file to UTF-8.
> Every decent text editor, including the editor in Lazarus, has a
> feature to do it.
> Once the data is Unicode, it is all smooth sailing.
> Data is converted between UTF-8 and UTF-16 losslessly.

You're right.

> One more thing:
> Data for WideString/UnicodeString parameters in WinAPI functions are
> converted automatically. You can ignore the warning or suppress it by
> a type cast as Mattias showed.
> However for PWideChar parameters you should create an explicit
> temporary variable, usually UnicodeString but WideString for OLE.
> Assigning to it from your "String" data converts encoding.
> Then cast the new variable as the required pointer type.

This is a ugly trick... but I understood what you mean.

Best regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Mon, Aug 14, 2017 at 6:53 AM, Tony Whyman via Lazarus
<[hidden email]> wrote:

>
> On 13/08/17 12:18, Juha Manninen via Lazarus wrote:
>>
>> Unicode was designed to solve exactly the problems caused by locale
>> differences.
>> Why don't you use it?
>
> I believe you effectively answer your own question in your preceding post:
>
>> Actually using the Windows system codepage is not safe any more.
>> The current Unicode system in Lazarus maps AnsiString to use UTF-8.
>> Text with Windows codepage must be converted explicitly.
>> This is a breaking change compared to the old Unicode suppport in
>> Lazarus 1.4.x + FPC 2.6.x.
>
> If you are processing strings as "text" then you probably do not care how it
> is encoded and can live with "breaking changes". However, if, for some
> reason you are or need to be aware of how the text is encoded - or are using
> string types as a useful container for binary data then, types that sneak up
> on you with implicit type conversions or which have semantics that change
> between compilers or versions, are just another source of bugs.
>
> PChar used to be  a safe means to access binary data - but not anymore,
> especially if you move between FPC and Delphi. (One of my gripes is that the
> FCL still makes too much use of PChar instead of PByte with the resulting
> Delphi incompatibility). The "string" type also used to be a safe container
> for any sort of binary data, but when its definition can change between
> compilers and versions, it is now something to be avoided.
>
> As a general rule, I now always use PByte for any sort of string that is
> binary, untyped or encoding to be determined. It works across compilers (FPC
> and Delphi) with consistent semantics and is safe for such use.
>
> I also really like AnsiString from FCP 3.0 onwards. By making the encoding a
> dynamic attribute of the type, it means that I know what is in the container
> and can keep control.
>
> I am sorry, but I would only even consider using Unicodestrings as a type
> (or the default string type) when I am just processing text for which the
> encoding is a don't care, such as a window caption, or for intensive text
> analysis. If I am reading/writing text from a file or database where the
> encoding is often implicit and may vary from the Unicode standard then my
> preference is for AnsiString. I can then read the text (e.g. from the file)
> into a (RawByteString) buffer, set the encoding and then process it safely
> while often avoiding the overhead from any transliteration. PByte comes into
> its own when the file contains a mixture of binary data and text.
>
> Text files and databases tend to use UTF-8 or are encoded using legacy
> Windows Code pages. The Chinese also have GB18030. With a database, the
> encoding is usually known and AnsiString is a good way to read/write data
> and to convey the encoding, especially as databases usually use a variable
> length multi-byte encoding natively and not UTF-16/Unicode. With files, the
> text encoding is usually implicit and AnsiString is ideal for this as it
> lets you read in the text and then assign the (implicit) encoding to the
> string, or ensure the correct encoding when writing.

Unicode everywhere and you using AnsiString and doing everything...
Now I'm confused.

> And anyway, I do most of my work in Linux, so why would I even want to
> bother myself with arrays of widechars when the default environment is UTF8?

Maybe you do not have problems because you don't use Windows.

> We do need some stability and consistency in strings which, as someone else
> noted have been confused by Embarcadero. I would like to see that focused on
> AnsiString with UnicodeString being only for specialist use on Windows or
> when intensive text analysis makes a two byte encoding more efficient than a
> variable length multi-byte encoding.

FPC and Lazarus claim they are cross-platform — this is a fact — and
because that, IMHO, both should be use in only one way in every
system, don't you think?

Best regards,
Marcos Douglas
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On 14.08.2017 14:50, Marcos Douglas B. Santos via Lazarus wrote:
>
> "The right solution is to use Unicode everywhere."
Embarcadero though that this would not b the "right" solution. Otherwise
they would not have invented the encoding aware strings.

IMHO that was a good idea. They only completely  failed to do a decent
specification and implementation.

-Michael
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list

On 14/08/17 14:11, Marcos Douglas B. Santos via Lazarus wrote:
> FPC and Lazarus claim they are cross-platform — this is a fact — and
> because that, IMHO, both should be use in only one way in every
> system, don't you think?
>
> Best regards,
> Marcos Douglas
Precisely. But why this fixation on UTF-16/Unicode and not UTF8?

Lazarus is already a UTF8 environment.

Much of the LCL assumes UTF8.

UTF8 is arguably a much more efficient way to store and transfer data

UTF-16/Unicode can only store 65,536 characters while the Unicode
standard (that covers UTF8 as well) defines 136,755 characters.

UTF-16/Unicode's main advantage seems to be for rapid indexing of large
strings.

You made need UTF-16/Unicode support for accessing Microsoft APIs but
apart from that, why is it being promoted as the universal standard?
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
On Mon, 14 Aug 2017 14:21:57 +0100
Tony Whyman via Lazarus <[hidden email]> wrote:

>[...]
> Lazarus is already a UTF8 environment.
>
> Much of the LCL assumes UTF8.

True.

 
> UTF8 is arguably a much more efficient way to store and transfer data

It depends.

 
> UTF-16/Unicode can only store 65,536 characters while the Unicode
> standard (that covers UTF8 as well) defines 136,755 characters.

No.
UTF-16 can encode the full 1 million Unicode range. It uses one or
two words per codepoint. UTF-8 uses 1 to 4 bytes.
See here for more details:
https://en.wikipedia.org/wiki/UTF-16

Although you are right, that there are still many applications, that
falsely claim to support UTF-16, but only support the first $D800
codepoints.

 
> UTF-16/Unicode's main advantage seems to be for rapid indexing of large
> strings.

That's only true for UCS-2, which is obsolete.

 
> You made need UTF-16/Unicode support for accessing Microsoft APIs but
> apart from that, why is it being promoted as the universal standard?

Who does that?

Mattias
--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Lazarus] String vs WideString

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On 2017-08-13 11:51, Juha Manninen via Lazarus wrote:
> Now is year 2017, Unicode has been used for decades. Everybody should
> use it by now.

Indeed, I can't agree more. Plus, I normally use UTF-8 for any text
files I create.

Regards,
   Graeme

--
_______________________________________________
Lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
1234 ... 7
Loading...