[Lazarus] unit Masks vs. unit FPMasks

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
Hello

LazUtils has unit Masks with classes TMask and TMaskList.
FPC's packages/fpindexer has unit FPMasks also with classes TMask and TMaskList.
A comment in FPMasks says "Moved here from LCL".
Revision control shows it was added 9 years ago in 2012. Since 2.5 years ago it supports Unicode by using UTF8string for all strings.

LCL has a related MaskEdit. It was first added in 2002 by Mattias with comment
 "added TMaskEdit from Tony"
I don't know who Tony is.
Unit Masks was first added to LCL in 2007 by tombo with comment
 "LCL: implemented TMask, MatchesMask, added Masks docs"
I don't know who Tombo is either.
In 2011 Masks was moved to LazUtils by Felipe.
Masks and MaskEdit have some identical code. IMO MaskEdit should reuse some code from Masks.
MaskEdit is maintained by Bart but he didn't know details of the history.

Masks in LazUtils has a slow implementation.
I planned to optimize it but now I realize we may have overlapping code.
Q: Are Masks (LazUtils) and FPMasks (fpindexer) compatible?
If they are, we should dump the LazUtils Masks and use code from FPC's libs.

Regards,
Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list


On Tue, 23 Feb 2021, Juha Manninen via lazarus wrote:

> Hello
>
> LazUtils has unit Masks with classes TMask and TMaskList.
> FPC's packages/fpindexer has unit FPMasks also with classes TMask
> and TMaskList.
> A comment in FPMasks says "Moved here from LCL".
> Revision control shows it was added 9 years ago in 2012. Since 2.5 years
> ago it supports Unicode by using UTF8string for all strings.
>
> LCL has a related MaskEdit. It was first added in 2002 by Mattias with
> comment
> "added TMaskEdit from Tony"
> I don't know who Tony is.
> Unit Masks was first added to LCL in 2007 by tombo with comment
> "LCL: implemented TMask, MatchesMask, added Masks docs"
> I don't know who Tombo is either.
> In 2011 Masks was moved to LazUtils by Felipe.
> Masks and MaskEdit have some identical code. IMO MaskEdit should reuse
> some code from Masks.
> MaskEdit is maintained by Bart but he didn't know details of the history.
>
> Masks in LazUtils has a slow implementation.
> I planned to optimize it but now I realize we may have overlapping code.
> Q: Are Masks (LazUtils) and FPMasks (fpindexer) compatible?
> If they are, we should dump the LazUtils Masks and use code from FPC's libs.

Since it comes from Lazarus in the first place, they are supposed to be
compatible, yes. If not, then the unit in FPC can be updated to add missing
things.

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
A related thing. I am confused with different mask classes.
Embarcadero docs for TMask
says:
"Note: Do not confuse TMask with the EditMask of a field or masked edit object. While both are used for comparing strings to a symbolic description of valid values, the special mask symbols and matching rules are completely different."

Is the "masked edit object" the same as MaskEdit?
Is the syntax really different?

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 12:00 PM Michael Van Canneyt via lazarus <[hidden email]> wrote:
Since it comes from Lazarus in the first place, they are supposed to be
compatible, yes. If not, then the unit in FPC can be updated to add missing
things.

Oops, now I understand that FPMasks does not really support Unicode.
Type UTF8String only converts encoding automatically when assigning values.
There is no code to identify codepoints.
The LazUtils Masks iterates codepoints, although in a very slow way.
Has anybody tested FPMasks with multibyte-codepoints? I guess it may not work correctly.

BTW, the UTF8String change is not in FPC 3.0.4 which we still must support.
Even if FPC libs get new code that can be used in Lazarus, It takes many years before we can use it due to the slow release cycle.
I hope FPC 3.2.2 comes out soon.


Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 12:56 PM Juha Manninen <[hidden email]> wrote:
Oops, now I understand that FPMasks does not really support Unicode.
Type UTF8String only converts encoding automatically when assigning values.
There is no code to identify codepoints.
The LazUtils Masks iterates codepoints, although in a very slow way.
Has anybody tested FPMasks with multibyte-codepoints? I guess it may not work correctly.

I am not sure if iterating multibyte codepoints is even necessary. I must create a unit test.
Does anybody have a list of use cases or some example code?
The Embarcadero documentation is very plain.
It is difficult to find comprehensive examples. I must confess I have not used TMask or TMaskList myself.

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 10:41 AM Juha Manninen via lazarus
<[hidden email]> wrote:

> LazUtils has unit Masks with classes TMask and TMaskList.
> FPC's packages/fpindexer has unit FPMasks also with classes TMask and TMaskList.

MaskEdit is a LCL control and hence has nothing to do in fpc.
I have in the past copied bits of TMaskEdit to a related fpc unit
(cannot remeber exactly which), so that they work the same (setting
and removing masks IIRC), with the exception that TMaskEdit does this
on UTF-8, while the fpc unit assumes 1-byte ANSI encoding (so it won't
work on UTF-8).
Given the release cycle of fpc, I would strongly advise to have this
code on "our side" (Lazarus), so bugs can be eliminated much faster.

The Masks unit is not related to TMaskEdit.
For the Masks unit: the same UTF-8 problems exist with it's fpc
counterpart IIRC (did not study the code recently).
So, I would like to keep it "here" as well.

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 2:54 PM Bart via lazarus <[hidden email]> wrote:
MaskEdit is a LCL control and hence has nothing to do in fpc.

I am not suggesting to move MaskEdit to FPC libs obviously.


I have in the past copied bits of TMaskEdit to a related fpc unit
(cannot remeber exactly which), so that they work the same (setting
and removing masks IIRC), with the exception that TMaskEdit does this
on UTF-8, while the fpc unit assumes 1-byte ANSI encoding (so it won't
work on UTF-8).
Given the release cycle of fpc, I would strongly advise to have this
code on "our side" (Lazarus), so bugs can be eliminated much faster.

The Masks unit is not related to TMaskEdit.

Does it mean the mask syntax is different? I found this:
Is TEditMask the same as TMaskEdit? Or is TEditMask used for TMask? Can you please explain it so that I understand?
Is it all documented somewhere?


For the Masks unit: the same UTF-8 problems exist with it's fpc
counterpart IIRC (did not study the code recently).
So, I would like to keep it "here" as well.

Yes, I realized it must stay in LazUtils. Iterating codepoints is needed.
FPMasks is good for ASCII only despite having UTF8String type.
Does Michael have a plan for that unit?

Fortunately I found a unit test for Masks unit under LCL tests directory. I moved it under LazUtils in r64653.
I will add tests for multibyte codepoint text. Then optimizing TMask will be safe. Nothing can go wrong... :)

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
El 23/02/2021 a las 10:41, Juha Manninen via lazarus escribió:
 >
 > Masks in LazUtils has a slow implementation.
 > I planned to optimize it but now I realize we may have overlapping code.
 > Q: Are Masks (LazUtils) and FPMasks (fpindexer) compatible?
 > If they are, we should dump the LazUtils Masks and use code from
FPC's libs.
Hello,

fpMasks and Masks are quite the same from my point of view, in fact
fpMasks (if my memory serves me) have various problems with UTF8 handling.

I'll send you (direct mail) my TMaskAnsi, TMaskUTF8 and TMaskUnicode so
you can do whatever you want with them :-)

Based in RTL's encoding you can alias TMask to any of them.


--

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 3:27 PM Juha Manninen via lazarus
<[hidden email]> wrote:

> Does it mean the mask syntax is different? I found this:
>  http://docwiki.embarcadero.com/Libraries/Sydney/en/System.MaskUtils.TEditMask
> Is TEditMask the same as TMaskEdit? Or is TEditMask used for TMask? Can you please explain it so that I understand?
> Is it all documented somewhere?

I have copied in the past some of the TMaskEdit logic and methods to
the MaskUtils unit.
Things like SplitEditMask etc.
So, basically the behave the same (on 1-byte ANSI strings).

The code in MaskEdit unit is the father of the code in MaskUtils.
From a Lazarus point of view, the code in MaskEdit is leading.
Fpc to some extent follows what we do here.

I do NOT want to use the code from MaskUtils in the MaskEdit unit, for
reasons I explained in my previous code.

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 7:38 PM Bart via lazarus <[hidden email]> wrote:
I have copied in the past some of the TMaskEdit logic and methods to
the MaskUtils unit.

Ok, the TEditMask thing was in MaskUtils. I didn't pay attention.
How about TMask? Does it have the same syntax as TMaskEdit or are they different?

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Tue, Feb 23, 2021 at 6:55 PM Juha Manninen via lazarus
<[hidden email]> wrote:

> How about TMask? Does it have the same syntax as TMaskEdit or are they different?

TMask (unit masks) deals with masks with wildcards (*,? and sets of
single byte chars).
It is mainly used for matching filenames (similar to the Path supplied
to FindFirst).

TMaskEdit gives you the possibility to constrain user input to almost
anything you like,
Can be used for e.g. ZIP codes, only numbers etc.
It also handles pasting in the control.
If the text the user enters does not match the specified mask when the
control looses focus (or user presses enter) an exception is raised.
In the Delphi 1,2,3 years I used it to force numeric input (integers
and floats), for which we nowadays have better controls.
There is a data aware counterpart as well.

As you have pointed out before, the GetCodePoint function in the Masks
unit needs overhoaling.
(It is the same as in TMaskEdit, but that only reacts to user input
with strings <=255 chars, so speed is not required there: I'ld love to
see someone typing faster that the code in TMaskEdit calculates what
needs to be done.)

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Wed, Feb 24, 2021 at 12:08 AM Bart via lazarus <[hidden email]> wrote:
TMask (unit masks) deals with masks with wildcards (*,? and sets of
single byte chars).
It is mainly used for matching filenames (similar to the Path supplied
to FindFirst).

TMaskEdit gives you the possibility to constrain user input to almost
anything you like,
Can be used for e.g. ZIP codes, only numbers etc.

TMask also supports ranges and sets. See the unit test.
Eg.  '[a-b]', '[!a-b]', '[abc]', '[0-9]'

Now I found documentation for TCustomMaskEdit.EditMask. It explains the syntax and it looks like the MaskUtils syntax.
It was documented, good! I missed it earlier.
I know filename wildcards and I know regular expressions. Now learning this Mask thingy...


As you have pointed out before, the GetCodePoint function in the Masks
unit needs overhoaling.

It is much worse than that!
Yes, GetCodePoint does its own nested loops and useless copies.
But then it and other UTF8...() functions are called inside a loop, effectively causing many nested loops.
The scalability is maybe O(n^3) or O(n^4).
José Mejuto's Mask unit looks promising. He mentioned in a private mail (which should be public IMO, no deep secrets there) that a pattern
 "*something*to*write*here*"
"which with current mask it takes a lot of time to be processed. If matchable string is of more than 200 chars long it could take seconds to be resolved. My classes are typically O(n)."
Many seconds in a modern computer is a lot.


(It is the same as in TMaskEdit, but that only reacts to user input
with strings <=255 chars, so speed is not required there: I'ld love to
see someone typing faster that the code in TMaskEdit calculates what
needs to be done.)

True, but the code should be cleaned anyway and maybe reuse some other code.
Code has aesthetic values, too.


Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
On Wed, Feb 24, 2021 at 9:11 AM Juha Manninen via lazarus
<[hidden email]> wrote:

>> TMask (unit masks) deals with masks with wildcards (*,? and sets of
>> single byte chars).
...
> TMask also supports ranges and sets. See the unit test.
> Eg.  '[a-b]', '[!a-b]', '[abc]', '[0-9]'

By single byte chars I meant ASCII only.
You cannot have '[ä..ë]' in a TMask (a constraint that is a side
effect of the implementation, but this would be sort of an undefined
range as well).

> Now I found documentation for TCustomMaskEdit.EditMask. It explains the syntax
It is the soucecode (has been there from the beginning) and in the wiki.

> and it looks like the MaskUtils syntax.
Again: it's the other way around: the code of MaskUtils looks like the
code of MaskEdit.

>> As you have pointed out before, the GetCodePoint function in the Masks
>> unit needs overhoaling.
>
>
> It is much worse than that!
> Yes, GetCodePoint does its own nested loops and useless copies.
> But then it and other UTF8...() functions are called inside a loop, effectively causing many nested loops.
> The scalability is maybe O(n^3) or O(n^4).
> José Mejuto's Mask unit looks promising. He mentioned in a private mail (which should be public IMO, no deep secrets there) that a pattern
>  "*something*to*write*here*"
> "which with current mask it takes a lot of time to be processed. If matchable string is of more than 200 chars long it could take seconds to be resolved. My classes are typically O(n)."
> Many seconds in a modern computer is a lot.

I use this Mask unit extensively for my backup program.
Resolving TMaskMatches even for long strings and mask take orders of
magnitude less time then accessing the file (just opening it).

Of course that is NOT a reason not to improve it: O(n^4) is just terrible.
Mind you, the GetCodePoint/SetCodePoint originally was just a quick
(as in: simple, stupis, short code) hack to get the UTF8 functionality
in MaskEdit.
After changing all SomeString[i] to either GetCodePoint(SomeString,i)
or SetCodePoint(SomeString,i, ACodePoint) the MaskEdit unit was UTF8
capable at once.
Without a major rewrite (which increases the cange of breaking compatibility).

Mind you that the first implementation of GetCodePoint was even more
"simple", it simply called Utf8Copy(SomeString, i, 1)...

So, yes re-implement GetCodePoint/SetCodePoint or the internal logic
of the Masks unit by all means, but as far as the MaskEdit unit is
concerned the function signature should not change. There is no need
for a major rewrite of that unit: it deals with user input and even if
you make it 100 times slower as it is now, user will not notice it.

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
I will not touch MaskEdit. Don't worry.

On Wed, Feb 24, 2021 at 11:03 AM Bart via lazarus <[hidden email]> wrote:
Without a major rewrite (which increases the cange of breaking compatibility).

José Mejuto's code is a major rewrite for Masks. It supports Unicode in masks, too.
I try to make it compatible by changing some class and method names, and then run the unit tests.
Comprehensive unit tests are a way to prevent breaking things.
Please everybody provide more test cases. The project is in components/lazutils/test.
There are no tests for MatchesWindowsMask() yet.

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
El 24/02/2021 a las 10:31, Juha Manninen via lazarus escribió:

> José Mejuto's code is a major rewrite for Masks. It supports Unicode in
> masks, too.
> I try to make it compatible by changing some class and method names, and
> then run the unit tests.

Hello,

In my code there is non 100% unicode compatibility when using the
"CaseInsensitive" mode as as it uses lowercase mask and lowercase string
to perform the test which is wrong by definition but I was unable to
find a method to test codepoints case insensitive without pulling in big
unicode tables.

I was thinking in import the NTFS (the filesystem) case comparison
tables which are 128 KB "only".

> Comprehensive unit tests are a way to prevent breaking things.

And also define if a compatibility break is a bug in the new code or in
the old code. In example my mask supports (there is a define to disable)
"[z-a]" converting it to "[a-z]" which is a compatibility break. Also
there is the support (also can be disabled) for the mask "[?]" which is
the counterpart for "*" but with one char position.

> There are no tests for MatchesWindowsMask() yet.

Who defines which are right and which are wrong ? There is no official
DOS/Windows mask strategies, only inherited behaviour since CP/M. Maybe
the behaviour of CMD.EXE ? Which version of CMD.EXE ?

--

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Wed, Feb 24, 2021 at 10:02 AM Bart <[hidden email]> wrote:

> Of course that is NOT a reason not to improve it: O(n^4) is just terrible.

To put this discussion in a little perspective.
Given a string S (UTF8 encoded) with Utf8Length=1000.
GetCodePoint(S,1000) on my laptop takes 0.00439 msecs to perform.
So 10 thousand of these lookups cost appr. 44 ms.
This is kind of a worst case scenario.
You are not very likely to have strings that long in TMask.MatchesMask
(and certainly not in a MaskEdit).

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Wed, Feb 24, 2021 at 11:22 AM José Mejuto via lazarus
<[hidden email]> wrote:

> In my code there is non 100% unicode compatibility when using the
> "CaseInsensitive" mode as as it uses lowercase mask and lowercase string
> to perform the test which is wrong by definition

Currently Masks unit does the same.

> And also define if a compatibility break is a bug in the new code or in
> the old code. In example my mask supports (there is a define to disable)
> "[z-a]" converting it to "[a-z]" which is a compatibility break. Also
> there is the support (also can be disabled) for the mask "[?]" which is
> the counterpart for "*" but with one char position.

Current behaviour of sets and wildcards should not be changed by default.
E.g. TShellTreeView and TShellListView us the Masks unit to populate
the tree/view.
An option to have the behaviour you described would be OK, the
TMaskOption can be extended for that.

Sometimes I wish we would migrate to using UnicodeString by default.
It would make life a bit easier.
(And yes I know you would have to deal with composed characters
(grapheme defined by more than 1 16-bit word)).

> > There are no tests for MatchesWindowsMask() yet.
I tested that extensively on my machine with all scenarios I could think of.
But others most likely can think of scenarios I did not test.
It was based on current behaviour of Windows NT platform (Win7 at the
time to be precise).

> Who defines which are right and which are wrong ?
Well, I did ;-)
(Nobody else bothered at the time, and nobody complained either.)

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
El 24/02/2021 a las 11:47, Bart via lazarus escribió:

> On Wed, Feb 24, 2021 at 10:02 AM Bart <[hidden email]> wrote:
>
>> Of course that is NOT a reason not to improve it: O(n^4) is just terrible.
>
> To put this discussion in a little perspective.
> Given a string S (UTF8 encoded) with Utf8Length=1000.
> GetCodePoint(S,1000) on my laptop takes 0.00439 msecs to perform.
> So 10 thousand of these lookups cost appr. 44 ms.
> This is kind of a worst case scenario.
> You are not very likely to have strings that long in TMask.MatchesMask
> (and certainly not in a MaskEdit).

Hello,

The worst case scenario is not based in GetCodePoint time, is in the
TMask logic when "*" is found, it enters in recursive scan from that
point, if it founds a new "*" it recurses again and if it finally fails
in comparison it rolls back to the first "*". So specially crafted masks
makes it recurse a lot.

String:='This is a test string';
Mask:='*T*h*s*n*x';

Of course this is not a day by day use ;-)

Note: Just to put in context, my "explore" in the TMask world started
when writing my NTFS filesystem reader, when all file names are read
(400,000) I can search for them using masks. When compiled in fpc
(Lazarus) the "*.txx" search takes 1-2 seconds (not measured) and when
compiled with Delphi it takes +/- 0.3 seconds, so I stated to write my
own TMask.


--

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Wed, Feb 24, 2021 at 1:00 PM Bart via lazarus <[hidden email]> wrote:
> > There are no tests for MatchesWindowsMask() yet.
I tested that extensively on my machine with all scenarios I could think of.

Please add your tests to the project I mentioned.

Juha


--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] unit Masks vs. unit FPMasks

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
El 24/02/2021 a las 11:58, Bart via lazarus escribió:

Hello,

>> In my code there is non 100% unicode compatibility when using the
>> "CaseInsensitive" mode as as it uses lowercase mask and lowercase string
>> to perform the test which is wrong by definition
>
> Currently Masks unit does the same.

Yes, but in example in my case I can not success test mask "ä*" vs
string "Ä*" because "Ä" is not lowercased to "ä" (Windows 7).

> Sometimes I wish we would migrate to using UnicodeString by default.
> It would make life a bit easier.
> (And yes I know you would have to deal with composed characters
> (grapheme defined by more than 1 16-bit word)).

That's a can of worms! UTF8 forces you to write "correct code" (at least
try it) for any character >127, with UnicodeString you get the false
apparence that everything magically works until everything cracks when a
string with surrogate pairs come in play :-) and ALL you text handling
must be rewritten, and most of them completly rewritten.

>>> There are no tests for MatchesWindowsMask() yet.
> I tested that extensively on my machine with all scenarios I could think of.
> But others most likely can think of scenarios I did not test.
> It was based on current behaviour of Windows NT platform (Win7 at the
> time to be precise).
>> Who defines which are right and which are wrong ?
> Well, I did ;-)
> (Nobody else bothered at the time, and nobody complained either.)

And mostly will not as almost everything matches the expected behaviour
for an user, like typical "*.txt" but there are some non supported cases
like:

Filename:='test.txt'
Mask:='test??.txt?'
Match must be true

This is the doc from my code about Windows matching, Quirks can be
enabled or disabled for compatibility:

----------------8<----------------------------8<---------------------
Windows mask works in a different mode than regular mask, it has too
many quirks and corner cases inherited from CP/M, then adapted to DOS
(8.3) filenames and adapted again for long file names.

         Anyth?ng.abc    = "?" matches exactly 1 char
         Anyth*ng.abc    = "*" matches 0 or more of chars

         ------- Quirks -------

         --eWindowsQuirk_AnyExtension
           Anything*.*     = ".*" is removed.

         --eWindowsQuirk_FilenameEnd
           Anything??.abc  = "?" matches 1 or 0 chars (except '.')
                          (Not the same as "Anything*.abc", but the same
                           as regex "Anything.{0,2}\.abc")
                           Internally converted to "Anything[??].abc"

         --eWindowsQuirk_Extension3More
           Anything.abc    = Matches "Anything.abc" but also
                            "Anything.abc*" (3 char extension)
           Anything.ab     = Matches "Anything.ab" and never
                            "anything.abcd"

         --eWindowsQuirk_EmptyIsAny
           ""              = Empty string matches anything "*"

         --eWindowsQuirk_AllByExtension (Not in use anymore)
           .abc            = Runs as "*.abc"

         --eWindowsQuirk_NoExtension
           Anything*.      = Matches "Anything*" without extension

----------------8<----------------------------8<---------------------

--

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
12