[Lazarus] Tests results of several pascal based JSON parsers

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[Lazarus] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
I've posted a new page that tests the speed and correctness of several pascal based JSON parsers. 


In full disclosure I am the author of the new open source JsonTools library, and even though my parser seems to a big improvement over the other alternatives, my tests were not biased.

If anyone would like help in replication the tests, let me know and I'll see what I can do.

Also, to be thorough, you should read through both the article I posted at the top this message, and my original page which has been updated with more information. Both pages took some time to write, and I promise if you read through them some of your questions will be answered without having to ask others for help or insight.

Thanks,
Anthony

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
This is very good news, that we have JsonTools parser now. I may think
of using it in CudaText - ie replacing fpJSON to JsonTools.

About lib. 1) Pls add an option to handle //.... comments in json. yes,
json don't allow this but CudaText and SublimeText and many programs
have json configs with comments. They use libs which allow comments.
fpJSON allows comments by option.

2) Pls add an option which allows "," after final dict node: { "a":1,
"b":2, }

so my app can read json file with final (bad) comma after "b":2.

3) Lib must support "true", "false", "none" or "nil"(?), values like
list [], with empty list, values like dict {}, with empty dict. And list
inside dict etc.

Alexey Torgashin



--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
Alexey,

Currently JsonTools anything that is valid JSON as described on this page:


The only valid constants are: null, true, false
Arrays can contain other arrays and object to any reasonable level
  [[[]]] //  is a valid array
  [{}{}[{}{}]] // is a valid array
Objects can contain other objects and arrays to any reasonable level
  {"a":{"a":{"a":[[]]}}} // is a valid object

I will look into you request for comment support, even though comments are not allowed in the official specification.

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter wrote:

> I've posted a new page that tests the speed and correctness of several
> pascal based JSON parsers.
>
> https://www.getlazarus.org/json/tests/
>
> In full disclosure I am the author of the new open source JsonTools
> library, and even though my parser seems to a big improvement over the
> other alternatives, my tests were not biased.

Test are always biased in the sense that they depend on the proficiency of
the coder. you must ask someone with sufficient knowledge to write the code.

The shootout benchmarks for example are dismally coded for FPC with as a
result that they perform badly.

So it could well be that the fpJSON results for example can be improved a
lot by changing the method used to a more efficient one.

Also, not every library is designed with the same goals.
fpjson could probably be made smaller if the goal was more focused.
it has a factory pattern, added on behalf of users who asked for this.
jsontools does not have this.
You can format the output. JSONTools does not have this.
fpJSON was designed to be pluggable wrt. parsing. JSONTools is not.

In short fpjson can do a lot more than JSONtools.
Some things come at a price, others not.

In that sense, tests like this compare apples with pears.

> If anyone would like help in replication the tests, let me know and I'll
> see what I can do.
>
> Also, to be thorough, you should read through both the article I posted at
> the top this message, and my original page <https://www.getlazarus.org/json>
> which has been updated with more information. Both pages took some time to
> write, and I promise if you read through them some of your questions will
> be answered without having to ask others for help or insight.

Can you please send me the testcode you used for your speed & correctness tests ?

I'm a bit surprised to see fpJSON fail in unicode correctness. It's been
tested extensively, maybe your code contains errors (or maybe fpjson does,
I'll need to check).

Also please explain what 'Handling duplicate key names correctly' means to you.
Saying that a library fails a test without specifying what the test is, is
strange.

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> Alexey,
>
> Currently JsonTools anything that is valid JSON as described on this page:
>
> https://www.json.org/
>
> The only valid constants are: null, true, false
> Arrays can contain other arrays and object to any reasonable level
>  [[[]]] //  is a valid array
>  [{}{}[{}{}]] // is a valid array
> Objects can contain other objects and arrays to any reasonable level
>  {"a":{"a":{"a":[[]]}}} // is a valid object
>
> I will look into you request for comment support, even though comments are
> not allowed in the official specification.

And so your tool will also become more bulky and slower :)

Michael.

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
Yes, JsonTools needs a method SaveToFile if it has not. It must save
formatted json with indent, set by a property or global variable
(default is 2 usually).

SaveToFile must handle Unicode strings, ie output them with \uNNNN or
like it. Use Unicode escape for all codes >=128, because utf8 codes all
after 127.

Alexey

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
With regards to duplicate key names, some libraries allow for the same key to be parsed resulting in multiple child nodes of the same name. Others throw an exception when parsing an object with a duplicate key name.

The correct way to handle duplicate keys is to overwrite the existing key when a duplicate is encountered.

My library does have Save/Load To/From /File/Stream. Please see the articles I posted. They are listed there.

Unicode support exists in both allowing unicode to be in the JSON, and escape inotpced e.g. \u00ae becomes ® a two byte utf8 encoding char when parsed by my parser. It also saves/load that utf8 encoding to steams or files.

Regarding indentation and formatted, I support two options as noted in my original article. The AsJson property create spaceless compact JSON suitable for network traffic, while the Vale property allows for friendly human readable formatting and indentation. Currently the friendly indentation is fixed and I doesn't see a reason use case reason to allow for custom indentation beyond what I already provide.

And finally regarding the unicode failure of FPJson, I am parsing a small bit of JSON ...

{ "name": "Joe\u00aeSchmoe"}

And compare it to the value of 'name' to the string constant  'Joe®Schmoe' in pascal code. If fails the first iteration, but if I run it a second time it works, so there is something amiss.

And finally with regards to plugins and extensiblity, that's great to have, but I am just trying to write something that handles the JSON spec and only that. If I want something to stream a form layout or settings to JSON it would be a separate wholly independent library that depends ona parser, but isn't part of the parser library. 

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
Michael,

I have a hurricane headed my way, but when I'm done evacuating I'll send you a copy of my test. If you want to make improvements to the test program to be sure the manner in which I am using the FPJson functions and classes is correct and send me a revised test program, then that would be awesome.

Thanks

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> With regards to duplicate key names, some libraries allow for the same key
> to be parsed resulting in multiple child nodes of the same name. Others
> throw an exception when parsing an object with a duplicate key name.
>
> The correct way to handle duplicate keys is to overwrite the existing key
> when a duplicate is encountered.

There you go. I think the "correct way" is to raise an error;
not to override and thus inadvertently lose previous data.

I won't argue on who is correct, since it is a matter of opinion.

But this is a prime example of 'biased' tests.
You're testing an opinion, not actual functionality.

so IMHO it would be only fair to remove it from your comparison.

For speed & correctness, I repeat my request:
please provide your test code.

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> Michael,
>
> I have a hurricane headed my way, but when I'm done evacuating I'll send
> you a copy of my test. If you want to make improvements to the test program
> to be sure the manner in which I am using the FPJson functions and classes
> is correct and send me a revised test program, then that would be awesome.

Just make sure you don't get caught in the hurricane.
I wish you a quickk hiding place ! :-)

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
Michael,

Can you tell me why the second half (N.Items[1].AsUnicodeString) this test fails? This is the part that decodes "bank teller \u00Ae ".

function VerifyUnicodeChars: Boolean;
const
  UnicodeChars = '{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }';
var
  N: TJSONData;
begin
  N := GetJSON(UnicodeChars);
  Result := (N.Items[0].AsUnicodeString = 'Joe®Schmoe') and (N.Items[1].AsUnicodeString = 'bank teller ® ');
  N.Free;
end;

begin
  WriteLn('Handles unicode chars correctly: ', VerifyUnicodeChars);
end.

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> Michael,
>
> Can you tell me why the second half (N.Items[1].AsUnicodeString) this test
> fails? This is the part that decodes "bank teller \u00Ae ".


The test fails on "Joe®Schmoe", not on "bank teller \u00Ae ".

If you WriteLn the UnicodeChars string to console, it shows:

{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }

(notice the  in front of ®)

This is because your string is encoded wrong in the binary.

Adding

{$codepage UTF8}

Before the uses clause makes your test print 'TRUE'.

Because I work on Linux, I also had to add the "cwstring" unit to the uses clause.

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
On my system with FPJson the test is failing it failing on "bank teller \u00Ae ", but on when using approximately the same code with JSONTools it passes on both "name" and  "occupation" always. What do you think is going on?

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> On my system with FPJson the test is failing it failing on "bank teller
> \u00Ae ", but on when using approximately the same code with JSONTools it
> passes on both "name" and  "occupation" always. What do you think is going
> on?

No idea. I tested with both 3.0.4 and trunk. Both give the same result.

Here are the sources I used:

home:~/fpc/packages/fcl-json/tests> cat twalter.pas
program twalter;

{$codepage UTF8}

uses cwstring, fpjson, jsonparser;

function VerifyUnicodeChars: Boolean;

const
   UnicodeChars =  '{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }';

var
   N: TJSONData;

begin
   N := GetJSON(UnicodeChars);
   Writeln('>',UnicodeChars,'<');
   Result := (N.Items[0].AsUnicodeString = 'Joe®Schmoe') and
             (N.Items[1].AsUnicodeString = 'bank teller ® ');
   N.Free;
end;

begin
   WriteLn('Handles unicode chars correctly: ', VerifyUnicodeChars);
end.

I test on linux, but could try windows.

Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
It is maybe bug which was fixed in FPC trunk, there was some Unicode issue in 3.0.4.

>
> On my system with FPJson the test is failing it failing on "bank teller \u00Ae ", but on when using approximately the same code with JSONTools it passes on both "name" and  "occupation" always. What do you think is going on?
> --
>

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
Alan, oh that's a good idea. I will do that as well as add a few more parser libraries as requested by a few people in other non mailing lists threads. I will also try to find out what's going on the unicode strings as it might be a problem with the compiler.

Michael,

I am on Linux as well, but I will test under Windows and Mac too.

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list


On Fri, 30 Aug 2019, Anthony Walter via lazarus wrote:

> Alan, oh that's a good idea. I will do that as well as add a few more
> parser libraries as requested by a few people in other non mailing lists
> threads. I will also try to find out what's going on the unicode strings as
> it might be a problem with the compiler.
>
> Michael,
>
> I am on Linux as well, but I will test under Windows and Mac too.

To show that my argument of 'coding proficiency' influence on algorithm
speed is not complete nonsense, I quickly cooked up the following test:

{$mode objfpc}
{$h+}

uses DateUtils, Sysutils,Classes, fpjson, jsonparser;

var
   FN : String;
   i,aCount : Integer;
   S : TJSONStringType;
   T : TMemoryStream;
   N : TDateTime;

procedure ReadJSON;

begin
   T:=TMemoryStream.Create;
   T.LoadFromFile(FN);
   SetLength(S,T.Size);
   T.ReadBuffer(S[1],T.Size);
   T.Position:=0;
end;

begin
   if ParamCount<>2 then Halt(1);
   aCount:=StrToInt(Paramstr(1));
   FN:=ParamStr(2);
   ReadJSON;
   try
     Writeln('Reading string ',aCount,' times');
     N:=Now;
     for I:=1 to aCount do
       GetJSON(S).Free;
     Writeln('Msecs : ',MillisecondsBetween(Now,N));
     Writeln('Reading stream ',aCount,' times');
     N:=Now;
     for I:=1 to aCount do
       begin
       GetJSON(T).Free;
       T.Position:=0;
       end;
     Writeln('Msecs : ',MillisecondsBetween(Now,N));
   finally
     T.Free;
   end;
end.

When you run this:

home:~/fpc/packages/fcl-json/tests> ./testjsonspeedread 100 ./testdata.json
Reading string 100 times
Msecs : 2972
Reading stream 100 times
Msecs : 1203

(file of 260Kb, 500 lines)

Not using a string (as you do) but a stream already gives a factor of 2.x faster.
The speed gain is there both for trunk as for 3.0.4.

So, I'm fairly confident that I can probably speed up your test results as
well, when you send me the sources.

That said, this is not to say that there is no room for speed improvements in fpjson.

I've already identified 2 places where speed gains can be made in the fpJSON
codebase, I'll improve the codebase this weekend and post results.


Michael.
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
Okay, so I turned on my Windows VM with a different version of FPC and ran VerifyUnicodeChars with both FPJson and JsonTools. The resutls are the same. JsonTools sees the unicode correctly, and something is wrong when using FPJson. I don't know what the problem is, but other people are noticing similar issues, so it would seem there is definitely a problem resulting in a failure for FPJson.

Michael, you have all the information needed to find out what's wrong and I'd be curious to learn why it's not working.

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
I am not sure how under any situation parsing a JSON from a stream source would be any faster than parsing a string. Also with regards to timing I am not sure how accurate Now is. For this purpose I've written:

{ Return a time based on system performance counters }
function TimeQuery: Double;

Implemented as:

const
{$ifdef linux}
  libc = 'libc.so';
{$endif}
{$ifdef darwin}
  libc = 'libSystem.dylib';
{$endif}
function gettimeofday(out TimeVal: TTimeVal; TimeZone: PTimeVal): Integer; apicall; external libc;

var
  TimeSec: SysInt;

function TimeQuery: Double;
var
  TimeVal: TTimeVal;
begin
  gettimeofday(TimeVal, nil);
  if TimeSec = 0 then
    TimeSec := TimeVal.Sec;
  TimeVal.Sec := TimeVal.Sec - TimeSec;
  Result := TimeVal.Sec + TimeVal.MSec / 1000000;
end;
{$endif}

{$ifdef windows}
const
  kernel32  = 'kernel32.dll';

function QueryPerformanceCounter(var Counter: Int64): LongBool; apicall; external kernel32;
function QueryPerformanceFrequency(var Frequency: Int64): LongBool; apicall; external kernel32;

function TimeQuery: Double;
var
  C, F: Int64;
begin
  F := 0;
  C := 0;
  if QueryPerformanceFrequency(F) and QueryPerformanceCounter(C) then
    Result := C / F
  else
    Result := 0;
end;
{$endif}

--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers

Free Pascal - Lazarus mailing list
In reply to this post by Free Pascal - Lazarus mailing list
On Fri, Aug 30, 2019 at 4:04 PM Michael Van Canneyt via lazarus
<[hidden email]> wrote:

> No idea. I tested with both 3.0.4 and trunk. Both give the same result.
>
> Here are the sources I used:
...
> I test on linux, but could try windows.

On Windows it prints FALSE, both with 3.0.4 and trunk r42348

--
Bart
--
_______________________________________________
lazarus mailing list
[hidden email]
https://lists.lazarus-ide.org/listinfo/lazarus
123