[Lazarus] How to write an eficient lexical scanner/parser?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[Lazarus] How to write an eficient lexical scanner/parser?

silvioprog
Hello,

I'm planning to write three parsers, and googling, I found some entries talking about lexical parsers.

After that, I did a 'find in files' in FPC sources, and I found many parsers (eg: jsonparser (jsonscanner), JSParser (JSScanner), fpsqlparser (fpsqlscanner), PParser (PScanner), fpexprpars etc.) that use lexical scanner.

Below, three possible string that I need to parse:

1)

${someVariable} -- 4 tokens

or

${a + b} -- 8 tokens - 1 expression

or

${fn:lenght('abc') * 3} -- 11 tokens - 1 function - 1 expression

2)

<c:forEach var="contact" items="${contactDao.list}">
    ${contato.name}, ${contato.email}
</c:forEach>

3)

contacts[0].name=abc
contacts[0].email=abc@def
contacts[1].name=def
contacts[2].email=def@ghi

So my parser will allow to register dynamic variables, functions (to be called via script) and plugins (to extend the parser).

However, I have a question: is there any article about 'how to write lexical parsers' using Object Pascal?

I need any material about this subject, and I'm very grateful for any tip.

Thanks in advance!

--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Kostas Michalopoulos
This is a classic series of articles that show how to write a very simple compiler in Turbo Pascal. The fundamentals when it comes to scanning are the same:

http://compilers.iecc.com/crenshaw/

I've also written a BASIC implementation for Free Pascal and Lazarus. The scanner should be straightforward to understand:

http://runtimelegend.com/rep/rbasic/artifact/2350e85c36a77e4d2d76adde23fd7d45731b5b22

But you may also find the formatter code simpler. Although it is a bit too simple:
http://runtimelegend.com/rep/rbasic/artifact/f3e9fb2d1ed8e60d36b50754c2d9a7d7c109fc40

For general theory you can look on recursive descent parsers (they're the simplest to implement and AFAIK most compilers use them, either to build the token list or directly).


On Fri, Mar 6, 2015 at 6:55 PM, silvioprog <[hidden email]> wrote:
Hello,

I'm planning to write three parsers, and googling, I found some entries talking about lexical parsers.

After that, I did a 'find in files' in FPC sources, and I found many parsers (eg: jsonparser (jsonscanner), JSParser (JSScanner), fpsqlparser (fpsqlscanner), PParser (PScanner), fpexprpars etc.) that use lexical scanner.

Below, three possible string that I need to parse:

1)

${someVariable} -- 4 tokens

or

${a + b} -- 8 tokens - 1 expression

or

${fn:lenght('abc') * 3} -- 11 tokens - 1 function - 1 expression

2)

<c:forEach var="contact" items="${contactDao.list}">
    ${contato.name}, ${contato.email}
</c:forEach>

3)

contacts[0].name=abc
contacts[0].email=abc@def
contacts[1].name=def
contacts[2].email=def@ghi

So my parser will allow to register dynamic variables, functions (to be called via script) and plugins (to extend the parser).

However, I have a question: is there any article about 'how to write lexical parsers' using Object Pascal?

I need any material about this subject, and I'm very grateful for any tip.

Thanks in advance!

--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus



--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

aradeonas
Hi Silvio,
 
About this parsing subject maybe looking at BeniBela Xidel and InternetTools help you or talking to Benito Aurthur of them.
He is a very good developer and kind person like you.
 
Ara
 
 
On Fri, Mar 6, 2015, at 10:15 AM, Kostas Michalopoulos wrote:
This is a classic series of articles that show how to write a very simple compiler in Turbo Pascal. The fundamentals when it comes to scanning are the same:
 
 
I've also written a BASIC implementation for Free Pascal and Lazarus. The scanner should be straightforward to understand:
 
 
The compiler code shows how it can be used:
 
But you may also find the formatter code simpler. Although it is a bit too simple:
 
For general theory you can look on recursive descent parsers (they're the simplest to implement and AFAIK most compilers use them, either to build the token list or directly).
 
 
On Fri, Mar 6, 2015 at 6:55 PM, silvioprog <[hidden email]> wrote:
Hello,
 
I'm planning to write three parsers, and googling, I found some entries talking about lexical parsers.
 
After that, I did a 'find in files' in FPC sources, and I found many parsers (eg: jsonparser (jsonscanner), JSParser (JSScanner), fpsqlparser (fpsqlscanner), PParser (PScanner), fpexprpars etc.) that use lexical scanner.
 
Below, three possible string that I need to parse:
 
1)
 
${someVariable} -- 4 tokens
 
or
 
${a + b} -- 8 tokens - 1 expression
 
or
 
${fn:lenght('abc') * 3} -- 11 tokens - 1 function - 1 expression
 
2)
 
<c:forEach var="contact" items="${contactDao.list}">
    ${contato.name}, ${contato.email}
</c:forEach>
 
3)
 
contacts[0].name=abc
contacts[0].email=abc@def
contacts[1].name=def
contacts[2].email=def@ghi
 
So my parser will allow to register dynamic variables, functions (to be called via script) and plugins (to extend the parser).
 
However, I have a question: is there any article about 'how to write lexical parsers' using Object Pascal?
 
I need any material about this subject, and I'm very grateful for any tip.
 
Thanks in advance!
 
 
--
 
Silvio Clécio
My public projects - github.com/silvioprog
 
 
--
_______________________________________________
Lazarus mailing list
 
 
--
_______________________________________________
Lazarus mailing list
 
-- 
http://www.fastmail.com - The professional email service

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Anthony Walter
In reply to this post by silvioprog
Here is something I originally wrote in 2001. I had a product briefly for converting pascal code into documentation and it quite fast. Do with it as you will.


--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

silvioprog
In reply to this post by Kostas Michalopoulos
On Fri, Mar 6, 2015 at 3:15 PM, Kostas Michalopoulos <[hidden email]> wrote:
This is a classic series of articles that show how to write a very simple compiler in Turbo Pascal. The fundamentals when it comes to scanning are the same:

http://compilers.iecc.com/crenshaw/

I've also written a BASIC implementation for Free Pascal and Lazarus. The scanner should be straightforward to understand:

http://runtimelegend.com/rep/rbasic/artifact/2350e85c36a77e4d2d76adde23fd7d45731b5b22

But you may also find the formatter code simpler. Although it is a bit too simple:
http://runtimelegend.com/rep/rbasic/artifact/f3e9fb2d1ed8e60d36b50754c2d9a7d7c109fc40

For general theory you can look on recursive descent parsers (they're the simplest to implement and AFAIK most compilers use them, either to build the token list or directly).

I downloaded the PDF with all articles.

Thank you very much!

--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

silvioprog
In reply to this post by aradeonas
On Fri, Mar 6, 2015 at 3:29 PM, aradeonas <[hidden email]> wrote:
Hi Silvio,
 
About this parsing subject maybe looking at BeniBela Xidel and InternetTools help you or talking to Benito Aurthur of them.
He is a very good developer and kind person like you.
 
Ara

I think that it uses regex as parser. I'll check when I have a free time. Thank you!
 
--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

silvioprog
In reply to this post by Anthony Walter
On Fri, Mar 6, 2015 at 4:21 PM, Anthony Walter <[hidden email]> wrote:
Here is something I originally wrote in 2001. I had a product briefly for converting pascal code into documentation and it quite fast. Do with it as you will.


Nice parser. I had some ideas after seeing your code. Thank you!

--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

leledumbo
Administrator
In reply to this post by silvioprog
> However, I have a question: is there any article about 'how to write lexical parsers' using Object Pascal?

First, you need to differentiate them correctly. Lexical scanner (or simply lexer) and parser, not lexical parser :)

I've written some simple languages, generating GraphViz, LLVM or C code, using both self learning reference from the almighty Prof. Niklaus Wirth's articles, Jack Crenshaw's, the legendary dragon book and many others plus my compiler course in college.

I really really suggest Prof. Niklaus Wirth's one (http://www.inf.ethz.ch/personal/wirth/, click the Compiler Construction link) as it's concise, simple, doesn't use stupid generator (i.e.: easy to implement by hand) while covering most important parts of the subject. While he didn't use or stress or even discourage the usage of abstract syntax tree, he did explain it that's good enough to understand.

Here's what I've written in the past and is publicly available:
- https://bitbucket.org/leledumbo/erd-maker (generates GraphViz code for ERD visualization)
- https://bitbucket.org/leledumbo/linguc (generates LLVM code that can be compiled as a library for a mathematically provable language focusing in database driven application)

I don't use the direct code generation approach anymore as I found the abstract syntax tree to be usable for more than just code generation.
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Mehmet Erol Sanliturk


On Wed, Mar 11, 2015 at 2:38 AM, leledumbo <[hidden email]> wrote:
> However, I have a question: is there any article about 'how to write
lexical parsers' using Object Pascal?

First, you need to differentiate them correctly. Lexical scanner (or simply
lexer) and parser, not lexical parser :)

I've written some simple languages, generating GraphViz, LLVM or C code,
using both self learning reference from the almighty Prof. Niklaus Wirth's
articles, Jack Crenshaw's, the legendary dragon book and many others plus my
compiler course in college.

I really really suggest Prof. Niklaus Wirth's one
(http://www.inf.ethz.ch/personal/wirth/, click the Compiler Construction
link) as it's concise, simple, doesn't use stupid generator (i.e.: easy to
implement by hand) while covering most important parts of the subject. While
he didn't use or stress or even discourage the usage of abstract syntax
tree, he did explain it that's good enough to understand.

Here's what I've written in the past and is publicly available:
- https://bitbucket.org/leledumbo/erd-maker (generates GraphViz code for ERD
visualization)
- https://bitbucket.org/leledumbo/linguc (generates LLVM code that can be
compiled as a library for a mathematically provable language focusing in
database driven application)

I don't use the direct code generation approach anymore as I found the
abstract syntax tree to be usable for more than just code generation.



--
View this message in context: http://free-pascal-lazarus.989080.n3.nabble.com/Lazarus-How-to-write-an-eficient-lexical-scanner-parser-tp4041002p4041083.html
Sent from the Free Pascal - Lazarus mailing list archive at Nabble.com.

--
_______________________________________________



When there is NO license information in your repositories , this means that

"NO one can use them ."

with respect to copy right laws or conventions .


Thank you very much .

Mehmet Erol Sanliturk



--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

[Lazarus] [OT] Re: How to write an eficient lexical scanner/parser?

Lukasz Sokol
On 11/03/15 11:57, Mehmet Erol Sanliturk wrote:
>
>
[...]
>
>
> When there is NO license information in your repositories , this means that
>
> "NO one can use them ."
>
> with respect to copy right laws or conventions .
>

Pointer/Keyword/URL please? I presume for when it matters, one would ask the author first...?

>
> Thank you very much .
>
> Mehmet Erol Sanliturk
>

el es


--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

leledumbo
Administrator
In reply to this post by Mehmet Erol Sanliturk
> When there is NO license information in your repositories , this means that
>
> "NO one can use them ."
>
> with respect to copy right laws or conventions .

No idea with what's default by law or convention, but when I give no license, consider it public domain.
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Virgo Pärna
On Wed, 11 Mar 2015 08:18:35 -0700 (MST), leledumbo <[hidden email]> wrote:
>
> No idea with what's default by law or convention, but when I give no
> license, consider it public domain.
>

    By default author has copyright and sole right to redistribute. And
that's what applies, when nothing else is declared.

--
Virgo Pärna
[hidden email]


--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Mark Morgan Lloyd
In reply to this post by leledumbo
leledumbo wrote:
>> When there is NO license information in your repositories , this means that
>>
>> "NO one can use them ."
>>
>> with respect to copy right laws or conventions .
>
> No idea with what's default by law or convention, but when I give no
> license, consider it public domain.

I'd imagine that application of the law would vary depending on
jurisdiction, and it might be unintentionally affected by the stated
policy of wherever the files are posted.

However the /correct/ thing to do would be to look at the import list
and to adopt whatever license is implied by the libraries and
documentation sources that are referenced. If a body of code imports
(only) libraries which conform to any of the recognised free (as in
speech) licenses, then an unqualified "no one can use them" is not a
reasonable conclusion.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

Mehmet Erol Sanliturk
In reply to this post by Virgo Pärna


On Wed, Mar 11, 2015 at 12:13 PM, Virgo Pärna <[hidden email]> wrote:
On Wed, 11 Mar 2015 08:18:35 -0700 (MST), leledumbo <[hidden email]> wrote:
>
> No idea with what's default by law or convention, but when I give no
> license, consider it public domain.
>

    By default author has copyright and sole right to redistribute. And
that's what applies, when nothing else is declared.

--
Virgo Pärna
[hidden email]


--
_______________________________________________




Yes , the correct understanding is the above sentence . Without explicit declaration , nothing can be assumed "implicitly" that is permitted , i.e. , permission requires to be explicit .


Mehmet Erol Sanliturk




--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] [OT] Re: How to write an eficient lexical scanner/parser?

Mehmet Erol Sanliturk
In reply to this post by Lukasz Sokol


On Wed, Mar 11, 2015 at 7:58 AM, Lukasz Sokol <[hidden email]> wrote:
On 11/03/15 11:57, Mehmet Erol Sanliturk wrote:
>
>
[...]
>
>
> When there is NO license information in your repositories , this means that
>
> "NO one can use them ."
>
> with respect to copy right laws or conventions .
>

Pointer/Keyword/URL please? I presume for when it matters, one would ask the author first...?

>
> Thank you very much .
>
> Mehmet Erol Sanliturk
>

el es


--
_______________________________________________



http://en.wikipedia.org/wiki/Copyright
http://en.wikipedia.org/wiki/Copyright_infringement
http://en.wikipedia.org/wiki/Public_domain


http://en.wikipedia.org/wiki/Software_license
http://en.wikipedia.org/wiki/Public_domain_software
http://en.wikipedia.org/wiki/Software_copyright
http://en.wikipedia.org/wiki/License-free_software


"Personal" use is a different concept , because no one may know that a copy is used whether usage is permitted or not for a copyrighted work .

Problem arises when "Redistribution" or ( "Derivative" works are "Published" ) .

One is the following :
 
Some of these collected software published before a year in US , is in "Public Domain" in the US with respect to a law  ( I do not remember the year exactly but it is around 1970 ) .
ACM has denied this "Public Domain" concept by countering it with "Copyright Law" :

More restrictive rules are effective even they ( restrictive and permissive ) are in the laws for the copyright holder .

In summary , with respect to copyright laws ( there are minor differences in countries as related to Bern Convention agreement exceptions adopted ) anything is not explicitly mentioned is assumed to be not permitted .


Mehmet Erol Sanliturk





--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Reply | Threaded
Open this post in threaded view
|

Re: [Lazarus] How to write an eficient lexical scanner/parser?

silvioprog
In reply to this post by leledumbo
On Wed, Mar 11, 2015 at 6:38 AM, leledumbo <[hidden email]> wrote:
> However, I have a question: is there any article about 'how to write
lexical parsers' using Object Pascal?

First, you need to differentiate them correctly. Lexical scanner (or simply
lexer) and parser, not lexical parser :)

I've written some simple languages, generating GraphViz, LLVM or C code,
using both self learning reference from the almighty Prof. Niklaus Wirth's
articles, Jack Crenshaw's, the legendary dragon book and many others plus my
compiler course in college.

I really really suggest Prof. Niklaus Wirth's one
(http://www.inf.ethz.ch/personal/wirth/, click the Compiler Construction
link) as it's concise, simple, doesn't use stupid generator (i.e.: easy to
implement by hand) while covering most important parts of the subject. While
he didn't use or stress or even discourage the usage of abstract syntax
tree, he did explain it that's good enough to understand.

Here's what I've written in the past and is publicly available:
- https://bitbucket.org/leledumbo/erd-maker (generates GraphViz code for ERD
visualization)
- https://bitbucket.org/leledumbo/linguc (generates LLVM code that can be
compiled as a library for a mathematically provable language focusing in
database driven application)

I don't use the direct code generation approach anymore as I found the
abstract syntax tree to be usable for more than just code generation.

Buddy, thanks alot for that information. I'll take a look at this! =) 

--
Silvio Clécio
My public projects - github.com/silvioprog

--
_______________________________________________
Lazarus mailing list
[hidden email]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus