Skip to content
Mohamed MERZAK edited this page Jul 11, 2020 · 10 revisions

LexicalAnalyser

Lexical Analyser is a Lexer / Tokenizer that determines whether a source code is accepted by a given Deterministic Finite State Automaton or not, and outputs each lexeme with its corresponding token type or whether was it not accepted using the given DFSA.

Automaton

Since LexicalAnalyser detects Tokens by their final states, the following final states rule must be respected for automaton's tokens !

Token FinalState
ID 1
KEYWORD 12
ARTH OP 2, 11
REL OP 5, 9, 10
STRING 13
INT 4
REAL 3, 8
COMMENT 14

Automaton's file format

Template :

Number of states
Alphabets
Initial State
Final states separated by space
Language operators separated by space
Language keywords separated by space
Transitions {StartState Symbol EndState} [From 8th line to the end of the file]
Comments {# Comment} [Starting from line 8]

Examples :

Usage

Requirements

Please make sure that .NET Core runtime is installed before running LexicalAnalyser, if not visit : https://aka.ms/dotnet-core-applaunch

merzak-x@PR3C1S10N:~$ ./LexicalAnalyser SimpleLanguageAutomaton.test SourceCode.test
Output result
merzak-x@PR3C1S10N:~$ ./LexicalAnalyser SimpleLanguageAutomaton.test SourceCode.test

Automaton [SimpleLanguageAutomaton] : 

E = {0, 1, 4, 2, 15, 3, 6, 7, 8, 5, 9, 10, 11} ; 

A = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, _, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, <, >, (, ), ", +, -, *, /, =} ; 

q₀ = 0 ; 

F = {1, 2, 3, 4, 5, 8, 9, 10, 11} ;

                                        
✓ <KEYWORD,{BEGIN}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <STRING,{some quite long string with ¢ħæræŧ€rß +°c §.-?}>
✓ <KEYWORD,{IF}>
✓ <REAL,{13.4}>
✓ <REL_OP,{>=}>
✓ <INT,{77}>
✓ <KEYWORD,{THEN}>
✓ <ID,{variable}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{-}>
✓ <INT,{98}>
✓ <KEYWORD,{ELSE}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <REAL,{17.54E^485512}>
✓ <INT,{1}>
✓ <INT,{-2}>
✓ <INT,{9}>
✓ <REL_OP,{=}>
✓ <INT,{8}>
✓ <INT,{8}>
✓ <ARTH_OP,{/}>
✓ <INT,{9}>
✓ <ID,{var}>
✓ <ARTH_OP,{*}>
✓ <INT,{-1}>
✓ <REL_OP,{<=}>
✓ <INT,{77}>
✓ <ARTH_OP,{/}>
✓ <INT,{18}>
✓ <ID,{test}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{+}>
✓ <INT,{98}>
✓ <ARTH_OP,{+}>
✓ <INT,{1}>
✓ <KEYWORD,{END}>

✓ The source file `SourceCode.test` is accepted by the automaton's described language !


Process finished with exit code 0.
  
Output result with -q
merzak-x@PR3C1S10N:~$ dotnet LexicalAnalyser.dll "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SimpleLanguageAutomaton.test" "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SourceCode.test" -q


✓ <KEYWORD,{BEGIN}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <STRING,{some quite long string with ¢ħæræŧ€rß +°c §.-?}>
✓ <KEYWORD,{IF}>
✓ <REAL,{13.4}>
✓ <REL_OP,{>=}>
✓ <INT,{77}>
✓ <KEYWORD,{THEN}>
✓ <ID,{variable}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{-}>
✓ <INT,{98}>
✓ <KEYWORD,{ELSE}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <REAL,{17.54E^485512}>
✓ <INT,{1}>
✓ <INT,{-2}>
✓ <INT,{9}>
✓ <REL_OP,{=}>
✓ <INT,{8}>
✓ <INT,{8}>
✓ <ARTH_OP,{/}>
✓ <INT,{9}>
✓ <ID,{var}>
✓ <ARTH_OP,{*}>
✓ <INT,{-1}>
✓ <REL_OP,{<=}>
✓ <INT,{77}>
✓ <ARTH_OP,{/}>
✓ <INT,{18}>
✓ <ID,{test}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{+}>
✓ <INT,{98}>
✓ <ARTH_OP,{+}>
✓ <INT,{1}>
✓ <KEYWORD,{END}>

✓ The source file `SourceCode.test` is accepted by the automaton's described language !


Process finished with exit code 0.

Output result with -v
merzak-x@PR3C1S10N:~$ dotnet LexicalAnalyser.dll "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SimpleLanguageAutomaton.test" "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SourceCode.test" -v

Automaton [SimpleLanguageAutomaton] : 

E = {0, 1, 4, 2, 15, 3, 6, 7, 8, 5, 9, 10, 11} ; 

A = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, _, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, <, >, (, ), ", +, -, *, /, =} ; 

q₀ = 0 ; 

F = {1, 2, 3, 4, 5, 8, 9, 10, 11} ;

Source code : 

‎`‎`‎`‎
(* test comment *)
BEGIN
var = "some quite long string with ¢ħæræŧ€rß +°c §.-?";
IF 13.4 >= 77
THEN
    variable = 99 - 98;
ELSE
    var=17.54E^485512
    1-2;9=8;8/9;
    var*-1<=77/18;
    test=99+98+1;
END
‎`‎`‎`‎


✓ <KEYWORD,{BEGIN}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <STRING,{some quite long string with ¢ħæræŧ€rß +°c §.-?}>
✓ <KEYWORD,{IF}>
✓ <REAL,{13.4}>
✓ <REL_OP,{>=}>
✓ <INT,{77}>
✓ <KEYWORD,{THEN}>
✓ <ID,{variable}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{-}>
✓ <INT,{98}>
✓ <KEYWORD,{ELSE}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <REAL,{17.54E^485512}>
✓ <INT,{1}>
✓ <INT,{-2}>
✓ <INT,{9}>
✓ <REL_OP,{=}>
✓ <INT,{8}>
✓ <INT,{8}>
✓ <ARTH_OP,{/}>
✓ <INT,{9}>
✓ <ID,{var}>
✓ <ARTH_OP,{*}>
✓ <INT,{-1}>
✓ <REL_OP,{<=}>
✓ <INT,{77}>
✓ <ARTH_OP,{/}>
✓ <INT,{18}>
✓ <ID,{test}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{+}>
✓ <INT,{98}>
✓ <ARTH_OP,{+}>
✓ <INT,{1}>
✓ <KEYWORD,{END}>

✓ The source file `SourceCode.test` is accepted by the automaton's described language !


Process finished with exit code 0.

Output result with -vv
merzak-x@PR3C1S10N:~$ dotnet LexicalAnalyser.dll "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SimpleLanguageAutomaton.test" "/home/merzak-x/EMSI/C#/Projects/LexicalAnalyser/lib/examples/SourceCode.test" -vv

Automaton [SimpleLanguageAutomaton] : 

E = {0, 1, 4, 2, 15, 3, 6, 7, 8, 5, 9, 10, 11} ; 

A = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, _, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, <, >, (, ), ", +, -, *, /, =} ; 

Transitions: {
        σ(0, a) = 1
        σ(0, b) = 1
        σ(0, c) = 1
        σ(0, d) = 1
        σ(0, e) = 1
        σ(0, f) = 1
        σ(0, g) = 1
        σ(0, h) = 1
        σ(0, i) = 1
        σ(0, j) = 1
        σ(0, k) = 1
        σ(0, l) = 1
        σ(0, m) = 1
        σ(0, n) = 1
        σ(0, o) = 1
        σ(0, p) = 1
        σ(0, q) = 1
        σ(0, r) = 1
        σ(0, s) = 1
        σ(0, t) = 1
        σ(0, u) = 1
        σ(0, v) = 1
        σ(0, w) = 1
        σ(0, x) = 1
        σ(0, y) = 1
        σ(0, z) = 1
        σ(0, _) = 1
        σ(1, 0) = 1
        σ(1, 1) = 1
        σ(1, 2) = 1
        σ(1, 3) = 1
        σ(1, 4) = 1
        σ(1, 5) = 1
        σ(1, 6) = 1
        σ(1, 7) = 1
        σ(1, 8) = 1
        σ(1, 9) = 1
        σ(1, a) = 1
        σ(1, b) = 1
        σ(1, c) = 1
        σ(1, d) = 1
        σ(1, e) = 1
        σ(1, f) = 1
        σ(1, g) = 1
        σ(1, h) = 1
        σ(1, i) = 1
        σ(1, j) = 1
        σ(1, k) = 1
        σ(1, l) = 1
        σ(1, m) = 1
        σ(1, n) = 1
        σ(1, o) = 1
        σ(1, p) = 1
        σ(1, q) = 1
        σ(1, r) = 1
        σ(1, s) = 1
        σ(1, t) = 1
        σ(1, u) = 1
        σ(1, v) = 1
        σ(1, w) = 1
        σ(1, x) = 1
        σ(1, y) = 1
        σ(1, z) = 1
        σ(1, _) = 1
        σ(0, 0) = 4
        σ(0, 1) = 4
        σ(0, 2) = 4
        σ(0, 3) = 4
        σ(0, 4) = 4
        σ(0, 5) = 4
        σ(0, 6) = 4
        σ(0, 7) = 4
        σ(0, 8) = 4
        σ(0, 9) = 4
        σ(0, -) = 2
        σ(2, 0) = 4
        σ(2, 1) = 4
        σ(2, 2) = 4
        σ(2, 3) = 4
        σ(2, 4) = 4
        σ(2, 5) = 4
        σ(2, 6) = 4
        σ(2, 7) = 4
        σ(2, 8) = 4
        σ(2, 9) = 4
        σ(4, 0) = 4
        σ(4, 1) = 4
        σ(4, 2) = 4
        σ(4, 3) = 4
        σ(4, 4) = 4
        σ(4, 5) = 4
        σ(4, 6) = 4
        σ(4, 7) = 4
        σ(4, 8) = 4
        σ(4, 9) = 4
        σ(4, .) = 15
        σ(15, 0) = 3
        σ(15, 1) = 3
        σ(15, 2) = 3
        σ(15, 3) = 3
        σ(15, 4) = 3
        σ(15, 5) = 3
        σ(15, 6) = 3
        σ(15, 7) = 3
        σ(15, 8) = 3
        σ(15, 9) = 3
        σ(3, 0) = 3
        σ(3, 1) = 3
        σ(3, 2) = 3
        σ(3, 3) = 3
        σ(3, 4) = 3
        σ(3, 5) = 3
        σ(3, 6) = 3
        σ(3, 7) = 3
        σ(3, 8) = 3
        σ(3, 9) = 3
        σ(3, e) = 6
        σ(6, ^) = 7
        σ(7, 0) = 8
        σ(7, 1) = 8
        σ(7, 2) = 8
        σ(7, 3) = 8
        σ(7, 4) = 8
        σ(7, 5) = 8
        σ(7, 6) = 8
        σ(7, 7) = 8
        σ(7, 8) = 8
        σ(7, 9) = 8
        σ(8, 0) = 8
        σ(8, 1) = 8
        σ(8, 2) = 8
        σ(8, 3) = 8
        σ(8, 4) = 8
        σ(8, 5) = 8
        σ(8, 6) = 8
        σ(8, 7) = 8
        σ(8, 8) = 8
        σ(8, 9) = 8
        σ(0, =) = 5
        σ(0, <) = 9
        σ(0, >) = 10
        σ(9, =) = 5
        σ(9, >) = 5
        σ(10, =) = 5
        σ(0, +) = 11
        σ(0, *) = 11
        σ(0, /) = 11
} ; 

q₀ = 0 ; 

F = {1, 2, 3, 4, 5, 8, 9, 10, 11} ;

Source code : 

‎`‎`‎`‎
(* test comment *)
BEGIN
var = "some quite long string with ¢ħæræŧ€rß +°c §.-?";
IF 13.4 >= 77
THEN
    variable = 99 - 98;
ELSE
    var=17.54E^485512
    1-2;9=8;8/9;
    var*-1<=77/18;
    test=99+98+1;
END
‎`‎`‎`‎

✓ <KEYWORD,{BEGIN}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <STRING,{some quite long string with ¢ħæræŧ€rß +°c §.-?}>
✓ <KEYWORD,{IF}>
✓ <REAL,{13.4}>
✓ <REL_OP,{>=}>
✓ <INT,{77}>
✓ <KEYWORD,{THEN}>
✓ <ID,{variable}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{-}>
✓ <INT,{98}>
✓ <KEYWORD,{ELSE}>
✓ <ID,{var}>
✓ <REL_OP,{=}>
✓ <REAL,{17.54E^485512}>
✓ <INT,{1}>
✓ <INT,{-2}>
✓ <INT,{9}>
✓ <REL_OP,{=}>
✓ <INT,{8}>
✓ <INT,{8}>
✓ <ARTH_OP,{/}>
✓ <INT,{9}>
✓ <ID,{var}>
✓ <ARTH_OP,{*}>
✓ <INT,{-1}>
✓ <REL_OP,{<=}>
✓ <INT,{77}>
✓ <ARTH_OP,{/}>
✓ <INT,{18}>
✓ <ID,{test}>
✓ <REL_OP,{=}>
✓ <INT,{99}>
✓ <ARTH_OP,{+}>
✓ <INT,{98}>
✓ <ARTH_OP,{+}>
✓ <INT,{1}>
✓ <KEYWORD,{END}>

✓ The source file `SourceCode.test` is accepted by the automaton's described language !


Process finished with exit code 0.

LexicalAnalyser v1.4: https://github.com/MERZAK-X/LexicalAnalyser

Usage: dotnet LexicalAnalyser.dll [[Automaton] [Sourcecode]] [-v|-vv|-q] [--help]

    Arguments:
        Automaton       Path to the Automaton's file
        Source          Path to the source code file to be analysed
    Options:
        -v, -vv         Verbose level, 1 or 2 respectively, if not set 0
        -q              Quiet (verbose level -1), only display results
        --help          Display this help and exit
        
Examples:
    ./LexicalAnalyser SimpleLanguageAutomaton.test SourceCode.test -v
    dotnet LexicalAnalyser.dll lib/examples/SimpleLanguageAutomaton.test lib/examples/SourceCode.test

Documentation: https://git.io/JfNf4

Copyright (C) 2020 "NUL-X"

Developer

MERZAK-X