Generating Intermediate Code Using Syntax-Directed Translation in ANTLR
So, this question isn't necessarily a problem I have, but rather a lack of understanding.
I have this ANTLR code (which comprises of a parser and lexer):
grammar Compiler;
prog
: Class Program '{' field_decls method_decls '}'
;
field_decls returns [String s1]
: field_decls field_decl ';'
{
$s1 = $field_decl.s2;
}
| field_decls inited_field_decl ';'
|
;
field_decl returns [String s2]
: field_decl ',' Ident
| field_decl ',' Ident '[' num ']'
| Type Ident
{
System.out.println($Ident.text);
$s2 = $Ident.text;
}
| Type Ident '[' num ']'
{
System.out.println($Ident.text+"["+"]");
$s2 = $Ident.text;
}
;
inited_field_decl
: Type Ident '=' literal
;
method_decls
: method_decls method_decl
|
;
method_decl
: Void Ident '(' params ')' block
| Type Ident '(' params ')' block
;
params
: Type Ident nextParams
|
;
nextParams
: ',' Type Ident nextParams
|
;
block
: '{' var_decls statements '}'
;
var_decls
: var_decls var_decl
|
;
var_decl
: Type Ident ';'
;
statements
: statement statements
|
;
statement
: location eqOp expr ';'
| If '(' expr ')' block
| If '(' expr ')' block Else block
| While '(' expr ')' statement
| Switch expr '{' cases '}'
| Ret ';'
| Ret '(' expr ')' ';'
| Brk ';'
| Cnt ';'
| block
| methodCall ';'
;
cases
: Case literal ':' statements cases
| Case literal ':' statements
;
methodCall
: Ident '(' args ')'
| Callout '(' Str calloutArgs ')'
;
args
: someArgs
|
;
someArgs
: someArgs ',' expr
| expr
;
calloutArgs
: calloutArgs ',' expr
| calloutArgs ',' Str
|
;
expr
: literal
| location
| '(' expr ')'
| SubOp expr
| '!' expr
| expr AddOp expr
| expr MulDiv expr
| expr SubOp expr
| expr RelOp expr
| expr AndOp expr
| expr OrOp expr
| methodCall
;
location
:Ident
| Ident '[' expr ']'
;
num
: DecNum
| HexNum
;
literal
: num
| Char
| BoolLit
;
eqOp
: '='
| AssignOp
;
//-----------------------------------------------------------------------------------------------------------
fragment Delim
: ' '
| 't'
| 'n'
;
fragment Letter
: [a-zA-Z]
;
fragment Digit
: [0-9]
;
fragment HexDigit
: Digit
| [a-f]
| [A-F]
;
fragment Alpha
: Letter
| '_'
;
fragment AlphaNum
: Alpha
| Digit
;
WhiteSpace
: Delim+ -> skip
;
Char
: ''' ~('\') '''
| ''\' . '''
;
Str
:'"' ((~('\' | '"')) | ('\'.))* '"'
;
Class
: 'class'
;
Program
: 'Program'
;
Void
: 'void'
;
If
: 'if'
;
Else
: 'else'
;
While
: 'while'
;
Switch
: 'switch'
;
Case
: 'case'
;
Ret
: 'return'
;
Brk
: 'break'
;
Cnt
: 'continue'
;
Callout
: 'callout'
;
DecNum
: Digit+
;
HexNum
: '0x'HexDigit+
;
BoolLit
: 'true'
| 'false'
;
Type
: 'int'
| 'boolean'
;
Ident
: Alpha AlphaNum*
;
RelOp
: '<='
| '>='
| '<'
| '>'
| '=='
| '!='
;
AssignOp
: '+='
| '-='
;
MulDiv
: '*'
| '/'
| '%'
;
AddOp
: '+'
;
SubOp
: '-'
;
AndOp
: '&&'
;
OrOp
: '||'
;
And basically, we need to generate intermediate code using syntax directed translation. By my knowledge, this means that we must add semantic rules to the parser grammar. We need to take the output generated and encapsulate it into .csv files.
So, we have three files: symbols.csv, symtable.csv and instructions.csv
In symbols.csv, the format of each row is:
int id; //serial no. of symbol, unique
int tabid; //id no. of symbol table
string name; //symbol name
enum types {INT, CHAR, BOOL, STR, VOID, LABEL, INVALID} ty; //symbol type
enum scope {GLOBAL, LOCAL, CONST, INVALID} sc; //symbol scope
boolean isArray; //is it an array variable
int arrSize; //array size, if applicable
boolean isInited; //is initialized
union initVal {
int i;
boolean b;
} in; //initial value, if applicable
In symtable.csv, the format of each row is:
int id; //symbol table serial no., unique
int parent; //parent symbol table serial no.
In instructions.csv, the format of each row is:
int id; //serial no., unique
int res; //serial no. of result symbol
enum opcode {ADD, SUB, MUL, DIV, NEG, READ, WRITE, ASSIGN, GOTO, LT, GT, LE, GE, EQ, NE, PARAM, CALL, RET, LABEL} opc; //operation type
int op1; //serial no. of first operand symbol
int op2; //serial no. of second operand symbol
As an example, let's say we have this input:
class Program {
int x;
int y, z;
int w = 0;
void main (int n) {
int a;
a = 0;
while (a < n) {
int n;
n = a + 1;
a = n;
}
callout("printf", "n = %dn", n);
return n;
}
}
symbols.csv should look like this:
0, 0, x, INT, GLOBAL, false, 0, false, 0,
1, 0, y, INT, GLOBAL, false, 0, false, 0,
2, 0, z, INT, GLOBAL, false, 0, false, 0,
3, 0, 0, INT, CONST, false, 0, false, 0,
4, 0, w, INT, GLOBAL, false, 0, true, 0,
5, 0, main, LABEL, GLOBAL, false, 0, false, 0,
6, 1, n, INT, LOCAL, false, 0, false, 0,
7, 1, a, INT, LOCAL, false, 0, false, 0,
8, 1, 0, INT, CONST, false, 0, false, 0,
9, 2, n, INT, LOCAL, false, 0, false, 0,
10, 2, 1, INT, CONST, false, 0, false, 0,
11, 1, "printf", STR, CONST, false, 0, false, 0,
12, 1, "n = %dn", STR, CONST, false, 0, false, 0,
13, 1, 2, INT, CONST, false, 0, false, 0,
symtables.csv should look like this:
0, -1,
1, 0,
2, 1,
instructions.csv should look like this:
0, 4, ASSIGN, 3, -1, #w = 0
1, 5, LABEL, -1, -1, #main:
2, 7, ASSIGN, 8, -1, #a = 0
3, 5, LT, 7, 6, #if a<n goto 5
4, 8, GE, 7, 6, #iffalse a<n goto 8
5, 9, ADD, 7, 10, #n = a + 1
6, 7, ASSIGN, 9, -1, #a = n
7, 2, GOTO, -1, -1, #goto 3
8, -1, PARAM, 12, -1, #"n = %dn"
9, -1, PARAM, 6, -1, #n
10, -1, CALL, 11, 13, #callout("printf", "n = %dn", n);
11, -1, RET, 6, -1, # return n
Simply put, I am not sure exactly where to start. I understand that I must add semantic rules to my parser grammar so that I can have output such as the ones I have previously stated. Furthermore, I have done some research on my own and discovered that I must create classes in java for my symbols and symtable and symstack. I am very new to ANTLR and would appreciate it if someone experienced in ANTLR could point me in the right direction.
Thank you in advance for any help.
P.S My lexer and parser are based off a tiny C-like language that is posted below.
Tiny C-Like Language:
program
:'class Program {'field_decl* method_decl*'}'
field_decl
: type (id | id'['int_literal']') ( ',' id | id'['int_literal']')*';'
| type id '=' literal ';'
method_decl
: (type | 'void') id'('( (type id) ( ','type id)*)? ')'block
block
: '{'var_decl* statement*'}'
var_decl
: type id(','id)* ';'
type
: 'int'
| 'boolean'
statement
: location assign_op expr';'
| method_call';'
| 'if ('expr')' block ('else' block )?
| 'switch' expr '{'('case' literal ':' statement*)+'}'
| 'while (' expr ')' statement
| 'return' ( expr )? ';'
| 'break ;'
| 'continue ;'
| block
assign_op
: '='
| '+='
| '-='
method_call
: method_name '(' (expr ( ',' expr )*)? ')'
| 'callout (' string_literal ( ',' callout_arg )* ')'
method_name
: id
location
: id
| id '[' expr ']'
expr
: location
| method_call
| literal
| expr bin_op expr
| '-' expr
| '!' expr
| '(' expr ')'
callout_arg
: expr
| string_literal
bin_op
: arith_op
| rel_op
| eq_op
| cond_op
arith_op
: '+'
| '-'
| '*'
| '/'
| '%'
rel_op
: '<'
| '>'
| '<='
| '>='
eq_op
: '=='
| '!='
cond_op
: '&&'
| '||'
literal
: int_literal
| char_literal
| bool_literal
id
: alpha alpha_num*
alpha
: ['a'-'z''A'-'Z''_']
alpha_num
: alpha
| digit
digit
: ['0'-'9']
hex_digit
: digit
| ['a'-'f''A'-'F']
int_literal
: decimal_literal
| hex_literal
decimal_literal
: digit+
hex_literal
: '0x' hex_digit+
bool_literal
: 'true'
| 'false'
char_literal
: '‘'char'’'
string_literal
: '“'char*'”'
csv compiler-construction antlr
add a comment |
So, this question isn't necessarily a problem I have, but rather a lack of understanding.
I have this ANTLR code (which comprises of a parser and lexer):
grammar Compiler;
prog
: Class Program '{' field_decls method_decls '}'
;
field_decls returns [String s1]
: field_decls field_decl ';'
{
$s1 = $field_decl.s2;
}
| field_decls inited_field_decl ';'
|
;
field_decl returns [String s2]
: field_decl ',' Ident
| field_decl ',' Ident '[' num ']'
| Type Ident
{
System.out.println($Ident.text);
$s2 = $Ident.text;
}
| Type Ident '[' num ']'
{
System.out.println($Ident.text+"["+"]");
$s2 = $Ident.text;
}
;
inited_field_decl
: Type Ident '=' literal
;
method_decls
: method_decls method_decl
|
;
method_decl
: Void Ident '(' params ')' block
| Type Ident '(' params ')' block
;
params
: Type Ident nextParams
|
;
nextParams
: ',' Type Ident nextParams
|
;
block
: '{' var_decls statements '}'
;
var_decls
: var_decls var_decl
|
;
var_decl
: Type Ident ';'
;
statements
: statement statements
|
;
statement
: location eqOp expr ';'
| If '(' expr ')' block
| If '(' expr ')' block Else block
| While '(' expr ')' statement
| Switch expr '{' cases '}'
| Ret ';'
| Ret '(' expr ')' ';'
| Brk ';'
| Cnt ';'
| block
| methodCall ';'
;
cases
: Case literal ':' statements cases
| Case literal ':' statements
;
methodCall
: Ident '(' args ')'
| Callout '(' Str calloutArgs ')'
;
args
: someArgs
|
;
someArgs
: someArgs ',' expr
| expr
;
calloutArgs
: calloutArgs ',' expr
| calloutArgs ',' Str
|
;
expr
: literal
| location
| '(' expr ')'
| SubOp expr
| '!' expr
| expr AddOp expr
| expr MulDiv expr
| expr SubOp expr
| expr RelOp expr
| expr AndOp expr
| expr OrOp expr
| methodCall
;
location
:Ident
| Ident '[' expr ']'
;
num
: DecNum
| HexNum
;
literal
: num
| Char
| BoolLit
;
eqOp
: '='
| AssignOp
;
//-----------------------------------------------------------------------------------------------------------
fragment Delim
: ' '
| 't'
| 'n'
;
fragment Letter
: [a-zA-Z]
;
fragment Digit
: [0-9]
;
fragment HexDigit
: Digit
| [a-f]
| [A-F]
;
fragment Alpha
: Letter
| '_'
;
fragment AlphaNum
: Alpha
| Digit
;
WhiteSpace
: Delim+ -> skip
;
Char
: ''' ~('\') '''
| ''\' . '''
;
Str
:'"' ((~('\' | '"')) | ('\'.))* '"'
;
Class
: 'class'
;
Program
: 'Program'
;
Void
: 'void'
;
If
: 'if'
;
Else
: 'else'
;
While
: 'while'
;
Switch
: 'switch'
;
Case
: 'case'
;
Ret
: 'return'
;
Brk
: 'break'
;
Cnt
: 'continue'
;
Callout
: 'callout'
;
DecNum
: Digit+
;
HexNum
: '0x'HexDigit+
;
BoolLit
: 'true'
| 'false'
;
Type
: 'int'
| 'boolean'
;
Ident
: Alpha AlphaNum*
;
RelOp
: '<='
| '>='
| '<'
| '>'
| '=='
| '!='
;
AssignOp
: '+='
| '-='
;
MulDiv
: '*'
| '/'
| '%'
;
AddOp
: '+'
;
SubOp
: '-'
;
AndOp
: '&&'
;
OrOp
: '||'
;
And basically, we need to generate intermediate code using syntax directed translation. By my knowledge, this means that we must add semantic rules to the parser grammar. We need to take the output generated and encapsulate it into .csv files.
So, we have three files: symbols.csv, symtable.csv and instructions.csv
In symbols.csv, the format of each row is:
int id; //serial no. of symbol, unique
int tabid; //id no. of symbol table
string name; //symbol name
enum types {INT, CHAR, BOOL, STR, VOID, LABEL, INVALID} ty; //symbol type
enum scope {GLOBAL, LOCAL, CONST, INVALID} sc; //symbol scope
boolean isArray; //is it an array variable
int arrSize; //array size, if applicable
boolean isInited; //is initialized
union initVal {
int i;
boolean b;
} in; //initial value, if applicable
In symtable.csv, the format of each row is:
int id; //symbol table serial no., unique
int parent; //parent symbol table serial no.
In instructions.csv, the format of each row is:
int id; //serial no., unique
int res; //serial no. of result symbol
enum opcode {ADD, SUB, MUL, DIV, NEG, READ, WRITE, ASSIGN, GOTO, LT, GT, LE, GE, EQ, NE, PARAM, CALL, RET, LABEL} opc; //operation type
int op1; //serial no. of first operand symbol
int op2; //serial no. of second operand symbol
As an example, let's say we have this input:
class Program {
int x;
int y, z;
int w = 0;
void main (int n) {
int a;
a = 0;
while (a < n) {
int n;
n = a + 1;
a = n;
}
callout("printf", "n = %dn", n);
return n;
}
}
symbols.csv should look like this:
0, 0, x, INT, GLOBAL, false, 0, false, 0,
1, 0, y, INT, GLOBAL, false, 0, false, 0,
2, 0, z, INT, GLOBAL, false, 0, false, 0,
3, 0, 0, INT, CONST, false, 0, false, 0,
4, 0, w, INT, GLOBAL, false, 0, true, 0,
5, 0, main, LABEL, GLOBAL, false, 0, false, 0,
6, 1, n, INT, LOCAL, false, 0, false, 0,
7, 1, a, INT, LOCAL, false, 0, false, 0,
8, 1, 0, INT, CONST, false, 0, false, 0,
9, 2, n, INT, LOCAL, false, 0, false, 0,
10, 2, 1, INT, CONST, false, 0, false, 0,
11, 1, "printf", STR, CONST, false, 0, false, 0,
12, 1, "n = %dn", STR, CONST, false, 0, false, 0,
13, 1, 2, INT, CONST, false, 0, false, 0,
symtables.csv should look like this:
0, -1,
1, 0,
2, 1,
instructions.csv should look like this:
0, 4, ASSIGN, 3, -1, #w = 0
1, 5, LABEL, -1, -1, #main:
2, 7, ASSIGN, 8, -1, #a = 0
3, 5, LT, 7, 6, #if a<n goto 5
4, 8, GE, 7, 6, #iffalse a<n goto 8
5, 9, ADD, 7, 10, #n = a + 1
6, 7, ASSIGN, 9, -1, #a = n
7, 2, GOTO, -1, -1, #goto 3
8, -1, PARAM, 12, -1, #"n = %dn"
9, -1, PARAM, 6, -1, #n
10, -1, CALL, 11, 13, #callout("printf", "n = %dn", n);
11, -1, RET, 6, -1, # return n
Simply put, I am not sure exactly where to start. I understand that I must add semantic rules to my parser grammar so that I can have output such as the ones I have previously stated. Furthermore, I have done some research on my own and discovered that I must create classes in java for my symbols and symtable and symstack. I am very new to ANTLR and would appreciate it if someone experienced in ANTLR could point me in the right direction.
Thank you in advance for any help.
P.S My lexer and parser are based off a tiny C-like language that is posted below.
Tiny C-Like Language:
program
:'class Program {'field_decl* method_decl*'}'
field_decl
: type (id | id'['int_literal']') ( ',' id | id'['int_literal']')*';'
| type id '=' literal ';'
method_decl
: (type | 'void') id'('( (type id) ( ','type id)*)? ')'block
block
: '{'var_decl* statement*'}'
var_decl
: type id(','id)* ';'
type
: 'int'
| 'boolean'
statement
: location assign_op expr';'
| method_call';'
| 'if ('expr')' block ('else' block )?
| 'switch' expr '{'('case' literal ':' statement*)+'}'
| 'while (' expr ')' statement
| 'return' ( expr )? ';'
| 'break ;'
| 'continue ;'
| block
assign_op
: '='
| '+='
| '-='
method_call
: method_name '(' (expr ( ',' expr )*)? ')'
| 'callout (' string_literal ( ',' callout_arg )* ')'
method_name
: id
location
: id
| id '[' expr ']'
expr
: location
| method_call
| literal
| expr bin_op expr
| '-' expr
| '!' expr
| '(' expr ')'
callout_arg
: expr
| string_literal
bin_op
: arith_op
| rel_op
| eq_op
| cond_op
arith_op
: '+'
| '-'
| '*'
| '/'
| '%'
rel_op
: '<'
| '>'
| '<='
| '>='
eq_op
: '=='
| '!='
cond_op
: '&&'
| '||'
literal
: int_literal
| char_literal
| bool_literal
id
: alpha alpha_num*
alpha
: ['a'-'z''A'-'Z''_']
alpha_num
: alpha
| digit
digit
: ['0'-'9']
hex_digit
: digit
| ['a'-'f''A'-'F']
int_literal
: decimal_literal
| hex_literal
decimal_literal
: digit+
hex_literal
: '0x' hex_digit+
bool_literal
: 'true'
| 'false'
char_literal
: '‘'char'’'
string_literal
: '“'char*'”'
csv compiler-construction antlr
add a comment |
So, this question isn't necessarily a problem I have, but rather a lack of understanding.
I have this ANTLR code (which comprises of a parser and lexer):
grammar Compiler;
prog
: Class Program '{' field_decls method_decls '}'
;
field_decls returns [String s1]
: field_decls field_decl ';'
{
$s1 = $field_decl.s2;
}
| field_decls inited_field_decl ';'
|
;
field_decl returns [String s2]
: field_decl ',' Ident
| field_decl ',' Ident '[' num ']'
| Type Ident
{
System.out.println($Ident.text);
$s2 = $Ident.text;
}
| Type Ident '[' num ']'
{
System.out.println($Ident.text+"["+"]");
$s2 = $Ident.text;
}
;
inited_field_decl
: Type Ident '=' literal
;
method_decls
: method_decls method_decl
|
;
method_decl
: Void Ident '(' params ')' block
| Type Ident '(' params ')' block
;
params
: Type Ident nextParams
|
;
nextParams
: ',' Type Ident nextParams
|
;
block
: '{' var_decls statements '}'
;
var_decls
: var_decls var_decl
|
;
var_decl
: Type Ident ';'
;
statements
: statement statements
|
;
statement
: location eqOp expr ';'
| If '(' expr ')' block
| If '(' expr ')' block Else block
| While '(' expr ')' statement
| Switch expr '{' cases '}'
| Ret ';'
| Ret '(' expr ')' ';'
| Brk ';'
| Cnt ';'
| block
| methodCall ';'
;
cases
: Case literal ':' statements cases
| Case literal ':' statements
;
methodCall
: Ident '(' args ')'
| Callout '(' Str calloutArgs ')'
;
args
: someArgs
|
;
someArgs
: someArgs ',' expr
| expr
;
calloutArgs
: calloutArgs ',' expr
| calloutArgs ',' Str
|
;
expr
: literal
| location
| '(' expr ')'
| SubOp expr
| '!' expr
| expr AddOp expr
| expr MulDiv expr
| expr SubOp expr
| expr RelOp expr
| expr AndOp expr
| expr OrOp expr
| methodCall
;
location
:Ident
| Ident '[' expr ']'
;
num
: DecNum
| HexNum
;
literal
: num
| Char
| BoolLit
;
eqOp
: '='
| AssignOp
;
//-----------------------------------------------------------------------------------------------------------
fragment Delim
: ' '
| 't'
| 'n'
;
fragment Letter
: [a-zA-Z]
;
fragment Digit
: [0-9]
;
fragment HexDigit
: Digit
| [a-f]
| [A-F]
;
fragment Alpha
: Letter
| '_'
;
fragment AlphaNum
: Alpha
| Digit
;
WhiteSpace
: Delim+ -> skip
;
Char
: ''' ~('\') '''
| ''\' . '''
;
Str
:'"' ((~('\' | '"')) | ('\'.))* '"'
;
Class
: 'class'
;
Program
: 'Program'
;
Void
: 'void'
;
If
: 'if'
;
Else
: 'else'
;
While
: 'while'
;
Switch
: 'switch'
;
Case
: 'case'
;
Ret
: 'return'
;
Brk
: 'break'
;
Cnt
: 'continue'
;
Callout
: 'callout'
;
DecNum
: Digit+
;
HexNum
: '0x'HexDigit+
;
BoolLit
: 'true'
| 'false'
;
Type
: 'int'
| 'boolean'
;
Ident
: Alpha AlphaNum*
;
RelOp
: '<='
| '>='
| '<'
| '>'
| '=='
| '!='
;
AssignOp
: '+='
| '-='
;
MulDiv
: '*'
| '/'
| '%'
;
AddOp
: '+'
;
SubOp
: '-'
;
AndOp
: '&&'
;
OrOp
: '||'
;
And basically, we need to generate intermediate code using syntax directed translation. By my knowledge, this means that we must add semantic rules to the parser grammar. We need to take the output generated and encapsulate it into .csv files.
So, we have three files: symbols.csv, symtable.csv and instructions.csv
In symbols.csv, the format of each row is:
int id; //serial no. of symbol, unique
int tabid; //id no. of symbol table
string name; //symbol name
enum types {INT, CHAR, BOOL, STR, VOID, LABEL, INVALID} ty; //symbol type
enum scope {GLOBAL, LOCAL, CONST, INVALID} sc; //symbol scope
boolean isArray; //is it an array variable
int arrSize; //array size, if applicable
boolean isInited; //is initialized
union initVal {
int i;
boolean b;
} in; //initial value, if applicable
In symtable.csv, the format of each row is:
int id; //symbol table serial no., unique
int parent; //parent symbol table serial no.
In instructions.csv, the format of each row is:
int id; //serial no., unique
int res; //serial no. of result symbol
enum opcode {ADD, SUB, MUL, DIV, NEG, READ, WRITE, ASSIGN, GOTO, LT, GT, LE, GE, EQ, NE, PARAM, CALL, RET, LABEL} opc; //operation type
int op1; //serial no. of first operand symbol
int op2; //serial no. of second operand symbol
As an example, let's say we have this input:
class Program {
int x;
int y, z;
int w = 0;
void main (int n) {
int a;
a = 0;
while (a < n) {
int n;
n = a + 1;
a = n;
}
callout("printf", "n = %dn", n);
return n;
}
}
symbols.csv should look like this:
0, 0, x, INT, GLOBAL, false, 0, false, 0,
1, 0, y, INT, GLOBAL, false, 0, false, 0,
2, 0, z, INT, GLOBAL, false, 0, false, 0,
3, 0, 0, INT, CONST, false, 0, false, 0,
4, 0, w, INT, GLOBAL, false, 0, true, 0,
5, 0, main, LABEL, GLOBAL, false, 0, false, 0,
6, 1, n, INT, LOCAL, false, 0, false, 0,
7, 1, a, INT, LOCAL, false, 0, false, 0,
8, 1, 0, INT, CONST, false, 0, false, 0,
9, 2, n, INT, LOCAL, false, 0, false, 0,
10, 2, 1, INT, CONST, false, 0, false, 0,
11, 1, "printf", STR, CONST, false, 0, false, 0,
12, 1, "n = %dn", STR, CONST, false, 0, false, 0,
13, 1, 2, INT, CONST, false, 0, false, 0,
symtables.csv should look like this:
0, -1,
1, 0,
2, 1,
instructions.csv should look like this:
0, 4, ASSIGN, 3, -1, #w = 0
1, 5, LABEL, -1, -1, #main:
2, 7, ASSIGN, 8, -1, #a = 0
3, 5, LT, 7, 6, #if a<n goto 5
4, 8, GE, 7, 6, #iffalse a<n goto 8
5, 9, ADD, 7, 10, #n = a + 1
6, 7, ASSIGN, 9, -1, #a = n
7, 2, GOTO, -1, -1, #goto 3
8, -1, PARAM, 12, -1, #"n = %dn"
9, -1, PARAM, 6, -1, #n
10, -1, CALL, 11, 13, #callout("printf", "n = %dn", n);
11, -1, RET, 6, -1, # return n
Simply put, I am not sure exactly where to start. I understand that I must add semantic rules to my parser grammar so that I can have output such as the ones I have previously stated. Furthermore, I have done some research on my own and discovered that I must create classes in java for my symbols and symtable and symstack. I am very new to ANTLR and would appreciate it if someone experienced in ANTLR could point me in the right direction.
Thank you in advance for any help.
P.S My lexer and parser are based off a tiny C-like language that is posted below.
Tiny C-Like Language:
program
:'class Program {'field_decl* method_decl*'}'
field_decl
: type (id | id'['int_literal']') ( ',' id | id'['int_literal']')*';'
| type id '=' literal ';'
method_decl
: (type | 'void') id'('( (type id) ( ','type id)*)? ')'block
block
: '{'var_decl* statement*'}'
var_decl
: type id(','id)* ';'
type
: 'int'
| 'boolean'
statement
: location assign_op expr';'
| method_call';'
| 'if ('expr')' block ('else' block )?
| 'switch' expr '{'('case' literal ':' statement*)+'}'
| 'while (' expr ')' statement
| 'return' ( expr )? ';'
| 'break ;'
| 'continue ;'
| block
assign_op
: '='
| '+='
| '-='
method_call
: method_name '(' (expr ( ',' expr )*)? ')'
| 'callout (' string_literal ( ',' callout_arg )* ')'
method_name
: id
location
: id
| id '[' expr ']'
expr
: location
| method_call
| literal
| expr bin_op expr
| '-' expr
| '!' expr
| '(' expr ')'
callout_arg
: expr
| string_literal
bin_op
: arith_op
| rel_op
| eq_op
| cond_op
arith_op
: '+'
| '-'
| '*'
| '/'
| '%'
rel_op
: '<'
| '>'
| '<='
| '>='
eq_op
: '=='
| '!='
cond_op
: '&&'
| '||'
literal
: int_literal
| char_literal
| bool_literal
id
: alpha alpha_num*
alpha
: ['a'-'z''A'-'Z''_']
alpha_num
: alpha
| digit
digit
: ['0'-'9']
hex_digit
: digit
| ['a'-'f''A'-'F']
int_literal
: decimal_literal
| hex_literal
decimal_literal
: digit+
hex_literal
: '0x' hex_digit+
bool_literal
: 'true'
| 'false'
char_literal
: '‘'char'’'
string_literal
: '“'char*'”'
csv compiler-construction antlr
So, this question isn't necessarily a problem I have, but rather a lack of understanding.
I have this ANTLR code (which comprises of a parser and lexer):
grammar Compiler;
prog
: Class Program '{' field_decls method_decls '}'
;
field_decls returns [String s1]
: field_decls field_decl ';'
{
$s1 = $field_decl.s2;
}
| field_decls inited_field_decl ';'
|
;
field_decl returns [String s2]
: field_decl ',' Ident
| field_decl ',' Ident '[' num ']'
| Type Ident
{
System.out.println($Ident.text);
$s2 = $Ident.text;
}
| Type Ident '[' num ']'
{
System.out.println($Ident.text+"["+"]");
$s2 = $Ident.text;
}
;
inited_field_decl
: Type Ident '=' literal
;
method_decls
: method_decls method_decl
|
;
method_decl
: Void Ident '(' params ')' block
| Type Ident '(' params ')' block
;
params
: Type Ident nextParams
|
;
nextParams
: ',' Type Ident nextParams
|
;
block
: '{' var_decls statements '}'
;
var_decls
: var_decls var_decl
|
;
var_decl
: Type Ident ';'
;
statements
: statement statements
|
;
statement
: location eqOp expr ';'
| If '(' expr ')' block
| If '(' expr ')' block Else block
| While '(' expr ')' statement
| Switch expr '{' cases '}'
| Ret ';'
| Ret '(' expr ')' ';'
| Brk ';'
| Cnt ';'
| block
| methodCall ';'
;
cases
: Case literal ':' statements cases
| Case literal ':' statements
;
methodCall
: Ident '(' args ')'
| Callout '(' Str calloutArgs ')'
;
args
: someArgs
|
;
someArgs
: someArgs ',' expr
| expr
;
calloutArgs
: calloutArgs ',' expr
| calloutArgs ',' Str
|
;
expr
: literal
| location
| '(' expr ')'
| SubOp expr
| '!' expr
| expr AddOp expr
| expr MulDiv expr
| expr SubOp expr
| expr RelOp expr
| expr AndOp expr
| expr OrOp expr
| methodCall
;
location
:Ident
| Ident '[' expr ']'
;
num
: DecNum
| HexNum
;
literal
: num
| Char
| BoolLit
;
eqOp
: '='
| AssignOp
;
//-----------------------------------------------------------------------------------------------------------
fragment Delim
: ' '
| 't'
| 'n'
;
fragment Letter
: [a-zA-Z]
;
fragment Digit
: [0-9]
;
fragment HexDigit
: Digit
| [a-f]
| [A-F]
;
fragment Alpha
: Letter
| '_'
;
fragment AlphaNum
: Alpha
| Digit
;
WhiteSpace
: Delim+ -> skip
;
Char
: ''' ~('\') '''
| ''\' . '''
;
Str
:'"' ((~('\' | '"')) | ('\'.))* '"'
;
Class
: 'class'
;
Program
: 'Program'
;
Void
: 'void'
;
If
: 'if'
;
Else
: 'else'
;
While
: 'while'
;
Switch
: 'switch'
;
Case
: 'case'
;
Ret
: 'return'
;
Brk
: 'break'
;
Cnt
: 'continue'
;
Callout
: 'callout'
;
DecNum
: Digit+
;
HexNum
: '0x'HexDigit+
;
BoolLit
: 'true'
| 'false'
;
Type
: 'int'
| 'boolean'
;
Ident
: Alpha AlphaNum*
;
RelOp
: '<='
| '>='
| '<'
| '>'
| '=='
| '!='
;
AssignOp
: '+='
| '-='
;
MulDiv
: '*'
| '/'
| '%'
;
AddOp
: '+'
;
SubOp
: '-'
;
AndOp
: '&&'
;
OrOp
: '||'
;
And basically, we need to generate intermediate code using syntax directed translation. By my knowledge, this means that we must add semantic rules to the parser grammar. We need to take the output generated and encapsulate it into .csv files.
So, we have three files: symbols.csv, symtable.csv and instructions.csv
In symbols.csv, the format of each row is:
int id; //serial no. of symbol, unique
int tabid; //id no. of symbol table
string name; //symbol name
enum types {INT, CHAR, BOOL, STR, VOID, LABEL, INVALID} ty; //symbol type
enum scope {GLOBAL, LOCAL, CONST, INVALID} sc; //symbol scope
boolean isArray; //is it an array variable
int arrSize; //array size, if applicable
boolean isInited; //is initialized
union initVal {
int i;
boolean b;
} in; //initial value, if applicable
In symtable.csv, the format of each row is:
int id; //symbol table serial no., unique
int parent; //parent symbol table serial no.
In instructions.csv, the format of each row is:
int id; //serial no., unique
int res; //serial no. of result symbol
enum opcode {ADD, SUB, MUL, DIV, NEG, READ, WRITE, ASSIGN, GOTO, LT, GT, LE, GE, EQ, NE, PARAM, CALL, RET, LABEL} opc; //operation type
int op1; //serial no. of first operand symbol
int op2; //serial no. of second operand symbol
As an example, let's say we have this input:
class Program {
int x;
int y, z;
int w = 0;
void main (int n) {
int a;
a = 0;
while (a < n) {
int n;
n = a + 1;
a = n;
}
callout("printf", "n = %dn", n);
return n;
}
}
symbols.csv should look like this:
0, 0, x, INT, GLOBAL, false, 0, false, 0,
1, 0, y, INT, GLOBAL, false, 0, false, 0,
2, 0, z, INT, GLOBAL, false, 0, false, 0,
3, 0, 0, INT, CONST, false, 0, false, 0,
4, 0, w, INT, GLOBAL, false, 0, true, 0,
5, 0, main, LABEL, GLOBAL, false, 0, false, 0,
6, 1, n, INT, LOCAL, false, 0, false, 0,
7, 1, a, INT, LOCAL, false, 0, false, 0,
8, 1, 0, INT, CONST, false, 0, false, 0,
9, 2, n, INT, LOCAL, false, 0, false, 0,
10, 2, 1, INT, CONST, false, 0, false, 0,
11, 1, "printf", STR, CONST, false, 0, false, 0,
12, 1, "n = %dn", STR, CONST, false, 0, false, 0,
13, 1, 2, INT, CONST, false, 0, false, 0,
symtables.csv should look like this:
0, -1,
1, 0,
2, 1,
instructions.csv should look like this:
0, 4, ASSIGN, 3, -1, #w = 0
1, 5, LABEL, -1, -1, #main:
2, 7, ASSIGN, 8, -1, #a = 0
3, 5, LT, 7, 6, #if a<n goto 5
4, 8, GE, 7, 6, #iffalse a<n goto 8
5, 9, ADD, 7, 10, #n = a + 1
6, 7, ASSIGN, 9, -1, #a = n
7, 2, GOTO, -1, -1, #goto 3
8, -1, PARAM, 12, -1, #"n = %dn"
9, -1, PARAM, 6, -1, #n
10, -1, CALL, 11, 13, #callout("printf", "n = %dn", n);
11, -1, RET, 6, -1, # return n
Simply put, I am not sure exactly where to start. I understand that I must add semantic rules to my parser grammar so that I can have output such as the ones I have previously stated. Furthermore, I have done some research on my own and discovered that I must create classes in java for my symbols and symtable and symstack. I am very new to ANTLR and would appreciate it if someone experienced in ANTLR could point me in the right direction.
Thank you in advance for any help.
P.S My lexer and parser are based off a tiny C-like language that is posted below.
Tiny C-Like Language:
program
:'class Program {'field_decl* method_decl*'}'
field_decl
: type (id | id'['int_literal']') ( ',' id | id'['int_literal']')*';'
| type id '=' literal ';'
method_decl
: (type | 'void') id'('( (type id) ( ','type id)*)? ')'block
block
: '{'var_decl* statement*'}'
var_decl
: type id(','id)* ';'
type
: 'int'
| 'boolean'
statement
: location assign_op expr';'
| method_call';'
| 'if ('expr')' block ('else' block )?
| 'switch' expr '{'('case' literal ':' statement*)+'}'
| 'while (' expr ')' statement
| 'return' ( expr )? ';'
| 'break ;'
| 'continue ;'
| block
assign_op
: '='
| '+='
| '-='
method_call
: method_name '(' (expr ( ',' expr )*)? ')'
| 'callout (' string_literal ( ',' callout_arg )* ')'
method_name
: id
location
: id
| id '[' expr ']'
expr
: location
| method_call
| literal
| expr bin_op expr
| '-' expr
| '!' expr
| '(' expr ')'
callout_arg
: expr
| string_literal
bin_op
: arith_op
| rel_op
| eq_op
| cond_op
arith_op
: '+'
| '-'
| '*'
| '/'
| '%'
rel_op
: '<'
| '>'
| '<='
| '>='
eq_op
: '=='
| '!='
cond_op
: '&&'
| '||'
literal
: int_literal
| char_literal
| bool_literal
id
: alpha alpha_num*
alpha
: ['a'-'z''A'-'Z''_']
alpha_num
: alpha
| digit
digit
: ['0'-'9']
hex_digit
: digit
| ['a'-'f''A'-'F']
int_literal
: decimal_literal
| hex_literal
decimal_literal
: digit+
hex_literal
: '0x' hex_digit+
bool_literal
: 'true'
| 'false'
char_literal
: '‘'char'’'
string_literal
: '“'char*'”'
csv compiler-construction antlr
csv compiler-construction antlr
asked Nov 24 '18 at 6:40
J.KhellyJ.Khelly
145
145
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
This depends on what version of ANTLR you're using:
- In ANTLR 3
- The most common approach was to use Tree Construction instructions to create a (modified) parse tree / AST, then walk through that tree as needed.
- A less common approach in ANTLR 3 is to embed actions (in target language) directly into grammar rules to capture and interpret the parsed input.
- In ANTLR 4, you use a Listener or a Visitor to process the parsed input.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455817%2fgenerating-intermediate-code-using-syntax-directed-translation-in-antlr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This depends on what version of ANTLR you're using:
- In ANTLR 3
- The most common approach was to use Tree Construction instructions to create a (modified) parse tree / AST, then walk through that tree as needed.
- A less common approach in ANTLR 3 is to embed actions (in target language) directly into grammar rules to capture and interpret the parsed input.
- In ANTLR 4, you use a Listener or a Visitor to process the parsed input.
add a comment |
This depends on what version of ANTLR you're using:
- In ANTLR 3
- The most common approach was to use Tree Construction instructions to create a (modified) parse tree / AST, then walk through that tree as needed.
- A less common approach in ANTLR 3 is to embed actions (in target language) directly into grammar rules to capture and interpret the parsed input.
- In ANTLR 4, you use a Listener or a Visitor to process the parsed input.
add a comment |
This depends on what version of ANTLR you're using:
- In ANTLR 3
- The most common approach was to use Tree Construction instructions to create a (modified) parse tree / AST, then walk through that tree as needed.
- A less common approach in ANTLR 3 is to embed actions (in target language) directly into grammar rules to capture and interpret the parsed input.
- In ANTLR 4, you use a Listener or a Visitor to process the parsed input.
This depends on what version of ANTLR you're using:
- In ANTLR 3
- The most common approach was to use Tree Construction instructions to create a (modified) parse tree / AST, then walk through that tree as needed.
- A less common approach in ANTLR 3 is to embed actions (in target language) directly into grammar rules to capture and interpret the parsed input.
- In ANTLR 4, you use a Listener or a Visitor to process the parsed input.
answered Nov 25 '18 at 21:09
Jiri TousekJiri Tousek
10.2k52138
10.2k52138
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53455817%2fgenerating-intermediate-code-using-syntax-directed-translation-in-antlr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown