深度优先

这个家伙好懒,除了文章什么都没留下

0%

【CSharp】解析if-else-if语句算法

https://www.coder.work/article/1591477

我试图为一个if-else类型结构创建一个非常简单的解析器,它将构建并执行一个sql语句。
与其测试执行语句的条件,不如测试生成字符串的条件。
例如:

1
2
3
4
5
6
7
8
9
10
select column1
from
#if(VariableA = Case1)
table1
#else if(VariableA = Case2)
table2
#else
defaultTable
#end

如果variablea等于case1,则结果字符串应为:select column1 from table1
一个更复杂的例子是嵌套if语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
select column1
from
#if(VariableA = Case1)
#if(VariableB = Case3)
table3
#else
table4
#else if(VariableA = Case2)
table2
#else
defaultTable
#end

这是我真正遇到麻烦的地方,我想不出一个好的方法来正确识别每个if-else结束组。
另外,我不确定跟踪“else”子句中的字符串是否应该计算为true的好方法。
我一直在网上寻找不同类型的解析算法,它们看起来都非常抽象和复杂。
对于这个非计算机专业的学生,有什么好的开始的建议吗?

最佳答案

我编写了一个简单的解析器,并根据您提供的示例进行了测试。如果你想了解更多关于解析的知识,我建议你阅读niklaus wirth的Compiler Construction
第一步总是以适当的方式写下你的语言的语法。我选择了ebnf,这很容易理解。
|分离备选方案。
[]包含选项。
{}表示重复(零次、一次或多次)。
()组表达式(此处不使用)。
此描述不完整,但我提供的链接对其进行了更详细的描述。
ebnf语法

1
2
3
4
5
6
7
8
9
LineSequence = { TextLine | IfStatement }.
TextLine = <string>.
IfStatement = IfLine LineSequence { ElseIfLine LineSequence } [ ElseLine LineSequence ] EndLine.
IfLine = "#if" "(" Condition ")".
ElseLine = "#else".
ElseIfLine = "#else" "if" "(" Condition ")".
EndLine = "#end".
Condition = Identifier "=" Identifier.
Identifier = <letter_or_underline> { <letter_or_underline> | <digit> }.

解析器严格遵循语法,即将重复翻译成循环,将替代翻译成if-else语句,依此类推。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Windows.Forms;

namespace Example.SqlPreprocessor
{
class Parser
{
enum Symbol
{
None,
LPar,
RPar,
Equals,
Text,
NumberIf,
If,
NumberElse,
NumberEnd,
Identifier
}

List<string> _input; // Raw SQL with preprocessor directives.
int _currentLineIndex = 0;

// Simulates variables used in conditions
Dictionary<string, string> _variableValues = new Dictionary<string, string> {
{ "VariableA", "Case1" },
{ "VariableB", "CaseX" }
};

Symbol _sy; // Current symbol.
string _string; // Identifier or text line;
Queue<string> _textQueue = new Queue<string>(); // Buffered text parts of a single line.
int _lineNo; // Current line number for error messages.
string _line; // Current line for error messages.

/// <summary>
/// Get the next line from the input.
/// </summary>
/// <returns>Input line or null if no more lines are available.</returns>
string GetLine()
{
if (_currentLineIndex >= _input.Count) {
return null;
}
_line = _input[_currentLineIndex++];
_lineNo = _currentLineIndex;
return _line;
}

/// <summary>
/// Get the next symbol from the input stream and stores it in _sy.
/// </summary>
void GetSy()
{
string s;
if (_textQueue.Count > 0) { // Buffered text parts available, use one from these.
s = _textQueue.Dequeue();
switch (s.ToLower()) {
case "(":
_sy = Symbol.LPar;
break;
case ")":
_sy = Symbol.RPar;
break;
case "=":
_sy = Symbol.Equals;
break;
case "if":
_sy = Symbol.If;
break;
default:
_sy = Symbol.Identifier;
_string = s;
break;
}
return;
}

// Get next line from input.
s = GetLine();
if (s == null) {
_sy = Symbol.None;
return;
}

s = s.Trim(' ', '\t');
if (s[0] == '#') { // We have a preprocessor directive.
// Split the line in order to be able get its symbols.
string[] parts = Regex.Split(s, @"\b|[^#_a-zA-Z0-9()=]");
// parts[0] = #
// parts[1] = if, else, end
switch (parts[1].ToLower()) {
case "if":
_sy = Symbol.NumberIf;
break;
case "else":
_sy = Symbol.NumberElse;
break;
case "end":
_sy = Symbol.NumberEnd;
break;
default:
Error("Invalid symbol #{0}", parts[1]);
break;
}

// Store the remaining parts for later.
for (int i = 2; i < parts.Length; i++) {
string part = parts[i].Trim(' ', '\t');
if (part != "") {
_textQueue.Enqueue(part);
}
}
} else { // We have an ordinary SQL text line.
_sy = Symbol.Text;
_string = s;
}
}

void Error(string message, params object[] args)
{
// Make sure parsing stops here
_sy = Symbol.None;
_textQueue.Clear();
_input.Clear();

message = String.Format(message, args) +
String.Format(" in line {0}\r\n\r\n{1}", _lineNo, _line);
Output("------");
Output(message);
MessageBox.Show(message, "Error");
}

/// <summary>
/// Writes the processed line to a (simulated) output stream.
/// </summary>
/// <param name="line">Line to be written to output</param>
void Output(string line)
{
Console.WriteLine(line);
}

/// <summary>
/// Starts the parsing process.
/// </summary>
public void Parse()
{
// Simulate an input stream.
_input = new List<string> {
"select column1",
"from",
"#if(VariableA = Case1)",
" #if(VariableB = Case3)",
" table3",
" #else",
" table4",
" #end",
"#else if(VariableA = Case2)",
" table2",
"#else",
" defaultTable",
"#end"
};

// Clear previous parsing
_textQueue.Clear();
_currentLineIndex = 0;

// Get first symbol and start parsing
GetSy();
if (LineSequence(true)) { // Finished parsing successfully.
//TODO: Do something with the generated SQL
} else { // Error encountered.
Output("*** ABORTED ***");
}
}

// The following methods parse according the the EBNF syntax.

bool LineSequence(bool writeOutput)
{
// EBNF: LineSequence = { TextLine | IfStatement }.
while (_sy == Symbol.Text || _sy == Symbol.NumberIf) {
if (_sy == Symbol.Text) {
if (!TextLine(writeOutput)) {
return false;
}
} else { // _sy == Symbol.NumberIf
if (!IfStatement(writeOutput)) {
return false;
}
}
}
return true;
}

bool TextLine(bool writeOutput)
{
// EBNF: TextLine = <string>.
if (writeOutput) {
Output(_string);
}
GetSy();
return true;
}

bool IfStatement(bool writeOutput)
{
// EBNF: IfStatement = IfLine LineSequence { ElseIfLine LineSequence } [ ElseLine LineSequence ] EndLine.
bool result;
if (IfLine(out result) && LineSequence(writeOutput && result)) {
writeOutput &= !result; // Only one section can produce an output.
while (_sy == Symbol.NumberElse) {
GetSy();
if (_sy == Symbol.If) { // We have an #else if
if (!ElseIfLine(out result)) {
return false;
}
if (!LineSequence(writeOutput && result)) {
return false;
}
writeOutput &= !result; // Only one section can produce an output.
} else { // We have a simple #else
if (!LineSequence(writeOutput)) {
return false;
}
break; // We can have only one #else statement.
}
}
if (_sy != Symbol.NumberEnd) {
Error("'#end' expected");
return false;
}
GetSy();
return true;
}
return false;
}

bool IfLine(out bool result)
{
// EBNF: IfLine = "#if" "(" Condition ")".
result = false;
GetSy();
if (_sy != Symbol.LPar) {
Error("'(' expected");
return false;
}
GetSy();
if (!Condition(out result)) {
return false;
}
if (_sy != Symbol.RPar) {
Error("')' expected");
return false;
}
GetSy();
return true;
}

private bool Condition(out bool result)
{
// EBNF: Condition = Identifier "=" Identifier.
string variable;
string expectedValue;
string variableValue;

result = false;
// Identifier "=" Identifier
if (_sy != Symbol.Identifier) {
Error("Identifier expected");
return false;
}
variable = _string; // The first identifier is a variable.
GetSy();
if (_sy != Symbol.Equals) {
Error("'=' expected");
return false;
}
GetSy();
if (_sy != Symbol.Identifier) {
Error("Value expected");
return false;
}
expectedValue = _string; // The second identifier is a value.

// Search the variable
if (_variableValues.TryGetValue(variable, out variableValue)) {
result = variableValue == expectedValue; // Perform the comparison.
} else {
Error("Variable '{0}' not found", variable);
return false;
}

GetSy();
return true;
}

bool ElseIfLine(out bool result)
{
// EBNF: ElseIfLine = "#else" "if" "(" Condition ")".
result = false;
GetSy(); // "#else" already processed here, we are only called if the symbol is "if"
if (_sy != Symbol.LPar) {
Error("'(' expected");
return false;
}
GetSy();
if (!Condition(out result)) {
return false;
}
if (_sy != Symbol.RPar) {
Error("')' expected");
return false;
}
GetSy();
return true;
}
}
}

注意,嵌套的if语句是以非常自然的方式自动处理的。首先,语法是递归表达的。aLineSequence可以包含IfStatments,IfStatments包含LineSequences。其次,这会导致语法处理方法以递归方式相互调用。因此,语法元素的嵌套被转换为递归方法调用。