[译]Let’s Build A Simple Interpreter. Part 6.

原文地址：Let’s Build A Simple Interpreter. Part 6.

今天是很重要的一天：）“为什么？”你可能会问。原因是今天我们会结束关于算术表达式的讨论（好吧，基本结束），给我们的语法中加入括号，实现可以对嵌套任意层括号的表达式进行求值，比如 7 + 3 * (10 / (12 / (3 + 1) - 1)).

让我们开始，好吗？

首先，要修改我们的语法使之支持括号。就像在第五部分中一样，规则 factor 被用来表示表达式中的基本单元。在那篇文章中，我们仅有的基本单元就是整数。今天我们会增加另一个基本单元──括号表达式。我们开始吧。

以下是更新后的语法：

Figure 1: grammar

expr   : term   ((PLUS | MINUS) term)*
term   : factor ((MUL | DIV) factor)*
factor : INTEGER | LPAREN expr RPAREN

产生式 expr 和 term 和第五部分中一样，只有产生式 factor 有一些变化，其中终结符 LPAREN 表示左括号‘(’, RPAREN 表示右括号‘)’, 括号中间的非终结符 expr 引用了规则 expr.

以下是 factor 更新过的句法图，现在包含了多选一：

Figure 2: factor diagram

由于 expr 和 term 的语法规则没有改变，所以它们的语法图和第五部分中一样：

Figure 3: term diagram

我们的新语法中有一个有趣的特点──它是递归的。如果你尝试派生表达式 2 * (7 + 3), 会由开始符号 expr 开始并在某一时刻递归地使用规则 expr 来派生原始算术表达式中的 (7 + 3) 部分。

让我们根据语法来分解表达式 2 * (7 + 3), 来看看它是如何工作的：

Figure 4: decomposition

一点提示：如果你需要复习一下递归，看一下Daniel P. Friedman 和 Matthias Felleisen 的《The Little Schemer》这本书──很不错。

好了，让我们继续前进把我们的新语法变成代码。

与前面的文章中的代码相比于最大的变化是：

Lexer 修改后可以多返回两种 token: 表示左括号的 LPAREN 和表示右括号的 RPAREN.
Interpreter 的 factor 方法做了一点升级，现在除了整数还可以解析括号表达式。

下面是可以对包含整数、任意数量加减乘除操作符、嵌套任意层数的括号的算术表达式进行求值的计算器的完整代码：

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, MUL, DIV, LPAREN, RPAREN, EOF = (
    'INTEGER', 'PLUS', 'MINUS', 'MUL', 'DIV', 'LPAREN', 'RPAREN', 'EOF'
)

class Token():
    def __init__(self, type, value):
	self.type = type
	self.value = value

    def __str__(self):
	"""String representation of the class instance.

	Examples:
	    Token(INTEGER, 3)
	    Token(PLUS, '+')
	    Token(MUL, '*')
	"""
	return f'Token({self.type}, {repr(self.value)})'

    def __repr__(self):
	return self.__str__()

class Lexer():
    def __init__(self, text):
	# client string input, e.g. "4 + 2 * 3 - 6 / 2"
	self.text = text
	# self.pos is an index into self.text
	self.pos = 0
	self.current_char = self.text[self.pos]

    def error(self):
	raise Exception('Lexer error')

    def advance(self):
	"""Advance the `pos` pointer and set the `current_char` variable."""
	self.pos += 1
	if self.pos >= len(self.text):
	    self.current_char = None  # Indicates end of input
	else:
	    self.current_char = self.text[self.pos]

    def skip_whitespace(self):
	while self.current_char is not None and self.current_char.isspace():
	    self.advance()

    def integer(self):
	"""Return a (multidigit) integer consumed from the input."""
	value = ''
	while self.current_char is not None and self.current_char.isdigit():
	    value += self.current_char
	    self.advance()
	return int(value)

    def get_next_token(self):
	"""Lexical analyzer (also known as scanner or tokenizer)

	This method is responsible for breaking a sentence
	apart into tokens. One token at a time.
	"""
	while self.current_char is not None:
	    if self.current_char.isspace():
		self.skip_whitespace()
		continue
	    elif self.current_char.isdigit():
		return Token(INTEGER, self.integer())
	    elif self.current_char == '+':
		self.advance()
		return Token(PLUS, '+')
	    elif self.current_char == '-':
		self.advance()
		return Token(MINUS, '-')
	    elif self.current_char == '*':
		self.advance()
		return Token(MUL, '*')
	    elif self.current_char == '/':
		self.advance()
		return Token(DIV, '/')
	    elif self.current_char == '(':
		self.advance()
		return Token(LPAREN, '(')
	    elif self.current_char == ')':
		self.advance()
		return Token(RPAREN, ')')
	    else:
		self.error()
	return Token(EOF, None)

class Interpreter():
    def __init__(self, lexer):
	self.lexer = lexer
	# set current token to the first token taken from the input
	self.current_token = self.lexer.get_next_token()

    def error(self):
	raise Exception('Interpreter error')

    def eat(self, token_type):
	# compare the current token type with the passed token
	# type and if they match then "eat" the current token
	# and assign the next token to the self.current_token,
	# otherwise raise an exception.
	if self.current_token.type == token_type:
	    self.current_token = self.lexer.get_next_token()
	else:
	    self.error()

    def factor(self):
	"""factor : INTEGER | LPAREN expr RPAREN"""
	token = self.current_token
	if token.type == INTEGER:
	    self.eat(INTEGER)
	    return token.value
	elif self.current_token.type == LPAREN:
	    self.eat(LPAREN)
	    value = self.expr()
	    self.eat(RPAREN)
	    return value

    def term(self):
	"""term : factor ((MUL | DIV) factor)*"""
	value = self.factor()

	while self.current_token.type in (MUL, DIV):
	    token_type = self.current_token.type
	    if token_type == MUL:
		self.eat(MUL)
		value *= self.factor()
	    elif token_type == DIV:
		self.eat(DIV)
		value /= self.factor()
	return value

    def expr(self):
	"""Arithmetic expression parser / interpreter.

	calc> 7 + 3 * (10 / (12 / (3 + 1) - 1))
	22

	expr   : term ((PLUS | MINUS) term)*
	term   : factor ((MUL | DIV) factor)*
	factor : INTEGER | LPAREN expr RPAREN
	"""
	value = self.term()

	while self.current_token.type in (PLUS, MINUS):
	    token_type = self.current_token.type
	    if token_type == PLUS:
		self.eat(PLUS)
		value += self.term()
	    elif token_type == MINUS:
		self.eat(MINUS)
		value -= self.term()
	return value

    def parse(self):
	return self.expr()

def main():
    while True:
	try:
	    text = input('calc> ')
	except EOFError:
	    print()
	    break

	if len(text.strip()):
	    interpreter = Interpreter(Lexer(text))
	    print(interpreter.parse())
	else:
	    continue

if __name__ == '__main__':
    main()

将以上代码保存到名为 calc6.py 的文件中，自己尝试一下，确认你的新解释器可以正确地对包含不同操作符和括号的算术表达式进行求值。

下面是某次运行过程：

$ python calc6.py
calc> 3
3
calc> 2 + 7 * 4
30
calc> 7 - 8 / 4
5
calc> 14 + 2 * 3 - 6 / 2
17
calc> 7 + 3 * (10 / (12 / (3 + 1) - 1))
22
calc> 7 + 3 * (10 / (12 / (3 + 1) - 1)) / (2 + 3) - 5 - 3 + (8)
10
calc> 7 + (((3 + 2)))
12

下面是今天的新练习：

Figure 5: exercises

如本文的描述，写一个你自己版本的算术表达式解释器。记住：重复是学习之母。

哈，你从头读到了尾！祝贺，你刚学到了（如果你做了练习──而且真的写了）怎么建立一个基本的可以处理相当复杂算术表达式的 递归下降parser/interpreter.

下一篇文章我会讨论 递归下降parser 的更多细节。我还会介绍一个构建解释器和编译器时非常重要的且广泛应用的数据结构，这个系列中都会用到。

敬请关注，很快再见。直到下次之前，继续学习解释器而且别忘了最重要的事：玩得开心，享受这个过程！

下面是我推荐的帮助你学习解释器和编译器的一些书：

顺便说一下，我还在写一本书“et’s Build A Web Server: First Steps”，解释了怎么从头写一个基本的 Web 服务器。你可以从这里，这里和这里来感受一下这本书。

本系列的所有文章：

Last Updated 2018-05-26 Sat 14:12.
Render by hexo-renderer-org with Emacs 25.2.2 (Org mode 9.1.13)