BEAM 简介
这篇文章是对 BEAM 的一个简要入门,BEAM 是在 Erlang 运行时系统 (ERTS) 中执行用户代码的虚拟机。它的目的是帮助那些刚接触 BEAM 的人,能够理解即将发布的关于 OTP 24 中 JIT 的系列文章,而将实现细节留到以后再讲。
BEAM 经常与 ERTS 混淆,区分两者很重要;BEAM 只是虚拟机,它没有进程、端口、ETS 表等概念。它仅仅执行指令,虽然 ERTS 影响了它们的设计,但这并不影响代码运行时它们的功能,所以你不需要理解 ERTS 来理解 BEAM。
BEAM 是一个寄存器机器,所有指令都对命名寄存器进行操作。每个寄存器可以包含任何 Erlang 项,例如整数或元组,你可以将它们视为简单的变量。两种最重要的寄存器类型是:
X
:这些寄存器用于临时数据和在函数之间传递数据。它们不需要堆栈帧,可以在任何函数中自由使用,但有一些限制,我们将在后面详细介绍。Y
:这些寄存器是每个堆栈帧本地的,除了需要堆栈帧之外,没有特殊的限制。
控制流由测试特定条件的指令处理,并跳转到下一个指令或分支到其失败标签,用 {f,Index}
表示。例如,{test,is_integer,{f,7},[{x,0}]}.
检查 {x,0}
是否包含一个整数,如果不是,则跳转到标签 7。
函数参数从左到右在 X
寄存器中传递,从 {x,0}
开始,结果在 {x,0}
中返回。
通过示例更容易解释它是如何组合在一起的,所以让我们看几个例子
sum_tail(List) ->
sum_tail(List, 0).
sum_tail([Head | Tail], Acc) ->
sum_tail(Tail, Head + Acc);
sum_tail([], Acc) ->
Acc.
让我们使用 erlc -S
逐个查看指令
%% sum_tail/1, entry label is 2 {function, sum_tail, 1, 2}. %% Marks a jump target with the label 1. {label,1}. %% Special instruction that raises a function_clause %% exception. Unused in this function. {func_info,{atom,primer},{atom,sum_tail},1}. {label,2}. %% The meat of the function starts here. %% %% Our only argument - List - is in {x,0} and %% since sum_tail/2 expects it to be the first %% argument we can leave it be. We'll pass the %% integer 0 as the second argument in {x,1}. {move,{integer,0},{x,1}}. %% Tail call sum_tail/2, whose entry label is 4. {call_only,2,{f,4}}. %% sum_tail/2, entry label is 4 {function, sum_tail, 2, 4}. {label,3}. {func_info,{atom,primer},{atom,sum_tail},2}. {label,4}. %% Test whether we have a non-empty list, and jump to %% the base case at label 5 if we don't. {test,is_nonempty_list,{f,5},[{x,0}]}. %% Unpack the list in the first argument, placing the %% head in {x,2} and the tail in {x,0}. {get_list,{x,0},{x,2},{x,0}}. %% Add the head and our accumulator (remember that the %% second function argument is in {x,1}), and place %% the result in {x,1}. %% %% A fail label of 0 means that we want the %% instruction to throw an exception on error, rather %% than jump to a given label. {gc_bif,'+',{f,0},3,[{x,2},{x,1}],{x,1}}. %% Tail-call ourselves to handle the rest of the list, %% the arguments are already in the right registers. {call_only,2,{f,4}}. {label,5}. %% Test whether our argument was the empty list. If %% not, we jump to label 3 to raise a function_clause %% exception. {test,is_nil,{f,3},[{x,0}]}. %% Return our accumulator. {move,{x,1},{x,0}}. return.
很简单,不是吗?
不过,我忽略了一个小细节;加法指令中神秘的数字 3
。这个数字告诉我们,如果我们需要更多内存,有多少 X
寄存器保存着活动数据,以便在其余的寄存器作为垃圾丢弃时可以保留它们。因此,在此指令之后引用较高的 X
寄存器是不安全的,因为它们的内容可能无效(在这种情况下是 {x,3}
及以上)。
函数调用类似;每当我们调用或从函数返回时,我们可能会安排自己退出,并且只有当我们这样做时才会保留函数参数/返回值。这意味着,即使您确定被调用的函数没有触及某个寄存器,调用后除 {x,0}
之外的所有 X
寄存器都无效。
这就是 Y
寄存器发挥作用的地方。让我们以前面的例子为例,并使其成为尾递归
sum_body([Head | Tail]) ->
Head + sum_body(Tail);
sum_body([]) ->
0.
{function, sum_body, 1, 7}. {label,6}. {func_info,{atom,primer},{atom,sum_body},1}. {label,7}. {test,is_nonempty_list,{f,8},[{x,0}]}. %% Allocate a stack frame with a single Y register. %% Since this instruction may need more memory, we %% tell the garbage collector that we currently have %% one live X register (our list argument in {x,0}). {allocate,1,1}. %% Unpack the list, placing the head in {y,0} and %% the tail in {x,0}. {get_list,{x,0},{y,0},{x,0}}. %% Body-call ourselves. Note that while this kills all %% X registers, it leaves Y registers alone so our %% head is still valid. {call,1,{f,7}}. %% Add the head to our return value and store the %% result in {x,0}. {gc_bif,'+',{f,0},1,[{y,0},{x,0}],{x,0}}. %% Deallocate our stack frame and return. {deallocate,1}. return. {label,8}. {test,is_nil,{f,6},[{x,0}]}. %% Return the integer 0. {move,{integer,0},{x,0}}. return.
注意,现在我们处于堆栈帧中,调用指令是如何变化的?有三种不同的调用指令
call
:如示例中的普通调用。当被调用的函数返回时,控制流将恢复到下一条指令。call_last
:当存在堆栈帧时的尾调用。当前帧将在调用之前被释放。call_only
:当没有堆栈帧时的尾调用。
这些指令中的每一个都有一个用于调用其他模块中的函数的变体(例如 call_ext
),但除此之外,它们是相同的。
到目前为止,我们只看了使用术语,那么如何创建它们呢?让我们来看一下
create_tuple(Term) ->
{hello, Term}.
{function, create_tuple, 1, 10}. {label,9}. {func_info,{atom,primer},{atom,create_tuple},1}. {label,10}. %% Allocate the three words needed for a 2-tuple, with %% a liveness annotation of 1 indicating that {x,0} %% is alive in case we need to GC. {test_heap,3,1}. %% Create the tuple and place the result in {x,0} {put_tuple2,{x,0},{list,[{atom,hello},{x,0}]}}. return.
这有点神奇,因为有一个看不见的寄存器用于内存分配,但分配很少与使用相差甚远,并且通常很容易追踪。同样的原则也适用于列表(consing)、浮点数和 fun 函数,遵循 PR 2765。
更复杂的类型,如映射、大整数、引用等,由特殊指令创建,这些指令可能会自行 GC(或在“堆片段”中分配到堆之外),因为它们的大小无法提前静态确定。
现在让我们看看一些不常见的:异常。
exception() ->
try
external:call()
catch
throw:example -> hello
end.
{function, exception, 0, 12}. {label,11}. {func_info,{atom,primer},{atom,exception},0}. {label,12}. {allocate,1,0}. %% Place a catch tag in {y,0}. If an exception is %% raised while this tag is the most current one, %% the control flow will resume at {f,13} in this %% stack frame. {'try',{y,0},{f,13}}. {call_ext,0,{extfunc,external,call,0}}. %% Deactivate the catch tag before returning with the %% result from the call. {try_end,{y,0}}. {deallocate,1}. return. {label,13}. %% Uh oh, we've got an exception. Kill the catch tag %% and place the exception class in {x,0}, the error %% reason/thrown value in {x,1}, and the stack trace %% in {x,2}. {try_case,{y,0}}. %% Return 'hello' if the user threw 'example' {test,is_eq_exact,{f,14},[{x,0},{atom,throw}]}. {test,is_eq_exact,{f,14},[{x,1},{atom,example}]}. {move,{atom,hello},{x,0}}. {deallocate,1}. return. {label,14}. %% Otherwise, rethrow the exception since no catch %% clause matched. {bif,raise,{f,0},[{x,2},{x,1}],{x,0}}.
到现在为止,您可能已经注意到控制流是如何只向前移动的;就像 Erlang 本身一样,循环的唯一方法是通过递归。唯一的例外是 receive 构造,它可能会循环直到收到匹配的消息
selective_receive(Ref) ->
receive
{Ref, Result} -> Result
end.
{function, selective_receive, 1, 16}. {label,15}. {func_info,{atom,primer},{atom,selective_receive},1}. {label,16}. {allocate,1,1}. %% We may be scheduled out while waiting for a %% message, so we'll preserve our Ref in {y,0}. {move,{x,0},{y,0}}. {label,17}. %% Pick the next message from the process' message box %% and place it in {x,0}, jumping to label 19 if the %% message box is empty. {loop_rec,{f,19},{x,0}}. %% Does it match our pattern? If not, jump to label 18 %% and try the next message. {test,is_tuple,{f,18},[{x,0}]}. {test,test_arity,{f,18},[{x,0},2]}. {get_tuple_element,{x,0},0,{x,1}}. {test,is_eq_exact,{f,18},[{x,1},{y,0}]}. %% We've got a match, extract the result and remove %% the message from the mailbox. {get_tuple_element,{x,0},1,{x,0}}. remove_message. {deallocate,1}. return. {label,18}. %% The message didn't match, loop back to handle our %% next message. Note that the current message remains %% in the inbox since a different receive may be %% interested in it. {loop_rec_end,{f,17}}. {label,19}. %% Wait until the next message arrives, returning to %% the start of the loop when it does. If there's a %% timeout involved, it will be handled here. {wait,{f,17}}.
没有更多内容了,如果您对上面的示例感到满意,那么您应该可以顺利地阅读 JIT 系列文章。
如果您好奇有哪些指令,您可以在 genop.tab 中找到每个指令的简要描述。