The architecture package¶
Supported CPU architectures are implemented in this package as subpackages and all
use the arch.core
generic classes. The interface to a CPU used by
system classes is implemented as a cpu_XXX.py
module usually in the architecture’s subpackage.
This CPU module will:
- provide the CPU environment (registers and other internals)
- provide an instance of
arch.core.disassembler
class, which requires to:- define an instruction class based on
arch.core.instruction
- define the
arch.core.ispec
of every instruction for the generic decoder, - and define the semantics of every instruction with
cas.expressions
.
- define an instruction class based on
- optionnally define the output assembly format, and the GNU as (or any other) assembly parser.
- optionnally define the function
PC()
that allows generic analysis to which register represents the instructions’ pointer.
A simple example is provided by the arch.arm.v8 architecture which implements
a model of ARM AArch64:
The interface CPU module is arch.arm.cpu_armv8
,
which imports everything from the arch.arm.v8
subpackage.
Adding support for a new cpu module¶
The cpu environment¶
It all starts with the definition of the cpu environment in a dedicated module.
This module defines registers as instances of cas.expressions.reg
,
and associated register slices with cas.expressions.slc
if necessary.
For example, x86 register eax
and its slices are defined in arch.x86.env
as:
eax = reg("eax",32)
ax = slc(eax, 0, 16, "ax")
al = slc(eax, 0, 8 , "al")
ah = slc(eax, 8, 8 , "ah")
In order to improve code analysis and views,
some registers should be bound to their special cas.expressions.regtype
,
using one of the dedicated callable or context manager.
For example, the stack pointer should be bound to regtype 'STACK'
using:
esp = is_reg_stack(reg('esp',32))
or alternatively using a context manager:
with is_reg_stack:
esp = reg('esp',32)
Defined regtypes are:
cas.expressions.is_reg_pc
cas.expressions.is_reg_flags
cas.expressions.is_reg_stack
cas.expressions.is_reg_other
Once all needed registers are defined, it is recommended to define also an
ordered list called registers
which will be used by emulator instances
for registers views.
Finally, the cpu environment sometimes also needs to define some
internal parameters that change the way instructions are decoded or the
memory endianness. For example, the arch.arm.v7.env module defines
internals for isetstate
to change the instruction set from ARM to
Thumb, and endianstate
to change endianness.
These internal parameters differ from regular registers by the fact
that they are not defined as expressions and thus cannot be symbolic.
Instructions specifications¶
The instructions’ specifications are then defined in a module as well.
An instruction’s specification is an instance of arch.core.ispec
that decorates a function which performs setup of an instruction’s instance.
The specification describes how the instruction is decoded out of bytes in
a way that allows the decorated function to setup instruction’s operands and
any other characteristics from the decoded values. This description allows
to follow CPU datasheet’s instructions manual very closely. Moreover, thanks
to how decorator work, several specs can share the same setup function.
For example, we have in the MIPS R3000 instructions’ spec module:
@ispec("32<[ 001100 rs(5) rt(5) imm(16) ]", mnemonic="ANDI")
@ispec("32<[ 001101 rs(5) rt(5) imm(16) ]", mnemonic="ORI")
@ispec("32<[ 001110 rs(5) rt(5) imm(16) ]", mnemonic="XORI")
def mips1_dri(obj, rs, rt, imm):
src1 = env.R[rs]
imm = env.cst(imm, 32)
dst = env.R[rt]
obj.operands = [dst, src1, imm]
obj.type = type_data_processing
Here, obj
is an instruction instanciated by the disassembler, if decoded
bytes matches one of these spec definitions. In such case, the setup function
is called with arguments rs
, rt
and imm
being ints decoded from the
corresponding bits (see arch.core.ispec
below.)
Any instruction setup should define at least an obj.operands
list and
should indicate one of the following obj.type
:
type_data_processing
, which are well-defined instructions,type_control_flow
, which mark default ending of assembly blocks,type_cpu_state
, which may change the cpu internal registers,type_system
, which have usually no impact on code semantics,type_other
The cpu disassembler¶
When the specification module is done, the cpu disassembler can be instanciated.
First a new local instruction class should be derived from the generic
arch.core.instruction
with:
from amoco.arch.core import instruction
instruction_X = type("instruction_X", (instruction,), {})
Then, a disassembler instance is obtained with:
from amoco.arch.core import disassembler
from amoco.arch.X import spec_X, spec_thumb
disassemble = disassembler([spec_X], iclass=instruction_X)
The first argument is the list of available specifications. Most architectures
have only one mode but some like ARM can switch from a default mode (ARM) to
an alternate mode like Thumb (see class definition mode
argument.)
The second is our new instruction class.
By default, disassemblers will fetch instructions in little-endian, but the
endian
parameter allows to fetch in big-endian. For example the ARMv7
architecture’s disassembler is:
mode = lambda: internals["isetstate"]
endian = lambda: 1 if internals["ibigend"] == 0 else -1
disassemble = disassembler([spec_armv7, spec_thumb],
instruction_armv7,
mode,
endian)
which allows the semantics to possibly change both the mode and the instructions’ endianness dynamically.
Instructions semantics¶
An instruction’s semantics is a function associated to the instruction’s
mnemonic which operates on a cas.mapper.mapper
object.
The function’s name should be “i_XXX” for mnemonic “XXX”.
The mapper argument enables transitions from a state to another state.
For example, the semantics of all MIPS R3000 AND
instructions is:
@__npc
def i_AND(ins, fmap):
dst, src1, src2 = ins.operands
if dst is not zero:
fmap[dst] = fmap(src1&src2)
The first argument is the disassembled instruction object and the
second argument is the mapper (i.e. the state).
We simply create local variables from the operands list and then
update the state according to these operands:
Thus, the mapper is modified by
setting the first operand expression to the mapper’s evaluation
of the cas.expressions.op
formed by src1 & src2
.
Of course, since we want symbolic semantics these functions might end-up being quite complex especially for conditional stuff. For example, like in the case of this weird unaligned load word MIPS R3000 instruction:
@__npc
def i_LWL(ins, fmap):
dst, base, src = ins.operands
addr = base+src
if dst is not zero:
fmap[dst[24:32]] = fmap(mem(addr,8))
cond1 = (addr%4)!=0
fmap[dst[16:24]] = fmap(tst(cond1,mem(addr-1,8),dst[16:24]))
addr = addr - 1
cond2 = cond1 & ((addr%4)!=0)
fmap[dst[8:16]] = fmap(tst(cond2,mem(addr-1,8),dst[8:16]))
fmap[dst] = fmap[dst].simplify()
Here, the number of bytes read from memory depends on the word-alignement of the address value. This instruction is thus normally coupled with a LWR which performs the read from memory of the rest of bytes accross the word-alignment. In concrete semantics, this is quite simple to write since address alignment is always computable and thus 3 cases are possible. In symbolic semantics, things are more tricky since address is symbolic and thus the resulting writeback to dst register is a symbolic expression that must take into account 3 cases at once.
Updating the cpu instruction pointer¶
Now, instruction’s semantics must also update the cpu PC()
.
In the MIPS case, this is performed by using
the __npc
decorator role which updates pc
and npc
as well
to handle delay slot cases.
Architectures without delay slots can just advance their program’s
counter by the length of the instruction. Architectures with delay
slots can always handle delayed branches by relying on intermediate
(hidden) program counters. This is the case for arch.sparc
and
arch.MIPS
where __npc
does:
pc <- npc
npc <- npc+4
and since branch instructions have an effect on npc
once they have been processed, the next instruction to execute
(the one located at pc
,) is still just after the branch instruction.
However, special care must be taken to avoid pitfalls… A common mistake is to believe that the delay slot instruction is executed before the branch instruction as if the two instructions were simply swapped. This is not true. The branch effectively occurs after, but its operands are still evaluated before the delay slot has had time to execute! For example the MIPS R3000 sequence:
liu t7, 0x5
liu t6, 0x2
bne t7, t6, *somewhere
addiu t7, t7, -0x3
will lead to a branch not taken. See pipelining discussion below for details…
A Note on cpu pipelining and cycle-accurate emulation¶
For most architectures, the instruction parallelism introduced
by the underlying pipeline does not interfer with the semantics.
What this means is that for example,
assuming R1=0, R2=1, R3=1
the generic case of:
OR R1, R2, R3
ADD R4, R1, 1
should obviously lead to R4=2
anyways, because pipelining is
implemented to improve performance but shouldn’t have any impact
on semantics.
Hence, we can always emulate instructions as if
no parallelism existed. Right ? Well, not exactly…
All pipelines have pipeline hazard, ie. situations
that could lead to undefined behaviors if not handled correctly.
In our example above, the R1
register is really updated after
the ALU has performed its operation on R2
and R3
values.
Meanwhile, the ADD
instruction wants to read R1
value as soon
as the instruction is decoded (after it was fetched,)
and would consequently read its value before it is updated.
Thus, pipelines have internal mechanism to detect these kind of situations
and either stall the pipeline (wait for R1
to be written back before
being used) or forward things back to other stages as soon as possible.
In this case, the ALU forwards its result immediately to back to
the ALU entry multiplexer before being updated in R1
later.
Unfortunately, some old architectures like MIPS[#]_ R3000 handled only a limited set of these pipeline hazard and heavily relied on the compiler to avoid some instructions’ flows (usually by inserting nops.) In MIPS R3000 architecture, the above case is handled correctly unless a load/store is involved like in:
lbu v0, 0x1(a1)
nop
sll v0, v0, 0x8
Here, the compiler has inserted a nop
to ensure that the loaded
byte has been fetched and can be forwarded to the ALU for sll
.
Hence, as long as we emulate code produced by compliant compilers,
we still can ignore the underlying pipeline operations. But this
is not true anymore in the general cases.
Since most of the time we can’t make this assumption, instructions
can’t formally be emulated as if no parallelism existed.
If we ever have MIPS R3000 code with:
lbu v0, 0x1(a1)
sll v0, v0, 0x8
then the resulting mapper is not v0 <- mem(a1+0x1,8)<<8
but rather
something that highly depends on the involved pipeline interlocking
mechanism, most likely v0 <- v0<<8
.
Like for delay slots of branch instructions that can be handled with
an additional npc
register, we can always simulate the pipeline
delay by introducing a kind of hidden “register”.
In amoco the mapper has an internal delayed
attribute that allows
explict delayed updates.
(these updates are triggered by explicit calls to
mapper.update_delayed()
, usually right in the middle of
every instructions, as if the result of the delayed load was forwarded
to the current ALU stage.)
Instructions format¶
Now that instructions specifications and semantics are defined, it is
recommended to define at least one formatter to print
instructions according to the CPU’s Instruction Set Assembly manual.
Available formatters for a CPU ISA are instances of the
arch.core.Formatter
class. These formatters are initiated from
a dict object that maps instructions’ mnemonic or setup function name
to iterable formatting functions operating on the instruction object.
For example:
format_default = (mnemo, opers)
MIPS_full_formats = {
"mips1_loadstore": (mnemo, opers_mem),
"mips1_jump_abs": (mnemo, opers),
"mips1_jump_rel": (mnemo, opers_rel),
"mips1_branch": (mnemo, opers_adr),
}
MIPS_full = Formatter(MIPS_full_formats)
MIPS_full.default = format_default
Here, the available format is MIPS_full
, instanciated from the
MIPS_full_formats
dict which maps spec setup functions to their
corresponding formatting tuples.
Functions mnemo
, and opers
take the instruction and return
a Pygments-compatible list of tokens if support for pretty-printing is
implemented, or simply a string. When an instruction is printed, the
formatter starts by matching its mnemonic or its setup function, or
takes the default formatting iterable, and then joins all
outputs from the iterables.
The cpu module¶
Finally, the cpu module can be fully created. This module should import all from the architecture’s environment and define its disassembler as shown above.
The semantics is associated to the instruction class with the
arch.core.instruction.set_uarch(dict)()
which takes a mapping
from mnemonics to the corresponding instruction semantics function.
Thus, in most cpu modules this binding is done with:
from .asm import *
uarch = dict(filter(lambda kv: kv[0].startswith("i_"), locals().items()))
instruction_X.set_uarch(uarch)
The chosen formatter is bound to the instruction class with:
from .formats import X_full
instruction_X.set_formatter(X_full)
(Eventually, if not already defined in the environment,
the PC()
function is defined to return the instruction’s pointer.)
Note that whenever a disassembler is available, the entire architecture ISA decision tree can be displayed with:
>>> from amoco.ui.views import archView
>>> from amoco.arch.mips.cpu_r3000LE import disassemble
>>> print(archView(disassemble))
─[& f0000000 == 0]
│─[& fc000000 == 0]
│ │─[& fc00003f == 8]
│ │ │─JR : 32<[ 000000 rs(5) 00000 00000 00000 001000]
│ │─[& fc00003f == 12]
│ │ │─MFLO : 32<[ 000000 00000 00000 rd(5) 00000 010010 ]
│ │─[& fc00003f == 10]
│ │ │─MFHI : 32<[ 000000 00000 00000 rd(5) 00000 010000 ]
│ │─[& fc00003f == 13]
│ │ │─MTLO : 32<[ 000000 rs(5) 00000 00000 00000 010011 ]
│ │─[& fc00003f == 11]
│ │ │─MTHI : 32<[ 000000 rs(5) 00000 00000 00000 010001 ]
│ │─[& fc00003f == 19]
│ │ │─MULTU : 32<[ 000000 rs(5) rt(5) 00000 00000 011001]
│ │─[& fc00003f == 18]
│ │ │─MULT : 32<[ 000000 rs(5) rt(5) 00000 00000 011000]
│ │─[& fc00003f == 1b]
│ │ │─DIVU : 32<[ 000000 rs(5) rt(5) 00000 00000 011011]
│ │─[& fc00003f == 1a]
│ │ │─DIV : 32<[ 000000 rs(5) rt(5) 00000 00000 011010]
│ │─[& fc00003f == 9]
│ │ │─JALR : 32<[ 000000 rs(5) 00000 rd(5) 00000 001001]
│ │─[& fc00003f == 2b]
│ │ │─SLTU : 32<[ 000000 rs(5) rt(5) rd(5) 00000 101011]
│ │─[& fc00003f == 2a]
│ │ │─SLT : 32<[ 000000 rs(5) rt(5) rd(5) 00000 101010]
│ │─[& fc00003f == 6]
│ │ │─SRLV : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000110]
│ │─[& fc00003f == 7]
│ │ │─SRAV : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000111]
│ │─[& fc00003f == 4]
│ │ │─SLLV : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000100]
│ │─[& fc00003f == 26]
│ │ │─XOR : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100110]
│ │─[& fc00003f == 25]
│ │ │─OR : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100101]
│ │─[& fc00003f == 27]
│ │ │─NOR : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100111]
│ │─[& fc00003f == 24]
│ │ │─AND : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100100]
│ │─[& fc00003f == 23]
│ │ │─SUBU : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100011]
│ │─[& fc00003f == 21]
│ │ │─ADDU : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100001]
│ │─[& fc00003f == 22]
│ │ │─SUB : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100010]
│ │─[& fc00003f == 20]
│ │ │─ADD : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100000]
│ │─[& fc00003f == 2]
│ │ │─SRL : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000010 ]
│ │─[& fc00003f == 3]
│ │ │─SRA : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000011 ]
│ │─[& fc00003f == 0]
│ │ │─SLL : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000000 ]
│ │─[& fc00003f == c]
│ │ │─SYSCALL : 32<[ 000000 .code(20) 001100]
│ │─[& fc00003f == d]
│ │ │─BREAK : 32<[ 000000 .code(20) 001101]
│─[& fc000000 == 4000000]
│ │─BLTZAL : 32<[ 000001 rs(5) 10000 ~imm(16) ]
│ │─BLTZ : 32<[ 000001 rs(5) 00000 ~imm(16) ]
│ │─BGEZAL : 32<[ 000001 rs(5) 10001 ~imm(16) ]
│ │─BGEZ : 32<[ 000001 rs(5) 00001 ~imm(16) ]
│─[& fc000000 == c000000]
│ │─JAL : 32<[ 000011 t(26)]
│─[& fc000000 == 8000000]
│ │─J : 32<[ 000010 t(26)]
─[& f0000000 == 40000000]
│─[& f2000000 == 40000000]
│ │─MTC : 32<[ 0100 .z(2) 00100 rt(5) rd(5) 00000000000 ]
│ │─CTC : 32<[ 0100 .z(2) 00110 rt(5) rd(5) 00000000000 ]
│ │─MFC : 32<[ 0100 .z(2) 00000 rt(5) rd(5) 00000000000 ]
│ │─CFC : 32<[ 0100 .z(2) 00010 rt(5) rd(5) 00000000000 ]
│─[& f2000000 == 42000000]
│ │─COP : 32<[ 0100 .z(2) 1 .cofun(25) ]
─[& f0000000 == 30000000]
│─LUI : 32<[ 001111 00000 rt(5) imm(16) ]
│─XORI : 32<[ 001110 rs(5) rt(5) imm(16) ]
│─ORI : 32<[ 001101 rs(5) rt(5) imm(16) ]
│─ANDI : 32<[ 001100 rs(5) rt(5) imm(16) ]
─[& f0000000 == 10000000]
│─BLEZ : 32<[ 000110 rs(5) 00000 ~imm(16) ]
│─BGTZ : 32<[ 000111 rs(5) 00000 ~imm(16) ]
│─BNE : 32<[ 000101 rs(5) rt(5) ~imm(16) ]
│─BEQ : 32<[ 000100 rs(5) rt(5) ~imm(16) ]
─[& f0000000 == 20000000]
│─SLTIU : 32<[ 001011 rs(5) rt(5) ~imm(16) ]
│─SLTI : 32<[ 001010 rs(5) rt(5) ~imm(16) ]
│─ADDIU : 32<[ 001001 rs(5) rt(5) ~imm(16) ]
│─ADDI : 32<[ 001000 rs(5) rt(5) ~imm(16) ]
─[& f0000000 == b0000000]
│─SWR : 32<[ 101110 base(5) rt(5) offset(16) ]
─[& f0000000 == 90000000]
│─LWR : 32<[ 100110 base(5) rt(5) offset(16) ]
│─LHU : 32<[ 100101 base(5) rt(5) offset(16) ]
│─LBU : 32<[ 100100 base(5) rt(5) offset(16) ]
─[& f0000000 == a0000000]
│─SWL : 32<[ 101010 base(5) rt(5) offset(16) ]
│─SW : 32<[ 101011 base(5) rt(5) offset(16) ]
│─SH : 32<[ 101001 base(5) rt(5) offset(16) ]
│─SB : 32<[ 101000 base(5) rt(5) offset(16) ]
─[& f0000000 == 80000000]
│─LWL : 32<[ 100010 base(5) rt(5) offset(16) ]
│─LW : 32<[ 100011 base(5) rt(5) offset(16) ]
│─LH : 32<[ 100001 base(5) rt(5) offset(16) ]
│─LB : 32<[ 100000 base(5) rt(5) offset(16) ]
─[& f0000000 == e0000000]
│─SWC : 32<[ 1110 .z(2) base(5) rt(5) offset(16) ]
─[& f0000000 == c0000000]
│─LWC : 32<[ 1100 .z(2) base(5) rt(5) offset(16) ]
If several specification modes are provided, they are listed one after the other.
arch/core.py¶
The architecture’s core module implements essential classes for the definition of new cpu architectures:
- the
instruction
class models cpu instructions decoded by the disassembler. - the
disassembler
class implements the instruction decoding logic based on provided specifications. - the
ispec
class is a function decorator that allows to define the specification of an instruction. - the
Formatter
class is used for instruction pretty printing
-
class
arch.core.
icore
(istr=b'')[source]¶ This is the core class for the generic parent instruction class below. It defines the mandatory API for all instructions.
-
type
¶ one of (type_data_processing, type_control_flow, type_cpu_state, type_system, type_other) or type_undefined (default) or type_unpredictable.
Type: int
-
spec
¶ the specification that was decoded by the disassembler to instanciate this instruction.
Type: ispec
-
misc
¶ a defaultdict for passing various arch-dependent infos (which returns None for undefined keys.)
Type: dict
-
length
¶ length of the instruction in bytes
-
-
class
arch.core.
instruction
(istr)[source]¶ The generic instruction class allows to define instruction for any cpu instructions set and provides a common API for all arch-independent methods. It extends the
icore
with anaddress
attribute and formatter methods.
-
class
arch.core.
disassembler
(specmodules, iclass=<class 'arch.core.instruction'>, iset=<function disassembler.<lambda>>, endian=<function disassembler.<lambda>>)[source]¶ The generic disassembler class will decode a byte string based on provided sets of instructions specifications and various parameters like endianess and ways to select the appropriate instruction set.
Parameters: - specmodules – list of python modules containing ispec decorated funcs
- iclass – the specific instruction class based on
instruction
- iset – lambda used to select module (ispec list)
- endian – instruction fetch endianess (1: little, -1: big)
-
maxlen
¶ the length of the longest instruction found in provided specmodules.
-
iset
¶ the lambda used to select the right specifications for decoding
-
endian
¶ the lambda used to define endianess.
-
setup
(ispecs)[source]¶ setup will (recursively) organize the provided ispecs list into an optimal tree so that __call__ can efficiently find the matching ispec format for a given bytestring (we don’t want to search all specs until a match, so we need to separate formats as much as possible). The output tree is (f,l) where f is the submask to check at this level and l is a defaultdict such that l[x] is the subtree of formats for which submask is x.
-
class
arch.core.
ispec
(format, **kargs)[source]¶ ispec (customizable) decorator
@ispec allows to easily define instruction decoders based on architecture specifications.
Parameters: - spec (str) – a human-friendly format string that describes how the ispec object will (on request) decode a given bytestring and how it will expose various decoded entities to the decorated function in order to define an instruction.
- **kargs – additional arguments to ispec decorator must be provided with
name=value
form and are declared as attributes/values within the instruction instance before calling the decorated function. See below for conventions about names.
-
hook
¶ the decorated python function to be called during decoding. The hook function name is relevant only for instructions’ formatter. See
arch.core.Formatter
.Type: callable
-
iattr
¶ the dictionary of instruction attributes to add before decoding. Attributes and their values are passed from the spec’s kargs when the name does not start with an underscore.
Type: dict
-
fargs
¶ the dictionary of keywords arguments to pass to the hook. These keywords are decoded from the format or given by the spec’s kargs when name starts with an underscore.
Type: dict
-
precond
¶ an optional function that takes the instruction object as argument and returns a boolean to indicate wether the hook can be called or not. (This allows to avoid decoding when a prefix is missing for example.)
Type: func
-
fix
¶ the values of fixed bits within the format
Type: Bits
-
mask
¶ the mask of fixed bits within the format
Type: Bits
Examples
This statement creates an ispec object with hook
f
, and registers this object automatically in a SPECS list object within the module where the statement is found:@ispec("32[ .cond(4) 101 1 imm24(24) ]", mnemonic="BL", _flag=True) def f(obj,imm24,_flag): [...]
When provided with a bytestring, the
decode()
method of this ispec object will:- proceed with decoding ONLY if bits 27,26,25,24 are 1,0,1,1 or raise an exception
- instanciate an instruction object (obj)
- decode 4 bits at position [28,29,30,31] and provide this value as an integer in ‘obj.cond’ instruction instance attribute.
- decode 24 bits at positions 23..0 and provide this value as an integer as argument ‘imm24’ of the decorated function f.
- set obj.mnemonic to ‘BL’ and pass argument _flag=True to f.
- call f(obj,…)
- return obj
Note
The
spec
string format isLEN ('<' or '>') '[' FORMAT ']' ('+' or '&' NUMBER)
LEN
is either an integer that represents the bit length of the instruction or ‘*’.- Length must be a multiple of 8, ‘*’ is used for a variable length instruction.
FORMAT
is a series of directives (see below.)- Each directive represents a sequence of bits ordered according to the spec direction : ‘<’ (default) means that directives are ordered from MSB (bit index LEN-1) to LSB (bit index 0) whereas ‘>’ means LSB to MSB.
The spec string is optionally terminated with ‘+’ to indicate that it represents an instruction prefix, or by ‘&’ NUMBER to indicate that the instruction has a suffix of NUMBER more bytes to decode some of its operands. In the prefix case, the bytestring matching the ispec format is stacked temporarily until the rest of the bytestring matches a non prefix ispec. In the suffix case, only the spec bytestring is used to define the instruction but the
read_instruction()
fetcher will provide NUMBER more bytes to thexdata()
method of the instruction.The directives defining the
FORMAT
string are used to associate symbols to bits located at dedicated offsets within the bitstring to be decoded. A directive has the following syntax:-
(indicates that current bit position is not decoded)0
(indicates that current bit position must be 0)1
(indicates that current bit position must be 1)
or
type SYMBOL location
where:type
is an optional modifier char with possible values:.
indicates that theSYMBOL
will be an attribute of theinstruction
.~
indicates that the decoded value will be returned as a Bits instance.#
indicates that the decoded value will be returned as a string of [01] chars.=
indicates that decoding should end at current position (overlapping)
if not present, the
SYMBOL
will be passed as a keyword argument to the function with value decoded as an integer.SYMBOL
: is a mandatory string matching regex[A-Za-z_][0-9A-Za-z_]*
location
: is an optional string matching the following expressions:( len )
: indicates that the value is decoded from the next len bits startingfrom the current position of the directive within the
FORMAT
string.
(*)
: indicates a variable length directive for which the value is decodedfrom the current position with all remaining bits in the
FORMAT
. If theLEN
is also variable then all remaining bits from the instruction buffer input string are used.
default location value is
(1)
.
The special directive
{byte}
is a shortcut for 8 fixed bits. For example8>[{2f}]
is equivalent to8>[ 1111 0100 ]
, or8<[ 0010 1111 ]
.
-
class
arch.core.
Formatter
(formats)[source]¶ Formatter is used for instruction pretty printing
Basically, a
Formatter
object is created from a dict associating a key with a list of functions or format string. The key is either one of the mnemonics or possibly the name of a @ispec-decorated function (this allows to group formatting styles rather than having to declare formats for every possible mnemonic.) When the instruction is printed, the formatting list elements are “called” and concatenated to produce the output string.