The architecture package¶

Supported CPU architectures are implemented in this package as subpackages and all use the arch.core generic classes. The interface to a CPU used by system classes is implemented as a cpu_XXX.py module usually in the architecture’s subpackage.

This CPU module will:

provide the CPU environment (registers and other internals)
provide an instance of arch.core.disassembler class, which requires to:
- define an instruction class based on arch.core.instruction
- define the arch.core.ispec of every instruction for the generic decoder,
- and define the semantics of every instruction with cas.expressions.
optionnally define the output assembly format, and the GNU as (or any other) assembly parser.
optionnally define the function PC() that allows generic analysis to which register represents the instructions’ pointer.

A simple example is provided by the arch.arm.v8 architecture which implements a model of ARM AArch64: The interface CPU module is arch.arm.cpu_armv8, which imports everything from the arch.arm.v8 subpackage.

Adding support for a new cpu module¶

The cpu environment¶

It all starts with the definition of the cpu environment in a dedicated module. This module defines registers as instances of cas.expressions.reg, and associated register slices with cas.expressions.slc if necessary. For example, x86 register eax and its slices are defined in arch.x86.env as:

eax = reg("eax",32)
ax  = slc(eax, 0, 16, "ax")
al  = slc(eax, 0, 8 , "al")
ah  = slc(eax, 8, 8 , "ah")

In order to improve code analysis and views, some registers should be bound to their special cas.expressions.regtype, using one of the dedicated callable or context manager. For example, the stack pointer should be bound to regtype 'STACK' using:

esp = is_reg_stack(reg('esp',32))

or alternatively using a context manager:

with is_reg_stack:
    esp = reg('esp',32)

Defined regtypes are:

cas.expressions.is_reg_pc

cas.expressions.is_reg_flags

cas.expressions.is_reg_stack

cas.expressions.is_reg_other

Once all needed registers are defined, it is recommended to define also an ordered list called registers which will be used by emulator instances for registers views.

Finally, the cpu environment sometimes also needs to define some internal parameters that change the way instructions are decoded or the memory endianness. For example, the arch.arm.v7.env module defines internals for isetstate to change the instruction set from ARM to Thumb, and endianstate to change endianness. These internal parameters differ from regular registers by the fact that they are not defined as expressions and thus cannot be symbolic.

Instructions specifications¶

The instructions’ specifications are then defined in a module as well. An instruction’s specification is an instance of arch.core.ispec that decorates a function which performs setup of an instruction’s instance. The specification describes how the instruction is decoded out of bytes in a way that allows the decorated function to setup instruction’s operands and any other characteristics from the decoded values. This description allows to follow CPU datasheet’s instructions manual very closely. Moreover, thanks to how decorator work, several specs can share the same setup function. For example, we have in the MIPS R3000 instructions’ spec module:

@ispec("32<[ 001100 rs(5) rt(5) imm(16) ]", mnemonic="ANDI")
@ispec("32<[ 001101 rs(5) rt(5) imm(16) ]", mnemonic="ORI")
@ispec("32<[ 001110 rs(5) rt(5) imm(16) ]", mnemonic="XORI")
def mips1_dri(obj, rs, rt, imm):
    src1 = env.R[rs]
    imm = env.cst(imm, 32)
    dst = env.R[rt]
    obj.operands = [dst, src1, imm]
    obj.type = type_data_processing

Here, obj is an instruction instanciated by the disassembler, if decoded bytes matches one of these spec definitions. In such case, the setup function is called with arguments rs, rt and imm being ints decoded from the corresponding bits (see arch.core.ispec below.) Any instruction setup should define at least an obj.operands list and should indicate one of the following obj.type:

type_data_processing, which are well-defined instructions,

type_control_flow, which mark default ending of assembly blocks,

type_cpu_state, which may change the cpu internal registers,

type_system, which have usually no impact on code semantics,

type_other

The cpu disassembler¶

When the specification module is done, the cpu disassembler can be instanciated. First a new local instruction class should be derived from the generic arch.core.instruction with:

from amoco.arch.core import instruction
instruction_X = type("instruction_X", (instruction,), {})

Then, a disassembler instance is obtained with:

from amoco.arch.core import disassembler
from amoco.arch.X import spec_X, spec_thumb
disassemble = disassembler([spec_X], iclass=instruction_X)

The first argument is the list of available specifications. Most architectures have only one mode but some like ARM can switch from a default mode (ARM) to an alternate mode like Thumb (see class definition mode argument.) The second is our new instruction class. By default, disassemblers will fetch instructions in little-endian, but the endian parameter allows to fetch in big-endian. For example the ARMv7 architecture’s disassembler is:

mode = lambda: internals["isetstate"]
endian = lambda: 1 if internals["ibigend"] == 0 else -1
disassemble = disassembler([spec_armv7, spec_thumb],
                           instruction_armv7,
                           mode,
                           endian)

which allows the semantics to possibly change both the mode and the instructions’ endianness dynamically.

Instructions semantics¶

An instruction’s semantics is a function associated to the instruction’s mnemonic which operates on a cas.mapper.mapper object. The function’s name should be “i_XXX” for mnemonic “XXX”. The mapper argument enables transitions from a state to another state. For example, the semantics of all MIPS R3000 AND instructions is:

@__npc
def i_AND(ins, fmap):
    dst, src1, src2 = ins.operands
    if dst is not zero:
        fmap[dst] = fmap(src1&src2)

The first argument is the disassembled instruction object and the second argument is the mapper (i.e. the state). We simply create local variables from the operands list and then update the state according to these operands: Thus, the mapper is modified by setting the first operand expression to the mapper’s evaluation of the cas.expressions.op formed by src1 & src2.

Of course, since we want symbolic semantics these functions might end-up being quite complex especially for conditional stuff. For example, like in the case of this weird unaligned load word MIPS R3000 instruction:

@__npc
def i_LWL(ins, fmap):
    dst, base, src = ins.operands
    addr = base+src
    if dst is not zero:
        fmap[dst[24:32]] = fmap(mem(addr,8))
        cond1 = (addr%4)!=0
        fmap[dst[16:24]]  = fmap(tst(cond1,mem(addr-1,8),dst[16:24]))
        addr = addr - 1
        cond2 = cond1 & ((addr%4)!=0)
        fmap[dst[8:16]] = fmap(tst(cond2,mem(addr-1,8),dst[8:16]))
        fmap[dst] = fmap[dst].simplify()

Here, the number of bytes read from memory depends on the word-alignement of the address value. This instruction is thus normally coupled with a LWR which performs the read from memory of the rest of bytes accross the word-alignment. In concrete semantics, this is quite simple to write since address alignment is always computable and thus 3 cases are possible. In symbolic semantics, things are more tricky since address is symbolic and thus the resulting writeback to dst register is a symbolic expression that must take into account 3 cases at once.

Updating the cpu instruction pointer¶

Now, instruction’s semantics must also update the cpu PC(). In the MIPS case, this is performed by using the __npc decorator role which updates pc and npc as well to handle delay slot cases. Architectures without delay slots can just advance their program’s counter by the length of the instruction. Architectures with delay slots can always handle delayed branches by relying on intermediate (hidden) program counters. This is the case for arch.sparc and arch.MIPS where __npc does:

pc  <- npc
npc <- npc+4

and since branch instructions have an effect on npc once they have been processed, the next instruction to execute (the one located at pc,) is still just after the branch instruction.

However, special care must be taken to avoid pitfalls… A common mistake is to believe that the delay slot instruction is executed before the branch instruction as if the two instructions were simply swapped. This is not true. The branch effectively occurs after, but its operands are still evaluated before the delay slot has had time to execute! For example the MIPS R3000 sequence:

liu   t7, 0x5
liu   t6, 0x2
bne   t7, t6, *somewhere
addiu t7, t7, -0x3

will lead to a branch not taken. See pipelining discussion below for details…

A Note on cpu pipelining and cycle-accurate emulation¶

For most architectures, the instruction parallelism introduced by the underlying pipeline does not interfer with the semantics. What this means is that for example, assuming R1=0, R2=1, R3=1 the generic case of:

OR   R1, R2, R3
ADD  R4, R1, 1

should obviously lead to R4=2 anyways, because pipelining is implemented to improve performance but shouldn’t have any impact on semantics. Hence, we can always emulate instructions as if no parallelism existed. Right ? Well, not exactly…

All pipelines have pipeline hazard, ie. situations that could lead to undefined behaviors if not handled correctly. In our example above, the R1 register is really updated after the ALU has performed its operation on R2 and R3 values. Meanwhile, the ADD instruction wants to read R1 value as soon as the instruction is decoded (after it was fetched,) and would consequently read its value before it is updated. Thus, pipelines have internal mechanism to detect these kind of situations and either stall the pipeline (wait for R1 to be written back before being used) or forward things back to other stages as soon as possible. In this case, the ALU forwards its result immediately to back to the ALU entry multiplexer before being updated in R1 later.

Unfortunately, some old architectures like MIPS[#]_ R3000 handled only a limited set of these pipeline hazard and heavily relied on the compiler to avoid some instructions’ flows (usually by inserting nops.) In MIPS R3000 architecture, the above case is handled correctly unless a load/store is involved like in:

lbu v0, 0x1(a1)
nop
sll v0, v0, 0x8

Here, the compiler has inserted a nop to ensure that the loaded byte has been fetched and can be forwarded to the ALU for sll. Hence, as long as we emulate code produced by compliant compilers, we still can ignore the underlying pipeline operations. But this is not true anymore in the general cases. Since most of the time we can’t make this assumption, instructions can’t formally be emulated as if no parallelism existed. If we ever have MIPS R3000 code with:

lbu v0, 0x1(a1)
sll v0, v0, 0x8

then the resulting mapper is not v0 <- mem(a1+0x1,8)<<8 but rather something that highly depends on the involved pipeline interlocking mechanism, most likely v0 <- v0<<8.

Like for delay slots of branch instructions that can be handled with an additional npc register, we can always simulate the pipeline delay by introducing a kind of hidden “register”. In amoco the mapper has an internal delayed attribute that allows explict delayed updates. (these updates are triggered by explicit calls to mapper.update_delayed(), usually right in the middle of every instructions, as if the result of the delayed load was forwarded to the current ALU stage.)

Instructions format¶

Now that instructions specifications and semantics are defined, it is recommended to define at least one formatter to print instructions according to the CPU’s Instruction Set Assembly manual. Available formatters for a CPU ISA are instances of the arch.core.Formatter class. These formatters are initiated from a dict object that maps instructions’ mnemonic or setup function name to iterable formatting functions operating on the instruction object. For example:

format_default = (mnemo, opers)

MIPS_full_formats = {
    "mips1_loadstore": (mnemo, opers_mem),
    "mips1_jump_abs": (mnemo, opers),
    "mips1_jump_rel": (mnemo, opers_rel),
    "mips1_branch": (mnemo, opers_adr),
}

MIPS_full = Formatter(MIPS_full_formats)
MIPS_full.default = format_default

Here, the available format is MIPS_full, instanciated from the MIPS_full_formats dict which maps spec setup functions to their corresponding formatting tuples. Functions mnemo, and opers take the instruction and return a Pygments-compatible list of tokens if support for pretty-printing is implemented, or simply a string. When an instruction is printed, the formatter starts by matching its mnemonic or its setup function, or takes the default formatting iterable, and then joins all outputs from the iterables.

The cpu module¶

Finally, the cpu module can be fully created. This module should import all from the architecture’s environment and define its disassembler as shown above.

The semantics is associated to the instruction class with the arch.core.instruction.set_uarch(dict)() which takes a mapping from mnemonics to the corresponding instruction semantics function. Thus, in most cpu modules this binding is done with:

from .asm import *
uarch = dict(filter(lambda kv: kv[0].startswith("i_"), locals().items()))
instruction_X.set_uarch(uarch)

The chosen formatter is bound to the instruction class with:

from .formats import X_full
instruction_X.set_formatter(X_full)

(Eventually, if not already defined in the environment, the PC() function is defined to return the instruction’s pointer.)

Note that whenever a disassembler is available, the entire architecture ISA decision tree can be displayed with:

>>> from amoco.ui.views import archView
>>> from amoco.arch.mips.cpu_r3000LE import disassemble
>>> print(archView(disassemble))
─[& f0000000 == 0]
  │─[& fc000000 == 0]
  │  │─[& fc00003f == 8]
  │  │  │─JR              : 32<[ 000000 rs(5) 00000 00000 00000 001000]
  │  │─[& fc00003f == 12]
  │  │  │─MFLO            : 32<[ 000000 00000 00000 rd(5) 00000 010010 ]
  │  │─[& fc00003f == 10]
  │  │  │─MFHI            : 32<[ 000000 00000 00000 rd(5) 00000 010000 ]
  │  │─[& fc00003f == 13]
  │  │  │─MTLO            : 32<[ 000000 rs(5) 00000 00000 00000 010011 ]
  │  │─[& fc00003f == 11]
  │  │  │─MTHI            : 32<[ 000000 rs(5) 00000 00000 00000 010001 ]
  │  │─[& fc00003f == 19]
  │  │  │─MULTU           : 32<[ 000000 rs(5) rt(5) 00000 00000 011001]
  │  │─[& fc00003f == 18]
  │  │  │─MULT            : 32<[ 000000 rs(5) rt(5) 00000 00000 011000]
  │  │─[& fc00003f == 1b]
  │  │  │─DIVU            : 32<[ 000000 rs(5) rt(5) 00000 00000 011011]
  │  │─[& fc00003f == 1a]
  │  │  │─DIV             : 32<[ 000000 rs(5) rt(5) 00000 00000 011010]
  │  │─[& fc00003f == 9]
  │  │  │─JALR            : 32<[ 000000 rs(5) 00000 rd(5) 00000 001001]
  │  │─[& fc00003f == 2b]
  │  │  │─SLTU            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 101011]
  │  │─[& fc00003f == 2a]
  │  │  │─SLT             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 101010]
  │  │─[& fc00003f == 6]
  │  │  │─SRLV            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000110]
  │  │─[& fc00003f == 7]
  │  │  │─SRAV            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000111]
  │  │─[& fc00003f == 4]
  │  │  │─SLLV            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 000100]
  │  │─[& fc00003f == 26]
  │  │  │─XOR             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100110]
  │  │─[& fc00003f == 25]
  │  │  │─OR              : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100101]
  │  │─[& fc00003f == 27]
  │  │  │─NOR             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100111]
  │  │─[& fc00003f == 24]
  │  │  │─AND             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100100]
  │  │─[& fc00003f == 23]
  │  │  │─SUBU            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100011]
  │  │─[& fc00003f == 21]
  │  │  │─ADDU            : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100001]
  │  │─[& fc00003f == 22]
  │  │  │─SUB             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100010]
  │  │─[& fc00003f == 20]
  │  │  │─ADD             : 32<[ 000000 rs(5) rt(5) rd(5) 00000 100000]
  │  │─[& fc00003f == 2]
  │  │  │─SRL             : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000010 ]
  │  │─[& fc00003f == 3]
  │  │  │─SRA             : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000011 ]
  │  │─[& fc00003f == 0]
  │  │  │─SLL             : 32<[ 000000 00000 rt(5) rd(5) sa(5) 000000 ]
  │  │─[& fc00003f == c]
  │  │  │─SYSCALL         : 32<[ 000000 .code(20) 001100]
  │  │─[& fc00003f == d]
  │  │  │─BREAK           : 32<[ 000000 .code(20) 001101]
  │─[& fc000000 == 4000000]
  │  │─BLTZAL          : 32<[ 000001 rs(5) 10000 ~imm(16) ]
  │  │─BLTZ            : 32<[ 000001 rs(5) 00000 ~imm(16) ]
  │  │─BGEZAL          : 32<[ 000001 rs(5) 10001 ~imm(16) ]
  │  │─BGEZ            : 32<[ 000001 rs(5) 00001 ~imm(16) ]
  │─[& fc000000 == c000000]
  │  │─JAL             : 32<[ 000011 t(26)]
  │─[& fc000000 == 8000000]
  │  │─J               : 32<[ 000010 t(26)]
─[& f0000000 == 40000000]
  │─[& f2000000 == 40000000]
  │  │─MTC             : 32<[ 0100 .z(2) 00100 rt(5) rd(5) 00000000000 ]
  │  │─CTC             : 32<[ 0100 .z(2) 00110 rt(5) rd(5) 00000000000 ]
  │  │─MFC             : 32<[ 0100 .z(2) 00000 rt(5) rd(5) 00000000000 ]
  │  │─CFC             : 32<[ 0100 .z(2) 00010 rt(5) rd(5) 00000000000 ]
  │─[& f2000000 == 42000000]
  │  │─COP             : 32<[ 0100 .z(2) 1 .cofun(25) ]
─[& f0000000 == 30000000]
  │─LUI             : 32<[ 001111 00000 rt(5) imm(16) ]
  │─XORI            : 32<[ 001110 rs(5) rt(5) imm(16) ]
  │─ORI             : 32<[ 001101 rs(5) rt(5) imm(16) ]
  │─ANDI            : 32<[ 001100 rs(5) rt(5) imm(16) ]
─[& f0000000 == 10000000]
  │─BLEZ            : 32<[ 000110 rs(5) 00000 ~imm(16) ]
  │─BGTZ            : 32<[ 000111 rs(5) 00000 ~imm(16) ]
  │─BNE             : 32<[ 000101 rs(5) rt(5) ~imm(16) ]
  │─BEQ             : 32<[ 000100 rs(5) rt(5) ~imm(16) ]
─[& f0000000 == 20000000]
  │─SLTIU           : 32<[ 001011 rs(5) rt(5) ~imm(16) ]
  │─SLTI            : 32<[ 001010 rs(5) rt(5) ~imm(16) ]
  │─ADDIU           : 32<[ 001001 rs(5) rt(5) ~imm(16) ]
  │─ADDI            : 32<[ 001000 rs(5) rt(5) ~imm(16) ]
─[& f0000000 == b0000000]
  │─SWR             : 32<[ 101110 base(5) rt(5) offset(16) ]
─[& f0000000 == 90000000]
  │─LWR             : 32<[ 100110 base(5) rt(5) offset(16) ]
  │─LHU             : 32<[ 100101 base(5) rt(5) offset(16) ]
  │─LBU             : 32<[ 100100 base(5) rt(5) offset(16) ]
─[& f0000000 == a0000000]
  │─SWL             : 32<[ 101010 base(5) rt(5) offset(16) ]
  │─SW              : 32<[ 101011 base(5) rt(5) offset(16) ]
  │─SH              : 32<[ 101001 base(5) rt(5) offset(16) ]
  │─SB              : 32<[ 101000 base(5) rt(5) offset(16) ]
─[& f0000000 == 80000000]
  │─LWL             : 32<[ 100010 base(5) rt(5) offset(16) ]
  │─LW              : 32<[ 100011 base(5) rt(5) offset(16) ]
  │─LH              : 32<[ 100001 base(5) rt(5) offset(16) ]
  │─LB              : 32<[ 100000 base(5) rt(5) offset(16) ]
─[& f0000000 == e0000000]
  │─SWC             : 32<[ 1110 .z(2) base(5) rt(5) offset(16) ]
─[& f0000000 == c0000000]
  │─LWC             : 32<[ 1100 .z(2) base(5) rt(5) offset(16) ]

If several specification modes are provided, they are listed one after the other.

arch/core.py¶

The architecture’s core module implements essential classes for the definition of new cpu architectures:

the instruction class models cpu instructions decoded by the disassembler.
the disassembler class implements the instruction decoding logic based on provided specifications.
the ispec class is a function decorator that allows to define the specification of an instruction.
the Formatter class is used for instruction pretty printing

class arch.core.icore(istr=b'')[source]¶

This is the core class for the generic parent instruction class below. It defines the mandatory API for all instructions.

bytes¶

instruction’s bytes

Type:	bytes

type¶

one of (type_data_processing, type_control_flow, type_cpu_state, type_system, type_other) or type_undefined (default) or type_unpredictable.

Type:	int

spec¶

the specification that was decoded by the disassembler to instanciate this instruction.

Type:	ispec

mnemonic¶

the mnemonic string as defined by the specification.

Type:	str

operands¶

the list of operands’ expressions.

Type:	list

misc¶

a defaultdict for passing various arch-dependent infos (which returns None for undefined keys.)

Type:	dict

classmethod set_uarch(uarch)[source]¶: class method to define the instructions’ semantics uarch dict

typename()[source]¶: returns the instruction’s type as a string

length¶: length of the instruction in bytes

class arch.core.instruction(istr)[source]¶

The generic instruction class allows to define instruction for any cpu instructions set and provides a common API for all arch-independent methods. It extends the icore with an address attribute and formatter methods.

address¶

the memory address where this instruction as been disassembled.

Type:	cst

classmethod set_formatter(f)[source]¶: classmethod that defines the formatter for all instances

static formatter(i, toks=False)[source]¶: default formatter if no formatter has been set, will return the highlighted list from tokens for raw mnemonic, and comma-separated operands expressions.

toks()[source]¶: returns the (unjoined) list of formatted tokens.

exception arch.core.InstructionError(i)[source]¶

exception arch.core.DecodeError[source]¶

class arch.core.disassembler(specmodules, iclass=<class 'arch.core.instruction'>, iset=<function disassembler.<lambda>>, endian=<function disassembler.<lambda>>)[source]¶

The generic disassembler class will decode a byte string based on provided sets of instructions specifications and various parameters like endianess and ways to select the appropriate instruction set.

Parameters:	specmodules – list of python modules containing ispec decorated funcs iclass – the specific instruction class based on `instruction` iset – lambda used to select module (ispec list) endian – instruction fetch endianess (1: little, -1: big)

maxlen¶: the length of the longest instruction found in provided specmodules.

iset¶: the lambda used to select the right specifications for decoding

endian¶: the lambda used to define endianess.

specs¶: the tree of ispec objects that defines the cpu architecture.

setup(ispecs)[source]¶: setup will (recursively) organize the provided ispecs list into an optimal tree so that __call__ can efficiently find the matching ispec format for a given bytestring (we don’t want to search all specs until a match, so we need to separate formats as much as possible). The output tree is (f,l) where f is the submask to check at this level and l is a defaultdict such that l[x] is the subtree of formats for which submask is x.

class arch.core.ispec(format, **kargs)[source]¶

ispec (customizable) decorator

@ispec allows to easily define instruction decoders based on architecture specifications.

Parameters:

spec (str) – a human-friendly format string that describes how the ispec object will (on request) decode a given bytestring and how it will expose various decoded entities to the decorated function in order to define an instruction.
**kargs – additional arguments to ispec decorator must be provided with name=value form and are declared as attributes/values within the instruction instance before calling the decorated function. See below for conventions about names.

format¶

the spec format passed as argument (see Note below).

Type:	str

hook¶

the decorated python function to be called during decoding. The hook function name is relevant only for instructions’ formatter. See arch.core.Formatter.

Type:	callable

iattr¶

the dictionary of instruction attributes to add before decoding. Attributes and their values are passed from the spec’s kargs when the name does not start with an underscore.

Type:	dict

fargs¶

the dictionary of keywords arguments to pass to the hook. These keywords are decoded from the format or given by the spec’s kargs when name starts with an underscore.

Type:	dict

precond¶

an optional function that takes the instruction object as argument and returns a boolean to indicate wether the hook can be called or not. (This allows to avoid decoding when a prefix is missing for example.)

Type:	func

size¶

the bit length of the format (LEN value)

Type:	int

fix¶

the values of fixed bits within the format

Type:	Bits

mask¶

the mask of fixed bits within the format

Type:	Bits

Examples

This statement creates an ispec object with hook f, and registers this object automatically in a SPECS list object within the module where the statement is found:

@ispec("32[ .cond(4) 101 1 imm24(24) ]", mnemonic="BL", _flag=True)
def f(obj,imm24,_flag):
    [...]

When provided with a bytestring, the decode() method of this ispec object will:

proceed with decoding ONLY if bits 27,26,25,24 are 1,0,1,1 or raise an exception

instanciate an instruction object (obj)

decode 4 bits at position [28,29,30,31] and provide this value as an integer in ‘obj.cond’ instruction instance attribute.

decode 24 bits at positions 23..0 and provide this value as an integer as argument ‘imm24’ of the decorated function f.

set obj.mnemonic to ‘BL’ and pass argument _flag=True to f.

call f(obj,…)

return obj

Note

The spec string format is LEN ('<' or '>') '[' FORMAT ']' ('+' or '&' NUMBER)

LEN is either an integer that represents the bit length of the instruction or ‘*’.

Length must be a multiple of 8, ‘*’ is used for a variable length instruction.

FORMAT is a series of directives (see below.)

Each directive represents a sequence of bits ordered according to the spec direction : ‘<’ (default) means that directives are ordered from MSB (bit index LEN-1) to LSB (bit index 0) whereas ‘>’ means LSB to MSB.

The spec string is optionally terminated with ‘+’ to indicate that it represents an instruction prefix, or by ‘&’ NUMBER to indicate that the instruction has a suffix of NUMBER more bytes to decode some of its operands. In the prefix case, the bytestring matching the ispec format is stacked temporarily until the rest of the bytestring matches a non prefix ispec. In the suffix case, only the spec bytestring is used to define the instruction but the read_instruction() fetcher will provide NUMBER more bytes to the xdata() method of the instruction.

The directives defining the FORMAT string are used to associate symbols to bits located at dedicated offsets within the bitstring to be decoded. A directive has the following syntax:

- (indicates that current bit position is not decoded)
0 (indicates that current bit position must be 0)
1 (indicates that current bit position must be 1)

or

type SYMBOL location where:
- type is an optional modifier char with possible values:
  - . indicates that the SYMBOL will be an attribute of the instruction.
  - ~ indicates that the decoded value will be returned as a Bits instance.
  - # indicates that the decoded value will be returned as a string of [01] chars.
  - = indicates that decoding should end at current position (overlapping)
  if not present, the SYMBOL will be passed as a keyword argument to the function with value decoded as an integer.
- SYMBOL: is a mandatory string matching regex [A-Za-z_][0-9A-Za-z_]*
- location: is an optional string matching the following expressions:
  - ( len ) : indicates that the value is decoded from the next len bits starting
    
    from the current position of the directive within the FORMAT string.
  - (*) : indicates a variable length directive for which the value is decoded
    
    from the current position with all remaining bits in the FORMAT. If the LEN is also variable then all remaining bits from the instruction buffer input string are used.
  default location value is (1).

The special directive {byte} is a shortcut for 8 fixed bits. For example 8>[{2f}] is equivalent to 8>[ 1111 0100 ], or 8<[ 0010 1111 ].

class arch.core.Formatter(formats)[source]¶

Formatter is used for instruction pretty printing

Basically, a Formatter object is created from a dict associating a key with a list of functions or format string. The key is either one of the mnemonics or possibly the name of a @ispec-decorated function (this allows to group formatting styles rather than having to declare formats for every possible mnemonic.) When the instruction is printed, the formatting list elements are “called” and concatenated to produce the output string.