The system package¶
Modules of this package implement all classes that relate to operating system specific operations as well as userland stubs or high-level language structures.
Contents
system/core.py¶
This module defines all task/process core classes related to binary format and
execution inherited by all system specific execution classes of
the amoco.system
package.
-
class
system.core.
CoreExec
(p, cpu=None)[source]¶ This class implements the base class for Task(s). CoreExec or Tasks are used to represent a memory mapped binary executable program, providing the generic instruction or data fetchers and the mandatory API for
amoco.emu
oramoco.sa
analysis classes. Most of theamoco.system
modules use this base class to implement a OS-specific Task class (see Linux/x86, Win32/x86, etc).-
bin
¶ the program executable format object. Currently supported formats are provided in
system.elf
(Elf32/64),system.pe
(PE) andsystem.utils
(HEX/SREC).
-
cpu
¶ reference to the architecture cpu module, which provides a generic access to the PC() program counter and obviously the CPU registers and disassembler.
-
OS
¶ optional reference to the OS associated to the child Task.
-
state
¶ the
mapper
instance that represents the current state of the executable program, including mapping of registers as well as theMemoryMap
instance that represents the virtual memory of the program.
-
read_data
(vaddr, size)[source]¶ fetch size data bytes at virtual address vaddr, returned as a list of items being either raw bytes or symbolic expressions.
-
read_instruction
(vaddr, **kargs)[source]¶ fetch instruction at virtual address vaddr, returned as an cpu.instruction instance or cpu.ext in case an external expression is found at vaddr or vaddr is an external symbol.
Raises MemoryError in case vaddr is not mapped, and returns None if disassembler fails to decode bytes at vaddr.
Note: Returning a cpu.ext expression means that this instruction starts an external stub function. It is the responsibility of the fetcher (emulator or analyzer) to eventually call the stub to modify the state mapper.
-
getx
(loc, size=8, sign=False)[source]¶ high level method to get the expressions value associated to left-value loc (register or address). The returned value is an integer if the expression is constant or a symbolic expression instance. The input loc is either a register string, an integer address, or associated expressions’ instances. Optionally, the returned expression sign flag can be adjusted by the sign argument.
-
setx
(loc, val, size=0)[source]¶ high level method to set the expressions value associated to left-value loc (register or address). The value is possibly an integer or a symbolic expression instance. The input loc is either a register string, an integer address, or associated expressions’ instances. Optionally, the size of the loc expression can be adjusted by the size argument.
-
-
class
system.core.
DefineStub
(obj, refname, default=False)[source]¶ decorator to define a stub for the given ‘refname’ library function.
-
class
system.core.
BinFormat
[source]¶ Base class for binary format API, just to define default attributes and recommended properties. See elf.py, pe.py and macho.py for example of child classes.
-
class
system.core.
shellcode
(dataio)[source]¶ This is the most basic file format for executable binary code. It provides zero information about the targeted architecture, entrypoints, or any other data or code dependencies.
-
class
system.core.
DataIO
(f)[source]¶ This class simply wraps a binary file or a bytes string and implements both the file and bytes interface. It allows an input to be provided as files of bytes and manipulated as either a file or a bytes object.
-
system.core.
read_program
(filename)[source]¶ Identifies the program header and returns an ELF, PE, Mach-O or DataIO.
Parameters: filename (str) – the program to read. Returns: an instance of currently supported program format (ELF, PE, Mach-O, HEX, SREC)
-
class
system.core.
DefineLoader
(fmt, name='')[source]¶ A decorator that allows to register a system-specific loader while it is implemented. All loaders are stored in the class global LOADERS dict.
Example
@DefineLoader(‘elf’,elf.EM_386) def loader_x86(p):
…Here, a reference to function loader_x86 is stored in LOADERS[‘elf’][elf.EM_386].
-
system.core.
load_program
(f, cpu=None)[source]¶ Detects program format header (ELF/PE/Mach-O/HEX/SREC), and maps the program in abstract memory, loading the associated “system” (linux/win) and “arch” (x86/arm), based header informations.
Parameters: f (str) – the program filename or string of bytes. Returns: a Task, ELF/PE (old CoreExec interfaces) or RawExec instance.
system/memory.py¶
This module defines all Memory related classes.
The main class of amoco’s Memory model is MemoryMap
.
It provides a way to represent both concrete and abstract symbolic values
located in the virtual memory space of a process.
In order to allow addresses to be symbolic as well, the MemoryMap is
organised as a collection of MemoryZone
.
A zone holds values located at addresses that are integer offsets
related to a symbolic expression. A default zone with related address set
to None
holds values at concrete (virtual) addresses in every MemoryMap.
-
class
system.memory.
MemoryMap
[source]¶ Provides a way to represent concrete and abstract symbolic values located in the virtual memory space of a process. A MemoryMap is organised as a collection of
MemoryZone
.-
_zones
¶ dictionary of zones, keys are the related address expressions.
-
locate
(address)¶ returns the memory object that maps the provided address expression.
-
reference
(address)[source]¶ returns a couple (rel,offset) based on the given address, an integer, a string or an expression allowing to find a candidate zone within memory.
-
write
(address, expr, endian=1)[source]¶ writes given expression at given (possibly symbolic) address. Default endianness is ‘little’. Use endian=-1 to indicate big endian convention.
-
-
class
system.memory.
MemoryZone
(rel=None)[source]¶ A MemoryZone contains mo objects at addresses that are integer offsets related to a symbolic expression. A default zone with related address set to None holds values at concrete addresses in every
MemoryMap
.Parameters: rel (exp) – the relative symbolic expression, defaults to None. -
rel
¶ the relative symbolic expression, or None.
-
_map
¶ the ordered list of mo objects of this zone.
-
range
()[source]¶ returns the lowest and highest addresses currently used by mo objects of this zone.
-
locate
(vaddr)[source]¶ if the given address is within range, return the index of the corresponding mo object in _map, otherwise return None.
-
read
(vaddr, l)[source]¶ reads l bytes starting at vaddr. returns a list of datadiv values, unmapped areas are returned as bottom exp.
-
-
class
system.memory.
mo
(vaddr, data, endian=1)[source]¶ A mo object essentially associates a datadiv with a memory offset, and provides methods to detect if an address is located within this object, to read or write bytes at a given address. The offset is relative to the start of the
MemoryZone
in which the mo object is stored.-
vaddr
¶ a python integer that represents the offset within the memory zone that contains this memory object (mo).
-
data
¶ the datadiv object located at this offset.
-
trim
(vaddr)[source]¶ if this mo contains data at given offset, cut out this data and points current object to this offset. Note that a trim is generally the result of data being overwritten by another mo.
-
-
class
system.memory.
datadiv
(data, endian)[source]¶ A datadiv represents any data within memory, including symbolic expressions.
Parameters: - data – either a string of bytes or an amoco expression.
- endian – either [-1,1], used when data is any symbolic expression. 1 is for little-endian, -1 for big-endian.
-
val
¶ the reference to the data object.
-
_is_raw
¶ a flag indicating that the data object is a string of bytes.
-
cut
(l)[source]¶ cut out the first l bytes of the current data, keeping only the remaining part of the data.
-
system.memory.
mergeparts
(P)[source]¶ This function will detect every contiguous raw datadiv objects in the input list P, and will return a new list where these objects have been merged into a single raw datadiv object.
Parameters: P (list) – input list of datadiv objects. Returns: the list after raw datadiv objects have been merged. Return type: list
system/structs.py¶
The system structs module implements classes that allow to easily define,
encode and decode C structures (or unions) as well as formatters to print
various fields according to given types like hex numbers, dates, defined
constants, etc.
This module extends capabilities of struct
by allowing formats to
include more than just the basic types and add named fields.
It extends ctypes
as well by allowing formatted printing and
“non-static” decoding where the way a field is decoded depends on
previously decoded fields.
Module system.imx6
uses these classes to decode HAB structures and
thus allow for precise verifications on how the boot stages are verified.
For example, the HAB Header class is defined with:
@StructDefine("""
B : tag
H :> length
B : version
""")
class HAB_Header(StructFormatter):
def __init__(self,data="",offset=0):
self.name_formatter('tag')
self.func_formatter(version=self.token_ver_format)
if data:
self.unpack(data,offset)
@staticmethod
def token_ver_format(k,x,cls=None):
return highlight([(Token.Literal,"%d.%d"%(x>>4,x&0xf))])
Here, the StructDefine
decorator is used to provide the definition of
fields of the HAB Header structure to the HAB_Header class.
The tag Field
is an unsigned byte and the StructFormatter
utilities inherited by the class set it as a name_formatter()
allow
the decoded byte value from data to be represented by its constant name.
This name is obtained from constants defined with:
with Consts('tag'):
HAB_TAG_IVT = 0xd1
HAB_TAG_DCD = 0xd2
HAB_TAG_CSF = 0xd4
HAB_TAG_CRT = 0xd7
HAB_TAG_SIG = 0xd8
HAB_TAG_EVT = 0xdb
HAB_TAG_RVT = 0xdd
HAB_TAG_WRP = 0x81
HAB_TAG_MAC = 0xac
The length field is a bigendian short integer with default formatter, and the version field is an unsigned byte with a dedicated formatter function that extracts major/minor versions from the byte nibbles.
This allows to decode and print the structure from provided data:
In [3]: h = HAB_Header('\xd1\x00\x0a\x40')
In [4]: print(h)
[HAB_Header]
tag :HAB_TAG_IVT
length :10
version :4.0
-
class
system.structs.
Consts
(name)[source]¶ Provides a contextmanager to map constant values with their names in order to build the associated reverse-dictionary.
All revers-dict are stored inside the Consts class definition. For example if you declare variables in a Consts(‘example’) with-scope, the reverse-dict will be stored in Consts.All[‘example’]. When StructFormatter will lookup a variable name matching a given value for the attribute ‘example’, it will get Consts.All[‘example’][value].
Note: To avoid attribute name conflicts, the lookup is always prepended the stucture class name (or the ‘alt’ field of the structure class). Hence, the above ‘tag’ constants could have been defined as:
with Consts('HAB_header.tag'): HAB_TAG_IVT = 0xd1 HAB_TAG_DCD = 0xd2 HAB_TAG_CSF = 0xd4 HAB_TAG_CRT = 0xd7 HAB_TAG_SIG = 0xd8 HAB_TAG_EVT = 0xdb HAB_TAG_RVT = 0xdd HAB_TAG_WRP = 0x81 HAB_TAG_MAC = 0xac
Or the structure definition could have define an ‘alt’ attribute:
@StructDefine(""" B : tag H :> length B : version """) class HAB_Header(StructFormatter): alt = 'hab' [...]
in which case the variables could have been defined with:
with Consts('hab.tag'): [...]
-
system.structs.
token_default_fmt
(k, x, cls=None, fmt=None)[source]¶ The default formatter just prints value ‘x’ of attribute ‘k’ as a literal token python string
-
system.structs.
token_address_fmt
(k, x, cls=None, fmt=None)[source]¶ The address formatter prints value ‘x’ of attribute ‘k’ as a address token hexadecimal value
-
system.structs.
token_constant_fmt
(k, x, cls=None, fmt=None)[source]¶ The constant formatter prints value ‘x’ of attribute ‘k’ as a constant token decimal value
-
system.structs.
token_mask_fmt
(k, x, cls=None, fmt=None)[source]¶ The mask formatter prints value ‘x’ of attribute ‘k’ as a constant token hexadecimal value
-
system.structs.
token_name_fmt
(k, x, cls=None, fmt=None)[source]¶ The name formatter prints value ‘x’ of attribute ‘k’ as a name token variable symbol matching the value
-
system.structs.
token_flag_fmt
(k, x, cls, fmt=None)[source]¶ The flag formatter prints value ‘x’ of attribute ‘k’ as a name token variable series of symbols matching the flag value
-
system.structs.
token_datetime_fmt
(k, x, cls=None, fmt=None)[source]¶ The date formatter prints value ‘x’ of attribute ‘k’ as a date token UTC datetime string from timestamp value
-
class
system.structs.
Field
(ftype, fcount=0, fname=None, forder=None, falign=0, fcomment='')[source]¶ A Field object defines an element of a structure, associating a name to a structure typename and a count. A count of 0 means that the element is an object of type typename, a count>0 means that the element is a list of objects of type typename of length count.
-
count
¶ A count of 0 means that the element is an object of type typename, a count>0 means that the element is a list of length count of objects of type typename
Type: int=0
-
type
¶ getter for the type associated with the field’s typename.
Type: StructFormatter
-
unpack
(data, offset=0)[source]¶ unpacks a data from given offset using the field internal byte ordering. Returns the object (if count is 0) or the list of objects of type typename.
-
pack
(value)[source]¶ packs the value with the internal order and returns the byte string according to type typename.
-
format
()[source] a (non-Raw)Field format is always returned as matching a finite-length string.
-
unpack
(data, offset=0)[source] returns a (sequence of count) element(s) of its self.type
-
-
class
system.structs.
RawField
(ftype, fcount=0, fname=None, forder=None, falign=0, fcomment='')[source]¶ A RawField is a Field associated to a raw type, i.e. an internal type matching a standard C type (u)int8/16/32/64, floats/double, (u)char. Contrarily to a generic Field which essentially forward the unpack call to its subtype, a RawField relies on the struct package to return the raw unpacked value.
-
class
system.structs.
BitField
(ftype, fcount=0, fname=None, forder=None, falign=0, fcomment='')[source]¶ A BitField is a 0-count RawField with additional subnames and subsizes to allow unpack the raw type into several named values each of given bit sizes.
-
class
system.structs.
VarField
(ftype, fcount=0, fname=None, forder=None, falign=0, fcomment='')[source]¶ A VarField is a RawField with variable length, associated with a termination condition that will end the unpack method. An instance of VarField has an infinite size() unless it has been unpacked with data.
-
class
system.structs.
CntField
(ftype, fcount=0, fname=None, forder=None, falign=0, fcomment='')[source]¶ A CntField is a RawField where the amount of elements to unpack is provided as first bytes, encoded as either a byte/word/dword.
-
class
system.structs.
StructDefine
(fmt, **kargs)[source]¶ StructDefine is a decorator class used for defining structures by parsing a simple intermediate language input decorating a StructFormatter class.
-
class
system.structs.
UnionDefine
(fmt, **kargs)[source]¶ UnionDefine is a decorator class based on StructDefine, used for defining unions.
-
class
system.structs.
StructCore
[source]¶ StructCore is a ParentClass for all user-defined structures based on a StructDefine format. This class contains essentially the packing and unpacking logic of the structure.
Note: It is mandatory that any class that inherits from StructCore can be instanciated with no arguments.
-
class
system.structs.
StructFormatter
[source]¶ StructFormatter is the Parent Class for all user-defined structures based on a StructDefine format. It inherits the core logic from StructCore Parent and provides all formatting facilities to pretty print the structures based on wether the field is declared as a named constant, an integer of hex value, a pointer address, a string or a date.
Note: Since it inherits from StructCore, it is mandatory that any child class can be instanciated with no arguments.
-
class
system.structs.
StructMaker
[source]¶ The StructMaker class is a StructFormatter equipped with methods that allow to interactively define and adjust fields at some given offsets or when some given sample bytes match a given value.
-
system.structs.
StructFactory
(name, fmt, **kargs)[source]¶ Returns a StructFormatter class build with name and format
system/elf.py¶
The system elf module implements Elf classes for both 32/64bits executable format.
-
exception
system.elf.
ElfError
(message)[source]¶ ElfError is raised whenever Elf object instance fails to decode required structures.
-
class
system.elf.
Elf
(f)[source]¶ This class takes a DataIO object (ie an opened file of BytesIO instance) and decodes all ELF structures found in it.
-
entrypoints
¶ list of entrypoint addresses.
Type: list of int
-
Phdr
¶ the list of ELF Program header structures.
Type: list of Phdr
-
Shdr
¶ the list of ELF Section header structures.
Type: list of Shdr
-
dynamic
¶ True if the binary wants to load dynamic libs.
Type: Bool
-
functions
¶ a list of function names gathered from internal definitions (if not stripped) and import names.
Type: list
-
getinfo
(target)[source]¶ target is either an address provided as str or int, or a symbol str searched in the functions dictionary.
- Returns a triplet with:
- section index (0 is error, -1 is a dynamic call)
- offset into section (idem)
- base virtual address (0 for dynamic calls)
-
system/pe.py¶
The system pe module implements the PE class which support both 32 and 64 bits executable formats.
-
exception
system.pe.
PEError
(message)[source]¶ PEError is raised whenever PE object instance fails to decode required structures.
-
class
system.pe.
PE
(data)[source]¶ This class takes a DataIO object (ie an opened file of BytesIO instance) and decodes all PE structures found in it.
-
entrypoints
¶ list of entrypoint addresses.
Type: list of int
-
Opt
¶ the Optional Header
Type: OptionalHdr
-
sections
¶ list of PE sections.
Type: list of SectionHdr
-
functions
¶ a list of function names gathered from internal definitions (if not stripped) and import names.
Type: list
-
tls
¶ the Thead local Storage table (or None.)
Type: TlsTable
-
locate
(addr, absolute=False)[source]¶ - returns a tuple with:
- the section that holds addr (rva or absolute), or 0 or None.
- the offset within the section (or addr or 0).
Note
If returned section is 0, then addr is within SizeOfImage, but is not found within any sections. Then offset is addr. If returned section is None, then addr is not mapped at all, and offset is set to 0.
-
getdata
(addr, absolute=False)[source]¶ get section bytes from given virtual address to end of mapped section.
-
loadsegment
(S, pagesize=0, raw=False)[source]¶ returns a dict {base: bytes} (or only bytes if optional arg raw is True,) indicating that section S data bytes (padded and extended to pagesize bounds) need to be mapped at virtual base address.
Note
If S is 0, returns base=0 and the first Opt.SizeOfHeaders bytes.
-
system/macho.py¶
The system macho module implements the Mach-O executable format parser.
-
exception
system.macho.
MachOError
(message)[source]¶ MachOError is raised whenever MachO object instance fails to decode required structures.
-
class
system.macho.
MachO
(f)[source]¶ This class takes a DataIO object (ie an opened file of BytesIO instance) and decodes all Mach-O structures found in it.
-
entrypoints
¶ list of entrypoint addresses.
Type: list of int
-
header
¶ the Mach header structure.
Type: struct_mach_header
-
archs
¶ the list of MachO instances in case the provided binary file is a “fat” format.
Type: list of MachO
-
dynamic
¶ True if the binary wants to load dynamic libs.
Type: Bool
-
dyld_info
¶ a container with dyld_info attributes rebase, bind, weak_bind, lazy_bind and export.
Type: container
-
read_fat_arch
(a)[source]¶ takes a struct_fat_arch instance and sets its ‘bin’ attribute to the corresponding MachO instance.
-
system/utils.py¶
The system utils module implements various binary file format like Intel HEX or Motorola SREC, commonly used for programming MCU, EEPROMs, etc.