23. Assembly Language

  • With the ability to make decisions, the ESAP system is now complete

  • However, using the ESAP system is not particularly easy as it’s programmed with machine code

    • Writing programs for the ESAP system is tedious

    • Machine code is prone to errors and and requires memorizing bit patterns

  • A solution to this problem is to improve the way programming is done

    • Instead of binary patterns or hex numbers, english mnemonics can be used

    • Other quality of life features can be included, like separating opcode mnemonic from the operand value

  • However, the ESAP system ultimately requires machine code

23.1. Assembler

  • An assembly language is a very low level programming language

    • Often referred to as assembly

    • Assembly is strongly tied to a specific system design, the underlying hardware, and its machine language

    • For example, the instruction set for ARM processors

  • An assembler is a tool to convert assembly language to machine code

  • When compared to machine code, it enables a more human centric way of programming the system

    • Assembly is effectively programming in machine code, but with a few nice features

  • Assembly languages have many benefits over programming in machine code, but two key important features are

    • Mnemonics for referring to specific instructions

      • For example, consider loading the value 5 into register A

      • Instead of the machine code 0b00100101, one could write LDAD 5

      • The mnemonic would mean the same thing, and would be translated to the machine code

      • But the mnemonic is much easier to remember and mentally parse

    • Labels/symbolic representation for memory addresses

      • For example, memory addresses could be labelled and referenced by their label

      • This would make referencing memory addresses for jumps and loading from RAM easier

      • Removes the need to remember specific memory addresses

      • Also removes the need to constantly update addresses when lines are added/removed to RAM

  • An assembler would take the assembly language and translate it, or assemble it, to the corresponding machine code

    • It would replace the mnemonics with their opcode bit patterns and translate literals to their binary/hex values

    • It would replace all labels within the assembly with their corresponding memory addresses

  • Typically, each statement in assembly has a 1-to-1 mapping to a statement in machine code

  • Despite its simplicity, it improves the programming experience and allows for a small amount of abstraction

../../_images/assembly_to_machine_code.png

An assembler is a tool used to translate assembly language to machine code. The left hand side shows an example of some assembly language making use of mnemonics. The right hand side shows the hex representation of corresponding machine code. The assembler takes the assembly language and “assembles” it to the machine code. Here, each instruction has a 1-to-1 mapping between assembly and machine code.

23.2. The ESAP Assembler

  • To make programming easier, a simple assembler will be built for the ESAP system

  • This ESAP assembler will only implement the mnemonics and interpret various literal value encodings

    • It will not make use of labels for memory addresses

  • The mnemonics make writing programs much easier

  • It will also make interpreting/reading programs easier

    • LDAR 15 versus 0b00011111 or 0x1F

  • The mnemonics for each instruction have already been discussed in a previous topic

    • Below is a table of all 16 instructions

    • This table was shown before, but did not include the conditional jump instructions

    • These conditional jump instructions are included here

Complete Instruction Set for the Current ESAP System

Bit Pattern

Hex

Label

Description

0000

0

NOOP

No Operation

0001

1

LDAR

Load A From RAM

0010

2

LDAD

Load A Direct

0011

3

LDBR

Load B From RAM

0100

4

LDBD

Load B Direct

0101

5

SAVA

Save A to RAM

0110

6

SAVB

Save B to RAM

0111

7

ADAB

Add B to A — A += B

1000

8

SUAB

Subtract B from A — A -= B

1001

9

JMPA

Jump Always

1010

A

JMPZ

Jump if Zero Flag Set

1011

B

JMPS

Jump if Significant/Sign Flag Set

1100

C

JMPC

Jump if Carry Flag Set

1101

D

OUTU

Output Unsigned Integer

1110

E

OUTS

Output Signed Integer

1111

F

HALT

Halt

  • The assembler will translate literal values from various bases

    • For example, the programmer could write 0b1010, 10, or 0xA to mean ten

    • Although they all mean the same thing, one encoding may make more sense for the programmer in some context

      • Remember, code is for humans, machine code is for machines

      • An assembly language is one step away from machine code

  • Negative numbers will also be handled

    • The assembler will convert the number to a two’s complement number

  • Finally, the ESAP assembler will provide some level of error checking on the program

    • Check if the program will fit into RAM

    • Syntax

    • Missing operands

    • Values within range

  • Since an assembler is a program, a Python script can serve as the ESAP assembler

  • Below, a script created for the ESAP system’s assembler is discussed

    • This script is by no means the only way one could write an assembler

    • Its presentation serves to show the simplicity of such an assembler

    • It facilitates the additional layer of abstraction

  • A series of constants are used to simplify the code

 5OPERATORS = {
 6    "NOOP": 0b0000,
 7    "LDAR": 0b0001,
 8    "LDAD": 0b0010,
 9    "LDBR": 0b0011,
10    "LDBD": 0b0100,
11    "SAVA": 0b0101,
12    "SAVB": 0b0110,
13    "ADAB": 0b0111,
14    "SUAB": 0b1000,
15    "JMPA": 0b1001,
16    "JMPZ": 0b1010,
17    "JMPS": 0b1011,
18    "JMPC": 0b1100,
19    "OUTU": 0b1101,
20    "OUTS": 0b1110,
21    "HALT": 0b1111,
22}
23HAS_OPERAND = {
24    "LDAR",
25    "LDAD",
26    "LDBR",
27    "LDBD",
28    "SAVA",
29    "SAVB",
30    "JMPA",
31    "JMPZ",
32    "JMPS",
33    "JMPC",
34    "OUTU",
35    "OUTS",
36}
37VALID_SYNTAX = {
38    r"NOOP",
39    r"LDAR\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
40    r"LDAD\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
41    r"LDBR\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
42    r"LDBD\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
43    r"SAVA\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
44    r"SAVB\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
45    r"ADAB",
46    r"SUAB",
47    r"JMPA\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
48    r"JMPZ\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
49    r"JMPS\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
50    r"JMPC\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
51    r"OUTU\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
52    r"OUTS\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
53    r"HALT",
54    r"^-?\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
55}
  • Helper functions are also included that will make the assembly loop easier to implement

58def parse_number(number_string:str) -> int:
59    """
60    Convert a string of a number to a decimal integer representable with the specified number of bits. This function
61    will work with binary (0bXXXX), hex (0xXX), decimal, etc.
62
63    :param number_string: String of a number to be converted.
64    :return: Value of the string as a decimal integer.
65    """
66    try:
67        number = int(eval(number_string))
68    except (ValueError, SyntaxError):
69        raise ValueError(f"Cannot parse operand {number_string}")
70    return number
73def verify_number_and_fix_negative(number:int, max_bits:int) -> int:
74    """
75    Verify that the number fits in the specified number of bits and convert to a signed, 2s compliment binary pattern
76    where necessary.
77
78    If the number is negative, this function applies the 2s compliment conversion since Python does not store negative
79    integers in a 2s compliment format. For example, the number -10 should be converted to the 2s compliment binary
80    pattern 0b0110. Since Python treats all binary patterns are unsigned ints, this would mean this function returns the
81    integer 6 in this case.
82
83    :param number: Number to be verified and converted
84    :param max_bits: Maximum number of bits the number can be stored in.
85    :return: Decimal version of the number (may be signed int binary pattern's decimal value).
86    """
87    # max_bits + 2 to account for the 0b in the string
88    # negative numbers are representable in max_bits - 1, but require +1 for the negative sign
89    # len(bin(number + 1)) when negative to account for edge case of the min negative number needing 1 more bit 
90    if (number < 0 and len(bin(number + 1)) > max_bits + 2 or
91            number >= 0 and len(bin(number)) > max_bits + 2):
92        raise ValueError(f"Data value {number} cannot be represented with {max_bits} bits.")
93    if number < 0:
94        number = ((2**max_bits - 1 ) ^ number * -1) + 1
95    return number
  • verify_number_and_fix_negative does two things

    • It verifies that a number can be represented with some specific number of bits

      • For example, if the number is data, it must fit in 8 bits

      • If the number is an operand, it must fit in 4 bits

    • This function also converts negative numbers to the integer representing the signed two’s complement number

      • This is necessary as Python has a peculiarity when it comes to signed integers

      • Python stores signed integers as unsigned integers with a sign flag

        • It does not store the signed integers as two’s complement numbers

      • For example -7 would be stored as 0b0111 with a sign flag

      • However, the two’s complement number for -7 is 0b1001, which is what the ESAP system expects

      • This function would convert -7 to the number representing the correct two’s complement bit pattern

      • In this case, it would return the integer 9, which corresponds to the bit pattern 0b1001

      • Here, the value 9 is not important, but the underlying bit pattern for 9 is important

 98def verify_syntax_return_string(program_line):
 99    """
100    Verifies that a given program line is valid syntax for the assembler. If valid, this function returns the string,
101    otherwise the function raises a ValueError.
102
103    :param program_line:
104    :raise ValueError: If the program line does not match a valid syntax pattern.
105    :return: Returns a valid program line
106    """
107    for syntax in VALID_SYNTAX:
108        syntax_match = re.match(syntax, program_line)
109        if syntax_match:
110            return syntax_match[0]
111    raise ValueError(f"Invalid operator and/or operand {program_line}")
  • The main part of the script uses the above constants and functions

114if len(sys.argv) < 2 or len(sys.argv) > 3:
115    raise ValueError(f"Assembler takes 1 or 2 argument(s), {len(sys.argv) - 1} given\n"
116                     f"\tUsage: assembler.py input.as [out.hex]\n"
117                     f"\t\tinput.as: the source assembly file to assemble\n"
118                     f"\t\tout.hex: the output hex dig file, defaults to `a.hex` (optional)\n")
119
120file_to_assemble = sys.argv[1]
121file_to_output = sys.argv[2] if len(sys.argv) == 3 else "a.hex"
  • The assembler takes one or two command line arguments

    • One argument specifies the name of the file containing the assembly code to assemble

    • The second argument, which is optional, specifies the name of the file to write the machine code to

    • This portion of the script verifies the a correct number of arguments are provided to the script

125with open(file_to_assemble) as file:
126    program_list = [line.strip() for line in file.readlines() if line.strip()]
127
128if len(program_list) > 16:
129    raise ValueError(f"Program length of {len(program_list)} exceeds maximum size of 16 bytes")
  • The assembler reads the assembly language code and verifies that it fits within RAM

  • The main loop of the assembler processes one instruction at a time

133machine_code = []
134for i, raw_program_line in enumerate(program_list):
135    verified_program_line = verify_syntax_return_string(raw_program_line)
136    line = verified_program_line.split()
137    if line[0].isalpha():
138        operator = OPERATORS[line[0]]
139        if line[0] not in HAS_OPERAND:
140            operand = 0
141        else:
142            operand = parse_number(line[1])
143            operand = verify_number_and_fix_negative(operand, 4)
144        machine_code_line = (operator << 4) | operand
145    else:
146        machine_code_line = parse_number(line[0])
147        machine_code_line = verify_number_and_fix_negative(machine_code_line, 8)
148    machine_code.append(machine_code_line)
  • The main loop verifies the line and processes it as an instruction or data accordingly

    • If it’s an instruction, it processes the operand if necessary too

  • Finally, the assembler saves the assembled code to a file

152with open(file_to_output, "w") as hex_file:
153    hex_file.write("v2.0 raw\n")
154    hex_file.writelines([f"0x{code:02x}\n" for code in machine_code])
  • Using the assembler is then a matter of running the script with the proper command line arguments

    • For example, python assembler.py to_assemble.esap assembled.hex

    • The file extension .esap is not necessary for the assembly language file

    • If the second argument is not set, the file is saved to a.hex by default

23.3. For Next Time

  • Something?