23. Assembly Language
With the ability to make decisions, the ESAP system is now complete
However, using the ESAP system is not particularly easy as it’s programmed with machine code
Writing programs for the ESAP system is tedious
Machine code is prone to errors and and requires memorizing bit patterns
A solution to this problem is to improve the way programming is done
Instead of binary patterns or hex numbers, english mnemonics can be used
Other quality of life features can be included, like separating opcode mnemonic from the operand value
However, the ESAP system ultimately requires machine code
23.1. Assembler
An assembly language is a very low level programming language
Often referred to as assembly
Assembly is strongly tied to a specific system design, the underlying hardware, and its machine language
For example, the instruction set for ARM processors
An assembler is a tool to convert assembly language to machine code
When compared to machine code, it enables a more human centric way of programming the system
Assembly is effectively programming in machine code, but with a few nice features
Assembly languages have many benefits over programming in machine code, but two key important features are
Mnemonics for referring to specific instructions
For example, consider loading the value 5 into register A
Instead of the machine code
0b00100101
, one could writeLDAD 5
The mnemonic would mean the same thing, and would be translated to the machine code
But the mnemonic is much easier to remember and mentally parse
Labels/symbolic representation for memory addresses
For example, memory addresses could be labelled and referenced by their label
This would make referencing memory addresses for jumps and loading from RAM easier
Removes the need to remember specific memory addresses
Also removes the need to constantly update addresses when lines are added/removed to RAM
An assembler would take the assembly language and translate it, or assemble it, to the corresponding machine code
It would replace the mnemonics with their opcode bit patterns and translate literals to their binary/hex values
It would replace all labels within the assembly with their corresponding memory addresses
Typically, each statement in assembly has a 1-to-1 mapping to a statement in machine code
Despite its simplicity, it improves the programming experience and allows for a small amount of abstraction

An assembler is a tool used to translate assembly language to machine code. The left hand side shows an example of some assembly language making use of mnemonics. The right hand side shows the hex representation of corresponding machine code. The assembler takes the assembly language and “assembles” it to the machine code. Here, each instruction has a 1-to-1 mapping between assembly and machine code.
23.2. The ESAP Assembler
To make programming easier, a simple assembler will be built for the ESAP system
This ESAP assembler will only implement the mnemonics and interpret various literal value encodings
It will not make use of labels for memory addresses
The mnemonics make writing programs much easier
It will also make interpreting/reading programs easier
LDAR 15
versus0b00011111
or0x1F
The mnemonics for each instruction have already been discussed in a previous topic
Below is a table of all 16 instructions
This table was shown before, but did not include the conditional jump instructions
These conditional jump instructions are included here
Bit Pattern |
Hex |
Label |
Description |
---|---|---|---|
|
|
|
No Operation |
|
|
|
Load A From RAM |
|
|
|
Load A Direct |
|
|
|
Load B From RAM |
|
|
|
Load B Direct |
|
|
|
Save A to RAM |
|
|
|
Save B to RAM |
|
|
|
Add B to A — |
|
|
|
Subtract B from A — |
|
|
|
Jump Always |
|
|
|
Jump if Zero Flag Set |
|
|
|
Jump if Significant/Sign Flag Set |
|
|
|
Jump if Carry Flag Set |
|
|
|
Output Unsigned Integer |
|
|
|
Output Signed Integer |
|
|
|
Halt |
The assembler will translate literal values from various bases
For example, the programmer could write
0b1010
,10
, or0xA
to mean tenAlthough they all mean the same thing, one encoding may make more sense for the programmer in some context
Remember, code is for humans, machine code is for machines
An assembly language is one step away from machine code
Negative numbers will also be handled
The assembler will convert the number to a two’s complement number
Finally, the ESAP assembler will provide some level of error checking on the program
Check if the program will fit into RAM
Syntax
Missing operands
Values within range
Since an assembler is a program, a Python script can serve as the ESAP assembler
Below, a script created for the ESAP system’s assembler is discussed
This script is by no means the only way one could write an assembler
Its presentation serves to show the simplicity of such an assembler
It facilitates the additional layer of abstraction
A series of constants are used to simplify the code
5OPERATORS = {
6 "NOOP": 0b0000,
7 "LDAR": 0b0001,
8 "LDAD": 0b0010,
9 "LDBR": 0b0011,
10 "LDBD": 0b0100,
11 "SAVA": 0b0101,
12 "SAVB": 0b0110,
13 "ADAB": 0b0111,
14 "SUAB": 0b1000,
15 "JMPA": 0b1001,
16 "JMPZ": 0b1010,
17 "JMPS": 0b1011,
18 "JMPC": 0b1100,
19 "OUTU": 0b1101,
20 "OUTS": 0b1110,
21 "HALT": 0b1111,
22}
23HAS_OPERAND = {
24 "LDAR",
25 "LDAD",
26 "LDBR",
27 "LDBD",
28 "SAVA",
29 "SAVB",
30 "JMPA",
31 "JMPZ",
32 "JMPS",
33 "JMPC",
34 "OUTU",
35 "OUTS",
36}
37VALID_SYNTAX = {
38 r"NOOP",
39 r"LDAR\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
40 r"LDAD\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
41 r"LDBR\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
42 r"LDBD\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
43 r"SAVA\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
44 r"SAVB\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
45 r"ADAB",
46 r"SUAB",
47 r"JMPA\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
48 r"JMPZ\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
49 r"JMPS\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
50 r"JMPC\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
51 r"OUTU\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
52 r"OUTS\s+\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
53 r"HALT",
54 r"^-?\b(0x[0-9a-fA-F]+|0b[0-1]+|[0-9]+)\b",
55}
Helper functions are also included that will make the assembly loop easier to implement
58def parse_number(number_string:str) -> int:
59 """
60 Convert a string of a number to a decimal integer representable with the specified number of bits. This function
61 will work with binary (0bXXXX), hex (0xXX), decimal, etc.
62
63 :param number_string: String of a number to be converted.
64 :return: Value of the string as a decimal integer.
65 """
66 try:
67 number = int(eval(number_string))
68 except (ValueError, SyntaxError):
69 raise ValueError(f"Cannot parse operand {number_string}")
70 return number
73def verify_number_and_fix_negative(number:int, max_bits:int) -> int:
74 """
75 Verify that the number fits in the specified number of bits and convert to a signed, 2s compliment binary pattern
76 where necessary.
77
78 If the number is negative, this function applies the 2s compliment conversion since Python does not store negative
79 integers in a 2s compliment format. For example, the number -10 should be converted to the 2s compliment binary
80 pattern 0b0110. Since Python treats all binary patterns are unsigned ints, this would mean this function returns the
81 integer 6 in this case.
82
83 :param number: Number to be verified and converted
84 :param max_bits: Maximum number of bits the number can be stored in.
85 :return: Decimal version of the number (may be signed int binary pattern's decimal value).
86 """
87 # max_bits + 2 to account for the 0b in the string
88 # negative numbers are representable in max_bits - 1, but require +1 for the negative sign
89 # len(bin(number + 1)) when negative to account for edge case of the min negative number needing 1 more bit
90 if (number < 0 and len(bin(number + 1)) > max_bits + 2 or
91 number >= 0 and len(bin(number)) > max_bits + 2):
92 raise ValueError(f"Data value {number} cannot be represented with {max_bits} bits.")
93 if number < 0:
94 number = ((2**max_bits - 1 ) ^ number * -1) + 1
95 return number
verify_number_and_fix_negative
does two thingsIt verifies that a number can be represented with some specific number of bits
For example, if the number is data, it must fit in 8 bits
If the number is an operand, it must fit in 4 bits
This function also converts negative numbers to the integer representing the signed two’s complement number
This is necessary as Python has a peculiarity when it comes to signed integers
Python stores signed integers as unsigned integers with a sign flag
It does not store the signed integers as two’s complement numbers
For example
-7
would be stored as0b0111
with a sign flagHowever, the two’s complement number for
-7
is0b1001
, which is what the ESAP system expectsThis function would convert
-7
to the number representing the correct two’s complement bit patternIn this case, it would return the integer
9
, which corresponds to the bit pattern0b1001
Here, the value
9
is not important, but the underlying bit pattern for9
is important
98def verify_syntax_return_string(program_line):
99 """
100 Verifies that a given program line is valid syntax for the assembler. If valid, this function returns the string,
101 otherwise the function raises a ValueError.
102
103 :param program_line:
104 :raise ValueError: If the program line does not match a valid syntax pattern.
105 :return: Returns a valid program line
106 """
107 for syntax in VALID_SYNTAX:
108 syntax_match = re.match(syntax, program_line)
109 if syntax_match:
110 return syntax_match[0]
111 raise ValueError(f"Invalid operator and/or operand {program_line}")
The main part of the script uses the above constants and functions
114if len(sys.argv) < 2 or len(sys.argv) > 3:
115 raise ValueError(f"Assembler takes 1 or 2 argument(s), {len(sys.argv) - 1} given\n"
116 f"\tUsage: assembler.py input.as [out.hex]\n"
117 f"\t\tinput.as: the source assembly file to assemble\n"
118 f"\t\tout.hex: the output hex dig file, defaults to `a.hex` (optional)\n")
119
120file_to_assemble = sys.argv[1]
121file_to_output = sys.argv[2] if len(sys.argv) == 3 else "a.hex"
The assembler takes one or two command line arguments
One argument specifies the name of the file containing the assembly code to assemble
The second argument, which is optional, specifies the name of the file to write the machine code to
This portion of the script verifies the a correct number of arguments are provided to the script
125with open(file_to_assemble) as file:
126 program_list = [line.strip() for line in file.readlines() if line.strip()]
127
128if len(program_list) > 16:
129 raise ValueError(f"Program length of {len(program_list)} exceeds maximum size of 16 bytes")
The assembler reads the assembly language code and verifies that it fits within RAM
The main loop of the assembler processes one instruction at a time
133machine_code = []
134for i, raw_program_line in enumerate(program_list):
135 verified_program_line = verify_syntax_return_string(raw_program_line)
136 line = verified_program_line.split()
137 if line[0].isalpha():
138 operator = OPERATORS[line[0]]
139 if line[0] not in HAS_OPERAND:
140 operand = 0
141 else:
142 operand = parse_number(line[1])
143 operand = verify_number_and_fix_negative(operand, 4)
144 machine_code_line = (operator << 4) | operand
145 else:
146 machine_code_line = parse_number(line[0])
147 machine_code_line = verify_number_and_fix_negative(machine_code_line, 8)
148 machine_code.append(machine_code_line)
The main loop verifies the line and processes it as an instruction or data accordingly
If it’s an instruction, it processes the operand if necessary too
Finally, the assembler saves the assembled code to a file
152with open(file_to_output, "w") as hex_file:
153 hex_file.write("v2.0 raw\n")
154 hex_file.writelines([f"0x{code:02x}\n" for code in machine_code])
Using the assembler is then a matter of running the script with the proper command line arguments
For example,
python assembler.py to_assemble.esap assembled.hex
The file extension
.esap
is not necessary for the assembly language fileIf the second argument is not set, the file is saved to
a.hex
by default
23.3. For Next Time
Something?