- A+

I am working on an assembly language program for a 6502 cpu, and am finding that I need a fast-as-possible divide-by-seven routine, in particular one which could take a 16-bit dividend.

I am familiar with the routines found here, but generalizing the divide-by-seven routine found there is quite complicated, and a cursory examination of the general algorithm (using integer division)

x/7 ~= (x + x/8 + x/64 ... )/8

indicates that to handle a 16-bit range, it would likely take over 100 cycles to complete because of the 6502's single accumulator register and the relative slowness of individual memory bit shifts on the 6502.

I thought that a lookup table might help, but on the 6502, I am of course restricted to lookup tables that are 256 bytes or fewer. To that end one could assume the existence of two 256-byte lookup tables, xdiv7 and xmod7, which, when using an unsigned one-byte value as an index into the table, can quickly obtain the result of a byte divided by 7 or modulo 7, respectively. I am not sure how I could utilize these to find the values for the full 16-bit range, however.

In parallel, I also need a modulo 7 algorithm, although ideally whatever solution that can be worked out with the division will produce a mod7 result as well. If additional precomputed tables are needed, I am amenable to adding those, as long as the total memory requirements for all tables does not exceed about 3k.

Although I ultimately need a signed division algorithm, an unsigned one would suffice, because I can generalize that to a signed routine as needed.

Any assistance would be greatly appreciated.

Note: As @Damien_The_Unbeliever pointed out in the comments, the `upperHigh`

and `lowerLow`

tables are identical. So they can be combined into a single table. However, that optimization will make the code harder to read, and the explanation harder to write, so combining the tables is left as an exercise for the reader.

The code below shows how to generate the quotient and remainder when dividing a 16-bit unsigned value by 7. The simplest way to explain the code (IMO) is with an example, so let's consider dividing `0xa732`

by 7. The expected result is:

`quotient = 0x17e2 remainder = 4 `

We start by considering the input as two 8-bit values, the `upper`

byte and the `lower`

byte. The `upper`

byte is `0xa7`

and the `lower`

byte is `0x32`

.

We compute a quotient and remainder from the `upper`

byte:

`0xa700 / 7 = 0x17db 0xa700 % 7 = 3 `

So we need three tables:

`upperHigh`

stores the high byte of the quotient:`upperHigh[0xa7] = 0x17`

`upperLow`

stores the low byte of the quotient:`upperLow[0xa7] = 0xdb`

`upperRem`

stores the remainder:`upperRem[0xa7] = 3`

And we compute the quotient and remainder from the `lower`

byte:

`0x32 / 7 = 0x07 0x32 % 7 = 1 `

So we need two tables:

`lowerLow`

stores the low byte of the quotient:`lowerLow[0x32] = 0x07`

`lowerRem`

stores the remainder:`lowerRem[0x32] = 1`

Now we need to assemble the final answer. The remainder is the sum of two remainders. Since each remainder is in the range [0,6] the sum is in the range [0,12]. So we can use two 13 byte lookups to convert the sum into a final remainder and a carry.

The low byte of the quotient is the sum of that carry and the values from the `lowerLow`

and `upperLow`

tables. Note that the sum may generate a carry into the high byte.

The high byte of the quotient is the sum of that carry and the value from the `upperHigh`

table.

So, to complete the example:

`remainder = 1 + 3 = 4 // simple add (no carry in) lowResult = 0x07 + 0xdb = 0xe2 // add with carry from remainder highResult = 0x17 // add with carry from lowResult `

The assembly code to implement this consists of 7 table lookups, an add-without-carry instruction and two add-with-carry instructions.

`#include <stdio.h> #include <stdint.h> uint8_t upperHigh[256]; // index:(upper 8 bits of the number) value:(high 8 bits of the quotient) uint8_t upperLow[256]; // index:(upper 8 bits of the number) value:(low 8 bits of the quotient) uint8_t upperRem[256]; // index:(upper 8 bits of the number) value:(remainder when dividing the upper bits by 7) uint8_t lowerLow[256]; // index:(lower 8 bits of the number) value:(low 8 bits of the quotient) uint8_t lowerRem[256]; // index:(lower 8 bits of the number) value:(remainder when dividing the lower bits by 7) uint8_t carryRem[13] = { 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 }; uint8_t combinedRem[13] = { 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5 }; void populateLookupTables(void) { for (uint16_t i = 0; i < 256; i++) { uint16_t upper = i << 8; upperHigh[i] = (upper / 7) >> 8; upperLow[i] = (upper / 7) & 0xff; upperRem[i] = upper % 7; uint16_t lower = i; lowerLow[i] = lower / 7; lowerRem[i] = lower % 7; } } void divideBy7(uint8_t upperValue, uint8_t lowerValue, uint8_t *highResult, uint8_t *lowResult, uint8_t *remainder) { uint8_t temp = upperRem[upperValue] + lowerRem[lowerValue]; *remainder = combinedRem[temp]; *lowResult = upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp]; uint8_t carry = (upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp]) >> 8; // Note this is just the carry flag from the 'lowResult' calcaluation *highResult = upperHigh[upperValue] + carry; } int main(void) { populateLookupTables(); uint16_t n = 0; while (1) { uint8_t upper = n >> 8; uint8_t lower = n & 0xff; uint16_t quotient1 = n / 7; uint16_t remainder1 = n % 7; uint8_t high, low, rem; divideBy7(upper, lower, &high, &low, &rem); uint16_t quotient2 = (high << 8) | low; uint16_t remainder2 = rem; printf("n=%u q1=%u r1=%u q2=%u r2=%u", n, quotient1, remainder1, quotient2, remainder2); if (quotient1 != quotient2 || remainder1 != remainder2) printf(" **** failed ****"); printf("/n"); n++; if (n == 0) break; } } `