Can we say that an x86 CPU has data types?

  • A+
Category:Languages

An x86 CPU have some instructions that deal with integers and floating-point numbers.

For example: the INC instruction increments an integer (which can be stored in memory or in a register) by 1, so the INC instruction "knows" that it should interpret the bits that it is manipulating as an integer. So can we say that an x86 CPU have data types (in the same way we can say that C++ have data types)? or in order for us to be able to say that, an x86 CPU should provide other features like type safety (which it doesn't provide)?

 


Yes, asm has operations that work with data in different formats, and you could call those types. But there is zero type safety. That's a good way to express it.

so the INC instruction "knows" that it should interpret the bits that it is manipulating as an integer.

But that's a clumsy way to express this. INC doesn't "know" anything; it just feeds the operand to a binary adder in an ALU. It's completely up to the programmer (or compiler) to use the right instructions in the right order on the right bytes to get the desired result. e.g. to implement high-level variables with types.

Every asm instruction does what it says on the tin, no more, no less. The Operation section in the instruction-set reference manual entry documents the full effect it has on the architectural state of the machine, including FLAGS and possible exceptions. e.g. inc. Or a more complicated instruction with more interesting pseudocode that show where each bit gets deposited, BMI2 pdep r32a, r32b, r/m32 (and diagrams). Intel's PDF that these are extracted from has an intro section that explains any notation like CF ← Bit(BitBase, BitOffset); for bts (bit test-and-set)


Everything is just bytes (including pointers, and floats, integers, strings, and even code in a von Neumann architecture like x86). (Or on machines with some things that aren't a multiple of 1 byte, everything is just bits.)

Nothing will magically scale indices by a type width for you. (Although AVX512 does use scaled disp8 in addressing modes, so an 8-bit displacement can encode up to -128..+127 times the vector width, instead of only that many bytes. In source-level assembly, you still write byte offsets, and it's up to the assembler to use a more compact machine-code encoding when possible.)

If you want to use inc al on the low byte of a pointer to cycle through the first 256 bytes of an (aligned) array, that's totally fine. (And efficient on CPUs other than P6-family where you'll get a partial-register stall when reading the full register.)


It's true to some degree that x86 has native support for many types. Most integer instructions come in byte, word, dword and qword operand size. And of course there are FP instructions (float / double / long double), and even the mostly-obsolete BCD stuff.

If you care about signed vs. unsigned overflow, you look at OF or CF respectively. (So signed vs. unsigned integer is a matter of which flags you look at after the fact for most instructions, because add / sub are the same binary operation for unsigned and 2's complement).

But widening multiply, and divide, do come in signed and unsigned versions. One-operand imul vs. mul (and BMI2 mulx) do signed or unsigned N x N => 2N-bit multiplication. (But often you don't need the high-half result and can simply use the more efficient imul r32, r/m32 (or other operand size). The low half of a multiply is the same binary operand for a signed or unsigned interpretation of the inputs; only the high half differs depending on whether the MSB of the inputs has a positive or negative place-value.)


It's not always a good idea to use the same operand size as the C++ data type you're implementing. e.g. 8-bit and 16-bit can often be calculated with 32-bit operand-size, avoiding any partial-register issues. For add/sub, carry only propagates from LSB to MSB, so you can do 32-bit operations and only use the low 8 bits of the result. (Unless you need to right-shift or something.) And of course 8-bit operand size for cmp can be handy, but that doesn't write any 8-bit registers.


x86 data types/formats include much more than just integer

  • signed 2's complement and unsigned binary integer
  • IEEE float and double, with SSE and SSE2, and x87 memory operands.
  • half-precision 16-bit float (vcvtph2ps and the reverse): load/store only. Some Intel CPUs have half-precision mul/add support in the GPU, but the x86 IA cores can only convert to save memory bandwidth and use at least float for vector FP math instructions.
  • 80-bit extended precision with x87
  • 80-bit BCD with x87 fbstp
  • packed and unpacked BCD, supported by the AF flag (nibble-carry) and instructions like DAA (packed-BCD decimal adjust AL after addition) and AAA (ASCII adjust after adition: for unpacked BCD in AL, AH). not in 64-bit mode
  • bitmaps with bt/bts/etc: bts [rdi], eax can select a bit outside the dword at rdi. Unlike with a register destination, the bit-index is not masked with &0x1f (https://www.felixcloutier.com/x86/bt). (This is why bt/bts/etc mem,reg is so many uops, while reg,reg and mem,immediate are not bad).

See also How to read the Intel Opcode notation for a list of all notation used in Intel's instruction-set reference manual. e.g. r/m8 is an 8-bit integer register or memory location. imm8 is an 8-bit immediate. (Typically sign-extended to the operand-size if that's larger than 8.)

The manual uses m32fp for x87 FP memory operands, vs. m32int for x87 fild / fistp (integer load/store), and other integer-source x87 instructions like fiadd.

Also stuff like m16:64, a far pointer in memory (segment:offset), e.g. as an operand for an indirect far jmp or far call. It would certainly be reasonable to count far pointers and a "type" that x86 supports. There are instructions like lgs rdi, [rsi] that loads gs:rdi from the 2+8 byte operand pointed to by rsi. (More usually used in 16-bit code, of course.)

m128 / xmm might not be what you'd really call a "data type" though; no SIMD instructions actually treat the operand as a 128-bit or 512-bit integer. 64-bit elements are the largest for anything except shuffles. (Or pure bitwise operations, but that's really 128 separate AND operations in parallel, no interaction between neighbouring bits at all.)

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: