Serving the Quantitative Finance Community

  • 1
  • 6
  • 7
  • 8
  • 9
  • 10
 
User avatar
rmax
Topic Author
Posts: 374
Joined: December 8th, 2005, 9:31 am

Programming Education

September 29th, 2014, 7:45 am

QuoteOriginally posted by: outrunThis guy building a CPU simulator immediately made me think of you rmax!Thanks for this. Interesting article/blog. Make me think even more that what I am attempting to do is a big ask!
 
User avatar
rmax
Topic Author
Posts: 374
Joined: December 8th, 2005, 9:31 am

Programming Education

October 17th, 2014, 4:48 pm

Preview. Comments welcome.Quote2.3 Addressing ModesThe CPU has a number of different addressing modes. These are set in bit 23 and bit 24 of the instruction. A brief description of the addressing modes follows:2.3.1 ImmediateThis addressing mode is used when loading data into registers, or executing a relative jump.LDR A, #1234H ; Load Register A with the number 1234HEX (4660DEC)BNE #-1234 ; Relative Branch when the Not Equal flag is set by - 1234DEC i.e. subtract 1235 from the program counter This addressing mode will always expect there to be a second byte for the CPU to read to contain the additional information.2.3.2 RegisterThis addressing mode is used when only the register value is used, and hence the entire instruction is contained in a single word.ADD B, A ; Add the contents of register B to the contents of register ABNE A ; Execute a relative branch when the not equal flag is set 2.3.3 DirectThis addressing mode is used when the contents of the operands are a memory pointers rather than the contents of the memory.ADD A, &B ; Add the contents of the memory location pointed to by register B to the contents of register A and store in register A BNE &B ; Branch when the not equal flag is set to the memory location held in register B2.3.4 Direct OffsetUses the offset operand to adjust the from register pointed location. ADD C, &A, &B ; Add the memory contents pointed to by register A to the memory contents pointed to by B + C and store the result in the memory location pointed to by register ANote this mode can only be used when with the FROM register when it is in Direct mode (e.g. ADD C, &A, B is invalid, but ADD &C, &A, &B is valid)2.4 One Word Instructions ? Register Operations only2.4.1 NOP ? No OPerationThis command executed no operations on the CPU.2.4.2 RSR ? Return from SubroutineReturn from the sub routine by popping the Flag register from the stack and popping the PC register from the stack, setting the PC to the new value.2.4.3 - Deleted2.4.4 ADD ? ADD RegistersAdd two registers or the contents of memory locations pointed to by the registers together.Addressing Mode Instruction CommentRegister ADD A, B ; Add contents from Register B to Register ARegister, Direct ADD A, &B ; Add the contents of the memory location pointed to by Register B to Register ADirect ADD &A, &B ; Add the contents of memory location pointed to by register B to the contents pointed to by register A and store in the memory location pointed to by register AOffset Direct ADD C, A, &B ; Add the contents of Register C to Register B to get a new memory locationInvalid ADD C, &A, BADD &C, &A, B ; Invalid. This implies that B + some combination of C is added to memory location pointed at by register A ADD A, &12345 ; Invalid. Register addressing onlyOn Overflow the To Register (or memory location) will contain the From OR To and the CARRY Status flag will be set. 2.4.5 SUB ? Subtract RegistersSubtract two registers or the contents of two Registers or memory locations pointed to by the registers.2.4.6 MUL ? Multiply RegistersMultiply two registers or the contents of two Registers or memory locations pointed to by the registers.2.4.7 DIV ? Divide Divide two registers or the contents of two Registers or memory locations pointed to by the registers.2.4.8 MOD ? Modulos operation Take the modulus of two registers or the contents of two Registers or memory locations pointed to by the registers returning the modulus in the TO register or memory location.2.4.9 AND ? Bitwise AND Logical AND two registers or the contents of two Registers or memory locations pointed to by the registers.2.4.10 ORR ? Bitwise ORLogical OR two registers or the contents of two Registers or memory locations pointed to by the registers2.4.11 XOR? Bitwise XOR Logical Exclusive OR two registers or the contents of two Registers or memory locations pointed to by the registers.2.4.12 NOT ? Bitwise NOT NOT ALL registers or the contents of two Registers or memory locations pointed to by the registers.This instruction can only be used with a single FROM register operand.NOT A ; Bitwise flip of all bits in Register ANOT &A ; Bitwise flip of the memory address pointed to be register A2.4.13 ROL ? Rotate bits left Rotate Bits left in two Registers or memory locations pointed to by the registers by 1 bit.ROL AROL &A2.4.14 ROR ? Rotate bits right Rotate Bits right in two Registers or memory locations pointed to by the registers by 1 bit.ROR AROR &A2.4.15 PSH ? Push Register Push onto the stack the From register or memory locations pointed to by the register.PUSH APUSH &A2.4.16 POP ? Pop register Pop from the stack the to the From register or memory locations pointed to by the register.POP APOP &A2.4.17 CMP ? Compare contentsCompare the contents of the From and TO register or memory locations pointed to by the register and set the flags accordingly.2.5 Two Word Instructions2.5.1 Bxx ? Branch InstructionsBit 23 determines if the Branch instruction will be relative, or absolute.If the Register is set, then register addressing is used, if it is not set, then the CPU will read the next instruction from instruction memory and use that to execute the branch.Type of branch contained in bit 20 to bit 13.Example:BNE &1234 ; Branch to address &1234BNE #-124 ; Branch relative -124BNE #A ; Branch relative contents in Register ABNE &A ; Branch to relative addressBranch Flags:Bit Branch Name Description, when set0 N/A Reserved Not used1 BRC Carry An instruction requires a carry operation (i.e. the result is greater than can be held in a WORD)2 BDZ Divide by Zero Processor encounter a divide by zero exception3 BNE Not Equal Result of the instruction sets are not equal e.g. after comparing the contents of two registers4 BEQ Equal Result of the instruction sets the Equal flag e.g. after comparing the contents of two registers5 BGT Greater than Result of the instruction sets the Greater than flag e.g. after comparing the contents of two registers6 BLT Less than Result of the instruction sets the Less than flag e.g. after comparing the contents of two registers7 BOV Overflow Instruction returns result that is either greater than 32 bits or 31 bits signedAn extension will allow more than one condition to be tested at one time.2.5.2 LDR ? LoadLoad the contents either immediate value or memory location into the register from the data store memory.LDR A, #42 ; load 42 into register ALDR &A, #42 ; load 42 into the memory location pointed to by A2.5.3 STR ? Store Load the contents either immediate value or memory location into the register from the data store memory.STR A, B ; store register B into register ASTR A, B ; store register B into register ASTR &A, #42 ; Store 42 into the memory location pointed to by Register A2.5.4 JMP ? JumpJump to the location pointed by the register. Similar to the Branch instruction but can only be implemented through register and addressing mode. JMP A ; Jump to relative AJMP &A ; jump to the location pointed at by A2.5.5 JSR ? Jump SubroutineSimilar to JMP, but pushing the PC and FLAG onto the stack prior to changing the PC.2.5.6 MOV ? Move ContentsMOV A, B ; Move register contents B into register AMOV &A, B ; Move registers contests B into memeory location pointed at by AMOV A, &B ; Move register contents of memory location pointed at by B to register AMOV &A, &B ; Move contents of memory location pointed at by B to memory location pointed at by AMOV A, #42 ; Load #42 into register AMOV &A, #42 ; Load #42 into memory contents pointed at by AMOV A, B, C ; Not allowed as B is not a memory pointerMOV A,&B, C ;Store C into memory location pointed at by B + AMOV A, &B, #42 Store 42 into memory location pointed at by B + AMOV &12345, #42 ; illegal
 
User avatar
rmax
Topic Author
Posts: 374
Joined: December 8th, 2005, 9:31 am

Programming Education

October 18th, 2014, 2:35 pm

I will publish the instruction layout when I get a chance. But I have a 32bit word instruction where bits 32 - 26 are used to define the instruction and bits 1 - 12 (I am writing from memory at the moment so will have to check my notes) are used either to define the registers if using register addressing or other things in specific cases. One specific case is if you are doing a branch instruction that is a relative jump. Then the bits are used to define the size if jump, using twos compliment for negative jumps.The interesting thing is that you can reduce everything to a single instructor set architecture. But the idea behind this was it would be educational (initially at least - I do have other thoughts as to what could be done with it).
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Programming Education

October 18th, 2014, 3:30 pm

WOW! You've done a lot of work! But perhaps this is much more complex than required and maybe even so complex that it harms your educational goals. All the addressing modes are nice, but maybe they are too confusing for a beginning programmer?I'm especially surprised you included MUL and DIV in the instruction set. Yes, they help the processor have a convenient level of math functionality (like a 4-function calculator). Yet learning to create your own algorithms for MUL and DIV provide the beginning student with a lot of great lessons. Everyone learns the paper-and-pencil algorithm for these two functions so it provides a great opportunity for learning how to translate a human step-by-step process into a machine one. Writing one's own MUL and DIV are also a great introduction to the challenge of representing numerical systems with only 32-bit integers (e.g., What's 70,000 times 70,000? What's 2 divided by 3? Is it 0, 1, or does one need some kind of fixed point representation?)What about emulating a specific processor such as the Atmel AVR. This has the advantages that: 1) the user learns a useful assembly language; 2) the code can be loaded into an Arduino board where it will actually do something; 3) there's compilers for help you or the student convert higher level language algorithms into assembly for fun, edification, and modification; 4) there's off-the-shelf emulators so you can test the output of your emulator against a "correct" standard.P.S. I notice your instruction set lacks two very common instructions: increment and decrement.
Last edited by Traden4Alpha on October 17th, 2014, 10:00 pm, edited 1 time in total.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Programming Education

October 19th, 2014, 12:01 pm

QuoteOriginally posted by: outrunI would like to learn more about the addressing mode, how it works on the transistor level.How does the transistor diagram look of the bit in the instruction cycle that fetches the instruction that's stored at the instruction pointer location? The fetch step.?It depends on what generation processor you're talking about.For early generation machines, there's just a array of AND gates in which each gate is connected to one bit of the PC register and a control signal line with the output going into a respective bit on an internal bus in the processor. That bus goes by the various registers, address line outputs, data line outputs, etc. A similar pattern of gates connects the internal bus to each of these other "destinations." By simultaneously sending a 1 down the control line for the gates between the PC register and the bus as well as sending a 1 down the control line for the gates between the bus and the address lines, the contents of the PC register appear on the address lines of the processor which starts the fetching of that location in RAM.Later machines become much more complicated for two interrelated reasons. The early generation machines had a lot of circuits with high fan-out such as the control lines for those arrays of control gates in which one control line is electronically driving 8, 16, or 32 other circuits. Propagation time became a problem as clock speeds increased so all these circuits were redesigned so that no single transistor had to drive more than a few other nearby transistors. That change fragmented the internal circuitry but enabled the development of pipelines. In early CPUs, it took about 3 to 12 clock cycles to complete each individual instruction -- an 8 MHz CPU chip might average only 1 MIPS. Pipelining let the CPU start the next instruction before the earlier one was complete -- instructions/clock cycle increased but so did the latency of each instruction. Some of the Pentium designs have pipelines that are 31 clock cycles long. A related development was superscalar processors in which multiple instructions could be handled in parallel (e.g., A+B->C and D+E->F) at the same time. Both the circuits and code for pipelined and superscalar are really really messy because of all the potential dependencies in time and circuit resource elements. If one wants to add registers A and B together into register C and then use the contents of C for something else, the instructions need to be timed so that the instruction that accesses C executes a sufficient number clock cycles down stream of the instruction that calculated C -- the new value of C needs to be in the register before C is read by the next instruction. There's a reason that all the modern CPUs have 1 billion transistors or more!Both you and rmax might enjoy 4004.com which has schematics for the Intel 4004.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Programming Education

October 19th, 2014, 4:19 pm

QuoteOriginally posted by: outrunThanks T4A!Suppose I have a 32 byte memory with instructions. The instruction counter is at 19, ie 10011. Does this mean that there are 32 XOR masks -one for each byte in the memory- that get XORed wit the instruction counter. The one that all all the bits AND to 1 will feed that 1 across all bits in the memory location, AND it with the bits, and that will give 8 bit outputs, which is effectively the fetch of byte nr 19?Does this scale?Each of the 32 stored bits in the instruction counter has a circuit that leads to it's respective line on the 32 lines of the bus which connects to respective bits in other 32-bit registers, 32-bit address lines, etc.I'm not sure I understand the XOR part. Shouldn't it be an AND?The scaling is linear.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Programming Education

October 19th, 2014, 7:40 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: Traden4AlphaQuoteOriginally posted by: outrunThanks T4A!Suppose I have a 32 byte memory with instructions. The instruction counter is at 19, ie 10011. Does this mean that there are 32 XOR masks -one for each byte in the memory- that get XORed wit the instruction counter. The one that all all the bits AND to 1 will feed that 1 across all bits in the memory location, AND it with the bits, and that will give 8 bit outputs, which is effectively the fetch of byte nr 19?Does this scale?Each of the 32 stored bits in the instruction counter has a circuit that leads to it's respective line on the 32 lines of the bus which connects to respective bits in other 32-bit registers, 32-bit address lines, etc.I'm not sure I understand the XOR part. Shouldn't it be an AND?The scaling is linear.The XOR was about that "a circuit that leads to it's respective line". I though each line had its own XOR mask, but now I realize it can be done with a binary tree.I fear I've confused things. Bit #1 of the instruction counter goes to input A of AND #1. The output of AND gate #1 goes to bus line #1. Bit #2 of the instruction counter goes to input A of AND gate #2. The output of AND gate #2 goes to bus line #2. .... Bit #32 of the instruction counter goes to input A of AND gate #32. The output of AND gate #32 goes to bus line #32. The control signal line that triggers the intersection counter being put on the base is a single wire that feeds to input B of gate #1, #2, .... #32. It's a set of 32 parallel stored bits going to 32 parallel AND gates that go to 32 parallel bus lines.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Programming Education

October 20th, 2014, 10:55 am

QuoteOriginally posted by: outrunNow it's even more confusing: The instruction counter is 5 bits (not 32 like in your example), the memory has 32 addresses, each address stores 1 byte (8 bits). The mapping from 5 bits in the counter to a single one and 31 zero's on the 32 adresses is what I'm talking about. That goes is a tree like fashion?Given a 5-bit instruction counter: there would be only 5 lines of the bus carrying address bits, only 5 lines going to the address pins of the chip, and 5 circuit board traces going to the RAM chip(s). The RAM circuitry would contain the tree that you mentioned that maps the 5 bits of the address into the 32 possible storage locations.If one wanted to have a 6-bit instruction counter, then one would add one more bus line inside the chip and address pin outside the chip. It's the RAM circuitry that implements the 2^N scaling between the N address bits and the 2^N memory locations. Of course, if you are talking about internal register address modes in which specific bits of an instruction encode for a specific register, then the decoding circuitry would be internal to the chip and adjacent to the registers.
Last edited by Traden4Alpha on October 19th, 2014, 10:00 pm, edited 1 time in total.