02/15/02 common_fpga_design_notes.txt XTRP / FPGA (....._X) 1. State Machine Intro 2. Primary/Secondary State Machine Handshake 2.1 Initial Request 2.2 Acknowledge Response 3. Secondary State Machine 3.1 Root - State OPA 3.2 Permit - State OPB 3.3 Read Path - States OPG, OPH 3.3.1 L2 Readout 3.3.2 L2PASS Implementation 3.4 Write Path - States OPC, OPD, OPE, OPF 4. Address Decode 5. Implementation of Logic Constructs 5.1 Register 5.2 Low-active write strobe 5.3 High-active write strobe 5.4 Low-active write pulse 5.5 Low-active read pulse 6. Level 1 Pipeline & Level 2 Buffer 6.1 Initiation 6.2 Write Address 6.3 Delay Counter 6.4 Read Address 6.5 Delay-count Groups 6.6 Address Mux 6.7 BIN register 6.8 Level 2 Buffer 1. State Machine Intro WVMEIF_X implements the Primary State Machine, which handles all the interaction between the VME Bus and the XTRP Data Board. A set of secondary state machines handle the communication between the primary state machine and primitive logic operations on the board. Since only the WVMEIF_X FPGA boots up with a serial PROM, which contains the primary state machine, it is referred to as the primary FPGA. All other FPGAs: WOPCOD_X, WPIPE_X, and WTRACK_X are referred to as secondary FPGAs. All FPGAs, including WVMEIF_X, implement a secondary state machine to handle requests of data from the primary state machine in WVMEIF_X. /-->[ WPIPE_X ] VME Bus <----->[ WVMEIF_X ]<----------->[ WOPCOD_X ] \-->[ WTRACK_X ] 2. Primary/Secondary State Machine Handshake The following diagram illustrates the time-sequence of handshake events: A. (WVMEIF_X) VREQ_L = 0, xxx_SPC_L = 0 {Wxxx_X executes logic} B. (Wxxx_X) VACK_L = 0 C. (WVMEIF_X) VREQ_L = 1 D. (Wxxx_X) VACK_L = 1 2.1 Initial Request The handshaking protocol between the primary and a secondary state machine starts with a request for data from the primary state machine, VREQ_L = 0. The assertion of VREQ_L is concurrent with the following: a read/write signal, address-space status signals, and an internal data board address bus. These signals are described below: Read/Write Signal: Signal Value Description ------ ----- ----------- VWR_L = 0 Write operation = 1 Read operation Address-space Status Signals: Signal Value Description FPGA ------ ----- ----------- ---- GLOBAL_SPC_L = 0 Access in Global Space WOPCOD_X = 1 Access not in Global Space RAM_SPC_L = 0 Access in RAM Space WOPCOD_X = 1 Access not in RAM Space PIPE_SPC_L(0)= 0 Access in PIPE FPGA #0 Space WPIPE_X (0) = 1 Access not in PIPE FPGA #0 Space PIPE_SPC_L(1)= 0 Access in PIPE FPGA #1 Space WPIPE_X (1) = 1 Access not in PIPE FPGA #1 Space PIPE_SPC_L(2)= 0 Access in PIPE FPGA #2 Space WPIPE_X (2) = 1 Access not in PIPE FPGA #2 Space PIPE_SPC_L(3)= 0 Access in PIPE FPGA #3 Space WPIPE_X (3) = 1 Access not in PIPE FPGA #3 Space PIPE_SPC_L(4)= 0 Access in PIPE FPGA #4 Space WPIPE_X (4) = 1 Access not in PIPE FPGA #4 Space PIPE_SPC_L(5)= 0 Access in PIPE FPGA #5 Space WPIPE_X (5) = 1 Access not in PIPE FPGA #5 Space PIPE_SPC_L(6)= 0 Access in PIPE FPGA #6 Space WPIPE_X (6) = 1 Access not in PIPE FPGA #6 Space PIPE_SPC_L(7)= 0 Access in PIPE FPGA #7 Space WPIPE_X (7) = 1 Access not in PIPE FPGA #7 Space TRACK_SPC_L = 0 Access in Track Space WTRACK_X = 1 Access not in Track Space Each address-space region defined in WVMEIF_X is unique and no regions have overlapping addresses. Only one Address-space Status Signal ought to be asserted for each VME transaction. Each signal is dedicated to a sole FPGA algorithm. Thus each secondary state machine needs to monitor one or two (WOPCOD_X) Address-space Status Signals. Each secondary FPGA must derive a signal ACCESS_L from the logical AND of all address-space status signals to which it must respond. Only WOPCOD_X would have a non-trivial solution for its ACCESS_L (RAMSPC_L * GLOBALSPC_L.) Internal Address Bus: VADDR(x:2) is the address bus output by WVMEIF_X. It is distributed to the secondary FPGAs and each algorithm utilizes a varying quantity of bits. See WVMEIF_X design notes for the computation of this bus. 2.2 Acknowledge Response Once a secondary state machine has run its course of implementing the primitive logic operations for a given VME transaction, it responds with an assertion of its VACK_L signal, VACK_L = 0. This signal will remain asserted until the primary state machine releases VREQ_L, VREQ_L = 1. Upon sensing this release, the secondary state machine may restore itself into its root state, which releases VACK_L to logic 1. 3. Secondary State Machine All secondary state machines are a set of states with next-state equations. Each state is implemented by a unique flip-flop with the value of the flip-flop controlled by a next-state equation. The set of flip-flops are updated by the rising edge of OSCCLK, a 10 MHz clock signal derived from OSC. Only one of the output signals of the flip-flops, STATEOPx, will be logic 1 at any given time, indicating the current state of the state machine. State Init OPx' = 1 if... (next-state equations) ----- ---- ---------------------------------------------------- OPA 1 ((OPF + OPH) * VREQ_L) + (OPA * (VREQ_L + ACCESS_L)) OPB 0 OPA * (!VREQ_L * !ACCESS_L) OPC 0 OPB * (PERMIT * !VWR_L) OPD 0 OPC OPE 0 OPD OPF 0 OPE + (OPF * !VREQ_L) + (OPB * !PERMIT) OPG 0 OPB * (PERMIT * VWR_L) OPH 0 OPG * (OPH * !VREQ_L) 3.1 Root - State OPA The root state of the state machine is STATEOPA. The root flip-flop is implemented with an FDP symbol, which outputs a logic 1 immediately after the FPGA exits its Configuration Phase. All other flip-flops are implemented with FD symbols, which initially output logic 0. The secondary state machine will remain in state OPA until it receives assertions of both the request line, VREQ_L = 0 and access line, ACCESS_L = 0. VREQ_L & ACCESS_L are described above. State OPA will only exit to state OPB. 3.2 Permit - State OPB State OPB provides the mechanism to decide whether to implement the current VME transaction or to bypass it. A custom decision making process, incorporating such variables as error status, Mode status or other parameters must produce a value for the PERMIT signal. The simplest case is to set PERMIT to VCC or GND. For PERMIT = 0, state OPB exits directly to state OPF and the VME transaction is bypassed. For PERMIT = 1, one of two paths, Read and Write are selected, depending on the value of VWR_L. For VWR_L = 1, the Read path is selected and state OPB exits to state OPG. For VWR_L = 0, the Write path is selected and state OPB exits to state OPC. 3.3 Read Path - States OPG, OPH Most read operations do not require more than one clock cycle to set up all the control signals to read data from the Data Board via the VME Data Bus. State OPG serves as a one clock period buffer between setting up the necessary control signals and responding to the primary state machine that this operation is done. State OPG exits to state OPH, except for L2 Readouts. In state OPH VACK_L is asserted to logic 0. The state machine will wait in state OPH until VREQ_L = 1; whereupon it returns to the root state, OPA. The internal FPGA data bus, VDATAO(), is asserted onto the Data Board Data Bus, VDATA(), while the state machine is in states OPG or OPH. This is the only time the FPGA algorithm may attempt to assert data onto VDATA(). This assertion is carried out by VDATA_E = 1. VDATA_E ---| | VDATAO() ----[>--- VDATA() 3.3.1 L2 Readout Reading data from Level 2 space - whether header data or data stored in Level 2 Buffers will occur while the experiment is running. There is a chance that Level 1 Accepts (L1A) or the Level 2 Token Ring Readout (Token) may collide with a VME Level 2 Readout (L2 Readout). The L1A and Token are not synced to L2 Readout and must supersede it. All 3 operations: L1A, Token and L2 Readout access the address lines of the L2 Buffer RAM. Only one access may be employed at a time. Only the Pipe FPGA needs to be concerned about L1A and Token conflicts, and that is covered in the Pipe FPGA documentation. Pipe, Track and Trig FPGAs must keep L1A and Token accesses safeguarded against L2 Readouts. Each FPGA that employs L2 Readouts can identify such transactions by comparing the VADDR bus to pre-determined L2 addresses. All such FPGAs issue a signal, L2DB_SSPC, during the VME transaction to identify L2 Readouts. Once state OPG is reached, a blocking signal, L2PASS must be asserted to continue to state OPH. For non-L2 Readouts, L2PASS is always asserted; for L2 Readouts, L2PASS must be generated. 3.3.2 L2PASS Implementation A mini-state machine is employed to generate L2PASS. It is clocked on the falling edge of OSCCLK, as opposed to the active rising edge of OSCCLK for the secondary state machine. Up to 3 consecutive checks are made on marker signals that indicate an active L1A or Token access. All 3 checks must fail to detect L1A and Token activity or else it will start over. For example, the Pipe FPGA must have all three signals: LOADHB, TKFLAGL and L1A_H inactive in order to fail to detect any L1A or Token activity. The Level 2 Buffer RAM must have prioritizing logic to enforce the submission of L2 Readout to L1A or Token. The mini-state machine is composed of three states, OPGFLAG1, OPGFLAG2 and OPGFLAG3 that track the progress of consecutive fail-to-detect checks. Why 3? A window of opportunity must be found to allow access to the address lines of the L2 Buffer RAM. The 2nd check is the critical time that data from the L2 Buffer RAM is latched into a repository to await the remainder of the VME transaction. The preceeding check and the succeeding check gaurantee that the L2 Buffer RAM address lines were free for +/- 100 ns from the critical access time. Any L1A or Token access that would conflict with the critical access time would also violate one of the 3 checks. There also is no opportunity for a glitch condition where a L1A or Token access is ending or starting at the critical access time either. By the time state OPGFLAG3 is reached, all 3 checks were made, data from the L2 Buffer was latched into the repository, and L2PASS is asserted. The Secondary State Machine will finish the read operation by entering state OPH and the VME transaction will terminate with data in the repository being read back by the Controller. 3.4 Write Path - States OPC, OPD, OPE, OPF There are some write operations that require a duration of several clock cycles to perform the necessary sequences of level and edge assertions to store data. The string of states: OPC, OPD and OPE provide 3 clock periods to execute these sequences. State OPE exits to state OPF. As in state OPH of the Read Path, VACK_L is asserted to logic 0 and the state machine waits for VREQ_L = 1; whereupon it returns to the root state, OPA. 4. Address Decode An attempt is made to reduce the propogation time to decode discrete operations from the VADDR() bus. Higher-order address bits and the address-space status signal are gated together and latched at the transition between state OPA and state OPB. These latched signals enable 4-to-16 decoders to decode low-order address bits. The outputs of the decoders are atomic operation signals. Each FPGA algorithm uses the atomic operation signals to carry out the VME transaction. 5. Implementation of Logic Constructs PRE - Preset C - Clock CD - Clock Enable D - Data Q - Output 5.1 Register The logic is built around FDPE & FDCE flip-flops: PRE: = GND, initializes signal to logic 1 at power-up. (FDPE) or CLR: = GND, initializes signal to logic 1 at power-up. (FDCE) C: = OSCCLK CE: = OP_xxx * NEXTOPD D: = VDATAI(x); A simple architecture that stores the VDATAI bus into the register if OP_xxx is asserted when entering state OPD. See 'Internal Reads' for reading registers. 5.2 Low-active write strobe The logic is built around an FDP flip-flop: PRE: = GND, initializes signal to logic 1 at power-up. C: = OSCCLK D: = ~(OP_xxx * NEXTOPD); This is designed to short pulse of logic 0 lasting only one clock cycle. Only if the necessary operation signal OP_xxx is asserted upon entering state OPD will the flip-flop output logic 0. Entering the next state OPE will force the flip-flop back to logic 1. 5.3 High-active write strobe The logic is built around an FDC flip-flop: CLR: = GND, initializes signal to logic 0 at power-up. C: = OSCCLK D: = OP_xxx * NEXTOPD This is designed to short pulse of logic 1 lasting only one clock cycle. Only if the necessary operation signal OP_xxx is asserted upon entering state OPD will the flip-flop output logic 0. Entering the next state OPE will force the flip-flop back to logic 1. 5.4 Low-active write pulse The logic is built around an FDPE flip-flop: PRE: = GND, initializes signal to logic 1 at power-up. C: = OSCCLK CE: = (OP_xxx * NEXTOPC) + NEXTOPF D: = ~(OP_xxx * NEXTOPC); This construct enables the clock only twice during a loop through the secondary state machine. The first is upon entering state OPC, logic 0 would be latched into the flip-flop only if the necessary operation signal OP_xxx was asserted. The flip-flop would retain this logic state only until entering state OPF, the second occasion the clock enable is asserted. At this occasion the D line is gauranteed to be logic 1; the flip-flop is reset to logic 1. 5.5 Low-active read pulse The logic is built around an FDPE flip-flop: PRE: = GND, initializes signal to logic 1 at power-up. C: = OSCCLK CE: = (OP_xxx * NEXTOPG) + NEXTOPA D: = ~(OP_xxx * NEXTOPG); This construct enables the clock only twice during a loop through the secondary state machine. The first is upon entering state OPG, logic 0 would be latched into the flip-flop only if the necessary operation signal OP_xxx was asserted. The flip-flop would retain this logic state only until entering state OPA, the second occasion the clock enable is asserted. At this occasion the D line is gauranteed to be logic 1; the flip-flop is reset to logic 1. 6. Level 1 Pipeline & Level 2 Buffer The Level 1 Pipeline (L1P) is based on a dual RAM architecture. Essentially, while data is being written into one RAM, the other RAM is available for readout. The point of conflict that is avoided is the use of the address lines. One cannot simultaneously read from and write to two unique addresses that share the same address bus. The Level 2 Buffer (L2B) is also RAM-based. The depth of L1P is 32 words, implemented in dual 16-deep RAMs with 4-bit addresses. This provides sufficient cover for the delay required for XTRP to hold data in the L1P prior to arrival of a Level 1 Accept. The L2B need only be 4 words: 1 word/L2 Buffer. 6.1 Initiation At least in the WPIPEx Fpgas, the XFT Bunch Zero Marker starts the L1P. Other Fpgas may start the L1P as soon as power and clocks are available. Whatever the case, some iniator signal, B0_XFTL in WPIPEx, for example, is output when conditions to start the L1P are satisfied. (In WPIPEx, the start condition is found when the XFT Bunch Zero Marker, RAWTRK(8) - word 3, is latched while asserted with the rising edge of CLK132_0. 6.2 Write Address The method of writing data into the L1P is to write data into one RAM followed by writing data into the other RAM. The same address for both RAMs can be used across two clock cycles. The write address, WRADDR, is implemented with a 4-bit counter. The counter need only be incremented every other clock cycle. The signal WRADDRCE, write address clock enable, is logically flipped every clock cycle. 6.3 Delay Counter The depth of the L1P is programmable. The implementation of 32 RAM addresses is static, but the interpretation of the RAM is dynamic. On the Data Boards, the VME-programmable DELAY register found in WOPCOD_X sets the depth of the L1P. Prior to initation, a delay counter is loaded with the DELAY value. Once the L1P starts, B0_XFTL=1, the delay counter is decremented until it reaches a terminal count, DELCNTTC. The duration of time between assertion of B0_XFTL and DELCNTTC is effectively the time delay between when data is stored into L1P and when that data is extracted. The programed depth of L1P is also the length of Delay-count Groups. See below. 6.4 Read Address The read address, RDADDR, points to where in the L1P RAM data will be extracted. As with the write address, RDADDR is implemented with a 4-bit counter. It too need only update the RAM address every other clock cycle. The RDADDRCE signal, read address clock enable, is logically flipped every clock cycle. RDADDRCE is held de-asserted until the delay counter has reached its terminal count, DELCNTTC=1 together with assertion of the initiator signal, B0_XFTL=1. B0_XFTL __------...------ ^start of Write Counter ^start of Delay Counter DELCNTL ________...___--- ^Delay Terminus Reached < > Time delay between Write and Read Counters 6.5 Delay-count Groups For odd DELAY values, the alternation of writes to one RAM followed by write to the opposing RAM could continue indefinitely. A problem arises in the case of eve DELAY values. Suppose a DELAY of 2 is used, the write sequence would follow as: 1st write - RAM A, address 0; 2nd write - RAM B, address 0; 3rd write - RAM A, address 1 etc. But coincident with the 3rd write (RAM A, Address 1) data would need to be extracted from RAM A, Address 0. That presents an address conflict for RAM A. The concept of Delay-count Groups gets around the address conflict. The duration of a Delay-count Group is arbitrarily and conveniently the DELAY value used as a given number of clock cycles. A DELAY value of 5 yields a Delay-count Group duration of 5 clock cycles. The first write of a Delay-count Group is alternated between the 2 RAMs, regardless of the last write of the previous Delay-count Group. See examples below... DELAY=5 delay-count group 1|delay-count group 2|... | | (W) (W) (W)|(R) (W) (R) (W) (R)|(W) A0 A1 A2 |A0 A3 A1 A4 A2 |A5 (W) (W) |(W) (R) (W) (R) (W)|(R) B0 B1 |B2 B0 B3 B1 B4 |B2 DELAY=6 delay-count group 1 | delay-count group 2 |... | | (W) (W) (W) |(R) (W) (R) (W) (R) (W)|(W) A0 A1 A2 |A0 A3 A1 A4 A2 A5 |A5 (W) (W) (W)|(W) (R) (W) (R) (W) (R)|(R) B0 B1 B2 |B3 B0 B4 B1 B5 B2 |B3 Key: (W) - write (R) - read A0 - RAM A, Address 0 In both even and odd value DELAYs, the read/write address conflict is avoided. A GRPCNT signal alternates logic value upon completion of each delay-count group. 6.6 Address Mux The L1P RAMs could be VME accessible, so in most applications, the VME Address Bus has same way to be asserted onto the RAM address lines. The read and write addresses and VADDR are routed to 4-to-1 muxes with appropriate selection lines to map any of the sources to either of the RAMs. 6.7 BIN register All data headed for the L1P is latched into a BIN register. The register serves as a staging ground to line up data to a common clock for all its bits. Additionally, a DELAY of 0 would need to bypass the RAM architecture entirely. The BIN register sources the data inputs of the dual RAMs. 6.8 Level 2 Buffer The end of the L1P is referenced by the read address. Which RAM is read from is determined by glue logic dependent on WRADDRCE and GRPCNT. This glue logic provides the selection lines of a 3-to-1 mux sourced by the BIN register, and data outputs from both RAMs. The output bus of the mux, LVL1D, represents the end of the L1P. When a Level 1 Accept arrives LVL1D is latched into the Level 2 Buffer. If no Level 1 Accept arrives, the data present on LVL1D will be ignored. Note that a copy will still be retained in the dual RAMs. It will not be overwritten until the write address targets that RAM address.