Intel Acceleration Stack for Intel® Xeon® CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual

ID 683193
Date 11/04/2019
Public
Document Table of Contents

1.2. Introduction

CCI-P is a host interface bus for an Accelerator Functional Unit (AFU) with separate header and data wires. It is intended for connecting an AFU to an FPGA Interface Unit (FIU) within an FPGA. This document defines the CCI-P protocol and signaling interface. It includes definitions for request types, header formats, timing diagrams and memory model.

In addition to the CCI-P signaling and protocol, this document also describes:
  1. Mandatory AFU registers required to design a CCI-P compliant AFU.
  2. Device Feature Lists (DFLs)—A standard for register organization that promotes modular design and easy enumeration of AFU features from the software.
  3. Intel® FPGA Basic Building Blocks (BBBs)—An architecture of defining reusable FPGA libraries that may consists of hardware and software modules.

The CCI-P offers an abstraction layer that can be implemented on top of a variety of platform interfaces like PCIe and UPI, thereby enabling interoperability of CCI-P compliant AFU across platforms.

The table below summarizes the features unique to the CCI-P interface for the AFU.

Table 5.   CCI-P Features
Feature Description
MMIO Request—CPU read/write to AFU I/O Memory
  • MMIO Read payload—4B, 8B
  • MMIO Write Payload—4B, 8B, 64B
    • MMIO writes could be combined by the x86 write combining buffer.
    • 64B MMIO writes require a CPU with capability of generation 64B Writes.
    • CPU for Integrated FPGA Platform can use AVX512 to generate 64B MMIO Write.
Memory Request
AFU read or write to memory.
  • Addressing Mode—Physical Addressing Mode
  • Addressing Width (CL aligned addresses)—42 bits (CL address)
  • Data Lengths—64 bytes (1 CL), 128 bytes (2 CLs), 256 bytes (4 CLs)
  • Byte Addressing—Not supported
FPGA Caching Hint (Integrated FPGA Platform only) The AFU can ask the FIU to cache the CL in a specific state. For requests directed to VL0, FIU attempts to cache the data in the requested state, given as a hint. Except for WrPush_I, cache hint requests on VH0 and VH1 are ignored.
Note: The caching hint is only a hint and provides no guarantee of final cache state. Ignoring a cache hint impacts performance but does not impact functionality.
  • <request>_I—No intention to cache
  • <request>_S—Desire to cache in shared (S) state
  • <request>_M—Desire to cache in modified (M) state
Virtual Channels (VC)

Physical links are presented to the AFU as virtual channels. The AFU can select the virtual channel for each memory request.

  • VL0—Low latency virtual channel (Mapped to UPI) (only for Integrated FPGA Platform).
  • VH0—High latency virtual channel. (Mapped to PCIe0). This virtual channel is tuned to handle large data transfers.
  • VH1—High latency virtual channel. (Mapped to PCIe1). This virtual channel is tuned to handle large data transfers (only for Integrated FPGA Platform).
  • Virtual Auto (VA)—FIU implements a policy optimized to achieve maximum cumulative bandwidth across all available physical links.
    • Latency—Expect to see high variance
    • Bandwidth—Expect to see high steady state bandwidth
UMsg (Integrated FPGA Platform only) Unordered notification directed from CPU to AFU
  • UMsgs data payload—64B
  • Number of UMsg supported—8 per AFU
Response Ordering Out of order responses
Upstream Requests Yes