999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

面向新一代眾核處理器的高性能SNC的設計與驗證*

2021-09-15 08:35:14徐海文
計算機與數字工程 2021年8期
關鍵詞:計算機設計

徐海文 張 洋

(國防科技大學計算機學院 長沙 410073)

1 Introduction

Since the Stanford researcher proposed"Single-Chip Multi-Processors[1~2]",also known as"Multi-core Processors",in 1990s.Multi-core architecture has gradually become the mainstream of general purpose processors.It develops according to the Moore's law[3]and enters the so-called"multi-core era[4]".The number of cores in the processor is 32 or even hundreds of cores.At present,the mainstream structure of multi-core processors is the heterogeneous fusion structure of"generous-purpose DSP[5]core + application-specific core". The application-specific cores include isomorphic multi-core structures and heterogeneous multi-core structures.The number of cores in the isomorphic many-core structure is above 32 cores,and the inter-core interconnection uses an On-chip interconnect network.

As the number of cores increases,the complexity of the interconnection structure between the cores and the On-chip network will multiply,which will lead to a sharp increase in interconnect area.Thus will increase the long line delay and reduce the clock frequency.At present,super-node[6]structure is mainly used in processor design to solve this problem.Super node is composed of several cores,and data interaction between several cores and on-chip network[7]is realized by a super node controller(SNC).This structure will reduce the area of interconnect and avoid the impact of long-line delay.At the same time,it will reduce the complexity of processor design and verification.In this work,we design a high performance super node controller.The main innovations of my work are as follows.

1)Advanced eXtensible Interface(AXI)[8]compatible design;

2)Reduce on-chip network bandwidth pressure;

3)Reduce bandwidth pressure in core;

4)Friendly to programmer,it can tackle unaligned memory access[9],especially transfer mode,etc.

2 SNC Design

2.1 Overall Structure of SNC Based on AXI Bus

AXI bus protocol is a high-performance,high-bandwidth,low-latency on-chip bus which introduced by ARM.It is widely recognized and used throughout the semiconductor industry.The key features of AXI protocol are:

1)Separate address/control and data phases;

2)Support for unaligned data transfers using byte strobes;

3)Burst-based transactions with only start address issued;

4)Separate read and write data channels to enable low-cost Direct Memory Access(DMA)[10];

5)Ability to issue multiple outstanding addresses,etc.

These features allow the AXI bus to be applied at a higher clock frequency and provide higher data throughput at the same clock frequency.At the same time,the AXI bus has higher data bus bandwidth,and the chip design based on the AXI bus protocol has better flexibility and portability.

Fig.1 Schematic diagram of data interaction between DSP cores and on-chip network

The design of SNC is based on the AXI bus protocol.As shown in Fig 1,the super node consists of 4 DSP cores,a fast synchronization unit[11]and a SNC.The SNC is a relay control center for data interaction between 4 DSP cores and the On-chip network.It enables data storage,data arbitration,data transfer,data distribution,data unicast and broadcast[12]operations,etc.

Fig.2 Four Data Channel in the SNC

As shown in Fig2,there are 4 data transmission channels in the SNC,which are the read address channel,the read data channel,the write channel and the write response channel.The write channel of the SNC is connected to the write address channel and the write data channel of the On-chip network,and the other three channels of the SNC are connected to the other three channels of the On-chip network.

Read Address Channel:As shown in Fig2(a),in the read address channel,4 DSP cores act as the master and are responsible for initiatively sending read requests to the SNC.Then the SNC packages,stores,arbitrates and distributes the data which sent by the 4 DSP cores,and sends the data to the On-chip network which acts as the slave.

Write Channel:As shown in Fig2(c),in the write channel,4 DSP cores act as the master and are responsible for initiatively sending write addresses and write data to the SNC.Then the SNC packages,stores,arbitrates(arbitration also includes reading data transferred by read data channels)and distributes the data which sent by the 4 DSP cores,and sends it to the On-chip network which acts as the slave.

Read Data Channel:As shown in Fig2(d),in the read data channel,the On-chip network acts as the master and the DSP cores act as the slave.If the read request is answered,the read return data will be packaged,stored,transferred,unicast,broadcast and other operations in the SNC,and then sent back to the corresponding DSP cores.

Write Response Channel:As shown in Fig2(b),in the write response channel,the On-chip network acts as the master and is responsible for initiatively sending write response data to the SNC.Then the SNC packages,stores,unicast and distributes data and sends it to the corresponding DSP core which acts as the slave.

2.2 Synchronous FIFO Design

In this article,SNC uses synchronous FIFO[12~13],a first-in and first-out data buffer,for data storage.The composition of synchronous FIFO has two parts which are the address control part and the data storage part.Synchronous FIFO can only write data sequentially and read data sequentially.And its data address is automatically added by internal read/write pointer.It is not possible to read or write a certain data by the address line like a normal memory.The synchronous FIFO has two address pointers,one for writing data to the next unread memory location and one for reading the next unread memory location.

Fig.3 Synchronous FIFO Read and Write Process

The read and write process of the synchronous FIFO with depth 4 is shown in Fig3.When stack is empty,the read data pointer and write data pointer point to the first memory location.When writing a data,the write data pointer points to next memory location.After three write data operations,the write data pointer points to the last memory location.After four consecutive write operations,the write points will return to the first location and display the stack status is full.The data read operation is similar to the write operation.When reading a data,the read pointer will move to the next memory location until all data is read.At this point,the read pointer returns to the first location,and the stack status is empty.

2.3 Data Arbitrator Design

In this paper,SNC uses round-robin arbitration[14~15]for data arbitration,which is divided into two phases:the loop of priority signals and the loop of output.Using 2-bit PRI signal to control the priority of 4 channels in the 4-to-1 data arbitration.There are four channels,such as channel0,channel1,channel2 and channel3.The cyclic process of the PRI signal is shown in Fig4.When PRI=00,the priority of channel0 is the highest.And if the data of channel0 is valid,the data of channel0 is output first.Then the PRI signal is incremented to 01.After four consecutive operations,the PRI signal changes back to 00 and the cycle continues.When the priority is certain,such as PRI=00,the output loop is shown in Fig5.At this time,the output of the channel0 has the highest priority,and if the data of channel0 is valid,output this data.If it is invalid,continue to judge whether the data in channel1 is valid,and so on.

Fig.4 PRI Signal

Fig.5 Output process at PRI=00

2.4 The Unicast and Broadcast Design

The unicast operation of the read data channel is to send the read return data to the corresponding DSP core according to the ID signal which returned by the On-chip network.The broadcast operation of the read data channel is to send the read return data to the corresponding DSP cores according to the 4-bit VECTOR signal which returned by the On-chip network.The priority of the VECTOR signal is higher than that of the ID signal.When the two signals are valid at the same time,the ID signal is ignored.When the VECTOR signal is invalid,ID signal is used as the judgment basis.The correspondence between the ID signal and the DSP core is shown in Table 1.

Table 1 Correspondence of ID,PRI,DSP Cores

In the SNC design,in order to simplify the complexity of the design of the read data channel,the data unicast and the data broadcast are combined into one data transmission mode.It is divided into two parts,which are the conversion phase and the transmission phase.The conversion phase is to convert the ID signal to the VECTOR signal during data unicast.The correspondence is shown in the Table1.The transmission phase is to broadcast data.

Table 2 SNC Synthesis Performance Indicators

The specific process of broadcasting is shown in Fig6.When the data is broadcast,the broadcast data is copied to data0,data1,data2,data3.Then determine the number of DSP cores that need to be broadcast according to the VECTOR signal.For the DSP core that is not being broadcast,the corresponding Count signal is set high.For the DSP core that needs to be broadcast,it is necessary to judge whether the DSP core has received the data.If the DSP core has received data,the corresponding Count signal is set high.If the DSP core has not received data,the loop continues.When Count0,Count1,Count2,Count3 are set high,the broadcast is completed,and then the next data is broadcast or unicast.

Fig.6 Transmission Phase

2.5 Data Transfer Unit Design

As shown in the Fig7(a),in the previous SNC design,when the DMA moves data from outside the DSP core to outside the DSP core,the SNC is only a data transmission channel,and the moving data process of the DMA is divided into three phases,which are the DMA sends a request to the storage 1,the DMA receives the read return data,and the DMA transfers the read return data to storage 2.

Fig.7 DMA Data transfer Processfrom outside of core to outside of core

As shown in the Fig7(b),in order to alleviate the bandwidth pressure of the DSP core when DMA moves data,we have added a Transfer mode to the SNC design.In this way,the data returned from the storage will not be sent to the DSP core.The transfer mode will reduce the data bandwidth pressure of the DSP core and reduce the data transmission path when the DMA moving data from outside the DSP core to outside the DSP core.

There may be cases when the read return address does not meet the address alignment requirement of the storage apace during the transfer process.Therefore,some data operations need to be performed on the read return data,such as data alignment and mask processing.The specific process of transfer is as follow:First,we need to know whether the return address meets the address alignment requirement of the storage.Output read return data if it matches.Process the read return data according to the address alignment requirements if it not matches.After data processing,the read return data is converted into two data,such as data0 and data1.Then judge whether the two data is valid by the mask signal in the data.Output the data if it is valid,and discard the data if it is invalid.

3 Conclusion

3.1 Verification

This article uses Cadence's NC-verilog for register transfer level verification of SNC.The SNC's verification platform[16]is shown in Fig8.It consists of 6 modules such as DUT,Gold Model,Interface,Top and Compare.The SNC's functional verification is mainly to compare the outputs of the DUT and the Gold Model by continuously sending incentives to the DUT and the Gold Model to verify that the function of the SNC is correct.After successful functional verification,we use Design Compiler of Synopsys to perform logical synthesis on SNC.The synthesis results are shown in table 2,SNC's critical path delay is 0.36ns.The area of the SNC is 39384.46um2.The total power of the SNC is 249.19mW.The verification and synthesis results show that the SNC has the correct function,and the main frequency can reach 2.0GHz,which meets the design requirements.

Fig.8 Functional Verification Platform

3.2 Conclusion

In summary,we design a SNC for high performance,high bandwidth,low latency multi-core processors design.Compared with the previous super node controller,this SNC not only implements the functions of the super node controller,but also adds some innovations.The innovations are as follows:

The SNC is compatible with AXI bus protocol,so it has greater flexibility and portability.

We have added a broadcast unit to the design of the SNC,so it can reduce on-chip network bandwidth pressure.

We have added a transfer unit to the design of the SNC,so it can reduce bandwidth pressure in core.

Friendly to programmer,it can tackle unaligned memory access,especially transfer mode,etc.

猜你喜歡
計算機設計
計算機操作系統
何為設計的守護之道?
現代裝飾(2020年7期)2020-07-27 01:27:42
穿裙子的“計算機”
趣味(數學)(2020年9期)2020-06-09 05:35:08
《豐收的喜悅展示設計》
流行色(2020年1期)2020-04-28 11:16:38
基于計算機自然語言處理的機器翻譯技術應用與簡介
科技傳播(2019年22期)2020-01-14 03:06:34
計算機多媒體技術應用初探
科技傳播(2019年22期)2020-01-14 03:06:30
瞞天過?!律O計萌到家
藝術啟蒙(2018年7期)2018-08-23 09:14:18
信息系統審計中計算機審計的應用
消費導刊(2017年20期)2018-01-03 06:26:40
設計秀
海峽姐妹(2017年7期)2017-07-31 19:08:17
有種設計叫而專
Coco薇(2017年5期)2017-06-05 08:53:16
主站蜘蛛池模板: 9cao视频精品| 国产精品成人观看视频国产| 在线观看免费黄色网址| 国产96在线 | 就去色综合| 秋霞午夜国产精品成人片| 国产福利小视频在线播放观看| 久久久精品国产亚洲AV日韩 | 专干老肥熟女视频网站| 美女免费精品高清毛片在线视| 色噜噜在线观看| 亚洲无码A视频在线| av大片在线无码免费| 九九视频免费在线观看| 狠狠色狠狠色综合久久第一次| 国产日韩精品一区在线不卡 | 国产精品99久久久| 国产成人AV大片大片在线播放 | 欧美日韩在线成人| 久久免费看片| 成年A级毛片| 综合成人国产| 国产午夜一级毛片| av手机版在线播放| 啪啪啪亚洲无码| av在线5g无码天天| 粉嫩国产白浆在线观看| 亚洲无码熟妇人妻AV在线| 欧美亚洲激情| 一本大道东京热无码av| 婷婷六月激情综合一区| 5555国产在线观看| 免费看一级毛片波多结衣| 欧美国产综合视频| 亚洲精品第一页不卡| 国产精品视频第一专区| 日本黄色不卡视频| 天天激情综合| 国产精品流白浆在线观看| 国产一区二区三区在线精品专区| 亚洲第一色视频| 在线观看无码av免费不卡网站 | 国产综合日韩另类一区二区| yy6080理论大片一级久久| 有专无码视频| 国产波多野结衣中文在线播放| 熟女日韩精品2区| 91精选国产大片| 看国产毛片| 丁香六月激情婷婷| 97超爽成人免费视频在线播放| 日韩AV无码一区| 国产黑丝视频在线观看| 精品视频第一页| 精品少妇三级亚洲| 免费在线看黄网址| 欧美日韩一区二区三区四区在线观看| 欧美乱妇高清无乱码免费| 日本欧美视频在线观看| 国产精品成人久久| 91成人精品视频| 一本色道久久88亚洲综合| 97免费在线观看视频| 污视频日本| 喷潮白浆直流在线播放| 欧美a在线看| 美女无遮挡被啪啪到高潮免费| 亚洲精品欧美重口| 日韩免费毛片| 成人午夜免费观看| 国产乱人伦AV在线A| 国产人成乱码视频免费观看| 亚洲一区二区约美女探花| 网友自拍视频精品区| 手机在线免费毛片| 人妻中文久热无码丝袜| 色视频国产| 高清视频一区| 国产成人AV男人的天堂| 91亚洲精选| 2021国产在线视频| 欧美一区二区福利视频|