999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Linear (optimal) complexity direct full-wave solution of full-package problems involving over 10 million unknowns on a single CPU core

2018-10-10 03:27:32JIAODan
安徽大學學報(自然科學版) 2018年5期

JIAO Dan

(School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907, USA)

Abstract:In this paper, we demonstrated a fast direct finite-element solver of linear (optimal) complexity for analyzing large-scale system-level interconnects and their related signal and power integrity problems. This direct solver had successfully analyzed an industry product-level package and correlated with measurements in time domain. The finite-element matrix of over 15.8 million unknowns resulting from the analysis of the industry package, including both signal lines and power delivery structures, was directly solved in less than 1.6 h on a single core running at 3 GHz. Comparisons with the state-of-the-art finite element methods that employed the most advanced direct sparse solvers, and a widely used commercial iterative finite element solver, had demonstrated the clear advantages of the proposed linear-complexity direct solver in computational efficiency.

Keywords: finite-element method; signal and power integrity; time-domain method

0 Introduction

With the increase in the processing power of the CPU, the memory and system interconnect links connected to a CPU need to have an exponentially increased bandwidth in order to fully utilize the computing power. This leads to higher speed signals on each data line as well as an increase in the number of data lines. Enabling higher bandwidth brings significant challenges to the analysis and design of interconnects. To address these challenges, a full-wave modeling technology is required that can rapidly characterize the interaction between a large number of I/Os in the face of large problem sizes.

Existing fast full-wave solvers for solving large-scale problems are, in general, iterative solvers since traditional direct solvers are computationally expensive. The optimal complexity of an iterative solver isO(NrhsNitN), whereNrhsis the number of right hand sides,Nitis the number of iterations, andNis the matrix size. To analyze the interaction among a large number of circuit ports and to perform many what-if analyses for an optimal design, the number of right hand sides is proportional to the port count and the number of what-if analyses. When the number of right hand sides is large, iterative solvers become inefficient. In contrast, a direct solver has a potential of achievingO(N) complexity, which is optimal for solvingNunknowns.

Among existing full-wave solvers, the finite element method (FEM) is a popular method for analyzing circuits because of its great capability in handling both complicated materials and geometries. A traditional direct finite element solver is computationally expensive. It is shown in Ref.[1] that the optimal operation count of a direct FEM solution in exact arithmetic isO(N1.5) for 2-D problems, andO(N2) for 3-D problems. Although there have been successes in speeding up the direct finite element solution with state-of-the-art sparse matrix solvers[2-4], these solvers have not accomplishedO(N) complexity, i.e., optimal complexity, for FEM-based direct solutions of general 3-D circuit problems.

In recent years, we have successfully achieved the first direct finite-element solver of linear complexity[5-6]. This solver is capable of analyzing arbitrarily shaped 3-D circuits in inhomogeneous materials. An IBM product-level package problem[7]having over 15.8 million unknowns is directly factorized and solved for 40 right hand sides in less than 1.6 h on a single 3 GHz CPU core. The signal integrity of the package interconnects in presence of the surrounding power delivery network is analyzed. Excellent correlation with the measured data is observed. In addition, we have compared the proposed direct solver with a suite of state-of-the-art high-performance direct sparse solvers such as MUMPS 4.10.0[2], and Pardiso in Intel MKL[4]. It is shown that the proposed direct solver greatly outperforms state-of-the-art sparse solvers in both CPU time and memory consumption with good accuracy achieved.

1 Vector finite element method and mathematical background

1.1 Vector finite element method

Considering a general physical layout of a package or integrated circuit involving inhomogeneous materials and arbitrarily shaped lossy conductors, the electric fieldEsatisfies the following second-order vector wave

(1)

whereμris relative permeability,εris relative permittivity,σis conductivity,k0is free-space wave number,Z0is free-space wave impedance, andJis current density. A finite element based solution of Eq. (1) subject to pertinent boundary conditions results in the following linear system of equations

YX=B,

(2)

whereY∈N×Nis a sparse matrix, and matrixBis composed of one or multiple right hand side vectors. When the size of Eq. (2) is large, its efficient solution relies on fast and large-scale matrix solutions.

1.2 Mathematical Background

In state-of-the-art direct sparse solvers, multifrontal method[2]is a powerful algorithm. In this algorithm, the overall factorization of a sparse matrix is organized into a sequence of partial factorizations of smaller dense frontal matrices. Various ordering techniques have been adopted to reduce the number of fill-ins introduced during the direct matrix solution process. The computational cost of a multifrontal based solver depends on the number of nonzero elements in theLandUactors of the sparse matrix. In general, the complexity of a multifrontal solver is higher than linear (optimal) complexity.

Recently, it is proved in Ref.[3] that the sparse matrix resulting from a finite-element based analysis of electromagnetic problems can be represented by anH-matrix[8]without any approximation, and the inverse as well asLandUof this sparse matrix has a data-sparseH-matrix approximation with a controlled error. In anH-matrix[8], the entire matrix is partitioned into multilevel admissible blocks and inadmissible blocks. An inadmissible block keeps its original full matrix form, while an admissible block is represented by an error-controlled low-rank matrix. It is shown in Ref. [3] that anH-matrix based direct FEM solver has a complexity ofO(NlogN) in storage and a complexity ofO(Nlog2N) in CPU time for solving general 3-D circuit problems.

2 Proposed linear complexity direct fem solver for large-scale interconnect extraction and related signal and power integrity analysis

2.1 Proposed direct solver

In the proposed solver, we fully take advantage of the zeros in the original FEM matrix, and also maximize the zeros inLandUby nested dissection ordering[1]. We store the nonzero blocks inLandUwith a compact error-controlledH-matrix representation, compute these nonzero blocks efficiently by developing fastH-matrix based algorithms, while removing all the zeros inLandUfrom storage and computation. Moreover, we organize the factorization of the original 3-D finite element matrix into a sequence of factorizations of 2-D dense matrices, and thereby control the rank to follow a 2-D based growth rate, which is much slower than a 3-D based growth rate[9]for analyzing circuits operating at high frequencies. The overall algorithm has six major steps:

(1) Build cluster treeTIbased on nested dissection;

(2) Build elimination treeEIfromTI;

(3) Obtain the boundary for each node inEI;

(4) Generate theH-matrix structure for each node;

(5) Perform numerical factorization guided byEIby new fastH-matrix-based algorithms;

(6) Solve for one or multiple right hand sides based onEI.

To build cluster treeTI, we recursively partition a 3-D computational domain into separators and subdomains. Since a separatorScompletely separates two subdomainsD1andD2, the off-diagonal blocks in the FEM matrix corresponding to the interaction betweenD1andD2, denoted byYD1D2andYD2D1, are zero. More important, the same zero blocks can be preserved in theLandUfactors.

The LU factorization of the FEM matrix is a bottom-up traversal of the elimination tree as that used in the multifrontal algorithm. Note that the union of all the nodes inEIis equal to I, which is different from cluster treeTI. For each nodesin the elimination tree, we assemble a frontal matrixFsfrom the system matrixYand all the updating matricesUcwithc∈Esbeings’s children nodes. TheFscan be written as a 2×2 block matrix

(3)

in whichΦsdenotes the boundary ofs. We then apply partial LU factorization toFs, obtaining

(4)

Comparing Eq. (3) and (4), we can readily obtain the updating matrixUs

Us=FΦs,Φs-LΦs,sUs,Φs,

(5)

which is then used for the LU factorization of the frontal matrices ofs’s ancestors inEI. The entire LU factorization of matrixYis a bottom-up or post-order traversal ofEI. ForΦs, we have an important lemma

(6)

Therefore, minimizing#{Φs}, the size ofΦs, is critical in avoiding unnecessary operations on zeros. We thus first perform symbolic factorization to pre-process the elimination tree to compute the minimumΦsfor each node inEIbefore the real factorization is carried out.

2.2 Complexity and accuracy analysis

(7)

(8)

which is linear. The accuracy of such anH-matrix representation can be proved from the fact that the original FEM matrix has an exactH-representation and its inverse has an error-boundedH-representation[3].

3 Performance demonstration

3.1 Intel package interconnect with measured data

A 3-D package interconnect provided by Intel Corporation is simulated from 50 MHz to 40 GHz. Fig.1A illustrates the cross sectional view, top view, and 3-D view of the package interconnect. In Fig.1B and Fig.1C, we plot theS-parameters extracted by the proposed solver in comparison with the measure data and the results generated from a commercial FEM-based tool. Excellent accuracy of the proposed solver can be observed in the entire frequency band.

Fig.1 Simulation of a realistic Intel package interconnect example and correlation with measurements

3.2 IBM full-package problem with measurements

An IBM product-level package[7]is simulated from 100 MHz to 50 GHz to examine the capability of the proposed direct finite element solver. The package involves 92 thousand unique elements. There are eight metal layers and seven dielectric layers. We develop software to interpret the board file of the full IBM package into the geometrical and material data that can be recognized by the proposed solver, and also mesh the entire layout of the package into triangular prism elements. The layout of selected layers re-produced by our geometrical processing software can be seen from Fig.2.

Fig.2 Layout of a product-level package in different layers and 19 test structures for solver performance verification

3.2.1 Complexity and performance verification

To examine the accuracy and efficiency of the proposed direct solver, a suite of nineteen substructures of the full package is simulated. These substructures are illustrated in Fig.2C. The smallest structure occupies a package area of 500 μm in width, 500 μm in length, and 4 layers in thickness, whereas the largest structure occupies a package area of 9 500 μm by 9 500 μm. The resultant number of unknowns ranges from 31 276 to 15 850 600. Two ports at the topmost layer are excited. The important simulation parameters used in the proposed direct solver are leafsize=8 and truncation error ∈=10-6. The computer used has a single core running at 3 GHz with 64 GB memory.

The CPU time and memory cost of the proposed solver with respect to unknown numberNare shown in Fig.3 in comparison with those of the direct finite element solver that employs the most advanced direct sparse solvers provided by SuperLU 4.3, UMFPACK 5.6.2, MUMPS 4.10.0[2], and Pardiso in Intel MKL[4]. It is evident that the proposed direct finite element solver greatly outperforms the other state-of-the-art direct solvers in both CPU time and memory consumption. More important, the proposed direct solver demonstrates a clear linear complexity in both time and memory across the entire unknown range, whereas the complexity of the other direct solvers is much higher. With the optimal linear complexity achieved, the proposed direct solver is able to solve the extremely large 15.8 million unknown case using less than 1.6 h on a single core. The solution error of the proposed direct solver, measured in relative residual, is plotted in Fig.3C for all of the testing cases. Excellent accuracy is observed across the entire unknown range. Note the last point in Fig.3B is due to the fact that the computer used has only 64 GB memory.

Fig.3 Complexity and performance verification of the proposed direct solver

3.2.2 Signal and power integrity analysis and correlation with measurements

With the accuracy and efficiency of the proposed direct solver validated, next, we analyze the signal and power integrity of the IBM product-level package and correlate our analysis with measured data. First, the frequency-domain S-parameters from 100 MHz to 30 GHz were generated for 16 ports assigned to 2 interconnects located in the full package. The near-end ports are placed on the topmost layer (chip side), while the far-end ports are at the bottom-most layer (BGA side). The entire stack of 8 metal layers and 7 inter-layer dielectrics are simulated. The resultant number of unknowns is 3 149 880.

The proposed direct solver only takes less than 3.3 h and 29 GB peak memory at each frequency to extract the 16 by 16S-parameter matrix, i.e. solutions for 16 right hand sides. The crosstalk between the near end of line 6 (port 9) and the far end of line 2 (port 8) with all the other ports left open is plotted in Fig.4 from 100 MHz to 30 GHz. The measurement of the structure is performed in time domain[7]. As shown in Fig.4C, very good agreement is observed between the time-domain voltage obtained from the proposed solver and the measured data.

Fig.4 Time domain correlation with measurements

4 Conclusion

In this paper, we develop a linear-complexity direct finite element solver to analyze large-scale interconnects and related system-level signal and power integrity problems. Comparisons with measurements and state-of-the-art sparse solvers have demonstrated the superior performance of the proposed direct solver in accuracy, efficiency and capacity.


登錄APP查看全文

主站蜘蛛池模板: 亚洲区第一页| 国产精品内射视频| 久久亚洲中文字幕精品一区| 亚洲男人的天堂在线| 欧美区一区二区三| 天天综合网亚洲网站| 午夜在线不卡| 亚洲精品在线91| 91日本在线观看亚洲精品| 无套av在线| 5555国产在线观看| 综合色区亚洲熟妇在线| 91视频国产高清| 亚洲欧洲一区二区三区| 在线综合亚洲欧美网站| 亚洲成人一区二区三区| 狠狠干综合| 特级毛片免费视频| 狠狠色成人综合首页| 欧美在线视频a| 亚洲欧美自拍中文| 亚洲精品无码高潮喷水A| 亚洲伊人天堂| 秋霞国产在线| 亚洲系列中文字幕一区二区| 国产精品亚洲天堂| 狠狠综合久久久久综| 人妻丰满熟妇AV无码区| 国产丝袜一区二区三区视频免下载 | 午夜丁香婷婷| 国产精品免费入口视频| 亚洲伦理一区二区| 成人国产精品视频频| 亚洲有无码中文网| 国产18在线播放| 一级毛片在线播放| 成人免费网站在线观看| 久久免费看片| 永久免费av网站可以直接看的| 久久中文字幕2021精品| a免费毛片在线播放| 成人一级黄色毛片| 亚洲久悠悠色悠在线播放| 亚洲高清无在码在线无弹窗| 亚洲一区色| 97久久人人超碰国产精品| 色有码无码视频| 无码aaa视频| 亚洲精品人成网线在线| 亚洲一级无毛片无码在线免费视频| 国产毛片久久国产| 久久一日本道色综合久久| 亚洲成人福利网站| 毛片免费在线视频| 伊人91视频| 国产精品极品美女自在线网站| 久热这里只有精品6| 亚洲第一成年网| 色欲色欲久久综合网| 91精品国产福利| 日本久久网站| 欧美一区精品| 亚洲欧美在线综合图区| 国内精品免费| 亚洲人成日本在线观看| 亚洲国模精品一区| 欧美午夜在线视频| 国产精品午夜电影| 99热最新网址| 久青草网站| 精品人妻一区无码视频| 视频国产精品丝袜第一页| 国产青榴视频在线观看网站| 成人a免费α片在线视频网站| 国产第一福利影院| 天堂在线www网亚洲| 无码一区二区三区视频在线播放| 蜜臀AVWWW国产天堂| 亚洲成人网在线观看| 久久久久久久97| 青青青伊人色综合久久| 亚洲av片在线免费观看|