999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

RCache: A Read-Intensive Workload-Aware Page Cache for NVM Filesystem

2023-05-08 06:13:24TUYaofengZHUBohongYANGHongzhangHANYinjunSHUJiwu
ZTE Communications 2023年1期

TU Yaofeng ,ZHU Bohong ,YANG Hongzhang ,HAN Yinjun ,SHU Jiwu

(1.State Key Laboratory of Mobile Network and Mobile Multimedia Technology,Shenzhen 518055,China;2.ZTE Corporation,Shenzhen 518057,China;3.Tsinghua University,Beijing 100084,China)

Abstract: Byte-addressable non-volatile memory (NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the perfor‐mance gap between Dynamic Random Access Memory (DRAM) and block devices,is now a liability that heavily hinders the writing perfor‐mance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access (DAX) technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Ad‐ditionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Cen‐ter (DC) Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive work‐loads and introduces little performance loss for write operations.

Keywords: storage system;file system;persistent memory

1 Introduction

In 2019,Intel released the first commercially available non-volatile memory (NVM) device called Intel DC Op‐tane Persistent Memory[1].Compared with Dynamic Ran‐dom Access Memory (DRAM),byte-addressable nonvolatile memory provides comparable performance and similar interfaces (e.g.,Load∕Store) along with data persistence at the same time.Because of a unique combination of features,NVM has a great advantage of performance on storage systems and posts the urgent necessity of reforming the old architecture of storage systems.Refs.[2–11] re-architected the old storage systems to better accommodate NVM and significant perfor‐mance boost that endorsed these design choices.

Among these novel designs,bypassing the page cache in kernel space is a popular choice.The page cache in Linux is used to be an effective mechanism to shorten the performance gap between DRAM and block devices.Since NVM has a close performance to the DRAM,the page cache itself posts severe performance loss to the NVM filesystem,because the page cache introduces extra data copy at every file operation and leads to write amplification on NVM.Therefore,the legacy page cache in the Linux kernel has become a liability for the NVM system.For the above reasons,recent work sim‐ply deployed the DAX[12]technology to bypass the page cache entirely[12–17].With the DAX technology,NVM filesystems ac‐cess the address space of NVM directly,without the necessity of filling the page cache first,which reduces the latency of file‐system operations significantly.

However,although NVM achieves bandwidth and latency at the same order of magnitude as DRAM,DRAM still provides bandwidth several times higher than NVM and fairly lower la‐tency than NVM.Therefore,the DAX approach reduces extra data copy and achieves fast write performance at the cost of cached read,especially for read-intensive workloads[18–20].The page cache provides benefits for reading but has severe performance impacts on writing because of the extra data copy and write amplification.And the DAX approach is efficient for writing due to direct access to NVM but fails to utilize DRAM bandwidth for reading.Therefore,in order to utilize DRAM bandwidth and avoid extra data copy and write amplifications,the page cache should be redesigned to allow both direct ac‐cess and cached read.

In this paper,we propose RCache,a read-intensive workload-aware page cache for the NVM filesystem.RCache aims to provide fast read performance for read-intensive work‐loads and avoid introducing significant performance loss for write operations at the same time.To achieve this,RCache as‐signs DRAM and NVM to hot and cold data separately,and reads data from both sides.Our major contributions are sum‐marized as follows.

? We propose a read-intensive workload-aware page cache design for the NVM filesystem.RCache uses a tired page cache design,including reading hot data from DRAM and ac‐cessing cold data directly from NVM to utilize DRAM band‐width for reading and preserving fast write performance.In ad‐dition,RCache offloads data copy from NVM to DRAM and to a background thread,in order to remove a major setback of caching mechanism from the critical path.

? RCache introduces a hash-based page cache design to manage the page cache in a lock-free manner using atomic in‐structions for better scalability.

? We implement RCache and evaluate it on servers with In‐tel DC Persistent Memory Modules.Experimental results show that RCache effectively utilizes the bandwidth of DRAM with few performance cost to manage the page cache and outper‐forms the state-of-the-art DAX filesystem under readintensive workloads.

2 Background and Motivation

2.1 Non-Volatile Memory

Byte-addressable NVM technologies,including Phasechange Memory (PCM)[22–24],ReRAM,and Memristor[21],have been intensively studied in recent years.These NVMs provide comparable performance and a similar interface as the DRAM,while persisting data after power is off like block devices.Therefore,NVMs are promising candidates for pro‐viding persistent storage ability at the main memory level.Recently,Intel has released Optane DC Persistent Memory Modules (DCPMM)[1],which is the first commercially avail‐able persistent memory product.Currently,new products come in three capacities: 128 GB,256 GB,and 512 GB.Pre‐vious studies show that a single DCPMM provides band‐widths at 6.6 GB∕s and 2.3 GB∕s at most for read∕write.Note that these bandwidth have the same order of magnitudes com‐parable to the DRAM but is a lot lower than the DRAM[25].

2.2 Page Cache and DAX Filesystem

Page cache is an important component in a Linux kernel filesystem.In brief,the page cache consists of a bunch of pages in DRAM and the corresponding metadata structures.The page cache is only accessed by the operating system in the context of a filesystem call and acts as a transparent layer to user applications.For a write system call,the operating sys‐tem writes data on pages in the page cache,which cannot guar‐antee the persistence of the data.To guarantee the persistence of the data,the operating system needs to flush all data pages in the page cache to the storage devices,probably within an fsync system call.For a read system call,the operating system first reads data from the page cache;if not present,the operat‐ing system further reads data from the storage devices.Note that this may involve loading data into the page cache depend‐ing on the implementation.In the current implementation,the operating system maintains an individual radix tree for each opened file.

As for the DAX filesystem,note that the page cache is ex‐tremely useful for block devices with much higher access la‐tency than DRAM,but not suitable for the NVM devices with comparable access latency to DRAM.As mentioned before,to ensure data persistence,the user must issue an fsync system call after a write system call.This brings substantial access la‐tency to persisting data in an NVM filesystem.Therefore,the state-of-the-art NVM filesystems leverage the DAX technology to bypass the page cache entirely and achieve instant persis‐tence immediately when the write system call returns.In a DAX filesystem,read∕write system call does not access the page cache at all,instead,data are loaded∕stored from∕to the NVM respectively using a memory interface.The DAX tech‐nology reduces extra data copy and accomplishes lower-cost data persistence.

2.3 Issue of DAX and Page Cache

The performance of NVM is close to that of DRAM but not equal to it.We measure the read and write latency of two differ‐ent filesystems (NOVA[17]and EXT4[26]) representing two differ‐ent mechanisms (DAX and Page Cache).Fig.1(a) shows that the read latency of the DAX is much higher than the page cache (4 kB sequential read).Fig.1(b) shows that the write la‐tency of the DAX is much lower than the page cache (4 kB se‐quential write).

To sum up,the DAX technology prevents the read opera‐tions from benefiting a much higher bandwidth of DRAM in the NVM filesystem,and the presence of the page cache sig‐nificantly increases the latency of write operations with imme‐diate data persistence.To overcome this,the page cache mechanism needs to be redesigned.

3 Rcache Design

3.1 Overview

We build RCache for servers with non-volatile memory to accelerate read-intensive workloads.In order to benefit from the DRAM bandwidth for read operations but not to induce no‐table latency for data persistence,we build RCache,a readintensive workload-aware page cache for the NVM filesystem.

1) RCache assigns DRAM and NVM to hot and cold data separately,and allows cached read and direct read from NVM to coexist.Furthermore,RCache offloads data copy to a back‐ground thread to alleviate the pressure of the critical path.

2) In addition,RCache deploys a lock-free page cache us‐ing hash-table to further reduce the performance cost of cache coherence management.

The architecture of RCache is described in Fig.2.RCache keeps an individual cache structure for each opened file.The page cache consists of a bunch of DRAM pages and a cache entry table containing a certain number of cache entries in the DRAM.A cache entry represents a DRAM page.It carries necessary information for RCache to manage the cache and navigate data given a logical block number.As shown in Fig.3,a cache entry carries a validation flag to indicate the status of this cache entry,a timestamp for the least recently used (LRU) algorithm,a Blocknr to indicate the logical block num‐ber that the entry represents,a DRAM page that is a pointer points to the actual cache page in DRAM,and an NVM page that is a pointer points to the actual data page in NVM.

▲Figure 1.Performance comparison between different hardware and different filesystem settings

▲Figure 2.RCache architecture

3.2 Tiered Page Cache Design

As shown in Fig.2,the page cache is accessed in two con‐texts: a read∕write system call and a background thread.

For a read operation,the operating system accesses the page cache first.If the data required by the user are present and valid in the page cache,the operating system copies data directly from the cached page in the DRAM to the user’s buf‐fer;if a cache miss happens,the operating system falls back to the legacy procedure where the operating system reads data di‐rectly from the NVM and inserts the newly read data to the page cache.For cache insertion,since reading all the data blocks into the page cache introduces extra data copy and then leads to higher latency,RCache only inserts a small cache entry carrying a pointer to the physical block to the page cache instead of the actual data blocks.

For a write operation,the operat‐ing system needs to invalidate all cached pages affected by this write operation before returned to users.We further explain why the invali‐dation procedure is light weight in Section 3.3.

RCache depends on a back‐ground kernel thread to finish the management of the cache.As de‐scribed above,in the read operation,RCache only inserts cache entries to the page cache.In the context back‐ground thread,once a pending cache entry is discovered,RCache first allocates a DRAM page to cache data,and then copies data from the NVM block to the DRAM page according to the cache entry.At last,RCache declares the validity of the cache entry by switching the validation flag atomically.Note that only when RCache updates the vali‐dation flag in the cache entry to vali‐dation,the cache entry is available for read∕write context.

3.3 Lock-Free Cache Management

▲Figure 3.Cache structure and status shifting paradigm

The decoupled cache mechanism splits the cache management into two separating and concurrent con‐texts,which makes coordinating across all units more expensive since it leads to more cross-core communications.Therefore,RCache deploys a lock-free cache management procedure to minimize the impact.First,RCache operates cache entries by manipu‐lating the validation flag atomicity using Compare-and-Swap (CAS) instructions.In the current implementation,a cache en‐try switches among five states using the Compare-and-Swap in‐struction.Fig.3 depicts the transition diagram among these five states.At the initial point,all cache entries are invalid.To insert a cache entry,RCache first acquires control of a can‐didate entry by setting the validation flag of this entry to “In use” atomically using CAS,which prevents other threads from operating on this entry.Then,RCache fills necessary informa‐tion (e.g.the block number and the NVM page pointer) and changes the status to “Prep”,which tells the background thread that this entry has all information needed and is ready for data copy.From the background thread view,before copy‐ing data from persistent memory to DRAM,the background thread first sets the status of a cache entry to “Copy”,then the background thread initiates a data copy procedure.When the data copy completes,the background thread sets the status of a cache entry to “Ready” by using CAS instruction operating on the validation flag,and,only at this point,the cache is available for read operations.To write data into a certain page,if cache hits,RCache needs to invalidate the cache entry rep‐resenting this page by switching the status to“Invalid” by CAS,and the validation flag of the entry to “Invalid”.Note that RCache never invalidates an “In use” cache entry,be‐cause the “In use” status only exists in the context of a read syscall.Since the file is locked up in write operations,this situation never happens.To read data from a cache entry,RCache first switches the status from “Ready” to “In use” us‐ing CAS,then copies data from the DRAM page to user buffer,and at last,changes the status back to “Ready”.However,this leads to an inconsistent status where users might be given wrong data,since there might be several threads reading data from the cache entry concurrently.Therefore,RCache incar‐nates an additional counter in the validation flag,when a reader wants to read this cache,it must increase this counter;and when a reader finishes reading,it must decrease the counter.Therefore,only the last reader can switch the status back to “Ready”.

3.4 Implementation

We implement RCache on NOVA,a state-of-the-art NVM filesystem devel‐oped with the DAX technology.We keep the metadata and data layout in NOVA in‐tact,and add extra logic for managing the cache in the context of read∕write proce‐dure.We launch the background thread in kernel at the mount phase,and reclaim this thread during the unmount phase.To tackle the hotness of a block,we extend the block index in NOVA,and add an extra counter to each leaf node of the radix tree.We insert a block into the cache only when it is accessed more times than a threshold in a time window.The threshold and the time window are predefined.

4 Evaluation

In this section,we first evaluate RCache’s read∕write la‐tency,then we evaluate the read performance under readintensive workload,and at last,we evaluate the read perfor‐mance under a skewed read-intensive workload.

4.1 Experimental Setup

We implement RCache and evaluate the performance of RCache on the server with Intel Optane DCPMM.The server has 192 GB DRAM and two Intel Xeon Gold 6 240 M proces‐sors (2.6 GHz,36 cores per processor) and 1 536 GB Intel Op‐tane DC Persistent Memory Modules (6×256 GB).Because cross-non-uniform memory access (NUMA) traffic has a huge impact on performance[27],throughout the entire evaluation,we only utilize NVMs on one NUMA node to deploy RCache and other file systems (e.g.,only 768 GB NVMs on this server).The server is running Ubuntu18.04 with Linux Kernel 4.15.

Table 1 lists file systems for comparison.We build all file‐systems on the same NVM device with a PMem driver.For EXT4,we build it following the traditional procedure with a page cache involved.For both NOVA and RCache,since RCache shares most of the filesystem routines with NOVA,we deploy both of them on an NVM device with a PMem driver and DAX enabled.

For a latency test,we use custom micro benchmarks and Fx‐mark[28]for bandwidth evaluation.Fxmark is a benchmark de‐signed to evaluate the scalabil‐ity of file systems.In this evaluation,we use three subbenchmarks,namely DRBL,DRBM and DWAL,in Fxmark.

▼Table 1.Evaluated file systems

4.2 Overall Performance

To evaluate the read∕write performance,we use a custom micro-benchmark.All evalua‐tion on each filesystem spawns only one thread.We first create a file with 64 MB,then issue 4 kB read∕write data with 100 000 requests,and finally calcu‐late the average latency.Since EXT4 does not ensure data per‐sistency in the write system call,we issue another fsync af‐ter each write system call to preserve data persistency.Fig.4 shows the read∕write latency for three evaluated filesystems.

For read operations,EXT4 shows the lowest latency,and the latency of RCache is close to that of EXT4 and much lower than that of NOVA.This is because RCache utilizes the DRAM bandwidth to acceler‐ate read.

To evaluate the read band‐width under a read-intensive workload,we use sub-benchmark DRBL from Fxmark.DRBL first creates a 64 MB file for each thread and then issues se‐quence read operation to the filesystem.We conduct the evalua‐tion for 20 s.If a read operation reaches the tail of the file,the next read operation is set at the beginning of the file.From Fig.5(a) we can see that the RCache shows much better read perfor‐mance than NOVA and close to that of EXT4.

4.3 Read Performance Under Skewness

We evaluate the read performance under the skewed work‐load.We modify the DRBL benchmark instead of reading files sequentially,where each thread post-read request at an offset is controlled by a random variable that follows the normal dis‐tribution.Fig.5(b) shows that,both EXT4 and RCache achieve even better performance than that in Fig.5(a).This is because under the skewed workload,the hot pages are more likely to be stored in the L3 cache and therefore end up with better performance.On the other hand,since NOVA does not utilize DRAM for better read performance,the read bandwidth achieved is much lower than that of EXT4 or RCache.

▲Figure 4.Read and write latency of different filesystems

▲Figure 5.Read bandwidth under the read-intensive workload of different filesystems

5 Conclusions

Traditional page cache in the Linux kernel can benefit read workload but cannot fit into an NVM filesystem because it causes extra data copy and write amplification.By bypassing the page cache,the DAX filesystem achieves better write per‐formance but gives up the opportunity of cached read.There‐fore,in this paper,we propose a read-intensive workloadaware page cache for NVM filesystems.RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.Therefore,cached read and direct access can coexist.In addition,to avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Furthermore,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evalua‐tions on Intel Optane DC Persistent Memory Modules show that compared with NOVA,RCache has 3 times higher band‐width for read-intensive workloads and introduces little perfor‐mance loss to write operations.

主站蜘蛛池模板: 中文字幕久久波多野结衣| 91极品美女高潮叫床在线观看| 国产精品自在线拍国产电影| 欧美五月婷婷| 国产精品永久不卡免费视频| 国产精品久久自在自2021| 内射人妻无码色AV天堂| 国产欧美视频在线| 国产青青草视频| 日韩少妇激情一区二区| 欧美日韩激情在线| 亚洲人成色77777在线观看| 国产精品国产主播在线观看| 欧美自慰一级看片免费| 91久久夜色精品| 亚洲三级色| 欧美在线观看不卡| 国产乱子伦无码精品小说| 国产高清毛片| 国产亚洲精品91| 国产精品久久久精品三级| 亚洲国产成熟视频在线多多 | 多人乱p欧美在线观看| 亚洲视频四区| 国产在线视频欧美亚综合| 日韩资源站| 日本一区二区三区精品AⅤ| 中文字幕佐山爱一区二区免费| 欧美色亚洲| 日韩精品专区免费无码aⅴ| 国内老司机精品视频在线播出| 色婷婷成人| 日本人妻丰满熟妇区| 亚洲日韩第九十九页| jijzzizz老师出水喷水喷出| 精品91自产拍在线| 国产成人高清精品免费软件| 风韵丰满熟妇啪啪区老熟熟女| 久久频这里精品99香蕉久网址| 亚洲欧美在线综合图区| 久久成人18免费| 午夜毛片免费看| 久久综合九色综合97婷婷| 婷婷色狠狠干| 伦精品一区二区三区视频| 亚洲第一极品精品无码| 国产91av在线| 国产成人AV综合久久| 无码精油按摩潮喷在线播放 | 麻豆精选在线| 99热这里只有成人精品国产| 免费a级毛片视频| 亚洲成人在线网| 无码一区18禁| 国产国产人免费视频成18| 国产精品入口麻豆| 国产人成午夜免费看| 亚洲成人手机在线| 无码专区第一页| 特黄日韩免费一区二区三区| 色综合中文| 2020精品极品国产色在线观看| 精品人妻无码中字系列| 国内自拍久第一页| 久久黄色免费电影| 国产精品无码久久久久久| 久久99热这里只有精品免费看| 国产视频大全| 91精品专区| 免费无码一区二区| 亚洲欧美另类久久久精品播放的| 重口调教一区二区视频| 欧美激情福利| 久久免费视频6| 国产大片黄在线观看| 国产91透明丝袜美腿在线| 一级毛片高清| 久久亚洲天堂| 99精品免费欧美成人小视频| 91久久大香线蕉| 欧美另类视频一区二区三区| 人妻无码中文字幕第一区|