999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Chromosome-level genome assembly of the Chinese longsnout catfish Leiocassis longirostris

2021-08-16 08:09:44Wen-PingHe,JianZhou,ZheLi
Zoological Research 2021年4期

The Chinese longsnout catfish (Leiocassis longirostrisGünther) is one of the most economically important freshwater fish in China.As wild populations have declined sharply in recent years,it is also a valuable model for research on sexual dimorphism,comparative biology,and conservation.However,the current lack of high-quality chromosome-level genome information for the species hinders the advancement of comparative genomic analysis and evolutionary studies.Therefore,we constructed the first high-quality chromosomelevel reference genome forL.longirostris.The total genome was 703.19 Mb,with 389 contigs and contig N50 length of 4.29 Mb.Using high-throughput chromosome conformation capture (Hi-C) data,the genome sequences (685.53 Mb) were scaffolded into 26 chromosomes ranging from 17.36 to 43.97 Mb,resulting in a chromosomal anchoring rate for the genome of 97.44%.In total,23 708 protein-coding genes were identified in the genome.Phylogenetic analysis indicated thatL.longirostrisand its closest related speciesP.fulvidracodiverged approximately 26.6 million years ago.This highquality reference genome ofL.longirostrisshould pave the way for future genomic comparisons and evolutionary research.

Leiocassis longirostris(also named Jiangtuan) belongs to the family Bagridae,which contains more than 220 species(Ferraris,2007),and the order Siluriformes.It is a semimigratory and commercially important freshwater species endemic to China,especially the Huaihe,Liaohe,Minjiang,Yangtze,and Pearl rivers,and the western regions of the Korean Peninsula (Shen et al.,2014;Wang et al.,2006;Zhu et al.,2005).In recent years,wild populations ofL.longirostrishave experienced a rapid decline due to over-fishing,water pollution,hydropower construction,and other human activities(Liang et al.,2016;Luo et al.,2000;Wang et al.,2006;Xiao & Yang,2009).Thus,to facilitate conservation and evolutionary research,we constructed the first high-quality chromosomelevel reference genome forL.longirostrisusing BGISEQ-500,Nanopore,and high-throughput chromosome conformation capture (Hi-C) technologies.

One healthy adult femaleL.longirostris(Figure 1A)collected from a farm at the Sichuan Academy of Agricultural Sciences in Meishan,Sichuan Province,China,was used for genome sequencing.Muscle tissue was collected for DNA extraction after treatment with the anesthetic tricaine MS-222.Genomic DNA for BGISEQ-500 and Nanopore sequencing was isolated using standard chloroform-isoamyl alcohol extraction procedures (Sambrook et al.,1989).DNA quality and quantity were measured using a NanoDrop? One UV-Vis spectrophotometer (Thermo Fisher Scientific,USA) and Qubit?3.0 fluorometer (Invitrogen,USA),respectively.

A DNA library (200–400 bp insert size) was constructed following the manufacturer’s instructions as described in previous study (Huang et al.,2017).The library was then sequenced following the BGISEQ-500 protocols (Huang et al.,2017).The short-read data obtained from the BGISEQ-500 platform were filtered using SOAPnuke v1.5.2 (Chen et al.,2018).The adapter sequences were removed from the reads,and paired reads with more than 10% ambiguous or lowquality (Phred score<5) bases were discarded,with BLAST v2.2.31 applied for the evaluation of sample contamination(Altschul et al.,1990).As a result,we obtained a total of 64.11 Gb short reads (Supplementary Table S1).Using Jellyfish v2.2.6 (Mar?ais & Kingsford,2011),theK-mer frequency distribution was calculated. The Jellyfish results were subsequently delivered to GenomeScope (Vurture et al.,2017).Using aK-mer size of 17,theK-mer frequency distribution forL.longirostriswas obtained (Supplementary Figure S1).As a result,the genome size ofL.longirostriswas estimated to be 688.99 Mb,with heterozygosity,repeat content,and GC content of 0.35%,42.53%,and 38.43%,respectively.

Figure 1 Genome analysis of L.longirostris

For Nanopore sequencing,we prepared a library using a Ligation Sequencing Kit (Oxford Nanopore Technologies,UK,SQK-LSK109) according to the manufacturer’s instructions.The library was sequenced using the Nanopore GridION X5 sequencer (Oxford Nanopore Technologies,UK) with flow cell R9.4 on five flow cells.Base calling was performed using Guppy v2.0.8 with default parameters,and reads were filtered for mean_qscore_template ≥7.NanoPlot v1.0.0 (De Coster et al.,2018) was then used to filter the Nanopore reads.For the construction of the Hi-C library,1 g of muscle tissue was used to prepare a library according to previously established protocols (Rao et al.,2014).The library was then sequenced on a BGISEQ-500 sequencer (BGI Genomics,China) using 100 bp paired end sequencing.

For transcriptome sequencing,the liver tissues of 15L.longirostrisindividuals collected from the same farm were used for RNA extraction with TRIzol reagent (Invitrogen,USA),followed by treatment with DNase I (Invitrogen,USA) to remove genomic DNA.RNA concentration and integrity were measured using a Qubit?RNA Assay Kit and Qubit?2.0 fluorometer (Life Technologies,USA) and an RNA Nano 6000 Assay Kit with the Agilent Bioanalyzer 2100 system (Agilent Technologies,USA),respectively.Three RNA sequencing libraries (five fish per library) with an insert size of 250–300 bp were prepared using a NEBNext?Ultra? RNA Library Prep Kit for Illumina?(NEB,USA) following the manufacturer’s protocols,and then sequenced on the Illumina Hiseq X Ten platform (Illumina Inc.,USA) as 150 bp paired-end reads.The raw RNA-seq reads were cleaned and assembled as described previously (Ye et al.,2018).

Using the Nanopore sequencing platform,we obtained 43.23 Gb long reads,with an expected average sequencing coverage of 61.48 X for genome assembly (Supplementary Table S1).We then performedde novogenome assembly using Canu v1.8 (Koren et al.,2017) following the correction,trimming,and contig construction steps. After contig assembly,three rounds of contig sequence polishing were performed with cleaned genomic short reads using Pilon v1.23(Walker et al.,2014).Purge Haplotigs v1.0.3 (Roach et al.,2018) was used to produce an improved and deduplicated assembly.Finally,we obtained the assembled genome ofL.longirostris,which was 703.19 Mb in length,with 389 contigs and an N50 contig size of 4.29 Mb.This is a medium-sized genome among other sequenced catfish genomes (Table 1;Supplementary Table S2).We performed genome assembly quality control using the distribution of GC_depth.The GC_depth scatter plots demonstrated a Poisson distribution,indicating that this genome had no significant contamination.The overall GC-content of 39.67% in theL.longirostrisgenome was slightly higher than that of the walking catfish(Clarias batrachus) (Li et al.,2018) and common carp(Cyprinus carpio) but much lower than that of most teleost genomes (Xu et al.,2014).The completeness of the assembledL.longirostrisgenome was estimated using BUSCO v3.0.2 (Sim?o et al.,2015) with the actinopterygii_odb9 database.As a result,4 293 (93.6 %) of the 4 584 BUSCO genes were completely identified in the genome,including 4 109 (89.6%) single-copy and 184 (4.0%)duplicated genes. These results suggest high genome assembly completeness.

For chromosome-level assembly of theL.longirostrisgenome,Hi-C reads were first filtered using HIC-Pro v2.8.0(Servant et al.,2015).Juicer v1.5 (Durand et al.,2016a) was then used to analyze the Hi-C datasets,and 3D-DNA v170123 was used to anchor the genome assembly to the chromosomes (Dudchenko et al.,2017) with parameters “-m haploid -s 0 -c 26”.The contact matrix of theL.longirostriscontigs was mapped using Juicebox v1.11.08 (Durand et al.,2016b) (Figure 1B).A total of 126.35 Gb clean Hi-C reads were obtained,and 685.53 Mb (97.44% of total genome)genome sequences were successfully scaffolded into 26 pseudochromosomes.The number of chromosome scaffolds is consistent with previous research on karyotypes ofL.longirostris(2n=52;Hong & Zhou,1984).The lengths of chromosomes ranged from 17.36 Mb to 43.97 Mb(Supplementary Table S3). The scaffold N50 of the chromosome-level assembly was 28.03 Mb (Table 1).

For the annotation of repetitive sequences,we used RepeatModeler v1.0.10 (Bao & Eddy,2002),which employs two complementary computational methods,i.e.,RECON v1.08 and RepeatScout v1.0.5 (RepeatScout,RRID:SCR 014653) (Price et al.,2005),to identify repeat element boundaries and family relationships from sequence data.Subsequently,the outputs from the RepeatModeler and RepBase v21.01 library were combined and used for further characterization of transposable elements (TEs),many of which are not repetitive,and other repeats by homology-based methods,including identification with RepeatMasker v4.0.7,rmblast-2.2.28 (RRID:SCR 012954).Using RepBase-based homology andde novomethods,239.11 Mb (33.99% of total genome) repetitive elements were identified,with DNA transposons (146.40 Mb,20.81%) being the most abundant type in the genome (Supplementary Table S4-1).The proportion of repetitive elements inL.longirostrisis similar to that in theGlyptosternon maculatumgenome (33.96%) (Liu et al.,2018) and higher than that of most teleost genomes(Supplementary Table S4-2).

Combined homology-,de novo-,and transcriptome-based methods were used for gene prediction in the genome.The protein sequences of nine fish species,includingDanio rerio,Gasterosteus aculeatus,Ictalurus Punctatus,Larimichthys crocea,Oreochromis niloticus,Oryziaslatipes,Pangasianodon hypophthalmus,Tachysurus fulvidraco,andTakifugu rubripes,were downloaded from the Ensembl database and mapped onto the assembledL.longirostrisgenome using BLASTN.Subsequently,GeneWise v2.2.0(Birney et al.,2004) with default options was used for homologous annotation.Forde novoprediction,Augustus v3.1.0 (Stanke & Waack,2003) was used to predict gene models.In addition,RNA-seq data were aligned to the assembledL.longirostrisgenome to predict gene coding regions.The gene models were then predicted by combining the above homology-,de novo-,and transcriptome-based information using PASA v2.3.3 (Haas et al.,2003).Various databases,including SwissProt (Boeckmann et al.,2003),Kyoto Encyclopedia of Genes and Genomes (KEGG)(Kanehisa & Goto,2000),TrEMBL (Boeckmann et al.,2003),InterPro (Zdobnov & Apweiler,2001),and Gene Ontology(GO) (Ashburner et al.,2000),were used to functionally annotate the predicted protein-coding genes,and GLEAN(Elsik et al.,2007) was used to create a consensus gene set.Finally,a total of 23 708 protein-coding genes were identified in theL.longirostrisgenome (Supplementary Table S5),of which 21 692,20 072,23 114,21 169,and 16 638 proteincoding genes were annotated in the SwissProt,KEGG,TrEMBL,InterPro,and GO databases,respectively(Supplementary Table S6 and Figure S2).BUSCO was also used to test the completeness of the genome annotation with the actinopterygii_odb9 database,which showed that 92.4%complete and 4.0% fragmented conserved single-copy orthologs were predicted forL.longirostris.

Table 1 Summary of sequenced catfish genomes

For non-coding RNAs,microRNA (miRNA) and small nuclear RNA (snRNA) were predicted using INFERNAL v1.1(Nawrocki & Eddy,2013) and the Rfam database (Kalvari et al.,2018).Transfer RNA (tRNA) and ribosomal RNA (rRNA)were identified using tRNAscan-SE v1.3.1 (Lowe & Eddy,1997) and RNAmmer v1.2 (Lagesen et al.,2007),respectively.After analysis,422 miRNAs,2 118 tRNAs,1 838 rRNAs,and 1 925 snRNAs were annotated in theL.longirostrisgenome (Supplementary Table S7).

To identify gene families,protein sequences from the longest transcripts of each gene fromL.longirostrisand 10 other fish species,includingD.rerio,Astyanax mexicanus,G.aculeatus,G.maculatum,I.punctatus,Lepisosteus oculatus,Oreochromis niloticus,Oryzias latipes,Pelteobagrus fulvidraco,andT.rubripes,were aligned using BLASTP with an e-value threshold of 1e-5.OrthoMCL v1.4 (Li et al.,2003)was then used to construct gene families.A total of 19 438 gene families and 3 585 single-copy ortholog families were identified among the 11 species,with 68 gene families specific toL.longirostris(Supplementary Table S8).In addition,11 729 (89.1%) gene families were shared by the four catfish species,with 301 gene families specific toL.longirostris(Supplementary Figure S3).

To investigate the phylogenetic relationships ofL.longirostriswith the above 10 fish species,the shared singlecopy genes were aligned by MUSCLE v3.8.31 (Edgar,2004).RAxML v8.2.1163 (Stamatakis,2014) was then employed to construct a phylogenetic tree with the -m PROTGAMMAAUTO model and 100 bootstrap replicates.MCMCTREE v3.8.31(Yang,2007) was used to estimate divergence time based on the “correlated molecular clock” and “HKY85” models.Phylogenetic analysis indicated thatL.longirostrisandP.fulvidraco,which are both from the family Bagridae,were clustered onto one branch,andL.longirostriswas close to theP.fulvidraco,G.maculatum,andI.punctatusclades,which belong to the Siluriformes order.These results are similar to previous phylogenetic analyses based on the mitochondrial genome ofL.longirostris(Liu et al.,2019).Our results also showed thatL.longirostrisdiverged~26.2 million years ago from its closest related speciesP.fulvidraco(Figure 1C).Furthermore,phylogenetic analysis estimated thatI.punctatusdiverged fromP.fulvidracoaround 82.2 million years ago,consistent with the 81.9 million years reported in previous study (Gong et al.,2018). Collinearity analysis of chromosomes betweenL.longirostrisandI.punctatuswas performed using LASTZ v1.02.00 (Harris,2007) with parameters “T=2 C=2 H=2 000 Y=3 400 L=6 000 K=2 200”.As a result,all 26 pseudochromosomes ofL. longirostrisdisplayed high homology with the corresponding chromosomes ofI.punctatus(Figure 1D),suggesting highqualityL.longirostrisgenome assembly.

In the present study,the first chromosome sequences forL.longirostriswere constructed using a combination of BGISEQ-500,Nanopore,and Hi-C technologies.The reference genome exhibited high quality in terms of continuity and completeness.This study should improve our understanding of theL.longirostrisgenome and provide valuable chromosomal information for genomic comparisons and evolutionary research among important aquaculture species.

DATA AVAILABILITY

The raw genome and RNA sequencing data were deposited in the National Center for Biotechnology Information (NCBI)database under accession No.PRJNA692071.

SUPPLEMENTARY DATA

Supplementary data to this article can be found online.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS’ CONTRIBUTIONS

W.P.H.,H.L.,J.Z.,and H.Y.designed the experiments;W.P.H.,H.L.,J.Z.,Z.L.,T.S.J.,C.H.L.,Y.J.Y.,M.B.X.,and C.W.Z. performed the experiments and analyzed data;W.P.H.,G.J.L.,H.Y.X.,and H.Y.wrote the paper.All authors read and approved the final version of the manuscript.

主站蜘蛛池模板: 中文字幕乱码二三区免费| 噜噜噜久久| 久久久久久久蜜桃| 一级毛片高清| 久久久久九九精品影院| 福利国产微拍广场一区视频在线| 国产成人综合亚洲欧美在| 国产清纯在线一区二区WWW| 好吊色国产欧美日韩免费观看| 中文字幕在线一区二区在线| 色噜噜狠狠色综合网图区| 亚洲aaa视频| 日韩在线永久免费播放| a级毛片免费网站| 亚洲男人天堂2020| 91人人妻人人做人人爽男同| 青青草原国产| 最新国产精品鲁鲁免费视频| 日韩免费毛片| 精品国产一二三区| 中文字幕佐山爱一区二区免费| 欧美一区二区精品久久久| 欧美成人午夜视频免看| 伊人欧美在线| 精品无码国产一区二区三区AV| 久久99热这里只有精品免费看| 亚洲第一成年网| 免费久久一级欧美特大黄| 老司机午夜精品网站在线观看| 亚洲综合久久一本伊一区| 亚洲精品你懂的| 午夜少妇精品视频小电影| 高潮毛片免费观看| 狠狠v日韩v欧美v| 伦精品一区二区三区视频| 亚洲天堂福利视频| 婷婷午夜影院| 国产福利拍拍拍| 免费看黄片一区二区三区| 小蝌蚪亚洲精品国产| 国产自在线播放| 在线观看欧美精品二区| 国产福利免费观看| 亚洲人成影院午夜网站| 国产高清无码麻豆精品| 日本色综合网| 国产一级无码不卡视频| 精品伊人久久久香线蕉 | 国产在线观看精品| 无码 在线 在线| 2018日日摸夜夜添狠狠躁| 黄色免费在线网址| 国产裸舞福利在线视频合集| 久久久久亚洲精品无码网站| 日韩亚洲高清一区二区| 国产在线无码一区二区三区| 亚洲av色吊丝无码| 午夜性刺激在线观看免费| 亚洲va在线∨a天堂va欧美va| 好紧太爽了视频免费无码| 欧美视频在线播放观看免费福利资源 | 国模私拍一区二区| 亚洲综合婷婷激情| 日韩性网站| 久久综合色88| 91蜜芽尤物福利在线观看| 男女性色大片免费网站| 无码久看视频| 久久永久视频| 精品国产Av电影无码久久久| 国产91丝袜在线播放动漫 | 乱码国产乱码精品精在线播放| 婷婷色中文网| 国产乱子精品一区二区在线观看| 自拍偷拍欧美| 国产午夜福利片在线观看| 无码精油按摩潮喷在线播放| 波多野结衣无码视频在线观看| 亚洲欧美另类专区| www.91中文字幕| 久久a级片| 国产一二三区视频|