Xiyu Li,Kixin Zhng,Xu Sun,Shnshn Hung,Jijing Wng,Chng Yng,Mhfishn Siyl,Cho Wng,Chunln Guo,Xiping Hu,Wen-Xi Li,*,Hilong Ning,*
aKey Laboratory of Soybean Biology,Ministry of Education/Key Laboratory of Soybean Biology and Breeding/Genetics,Ministry of Agriculture,Northeast Agricultural University,Harbin 150030,Heilongjiang,China
bYancheng Institute of Technology,Yancheng 224051,Jiangsu,China
cKey Laboratory of Crop Biotechnology Breeding of the Ministry of Agriculture,Beidahuang Kenfeng Seed Co.,Ltd.,Harbin 150030,Heilongjiang,China
ABSTRACT
Soybean[Glycine max(L.)Merr.]is an economically important crop worldwide and one of the main sources of human-edible oil,which accounts for about 20% of the seed’s dry weight.Increasing oil content is a primary objective of soybean breeding programs.Oil content is a quantitative trait controlled by multiple genes and greatly affected by the environment.In the last 30 years,based on oil content at maturity in multiple environmental and genetic backgrounds,259 quantitative trait loci(QTL)and 78 quantitative trait nucleotides(QTNs)for oil in soybean have been identified by QTL mapping and genomewide association study(GWAS)(https://www.soybase.org/).
In previous studies[1–3],QTL were identified mainly in separate populations derived from biparental crosses.However,the richness of allelic and phenotypic variation in biparental populations is limited.To overcome this limitation,multiparent advanced generation intercross(MAGIC)populations have been developed in plant and animal species[4,5].In comparison with populations derived from biparental crosses,the large number of parents used for MAGIC populations increases the diversity of alleles and phenotypes,increase mapping accuracy and precision,and improve QTL detection efficiency with the large number of recombination events accumulated between parents of MAGIC populations increasing QTL mapping resolution.As the simplest form of MAGIC population,a four-way recombinant inbred line(FW-RIL)population is the easiest to develop and offers the same advantages as MAGIC populations.This population type has been used for QTL mapping and GWAS in soybean[6–13].
In recent years,linkage and linkage disequilibrium(LD)mapping based on single-nucleotide polymorphism(SNP)have been applied in various plant species[14].Individual linkage analysis employs only a genetic map constructed with a set of progeny in combination with the phenotypes.This approach relies on the genetic diversity of parents,and its QTL detection power varies among populations.Putative QTL are often in large confidence intervals,making it hard to identify candidate genes.GWAS can overcome the limits of QTL analysis by narrowing the candidate region,but at the cost of an increased false-positive rate.Accordingly,a combination of the two analyses is often applied to improve the accuracy of QTL mapping.
The objectives of the present study were to identify QTL and QTN for oil content by QTL analysis in an FW-RIL population,to identify QTN by GWAS in a germplasm panel,and to identify candidate genes involved in oil synthesis.
Two genetic populations were used. An FW-RIL population was constructed using four parents with differing oil content(OC): Kenfeng 14 (OC 20.34%), Kenfeng 15 (OC 22.76%), Kenfeng 19(OC 19.05%), and Heinong 48 (OC 19.26%). The detailed construction process has been described by Ning et al. [6]. A germplasm panel was constructed from 455 soybean accessions including 4 landraces and 387 domestic and 44 foreign cultivars (Table S1).
The four parents and 144 FW-RILs were planted in 10 environments (E1–E10) distinguished by year, location, planting density, and sowing date (Table S2). The 455 accessions of the germplasm panel were planted in 2018 and 2019 in Harbin. The field experiments for the FW-RILs and germplasm panel were laid out in a randomized complete block design with three replications. A plot of three rows (3 m long × 0.65 m wide) was planted for each line.The germplasm population was planted in a randomized complete block design with three replications.A plot containing three rows(3 m long,0.65 m wide)was planted for each line,with 7 cm between adjacent seeds.The seeds of 10 mature plants from the middle row of each plot were harvested and the oil content of the dry(with about 10%water)seeds was determined with an Infratec 1241 NIR Grain Analyser(FOSS,Denmark).
The third tender trifoliate leaves were collected from each FWRIL and accession of the germplasm panel.and stored in liquid nitrogen.DNA was extracted from the samples using the CTAB method.The DNA concentration was determined with a UV752N spectrophotometer(Shanghai Jingke Science Instrument Co.Ltd.).SNP genotyping of FW-RIL was performed at Beijing Boao Biotechnology Co.Ltd.,using the SoySNP660K BeadChip,and SNP genotyping of the germplasm population was performed at Beidahuang Kenfeng Seed Co.,Ltd.,using the SoySNP180K BeadChip.A total of 109,676 and 63,306 high quality SNPs remained after removal of SNPs with AA or BB frequency=0,secondary gene frequency<0.05,or sample loss rate>0.1,and SNPs that did not show clearly defined clusters.
SNP markers showing polymorphism among the parents was selected with a physical distance of about 100 kb for the construction of a FW-RIL linkage map.A total of 2232 highquality SNPs on 20 chromosomes were used to construct a map using GAPL 1.0 software(http://www.isbreeding.net/software/)[15].Markers were grouped by anchor and ordered via a combined algorithm of nearest neighbor and the two-opt algorithm of the Traveling Salesman Problem[16].
Based on the mean phenotypic values of the three replicates per line in each environment,a descriptive analysis and correlation analysis were performed.An analysis of variance(ANOVA)and estimation of broad-sense heritability were performed using the phenotypic values of the three replicates per line in single and multiple environments.The statistical model of ANOVA for multiple environments was as follows:





The oil content of each line in every environment and the linkage map were used for QTL analysis via inclusive complete interval mapping(ICIM)[17,18].QTL were identified using a log of odds(LOD)threshold at 2.5.QTL across different environments were declared to be the same when their 1-LOD confidence intervals overlapped.
Zhang et al.[8]have described the population structure and linkage disequilibrium(LD)of the FW-RIL population,and the analytical methods of population structure and LD for the germplasm panel were the same as for the FW-RIL population.Analyses of LD and population structure were also performed with the software TASSEL 5.0[19]and STRUCTURE 2.3.4[20],respectively;and the physical distance of LD decay was estimated as the position where r2dropped to half of its maximum value.Based on the structure and LD results,association analyses of these two populations were performed.Five multi-locus GWAS methods in the R package mrMLM.GUI 3.0(http://cran.ms.unimelb.edu.au/web/packages/mrMLM.GUI/index.html) were used to identify QTN,including a multi-locus random-SNPeffect mixed linear model(mrMLM)[21],fast multi-locus random-effect mixed linear model(FASTmrMLM)[22],fast multi-locus random-SNP-effect efficient mixed model association(FASTmrEMMA)[23],polygenic-background-control-based least angle regression plus empirical Bayes(pLARmEB)[24],and iterative sure independence screening EM-Bayesian LASSO algorithm(ISIS EM-BLASSO)[22].The critical P-value parameters for these methods at the first stage were set to 0.01 except for FASTmrEMMA,where the critical P-value was set to 0.005,and the critical LOD score was set to 3 for significant QTN at the last stage.
QTL and QTN were named as follows:q+Fl/Fn/G+OCchromosome name-sequence number,where q represents QTL;Fl and Fn represent respectively QTL and QTN detected in FW-RIL;G represents QTN detected in the germplasm panel;and OC represents oil content;number represents sequence on its chromosome.QTL that were mapped to the same marker region were given the same sequence number.
Linkage maps were drawn with MapChart 2.1(https://www.wur.nl/en).
Target QTN were identified from significant QTNs in the panel by colocation with FW-RILs,and then genes in the LD decay distances of target QTN were identified in Phytozome(https://phytozome.jgi.doe.gov/).Among them,those expressed in seeds were identified in the Bio-Analytic Resource for Plant Biology(BAR)(http://www.bar.utoronto.ca).Finally,candidate genes involved in oil synthesis were identified by pathway analysis in the Kyoto Encyclopedia of Genes and Genomes(KEGG)(https://www.kegg.jp/).
Oil content followed a normal distribution with low skewness in the FW-RIL and germplasm populations, but the different direction of skewness in the FW-RILs showed the influence of environment on the overall variation of oil content (Table 1). Kurtosis was higher in the germplasm panel than in the FW-RILs, indicating its higher genetic diversity.
ANOVA (Table S3) showed that the variation among genotypes and genotype × environment effects were significant at the 0.01 level, indicating that the population was suitable for QTL mapping studies and could potentially reveal a variety of QTL in different environments.
The 2332 SNPs were distributed on 20 linkage groups covering 3540 cM. The mean marker interval length on chromosomes ranged from 1.92 to 10.93 cM with a grand mean of 4.09 cM.
Among the 10 environments, 59 QTL associated with oil content were identified on 17 chromosomes (not on chromosomes 2, 16, or 20) explaining 3.36%–14.91% of phenotypic variance (Figs. 1 and 2, Table S4). In ten environments, 6, 4, 4, 7,6, 5, 10, 5, 10, and 8 QTL were detected, explaining respectively 5.61%–9.36%, 6.37%–11.27%, 5.27%–11.32%, 3.36%–9.89%,6.59%–14.79%, 5.79%–7.72%, 5.80%–14.29%, 5.19%–8.27%,3.56%–14.91%, 3.44%–14.16% of phenotypic variance (Fig. 3 left).
Five QTL (qFlOC-1-4, qFlOC-9-1, qFlOC-13-2, qFlOC-15-1,and qFlOC-18-3) were identified in more than two environments. The parent Kenfeng 14 carried positive additive effect alleles for 25 QTL, Kenfeng 15 for 39, Heinong 48 for 31, and Kenfeng 19 for 34. There were 25, 40, 32, and 35 allelic genotypes with positive additive effects increasing oil content by 0.02%–0.95%, 0.004%–2.84%, 0.01%–0.71% and 0.002%–0.32% in Kenfeng 14, Kenfeng 15, Heinong 48, and Kenfeng 19, respectively (Table 2).
A total of 44 QTN were detected on 18 chromosomes (all but chromosomes 5 or 8) by association analysis (Figs. 1 and 2, Table S5),including 4,3,5, 2,8,7,3,2,0 and 10 significant QTN in the 10 environments, and the proportion of phenotypic variation explained by the QTN ranged from 6.47% to 7.29%,from 3.43% to 9.74%, from 3.54% to 6.33%, from 5.21% to 14.16%, from 4.32% to 12.91%, from 1.10% to 7.82%, from 5.62% to 8.06%, from 4.97% to 4.99%, and from 0, 3.06% to 13.65% (Fig. 3 right). Of 44 QTN, nine were located in six physical regions defined by QTL intervals: qFnOC-1-2 and qFnOC-1-3 in qFlOC-1-3, qFnOC-6-6 and qFnOC-6-7 in qFlOC-6-1, qFnOC-9-1 and qFnOC-9-2 in qFlOC-9-2,qFnOC-11-1 in qFlOC-11-2, qFnOC-13-1 in qFlOC-13-1,qFnOC-17-1 in qFlOC-17-2 (Fig. 2).

Table 1–Descriptive analysis of oil content in two populations in m ultiple environments.
The analysis of population structure of germplasm revealed the presence of two subpopulations(selected K=2)based on delta K(ΔK)values.These two subgroups contained 132(29.01%)and 323(70.99%)accessions,and the LD decay distance was estimated as 86 kb[8].
A total of 77 QTN associated with oil content were identified on 19 chromosomes(all but chromosome 4)in the germplasm population(Figs.1 and 2,Table S6),which the LOD values ranging from 3.03 to 12.09,the proportion of phenotypic variance explained by the QTN ranged from 0.09% to 11.56%.Among these QTN,40 and 37 significant QTN were identified in 2018 and 2019,respectively(Fig.4).
The 103 QTL and QTN identified by the FW-RIL QTL and association analyses were compared with the 77 QTN identified by association analysis in the germplasm panel(Fig.2).A total of 17 attenuation regions of QTN detected in the germplasm panel overlapped 9 QTL intervals identified in FW-RIL:qGOC-1-3 and qGOC-1-4 in the interval of qFnOC-1-3;qGOC-8-1,qGOC-8-2 and qGOC-8-3 in the interval of qFlOC-8-2;qFlOC-9-2 and qGOC-9-3 in the interval of qFlOC-9-2;qGOC-10-3,qGOC-10-4 and qGOC-10-5 in the interval of qFlOC-10-2;qGOC-13-2 and qGOC-13-3 in the interval of qFlOC-13-1;qGOC-14-4 in the interval of qFlOC-14-2;qGOC-17-1,qGOC-17-2 in the interval of qFlOC-17-1;qGOC-18-1 in the interval of qGOC-18-1;qGOC-19-2 in the interval of qFlOC-19-1.
The LD decay distance of the germplasm panel was used to select candidate genes within the attenuation distance of each QTN.A total of 24 candidate genes were found,of which 20 genes were expressed in seed during the formation of seed oil.In the pathway analysis,6 of 20 genes were assigned to 10 pathways(Tables 3 and 4).
According to the annotation and metabolic function information,two genes(Glyma.08G045500.1 and Glyma.08g047000.1)were selected as candidate genes involved in oil synthesis.ABA content influences the accumulation of assimilates in soybean seeds.Glyma.10G159900.1 is involved in regulation of the signal transduction of the endogenous hormone ABA in plants,thereby controlling oil increase.Glyma.10G159800.1,as a cysteine synthase,is involved in the biosynthesis of cysteine and methionine.Because substrate competition occurs during protein and oil synthesis in soybean,the expression of this gene can inhibit oil accumulation in soybean.Further investigation may reveal whether and how either gene influences oil content in soybeans.
To compare QTL detected in this study with previously identified QTL, we integrated the newly identified QTL into a publicly available soybean genome map and organized them into Tables S3, S4, and S5.

Table 2–Frequency of allelic effects in the four parents.
Among the 59 identified QTL in FW-RIL,56 QTL have been detected in previous studies.Among the remaining three QTL,two are near the identified QTL.Of the 44 QTN identified in the fourway population,35 are consistent with previously reported QTL.Six of the remaining nine are located near areas identified in previous studies.In the germplasm population,65 of 77 QTN have been previously reported.Four QTN that were consistent across the two populations(Fig.2)overlapped previously mapped regions[25–28],further supporting the reliability of the mapping results.

Fig.1–Frequency of QTL and QTN influencing oil content on 20 chromosomes in two populations.Top,QTL in the FW-RILs;middle,QTN in the FW-RILs;bottom,QTN in the germplasm panel.

Fig.2–Distribution of QTL and QTNs for oil content identified in FW-RIL and germplasm populations on genome map.Black bars represent QTL in FW-RIL.Lines on chromosome with italic and upright letters indicate QTN in the FW-RIL and germplasm populations.The ruler unit is Mb.Chr.1 represents the first chromosome,and likewise for all chromosomes.
Compared with a conventional biparental mapping population(F2,backcross,double haploid),an FW-RIL population uses the genetic diversity of four parents.Thus,the density and coverage of the linkage map will be increased in an FWRIL population[29].When a QTL is mapped using a biparental populations,similar allelic additive effects from the two parents will prevent its detection.In an FW-RIL population derived from four parents,as long as there are differences in the additive effect values between the two parents,a QTL can be detected.An example is qFlOC-9-1,for which the additive effects of alleles from the four parents were-0.27,0.20,-0.13,and 0.20.Given the similarity of additive effects from the second and fourth parents,this QTL would not have been identified in a biparental population derived from a Kenfeng 15×Kenfeng 19 cross.The large difference in additive effects from the other parents increased the power of QTL detection.

Fig.3–Percentage of phenotype variation explained by QTL(left)and QTN(right)detected in FW-RIL.

Fig.4–Frequency of QTN with various ratio of phenotype variation explanation detected in the germplasm panel.
Use of an FW-RIL population revealed genomic regions controlling target traits by linkage and association mapping.The difference in the statistical principles and genetic bases of these two approaches leads to difference in the results.In the present study,59 QTL and 44 QTN were detected by linkage(ICIM)and association(multiple-locus GWAS)analysis,and nine QTN were located in six QTL intervals.Thus,the combination of linkage and association analysis increased the detection power of QTL and QTN influencing oil content for FW-RIL.In summary,FW-RIL populations can be used to improve QTL detection.
Like conventional biparental populations,an FW-RIL population shows advantages and disadvantages in comparison with a germplasm panel.As advantages,the heritability and kurtosis in the panel were higher than those in the FWRIL population,indicating that the genetic base of the panel was different from those of the FW-RIL.Furthermore,among 59 QTL and 44 QTN detected in the FW-RIL,only nine QTL intervals co-localized in the germplasm panel and the remaining QTL were specific for FW-RIL,showing that a special molecular mechanism controlling oil synthesis was present in FW-RIL.As a disadvantage,of 59 QTL intervals,22(37%)covered a genomic tract of more than1 Mb,too large a region to search for a candidate gene.
CRediT authorship contribution statement
Wen-Xia Li and Hailong Ning conceived and designed the experiments. Xiyu Li, Kaixin Zhang, Xu Sun, Jiajing Wang,Chang Yang, and Mahfishan Siyal performed the field experiments and quality analysis. Shanshan Huang, Chao Wang,Chunlan Guo, and Xiping Hu performed the genome sequencing.Xiyu Li, Kaixin Zhang, and Hailong Ning analyzed and interpreted the results. Xiyu Li and Hailong Ning drafted and revised the manuscript.
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2020.07.004.

Table 3–Details of four genes annotated in the KEGG database.

Table 4–Six genes with pathway information.
Declaration of competing interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This research was financially supported by Hundredthousand and Million Project of Heilongjiang Province for Engineering and Technology Science(2019ZX16B01).We thank Professor Jiankang Wang(Institute of Crop Sciences,Chinese Academy of Agricultural Sciences)for valuable suggestions.