999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Application of Random Forest Regressions on Stellar Parameters of A-type Stars and Feature Extraction*

2022-05-23 08:45:34ShuXinChenWeiMinSunandYingHe
Research in Astronomy and Astrophysics 2022年2期

Shu-Xin Chen,Wei-Min Sun,and Ying He

1 Qiqihar University,Qiqihar 161006,China

2 Key Lab of In-fiber Integrated Optics,Ministry Education of China,Harbin Engineering University,Harbin 150009,China;sunweimin@hrbeu.edu.cn

3 Department of Computer Science and Technology,Tianjin Ren’ai College,Tianjin 301636,China

Received 2020 September 16;revised 2021 November 28;accepted 2021 November 29;published 2022 February 2

Abstract Measuring the stellar parameters of A-type stars is more dif ficult than FGK stars because of the sparse features in their spectra and the degeneracy between effective temperature(T eff)and gravity(log g).Modeling the relationship between fundamental stellar parameters and features through machine learning is possible because we can employ the advantage of big data rather than sparse known features.As soon as the model is successfully trained,it can be an ef ficient approach for predicting T eff and log g for A-type stars especially when there is large uncertainty in the continuum caused by flux calibration or extinction.In this paper,A-type stars are selected from LAMOST DR7 with a signal-to-noise ratio greater than 50 and the T eff ranging within 7000 to 10,000 K.We perform the Random Forest(RF)algorithm,one of the most widely used machine learning algorithms to establish the regression relationship between the flux of all wavelengths and their corresponding stellar parameters(T eff)and(log g)respectively.The trained RF model not only can regress the stellar parameters but also can obtain the rank of the wavelength based on their sensibility to parameters.According to the rankings,we de fine line indices by merging adjacent wavelengths.The objectively de fined line indices in this work are amendments to Lick indices including some weak lines.We use the Support Vector Regression algorithm based on our new de fined line indices to measure the temperature and gravity and use some common stars from Simbad to evaluate our result.In addition,the Gaia Hertzsprung-Russell diagram is used for checking the accuracy of T eff and log g.

Key words:methods:data analysis–surveys–stars:early-type–stars:abundances

1.Introduction

The A-type stars encompass a bewildering array of stellar types,and many horizontal-branch stars shown in the A-type star region on the Hertzsprung-Russell(HR)diagram suggest their evolutionary states.The fundamental stellar atmospheric parameters(Teffand logg)are the basis for astrophysics study of A-type stars,and estimation of these parameters are often from strong Balmer spectral lines.For low-resolution spectra,line index is an effective method to extract spectral features and has been widely used in astronomical research.Cenarro(2001)used the line index to calculate the Ca II flux and measured stellar atmospheric parameters to determine the effective temperature.Covery et al.(2007)wrote IDL programs to use Hammer line index to automatically classify stellar spectra.Yi et al.(2014)also added the features extracted from the spectrum using the Random Forest(RF)algorithm on the basis of Covery's program as a new feature index and applied it to the spectral classi fication of M dwarfs,and proving that the improved feature index has a better performance in the classi fication of M dwarfs.Inspired by the work of Yi et al.(2014),we apply RF in A-type stars to de fine new spectral line indices representing features for low-resolution spectra,and this speci fic de finition of line indices of A-type stars is sensitive to their stellar parameters.

Among all de finitions of line index systems,the Lick index is one of the most widely used line index systems applied in many spectral analysis fields.The line indices for A-type stars released by LAMOST were calculated following the de finition of the Lick system,which includes most of the prominent absorption lines.Hou et al.(2014)described the details of lines of A-type stars for low-resolution spectra.The advance of using Lick indices is that the error of flux calibration and radial velocity measurement can be ignored and the noise has little effect on the line indices.Tan et al.(2013)used line index as the training feature of sky survey data in the measurement process of stellar atmospheric physical parameters,and obtained the best regression model in the training of linear regression.Wang et al.(2014)used the Lick line index and applied the partial least-squares regression method for the measurement of the atmospheric physical parameters.The result of the partial least-squares regression model is not only consistent with the parameters of Sloan Stellar Parameter Pipeline(SSPP)released but also the partial least-squares regression can reduce the computational complexity,speed up the training process.Pan et al.(2015)pointed out different sensitivities of spectral lines to the effective temperature of main-sequence stars.They used line index as input of Support Vector Machines(SVM)to do the classi fication of stars.

However,there are only strong lines in the Lick system that are not enough for the correct parameterization of A-type stars.Thus,we are motivated to accurately estimate theTeffand loggfor A-type stars and get relatively weak features that are sensitive to the stellar parameters.To obtain the possible additional features,we choose to use the decision tree based RF algorithm to extract more features other than Balmer lines and Calcium HK,etc.RF is a regression method that has been used in several astronomical research.For example,Bai et al.(2019)applied RF to the stellar effective temperature regression for the second Gaia data release with the precision of about 191 K,based on the combination of the stars in four spectroscopic surveys.

In this work,we use LAMOST DR7 released A-type spectra with full wavelength as input of RF algorithm to establish the regression model for stellar parameters.Then we rank the wavelength according to the sensitivity to the parameters and obtain the most sensible lines finally.We then de fine the line indices for these lines and compare them to Lick indices.Using the newly de fined indices,we employ Support Vector Regression(SVR)to estimate the stellar parameters for A-type stars.The result of temperature and gravity from our method agrees with those from LAMOST.Cross-matching with Simbad,we get around 200 common stars with published parameters.A comparison of parameters is conducted to the common star.In addition,we calculate the absolute magnitude for the star with Gaia parallax and use the HR diagram to check our result.

The article is organized as follows.In Section 2,we introduce the LAMOST data we used.In Section 3,we present the application of RF regression in derivingTeff,loggand[Fe/H]of A-type stars from full spectra and de finition of speci fic line indices for parameter determination of A-type stars.Section 4 introduces the application of SVR to estimate stellar parameters using our de fined indices,and also presents HR and Keil diagrams to check the parameters we compute,and Section 5 summary the work in this paper.

2.Data

2.1.LAMOST Released Spectra of A-type Stars

The published LAMOST DR7 catalog includes 599,762 A-type star spectra,which were obtained during the pilot survey and 7 yr regular surveys.There are two formats for the A-type star catalog:i.e.,FITS and CSV.The full spectra ranging from 3700 to 8800?are used as input of the RF algorithm in the first run.The class of these stars contains both spectral type and luminosity class provided by the LAMOST analysis pipeline.We also compare our de fined index system with the line indices published in the LAMOST LRS Line-Index Catalog of A-Type Stars.The comparison includes kp12,Halpha12,and Hgamma12 are the Ca II-K,Halpha,and Hgamma.Teffand loggare from the catalog LAMOST LRS Stellar Parameter Catalog of A,F,G,and K Stars,in which parameters of 114,208 A type spectra are included.Crossmatching with Gaia EDR3,we obtained 108,581 stars with good parallax.We also remove some spectra classi fied as A-type but with a temperature lower than 7000 K.An example is shown in Figure 1 titled“spec-55859-f5907_sp15-081.fits”,of which the effective temperature is 6833 K and class is A9Vtype.Thus,we selected A-type stellar data with temperatures from 7000 to 12,500 K and S/N greater than 50.

Figure 1.A pipeline classi fied A9V-type star“spec-55859-f5907_sp15-081.fits”,whose T eff is 6833.16 K.

Figure 2.OOB(Out-Of-Bag)error and decision tree number in the random forest.

Figure 3.Distribution of the three physical parameters T eff(effective temperature),and log g(surface gravity),and Fe/H(chemical abundance)from A-type stellar spectra published by LAMOST.

Figure 4.Checking the T eff and log g with both on the HR(left panel)and Keil(right panel).Red dots in both panels represent A-type stars with the parameter estimated through line indices.

2.2.Removing Contamination of Negative Index Values

To obtain a robust relationship between stellar atmospheric parameters and spectral features for A-type stars,a clear sample without affection emission lines from stellar disks or exchange of material between binaries is necessary.We checked the line indices of A-type stars released by LAMOST and remove those spectra having negative index values.

3.Random Forest Prediction Analysis

The random forest(RF)algorithm,which belongs to the ensemble learning method in machine learning,is a combination of supervised prediction models.It can handle highdimensional data sets with good advantages and hold thousands of input variables.The model can output the importance of variables and establish a model for setting the variables of the data set.All decision trees depend on the corresponding random vectors.All the vectors are independent and identically distributed,and the most important variables are determined by reducing the dimensionality.Finally,the results of the classi fication tree are summed,and the accuracy of the prediction model is improved.Even with a large number of missing data,RFs can also maintain accuracy.

3.1.Random Sampling in the Whole Dataset

From the total A-type data set of around 80 thousand spectra described in Section 2.1,we randomly sample the data set to train the model.Section 3.3 will introduce the method for calculating the distance between different data points through an RF,thus realizing the regression.When the data set is not veri fied,the outside prediction error can be calculated,the category corresponding to the sample points that are not used when the tree is generated can be estimated by the spanning tree,and the outside prediction can be obtained by comparing with the real category.

3.2.Normalization

Before establishing the RF model,we remove the pseudocontinuum of each spectrum to keep spectral lines.We use a ninth-order polynomial to fit each spectrum,removing those points outside 3σfrom the fitted curve,and iteratively repeat the fitting four times.Then the intensity of each spectrum is recti fied by dividing the observed spectrum by the pseudocontinuum.

3.3.Random Forest Algorithm

All vectors in the RF are independent and identically distributed.Random forests are randomizations of column variables and row observations of data sets,generating multiple classi fication numbers.Finally,the results of classi fication trees are aggregated.Compared to neural networks,RFs reduce computation and improve prediction accuracy.Moreover,this algorithm is not sensitive to multicollinearity,and it issuf ficiently robust to process missing data and nonbalanced data.

The RF algorithm for prediction and regression mainly includes N randomly selected sample units from the original data to generate decision or regression trees,and m<M randomly selected variables at each node as the candidate variables of the segmentation node.The number of variables at each node should be consistent.The full wavelength spectra as input of the RF and the results of each decision or regression tree are integrated to generate predicted values.In the training process,multiple decision trees will be generated,and each decision tree will produce a corresponding prediction output according to the input data set.The number of decision trees is a key parameter in the RF algorithm,the larger the number of decision trees,the better the regression results,the longer time consumption.In this work,we used 3800 decision trees as well as the number of input spectral data points.The remaining parameters were set to the default values.

The out-of-bag(OOB)error—which is an unbiased estimate of the generalization error whose result approximates the K-th tree fold cross-validation which requires additional computation—and the decision tree number in the RF are shown in Figure 2.The number of trees is about 500 to realize the regression.The difference for each split is less than 1.Mean of squared residuals is 4926.627,in addition,Var value is 96.57,which comply with the requirements of Section 3.

We rank the wavelength according to the importance of the parameters and then identify the spectral lines where the first 30 feature points forTeff,loggand[Fe/H]are located by searching for the line table from Moore et al.(1966).The details are listed in Table 1.We only listed the main elements contained in spectral lines with low-resolution.The first column lists the feature ID.In order to make the table more concise,features that fall on the same absorption line are placed in the same entry.The second column shows the name of the line in which feature points are located.The third column lists the vacuum wavelength corresponding to each spectral line.The fourth column shows the importance of the corresponding feature determined with the RF algorithm.

Table 1 Identi fication of Elements Sensitive to Parameters based on the Location of the First 30 Feature Points

Table 2List of Three New De fined Line Indices for Parameter Regression

Table 3Effective Temperature T eff as Predicted by Random Forest Algorithm with Three New Indices as Input

As listed in Table 1,we group the conjuncted wavelengths as spectral features.To obtain the most sensitive lines to three parameters,we consider top one or two features for each parameter.Then,we de fined three most important features,Ca II K at 3933?,blended feature of Co I,Mn I,Cr I,and V I lines ranging from 4109 to 4112?,and Sr II at 4077?.The detailed de finitions are listed in Table 2,including the feature name,index bandpass,and two sidebands.

3.4.Random Forest with New De f i ned Line Indices

In the RF algorithm,each tree grows to its maximum extent,and there is no branch-pruning process.Using training data that perform better in regression analysis can result in improved learning model characteristics.In this step,we made an RF temperature model using Ca II K,Blended Co I Mn I Cr I V I,and Sr II as input rather than using full spectra,the effective temperature is predicted as shown in Table 3.

4.Veri fication with SVR Algorithm

SVR is one of the best regression algorithms that focuses on handling overall error and tries to avoid outlier issues better than algorithms like linear regression.SVR builds a hyperplane in an N-Dimensional vector space,where we aim to keep data points inside the hyperplane for regression.We tried the SVR algorithm using the software package Sklearn with the newly de fined line indices as input.Comparing with LAMOST stellar parameter catalog,the precision is 123 K forTeff,0.32 dex logg,and 0.28 dex for[Fe/H]respectively.

4.1.Veri f i cation with Gaia Data

We cross match our sample with Gaia using Topcat to obtain parallax of these A type stars,and then calculate their absolute magnitudes.We plot them on both the HR and Keil diagrams to verify the regression results shown in Figure 4.

4.2.HR Diagram of A-type Stars

A schematic representation of how rotation affects the position of a star in the HR diagram,shown as Figure 4.In any case,a rotating star generally appears to be above the main sequence.Rotation displaces a star in the HR diagram.Consider a star seen in the equatorial plane.If it were possible to increase this star’s rotational velocity,we would see it move to the right and down,which toward cooler temperatures and lower luminosities.On the other hand,a star seen pole-on toward higher luminosities would move generally upwards in the HR diagram.Neither of these paths is necessarily parallel to the main sequence,and so a rapidly rotating main-sequence star,no matter the orientation,tends to lie above the main sequence.The A-type and early F-type stars have detected subtle differential effects in the spectra and photometry of rapid rotators,even those that are seen pole-on.

5.Discussion

Because line index would not be seriously affected by noise,it is a good feature representation of stellar spectra especially with low S/N ratio.In this work,we re-de fine a line index system using the RF algorithm.We apply the system in the LAMOST DR7 and get very good prediction performance.The indices are veri fied with SVR,and the correctness is veri fied by using Gaia data.The result shows that the RFs are a very useful tool for feature extraction dealing with high-dimensional data.For unbalanced data sets,RFs provide an effective way to balance data set errors to achieve balanced errors.Using our newly de fined line index system for A type stars to predict the stellar parameters of A-type stars,we can avoid the effect of interstellar extinction and degeneration of parameters.

Acknowledgments

We are very grateful to the anonymous referee for many useful comments and suggestions.This work was funded by the Joint Research Fund in Astronomy(Grant No.U2031142)under cooperative agreement between the National Natural Science Foundation of China(NSFC)and Chinese Academy of Sciences(CAS);the National Science Foundation for Young Scientists of China(Grant No.11803013)and Technology Innovation Center of Agricultural Multi-Dimensional Sensor Information Perception,Heilongjiang Province.

This research uses data obtained through the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST),which is funded by the National Astronomical Observatories,Chinese Academy of Sciences.


登錄APP查看全文

主站蜘蛛池模板: 亚洲精品少妇熟女| 国产亚洲成AⅤ人片在线观看| 日本精品影院| 97在线免费| 狠狠做深爱婷婷综合一区| 国产丰满大乳无码免费播放| 波多野结衣在线se| 五月天久久综合| 国产呦视频免费视频在线观看| 国产免费怡红院视频| 亚洲人成网站在线播放2019| 男女猛烈无遮挡午夜视频| 亚洲色图欧美| h网站在线播放| 亚洲精品福利网站| 青草精品视频| 日日噜噜夜夜狠狠视频| 99久久精品无码专区免费| 老司机午夜精品网站在线观看| 国产全黄a一级毛片| 欧美精品黑人粗大| 91国内在线观看| 欧美日本不卡| 日本五区在线不卡精品| 亚洲精品桃花岛av在线| 欧美日韩精品一区二区在线线| 中文国产成人精品久久| 亚洲一区二区无码视频| 一级毛片免费不卡在线| 好久久免费视频高清| a亚洲视频| 精品免费在线视频| 一级不卡毛片| 亚洲日韩精品伊甸| 亚洲第七页| 波多野结衣视频网站| 国产一级无码不卡视频| 亚洲日韩国产精品综合在线观看| a毛片免费看| 国产精品男人的天堂| 一本久道久综合久久鬼色| 色偷偷一区二区三区| 久久精品国产免费观看频道| 国产精品专区第1页| 老司机精品99在线播放| 日韩精品一区二区三区大桥未久| 色呦呦手机在线精品| 青青草原国产| 思思热精品在线8| 91区国产福利在线观看午夜| 久久婷婷六月| 中文字幕av一区二区三区欲色| 欧美日韩国产精品va| 免费A∨中文乱码专区| 18禁色诱爆乳网站| 国产美女在线观看| 在线观看亚洲精品福利片| 久久性视频| 亚洲三级成人| 亚洲国产在一区二区三区| 国产男人天堂| 呦女亚洲一区精品| 91娇喘视频| 国产精品亚洲专区一区| 美女一区二区在线观看| 美女国内精品自产拍在线播放| 成人在线观看不卡| 国产女人在线视频| 国产精品自在自线免费观看| 亚洲欧美日韩久久精品| 国产成人凹凸视频在线| 五月激情综合网| 中文字幕在线看| a亚洲天堂| 国产欧美日韩va另类在线播放 | 一级片免费网站| 国产另类视频| 在线亚洲小视频| av在线5g无码天天| 久久国产亚洲偷自| 黄色在线网| 欧美日韩免费在线视频|