999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Q-learning-based energy transmission scheduling over a fading channel

2021-01-12 11:24:26WangZhiweiWangJunboYangFanLinMin

Wang Zhiwei Wang Junbo Yang Fan Lin Min

(1 School of Cyber Science and Engineering, Southeast University, Nanjing 210096, China)(2 School of Information Science and Engineering, Southeast University, Nanjing 210096, China)(3 School of Science, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)

Abstract:To solve the problem of energy transmission in the Internet of Things (IoTs), an energy transmission schedule over a Rayleigh fading channel in the energy harvesting system (EHS) with a dedicated energy source (ES) is considered. According to the channel state information (CSI) and the battery state, the charging duration of the battery is determined to jointly minimize the energy consumption of ES, the battery’s deficit charges and overcharges during energy transmission. Then, the joint optimization problem is formulated using the weighted sum method. Using the ideas from the Q-learning algorithm, a Q-learning-based energy scheduling algorithm is proposed to solve this problem. Then, the Q-learning-based energy scheduling algorithm is compared with a constant strategy and an on-demand dynamic strategy in energy consumption, the battery’s deficit charges and the battery’s overcharges. The simulation results show that the proposed Q-learning-based energy scheduling algorithm can effectively improve the system stability in terms of the battery’s deficit charges and overcharges.

Key words:energy harvesting; channel state information; Q-learning; transmission scheduling

With the rapid development of the IoTs, energy harvesting has been regarded as a favorable supplement to drive the numerous sensors in the emerging IoT[1]. Due to several key advantages such as being pollution free, having a long lifetime, and energy self-sustainability, the energy harvesting systems (EHSs) are competitive in a wide spectrum of applications[2].

The EHS generally consists of an antenna either separating or shared with data communications, an energy harvesting device (EHD) converting the RF signal from energy sources (ESs) to power, and a battery that stores the harvested energy[3]. According to different ESs, the RF-based energy harvesting system can be classified into two categories: EHS with ambient ESs and EHS with a dedicated ES[3].

Recent research of the EHS mainly focuses on how to effectively utilize energy from ambient or dedicated ESs[4-6]. In Ref.[4], an energy neutrality theorem for EHN was proposed and it was proved that perpetual operation can be achieved by maintaining the energy neutrality of EHN. Then, an adaptive duty cycle (ADC) control method was further proposed in order to assign the duty cycle online to achieve the perpetual operation of EHN. In Ref.[5], a reinforcement learning-based energy management scheme was proposed to achieve the sustainable operation of EHN. In Ref.[6], a fuzzyQ-leaning-based power management scheme was proposed for EHN under energy neutrality criteria. To achieve the sustainable operation of EHN, the duty cycle is decided from the fuzzy inference system for the EHN. In fact, all the research managed to adjust power in the EHS with ambient ESs to maximize the utilization of the harvested energy. However, due to the lack of the contact between the ESs and EHDs, the energy transmission period in the EHS with ambient ESs are more uncontrollable and unstable. However, in the EHS with a dedicated ES, the progress of energy transmission can be scheduled effectively due to the dedicated ES which is installed to power the EHDs. Hence, some research began to focus on the EHS with a dedicated ES. In Ref.[3], a two-step dual tunnel energy requesting (DTER) strategy was proposed to minimize the energy consumption at both the EHD and the ES on timely data transmission. However, these existing strategies did not consider the exhaustion or overflow of the battery’s energy during the transmission. Hence, this paper will concentrate on the online energy management strategies to improve system stability in terms of the battery’s deficit charges and overcharges.

In this paper, aQ-learning-based energy transmission scheduling algorithm is proposed to improve the EHS with a dedicated ES. Based on the basic theories of theQ-learning algorithm[7], an energy transmission scheduling algorithm is used to decrease energy consumption through adjusting transmitted energy. By using the energy scheduling scheme in this paper, the EHS can adjust the transmitted energy of ES timely and effectively to change the energy consumption. First, the system model of the EHS is presented in detail. Then, a multi-objective optimization problem is formulated to improve system performance in terms of the battery’s deficit charges and overcharges. Next, aQ-learning-based scheduling algorithm is proposed for the optimization problem. Finally, the simulation results and conclusions are presented, respectively.

1 System Model

Consider an RF-based EHS, where the EHD requests and harvests energy from the ES, as shown in Fig.1. The harvested energy stored in the EHD’s battery is consumed to send out data. Moreover, the system time is assumed to be equally divided intoNtime slots andTn(1≤n≤N), the duration of time slotnis constant and selected to be less than the channel coherence time. Therefore, the channel states remain invariant over each time slot but vary across successive time slots. Assume that the fading of the wireless channel follows a correlated Rayleigh fading channel model[8]. Using the ellipsoidal approximation, the CSI can be deterministically modeled as[9]

gn=hn10-vn/10

(1)

wherevnis the uncertain parameter andθdenotes the uncertainty bound which is a non-negative constant;gnandhndenote the actual and estimated channel gains at time slotn, respectively.

Fig.1 The energy-harvesting system

Vn=Vm(1-e-t′/(RC))

(2)

(3)

(4)

(5)

wheret′ is the time consumed during charging the voltage of the battery from 0 toVnwithVmvolts of voltage;Vmis the maximum voltage that the battery can approach.RandCare the resistance and capacitance of the charging circuit in EHD, respectively. Eq.(2) represents that the battery needs to spend timet′ on voltage changing from 0 toVnand Eq.(3) represents that the voltage changes fromVntoVn+ΔVnafter energy harvest at time slotn. Eq.(4) and Eq.(5) reflect the relationship between the battery’s voltage and stored energy. Using Eq.(2) to Eq.(5), the charge duration can be derived as

(6)

(7)

wherepthdenotes the charge power of a battery.

(8)

(9)

whereηrepresents the conversion efficiency of a battery.

2 Problem Formulation

(10)

whereυrepresents the minimum capacity percentage of the battery that can keep EHD normally. Meanwhile, due to the limitation of the storage size, the overflow of the battery’s energy will occur when the received energy is too large. Therefore, how to avoid overcharges of the battery should be taken into account as well. The condition of the battery’s overcharge at time slotncan be described as

(11)

In most cases, it is unlikely that the three objectives can simultaneously be optimized by the same solution. Therefore, some tradeoff between the above three objectives is needed to ensure satisfactory system performance. The most well-known tradeoff method is the weighted sum method[11]. Accordingly, the multi-objective optimization problem can be converted into the following minimization problem,

(12)

whereE(·) is the expectation operator;I(·) is an indicator function and is used to show the occurrence of overcharges or deficit charges;τandμare two small positive constants, which are used to adjust the weight of deficit charges and overcharges of the battery during the optimization.

3 Online Scheduling Algorithm

3.1 State

Channel state and residual battery energy are continuous variables, which should be converted into discrete and finite. Therefore, we divided the ranges of the continuous variable into several intervals. If different variables are located in the same interval, they are regarded the same. To distinguish these intervals, we use continuous natural numbers to label them and these numbers can be regarded as different states.

In the proposed scheduling scheme, the channel states are assumed to be discrete and finite.Without loss of generality, the range of the estimated channel gain can be divided intoDstates. The states can be defined as

(13)

where 0<ω1<ω2<…<ωD-1. Therefore, at time slotn, the channel state can be determined as

(14)

Similarly, the residual battery energy, which is also assumed to be discrete and finite, can be divided intoEstates as follows:

(15)

(16)

Using the residual energy and channel states, the current composite state of the system is defined in a vector as

Sn={Hn,En}{1,2,3,…,D}×{1,2,3,…,E}

(17)

Eq.(17) represents that every state can be mapped into the only combination ofHnandEn.

3.2 Action

(18)

3.3 Cost

In the optimization problem Eq.(12), the objective is to save energy consumption, avoid overflow of a battery’s energy and prevent a battery from draining. Therefore, the total cost is determined as

(19)

As different circumstances have different QoS requirements, by adjustingμandτ, the reward function is generic enough to satisfy different requirements in real systems.

3.4 Action selection

Using the states, actions and cost functions defined above, the received energy at time slotncan be selected by

(20)

After selecting the proper action, the next state of battery energyEn+1can be determined by Eq.(8) and Eq.(16). Also, the next channel stateHn+1can be obtained by Eq.(14). Hence, combined with the information ofEn+1andHn+1, the next stateSn+1is determined as well. Accordingly, matrixQwill be updated as

(21)

whereαis the time-varying learning rate parameter;γis the discount factor. The detailed procedures of the algorithm are shown in Algorithm 1.

Algorithm1The Q-learning-based scheduling algorithm

Step1Initialization.

Step2If rand()<ε, randomly select an action fromAn. Else, select an action using Eq.(19).

Step3Calculate the cost using Eq.(18) and then determine next stateSn+1.

Step4UpdateQby Eq.(20).

Step5n=n+1 , then go to step 2.

4 Simulation and Results

Under the same simulation environments, the proposed algorithm is compared with the constant strategy algorithm and the on-demand dynamic strategy algorithm[3]in terms of the battery’s deficit charges, the battery’s overcharges and the total consumed energy. The proposed algorithm and the reference algorithms are, respectively, deployed at most 100 times in one trial, and the trial is repeated 1 000 times. In other words, the ES transmits energy to EHD in each trial, which will not stop unless the battery’s energy is exhausted or transmission is carried out more than 100 times. After trials are completed, the data from simulations will be collected to analyze the performance of the algorithms.

4.1 Simulation settings

4.2 Performance comparison

For comparison purpose, the reference algorithms are described as follows.

Fig.2 shows the performance comparison between the proposedQ-learning algorithm and reference algorithms. In Fig.2(a), it is noted that theQ-learning algorithm achieves an excellent performance in terms of the battery’s deficit charges. As the reference algorithms do not consider the effect of the battery’s deficit charges, the battery’s energy cannot be prevented from becoming exhausted during trials and the occurrence of the battery’s deficit charges increases with the trials’ continuation. In Fig.2(b), the preference algorithms outperform theQ-learning algorithm slightly in the overcharges. The reason is that both the constant strategy and on-demand strategy algorithms have considered the restriction of overcharges so that the overflow of the battery’s energy never occurs during trials. In Fig.2(c), both the reference algorithms consume less energy than theQ-learning algorithm, but this consequence is based on the degradation in its performance of the battery’s deficit charges. To sum up, although theQ-learning algorithm seems to consume more energy than the reference algorithms, it actually provides better system stability during the energy transmission period.

For theQ-learning algorithm, the size of action space can be an important factor that influences algorithm performance. To verify how action space size affects algorithm performance, the simulations of theQ-learning algorithm with different action space sizes are executed under the same simulation environment. The results are shown in Fig.3.

(a)

(b)

(c)

Fig.3 The averaged energy consumption of the Q-learning algorithms with different sizes of action space

Assume that the size of state space is kept at 10 during simulations. It can be seen that a large action space will result in longer convergence time[12], which is also demonstrated in Fig.3. Through the accumulated information of multiple iterations, the information of CSI will be obtained. In other words, theQ-learning algorithm spends time in learning before the first 20 trials. In the practical, for the first 20 trials, the system is in the progress of learning, and thus the derived results are not optimal. After the first 20 trials of learning, the system can grasp the best strategy of all the states and the averaged energy consumption of the ES converges to a constant value. In addition, the action space never becomes as large as possible. If the action space is large enough to obtain the optimal averaged energy consumption, a larger action space will only extend the convergence time without reducing energy consumption.

5 Conclusions

1) The proposedQ-learning algorithm can solve the proposed issue and achieves acceptable system performance over different Rayleigh fading channels in terms of energy consumption, a battery’s deficit charges and overcharges.

2) Compared with the two reference algorithms, theQ-learning algorithm shows a significant advantage in avoiding a battery’s energy from becoming exhausted. From the practical view, it is worthwhile to sacrifice performance in energy consumption in exchange for better system stability.

3) The size of action space can affect theQ-learning algorithm’s performance. A small action space causes a shorter convergence time, but cannot converge to the optimal solution. In fact, theQ-learning algorithm with a larger action space can effectively reduce energy consumption during a long time energy transmission.

主站蜘蛛池模板: 天天色综合4| 一级毛片a女人刺激视频免费| 国产精品成人啪精品视频| 精品国产91爱| 丁香六月激情综合| 国产成人精品优优av| 久久精品丝袜| 日韩精品一区二区三区大桥未久| 日韩无码视频专区| 国产在线视频二区| 她的性爱视频| 99精品视频九九精品| 国产在线视频福利资源站| 国产香蕉在线视频| 一级在线毛片| 亚洲自拍另类| 亚洲中文字幕在线一区播放| 国产精品jizz在线观看软件| 国产一级视频在线观看网站| 99精品国产电影| 国产精品漂亮美女在线观看| 日韩国产精品无码一区二区三区| 一级不卡毛片| 四虎影视无码永久免费观看| 中文字幕2区| 国产成人综合日韩精品无码不卡| 色偷偷一区| 丁香五月婷婷激情基地| 色成人亚洲| 91人人妻人人做人人爽男同| 欧美黑人欧美精品刺激| 欧美国产在线看| 无码日韩精品91超碰| 国内熟女少妇一线天| 五月婷婷丁香色| 中文字幕亚洲专区第19页| 免费A∨中文乱码专区| 亚洲日产2021三区在线| 亚洲天堂久久| 精品无码视频在线观看| 国产电话自拍伊人| 精品久久蜜桃| 国产系列在线| 蜜桃视频一区二区| 国产91特黄特色A级毛片| 色欲不卡无码一区二区| 国产精品久久久精品三级| 日韩中文字幕亚洲无线码| 91国内视频在线观看| 午夜a视频| 日韩精品成人网页视频在线| 亚洲综合狠狠| 永久免费无码成人网站| 久久人体视频| 福利视频99| 福利一区三区| 欧美亚洲综合免费精品高清在线观看| 国产精品视屏| 日韩美毛片| 日本午夜三级| 亚洲精品第一页不卡| 性喷潮久久久久久久久| 婷婷亚洲天堂| 欧洲亚洲一区| 免费A∨中文乱码专区| 成人午夜久久| 怡红院美国分院一区二区| 黄色福利在线| 尤物成AV人片在线观看| 91福利一区二区三区| 久久精品国产亚洲麻豆| av一区二区无码在线| 真实国产乱子伦视频| 欧美三级日韩三级| 国产精品国产三级国产专业不 | 午夜精品影院| 色网在线视频| 日本不卡视频在线| 亚洲水蜜桃久久综合网站| 亚洲国产日韩视频观看| 成人免费一区二区三区| 国产97视频在线观看|