999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

2008-04-12 00:00:00YIYunfeiLINFangQINJun
現代電子技術 2008年10期

Abstract:Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules.The paper gives the method to encode for the rules,the fitness function,the selecting,crossover,mutation and migration operator for the DAGA at the same time are designed.

Keywords:classification rules;adaptive genetic algorithm;information gain;distributed genetic algorithm

CLC:TP183Document Code:B

No:[WTBZ]1004-373X(2008)10-132-04

1 Introduction

The data classification is a process of giving data object partition according to the features of group of data objects,and has widely been researched in machine learning,statistics,neural network,expert system,and etc.And mining the classification rules is also an important task in the course of Knowledge Discovery in Databases (KDD).

Genetic Algorithm [1-7] is an overall random searching method based on the theory of evaluation and molecular genetics.In this paper,we study the key technology to the implementation of genetic algorithm and put forward a particular scheme of how to determine the parameters and operations when classify,including individual encoding,fitness function design,GA operators design,and so on.But the classification rules mining methods based on traditional genetic algorithm have drawbacks as follows: producing too much uninteresting classification rules; redundant rules are too much in the optimized population; and the classification accuracy isn′t high.This paper proposes a classification rules mining method based on Distributed and Adaptive Genetic Algorithm (DAGA),which can overcome above drawbacks.

Furthermore,in order to assure that the acquired classification rules to be accuracy as well as comprehensible,we put forward a method to compute the comprehensibility of the classification rules using attribute information gain,different to other methods which evaluate the comprehensibility of the rule only by its simplicity.Thus the exports of the classification rules which are mined are more understandable and informational.

2 Use Attribute Information Gain to Compute the Comprehensibility of the Classification Rules

2.1 Concept of Information Gain

Given random variation x=xi (i=1,2,…,n),where probability of xi is pi,and it has ∑ni=1pi=1,then the definition of entropy of x is:



H(x)=-∑ni=1pilog pi(1)



The definition of the above formula is called Shannon entropy.In the formula,when we take the log to the base 2,the unit of the entropy is a bit and is called an engineering unit.

[BT(3+1]2.2 Attribute Information Gain and Its Computation Method

Information gain contains classified information about the value the presence of large amounts of information.The higher the attribute information gain the information required less used to classify the results as possible.Therefore,the attribute of the information gain in the classification rules is as high as possible.Information gain calculated as follows:

Given S is a set of sample data set,assuming different types of labeling attribute value n,definition different n types of Ci.(i=1,2,…,n).Assuming Si is located a few samples of the type of Ci,classification of samples required for a given expectations of the information given by the following formula:



I(S1,S2,…,Sn)=-∑ni=1pilog pi(2)



Give a set attributes with different values V{a1,a2,…,av},and A can be divided into sub-sets V{S1,S2,…,Sv},where includes Sj such samples in A with a value of aj.Sij is located subset of the sample Ci in Class Sj,A sub-divided into n subset by the entropy or the expectation information given by the following formula:



E(A)=∑vi=1S1j+…+SnjSI(S1j+…+Snj)(3)



The information gain of the A attribute is:



Gain(A)=I(S1,S2,…,Sn)E(A)(4)

2.3 Comprehensibility of the Classification Rules Based on Information Gain

In this paper,we define the comprehensibility of the classification rules.The comprehensibility is that a rule which can offer users more relevant to the field of knowledge and information,but not decided by the length of the rules.According to this view,we have introduced the concept of information gain attributes to construct the rules easier to understand,measure the fitness function.We define the comprehensibility of the classification rules based on the information gain as follows:



comprehensibility=∑li=1Gain(Ai)∑nj=1Gain(Aj)(5)

Where l is the length of the rule,i.e.the number of feature attributes.∑li=1Gain(Ai) symbolizes a result which is plus all the information gain of the characteristics classification rules.∑nj=1Gain(Aj) symbolizes a result which is plus all the information gain of characteristicsattributes which are used to classify the future data objects.

3 The Distributed and Adaptive Genetic Algorithm

3.1 Genetic Algorithm

Genetic algorithm [1,2] is stochastic search methods that mimic the metaphor of natural biological evolution.Genetic algorithm operates on a population of potential solutions applying the principle of survival of the fittest to produce better and better approximations to a solution.At each generation,a new set of approximations is created by the process of selecting individuals according to their level of fitness in the problem domain and breeding them together using operators borrowed from natural genetics.This process leads to the evolution of populations of individuals that are better suited to their environment than the individuals that they were created from,just as in natural adaptation.

Genetic algorithm models natural processes,such as selection,crossover,mutation,migration.Genetic algorithm works on populations of individuals instead of single solutions.In this way the search is performed in a parallel manner.

At the beginning of the computation a number of individuals are randomly initialized.The fitness function is then evaluated for these individuals.The initial generation is produced.

If the optimization criteria are not met the creation of a new generation starts.Individuals are selected according to their fitness for the production of offspring.Parents are recombined to produce offspring.All offspring will be mutated with a certain probability.The fitness of the offspring is then computed.The offspring are inserted into the population replacing the parents,producing a new generation.This cycle is performed until the optimization criteria are reached.

3.2 Distributed and Adaptive Genetic Algorithm

In the design of genetic algorithm the selection of mutation probability and crossover probability is of great importance.If the sample of the mutation probability is smaller,the new born individual will have a weaker ability and a less population variety and even easy to be early-mature.If the sample is bigger,it will destroy the fine modular in the current generation and make the genetic algorithm be similar as the random searching.If the sample of the crossover probability is smaller,the efficiency of searching will be low and even stagnation.If the sample is bigger,it will destroy the structure of the fine chromosome in the current generation.Srinvivas[7] proposed adaptive genetic algorithm.This paper presents a mutation probability broad wise adaptive turning algorithm based on individual fitness,for the individual which has a big fitness and contains many fine modular should reduce its respectively probability; for the individual which has a small fitness should increase its mutation probability.Generally,in the adaptive genetic algorithm,crossover probability (Pc) and mutation probability (Pm) are usually adaptive by the formula as follows:

 

Pc=k1(fmax-f′)fmax-favg,[WB]f≥favg

k2,f

Pm=k3(fmax-f)fmax-favg,[WB]f≥favg

k4,f



Where fmax symbolizes the highest fitness in the population,favg symbolizes the average fitness in each generation,f′ symbolizes the fitness of the two chromosomes which are going to crossover,and f symbolizes the fitness of the chromosome which is going to mutation.

Reiko Tanese[3] proposed the distributed genetic algorithm as a way of efficiently paralleling the canonical genetic algorithm.The global population is divided into several subpopulations,one per processor.Each processor runs the canonical genetic algorithm independently on its subpopulation.The only inter-processor communication occurs during the migration phase,which takes place at a regular interval: A fixed proportion of each subpopulation is selected and sends to the neighboring processor.In return,the same numbers of migrants is received from some other subpopulation and replace individuals selected according to some criteria.This migration can occur either asynchronously or after all of the processor have been synchronized.Each processor then resumes running the canonical genetic algorithm as before,until the next migration phase.

A single population genetic algorithm are powerful and perform well on a wide variety of problems,and the distributed genetic algorithm can improve the efficiency,and the adaptive genetic algorithm can maintain the diversity of the population and also can ensure the convergence.However,better results can be obtained by the distributed and adaptive genetic algorithm.This paper combines the advantages of them,based on distributed genetic algorithm,separating the global population into many sub-populations,each executes the adaptive genetic algorithm independently,choosing the promising individuals periodically and migrate them to different sub-populations.Then shows the algorithm of such a distributed and adaptive genetic algorithm.

procedure: Distributed and Adaptive Genetic Algorithm

input: GA parameters

output: best solution

begin

t←0;//t: generation number

initialize P(t) by encoding routine;//P(t):population of chromosomes

fitness eval(P) by decoding routine;

while(not termination condition) do

crossover P(t) to yield C(t);//C(t):offspring

mutation P(t) to yield C(t);

fitness eval(P) by decoding routine;

iff′≥favg  // f′: fitness of the two chromosomes which are going to crossover

then increase Pc for next generation;//Pc: crossover probability

iff≥favg  // f:fitness of the chromosome which is going to mutation

then increase Pm for next generation;//Pm: mutation probability

select P(t+1) from P(t) and C(t);

migrate P(t+1) to neighboring process;

t←t+1;

end

output best solution;

end

4 Use the DAGA to Mine the Interesting Classification

Rules

The method to mine the classification rules right now is the way based on the traditional greedy algorithm.The genetic algorithm is an adaptive optimized-overall probability searching algorithm.Comparing with the two optimizing methods like the least square method and escalator method needn′t the genetic algorithm the gradient information of the object when solving the problem of non-linear-optimization.Because it also can avoid sinking into the smallest order of the part,therefore the genetic algorithm has a wide adaptable scope and a strong robustness.At present genetic algorithm has already been widely applied to many areas because of its simplicity,prevalence,robustness,and so on.

4.1 Encoding

We use the encoding method of the Michigan′s,which is each of the single classification rules corresponding with a database records.The other hand,we have adopted for similar evolution from the initial group of individuals.As the individuals belong to the same type of groups,we just encoding for the premise of the classification rules but not encoding for the conclusion rules.It will get the corresponding classification rules when run the program each time.

As the Table 1 shows,we dispose the records of the database as follows:

A database records the basic carrier of genetic encoding into chromosomes,feature attributes Ai became coding genes.Each gene consists of four fields,namely the right range from weight(Wi),operator(Oi),value(Vi),to information gain(Gi).Where Wi symbolizes the weight,it takes on the characteristics of the current value of the individual attributes of a few percentage of the total number of individuals.Oi symbolizes the attribute associated with the operator of discrete attributes,using=and≠symbols or continuous nature,using ≤and> symbols.Vi is the attribute domain,using the binary coding method for discrete attributes.To the median value of the discrete binary attribute,we use the 1 symbolizes take corresponding attribute,and 0 symbolizes not corresponding values.Gi symbolizes the attribute information gain of the rules.Each attribute information gain in the value of pre-implementation genetic algorithm has been well calculated and stored in the individual.

4.2 Design Fitness Function.

As we known,there are 4 kinds′ rules in the database as shown in Table 2.

In order to design the fitness function,we define some formula as follows.



S(r)=n1n2n1n2+n1n2(8)

Cov(r)=n1n2n1n2+n1n2(9)

accuracy=S(r)*Cov(r)(10)



In order to mine the interesting classification rules we design the fitness function as follows:



f(w1,w2)=w1*accuracy+w2*comprehensibility(11)



Where the w1 and w2 not only satisfied with w1∈[0,1] and w2∈[0,1],but also satisfied with w1+w2=1.

4.3 Selection Operator

The simplest selection scheme is roulette-wheel selection,also called stochastic sampling with replacement.This is a stochastic algorithm and involves the following technique:

The individuals are mapped to contiguous segments of a line,such that each individual′s segment is equal in size to its fitness.A random number is generated and the individual whose segment spans the random number is selected.The process is repeated until the desired number of individuals is obtained,and we call this process as mating population.This technique is analogous to a roulette wheel with each slice proportional in size to the fitness.

We choose the scheme of roulette-wheel selection as the selection operator of the DAGA.

4.4 Crossover Operator

During the recombination of binary variables only parts of the individuals are exchanged between the individuals.Depending on the number of parts,the individuals are divided before the exchange of variables.The numbers of cross points distinguish the methods.

This paper uses the method which is called single-point crossover.In single-point crossover,crossover position k∈[1,2,…,N-1],N is the number of variables of an individual,is selected uniformly at random and the variables exchanged between the individuals about this point,then two new offspring are produced.

4.5 Mutation Operator

For binary valued individuals mutation means the flipping of variable values,because every variable has only two states.Thus,the size of the mutation step is just one.For every individual the variable value to change is chosen,mostly uniform at random.

4.6 Migration Operator

The migration model divides the population into multiple subpopulations.These subpopulations evolve independently of each other for a certain number of generations.After the isolation time a number of individuals is distributed between the subpopulation.The number of exchanged individuals that is migration rate,the selection method of the individuals for migration and the scheme of migration determines how much genetic diversity can occur in the subpopulation and the exchange of information between subpopulation.

In neighborhood migration topology,migration is made only between the nearest neighbors.However,migration may occur in either direction between subpopulations.For each subpopulation,the possible immigrants are determined,according to the desired selection method,from the adjacent subpopulations and a final selection is made from this pool of individuals.

5 Conclusions

The comprehensibility,accuracy and the interest of theclassification rules is of importance in the course of Knowledge Discovery in Databases.This paper presents the formula to calculate the comprehensibility based on the information gain of the attributes,and also gives a method to the accuracy of the classification rules,so that we can mine the interesting classification rules easily.In the paper,a new approach for mining the interesting classification rules based on Distributed and Adaptive Genetic Algorithm (DAGA) is put forward.And we give the method to encode for the rules,design the fitness function,and design the selecting,crossover,mutation and migration operator for the algorithm as well.

References

[1]Murata T,Isibuchi H,Gen M.Cellular Genetic Local Search for Multi-objective Optimization[C].Prof.of The Genetic and Evolutionary Computation Conference,2000:307-314.

[2]Ishibuchi H,Yoshida T,Murata T.Selection of Initial Solutions for Local Search in Multi-objective Genetic Local search[C].Proc.of 2002 Congress on Evolutionary Computation,2002.

[3]Tanese R.Distributed Genetic Algorithms for Function Optimization[D].University of Michigan,Ann Arbor,MI,1989.

[4]Joe Suzuki.A Markov Chain Analysis on Simple Genetic Algorithms[J].IEEE Trans.on Sys.Man.and Cyber.,1995,25(4):655-659.

[5]Srinivas M.Adaptive Probability of Crossover and Mutation in Genetic Algorithms[J].IEEE Trans.Syst.Man.Cybern.,1994,24(4):655-667.

[6]Fogarty T C.Evolutionary Computing.Proceedings of AISB Workshop on Evolutionary Computing 1996,Lecture Notes in Computer Science,Berlin,Heidelberg,New York:Springer-Verlag,1996.

[7]Yang Liuyu.An Entropy-based Adaptive Genetic Algorithm for Learning.Classification Rules,2001:790-796.

作者簡介

YI Yunfei born in 1981,from Ziyuan in Guangxi Province,teaching assistant in Hechi College,postgraduate of Computer Scicnce College in South Central University for Nationalities,research in data mining,artificial intelligence and information security.

注:本文中所涉及到的圖表、注解、公式等內容請以PDF格式閱讀原文。

主站蜘蛛池模板: 国产91视频免费| 日本成人精品视频| 特级做a爰片毛片免费69| 就去色综合| 午夜无码一区二区三区在线app| 欧美一区国产| 999国产精品永久免费视频精品久久| 片在线无码观看| 91成人在线免费视频| 欧美精品伊人久久| 在线免费不卡视频| 国产女人综合久久精品视| 亚洲一区二区三区在线视频| 青青青草国产| 日本AⅤ精品一区二区三区日| 亚洲国产精品不卡在线| 日韩欧美国产综合| 97人妻精品专区久久久久| 国产欧美视频在线观看| 四虎影视国产精品| hezyo加勒比一区二区三区| 尤物午夜福利视频| 国产欧美视频综合二区| 丁香五月激情图片| 亚洲人成影视在线观看| 一区二区欧美日韩高清免费| 天天躁狠狠躁| 亚洲福利视频网址| 欧美中文字幕在线二区| 久久久久亚洲精品成人网| 亚洲,国产,日韩,综合一区| 1024你懂的国产精品| a级免费视频| 成人中文字幕在线| 天天色综网| 日韩欧美中文字幕一本| 亚洲色图欧美视频| 综合五月天网| 性视频一区| 制服丝袜一区| 91麻豆精品国产91久久久久| 亚洲综合一区国产精品| 91福利国产成人精品导航| 全裸无码专区| 国产一级毛片高清完整视频版| 国产微拍一区二区三区四区| 一级黄色片网| 这里只有精品国产| 在线免费不卡视频| 亚洲黄网在线| 九月婷婷亚洲综合在线| 啊嗯不日本网站| 精品成人免费自拍视频| 成人福利在线免费观看| 国产欧美性爱网| 久久久久久久97| 国产一级在线观看www色| 色综合网址| h视频在线播放| 午夜精品区| 无码国内精品人妻少妇蜜桃视频| 亚洲高清日韩heyzo| 欧美精品v日韩精品v国产精品| 久久综合伊人 六十路| 亚洲综合色婷婷| 久久国产亚洲偷自| 色综合久久88色综合天天提莫 | 91视频青青草| 国产SUV精品一区二区6| 精品撒尿视频一区二区三区| 国产亚洲视频播放9000| 毛片网站在线看| 婷婷午夜天| 高清欧美性猛交XXXX黑人猛交| 国产精品亚洲天堂| 欧美另类图片视频无弹跳第一页| 成人毛片在线播放| 91精品国产情侣高潮露脸| 一级毛片高清| 国产女人在线| 99ri国产在线| 日韩国产亚洲一区二区在线观看|