Lattice Vector Quantization Applied to Speech and Audio Coding

2012-05-22 03:36:24MinjieXie

ZTE Communications 2012年2期

Minjie Xie

(ZTE USA Inc.,Richardson,TX 75080,USA)

AbstractLattice vector quantization(LVQ)has been used for real-time speech and audio coding systems.Compared with conventional vector quantization,LVQ has two main advantages:It has a simple and fast encoding process,and it significantly reduces the amount of memory required.Therefore,LVQ is suitable for use in low-complexity speech and audio coding.In this paper,we describe the basic concepts of LVQ and its advantages over conventional vector quantization.We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations(SDOs).

Keyw ords Vector quantization;lattice vector quantization;speech and audio coding;transform coding

1 Introduction

V ector quantization is generally much more efficient than scalar quantization,and the efficiency of vector quantization improves as the vector dimension increases[1],[2].With a conventional vector quantizer,however,the computational complexity of the quantization process increases exponentially as the dimension increases,and the storage requirement for the codebook can be very large[1],[3].Lattice vector quantization(LVQ)can overcome this problem.It is a very promising signal compression technique that has advantages over conventional vector quantization[4]-[7].In LVQ,the codebook is a finite subset of a regular-point lattice.Because of the regular structure of the lattice,the nearest codeword to the input vector can be found and indexed very efficiently.Another interesting feature of LVQ is that the codewords do not have to be stored because they can be algorithmically generated.Therefore,LVQ has a simple and fast encoding process and significantly reduces the amount of memory required.These advantages can substantially reduce the implementation complexity of a vector quantizer.Applying entropy coding to the quantization indices can further improve the efficiency of LVQ[8]-[10].

LVQ is suitable for use in low-complexity speech and audio coding.Low computationalcomplexity is especially important in telecommunication applications such as video conferencing and mobile communications.For example,a low-complexity audio codec can free cycles for computationally intensive video coding and other audio processing,such as acoustic echo cancellation in video conferencing systems.Low computational complexity can also extend battery life in portable devices.Various LVQ schemes have been developed for speech and audio coding in the past decades[11]-[17],and some of these are used in the codecs that have been adopted as speech and audio coding standards by internationalstandards developing organizations[18]-[23].

In section 2,we briefly review vector quantization and describe the advantages of LVQ over conventional vector quantization.In section 3,we describe some LVQ techniques used in speech and audio coding standards.In section 4,we summarize LVQ applications presented in this paper.

2 Lattice Vector Quantization

2.1 Vector Quantization

Let x∈RNbe an arbitrary vector in N-dimensional Euclidean space RN.An N-dimensionalvector quantizer Q with L-level is a function that maps the input vector x into a codeword(code vector)yithat is selected from a finite codebook C={y1,y2,y3,...,yL|yi∈RN}.That is,

and

where Viis a partition(Voronoiregion)of RN.The Voronoi region Viis a nearest-neighbor region associated with the codeword yiand is given as

is the minimum-squared error and is used as the distortion measure of quantization.

The codewords yiare usually represented by their indicesi,which are used for transmission or storage.A vector quantizer Q is specified by C,Vi,and the indexing of yi.

The rate R of Q is measured in bits per dimension and is given by

This rate can be used to measure the accuracy of quantization.Vector quantizers are traditionally designed by using K-means or Linde-Buzo-Gray(LBG)clustering algorithms[1],[3].The conventional vector quantization is usually referred to as being either statistical or stochastic.In statistical vector quantization,the clustering algorithm generates a locally optimal codebook based on a large training database that is related to the signal to be quantized.For a given number of codewords,L,the codebook is designed in the following steps:

Step 1.Initialize the codebook with L codewords.The initial codewords can be obtained by first calculating the centroid of the training database then splitting the centroid into L codewords by using the clustering algorithm.

Step 2.Using the minimum-squared error distortion,associate each input vector with a codeword to determine L partitions(Voronoiregions).

Step 3.Calculate the total average distortion for the training database.If the distortion does not vary or varies only very slightly,the final codebook is obtained.Otherwise,continue.

Step 4.Recalculate the centroid of each partition and use the obtained centroids as the new codewords.Then repeat step 2 and step 3.In statistical vector quantization,x must be compared with all L codewords in the codebook to find yithat best matches x in terms of minimum-squared error.The number of codewords is usually L=2NR,and finding the best codeword,that is,the nearest neighbor to x requires(2N-1)2NRadditions,N 2NRmultiplications,and(2NR-1)comparisons.The codebook storage requires a memory of N 2NRunits.The drawback of statistical vector quantization is the high computational complexity for codebook search and the large amount of memory required for codebook storage.Both computational complexity and storage memory required increase exponentially as N and R increase,so it is difficult to improve vector quantization by increasing the dimension or accuracy.This is also an important issue in real-time implementation of statistical vector quantization.

2.2 Lattice Vector Quantization

From a geometric standpoint,a latticeΛNis a regular arrangement of points in N-dimensional Euclidean space RN.From an algebraic standpoint,an N-dimensional latticeΛNis a collection of vectors y that forms a group under ordinary vector addition in RN,that is,

where k1,k2,k3,...,kNare integers,and v1,v2,v3,...,vNare linearly independent vectors in RN[4].

The simplest lattice is the integer lattice ZNin which all the N components of the lattice points are integers.Another important lattice,DN,consists of the lattice points of ZN,which have integer components with an even-component sum:

The Gosset lattice E8is a well-known lattice in 8 dimensions and is defined as the union of the D8lattice and the coset{D8+.The Gosset lattice is given by

The rotated Gosset lattice RE8is given as

LVQ,also called algebraic vector quantization(AVQ),is an efficient vector quantization technique in which a finite subset of lattice points is used as the codebook of quantizer.Because of the regular structure of the lattice codebook,the nearest codeword to a given input vector can be found and indexed very efficiently.Another interesting feature of LVQ is that the lattice codebook does not have to be stored because the codewords can simply be generated using algebraic rules.Therefore,LVQ has two main advantages:It has a simple and fast encoding process,and it significantly reduces the amount of memory required.These advantages can reduce substantially the implementation complexity of a vector quantizer.

Here,we describe the advantage of LVQ in terms of computational complexity.Suppose an input vector in RNlies in the truncated lattice,which is used as the codebook of the quantizer.In LVQ based on the ZNlattice,an N-dimensional vector is quantized by rounding each vector component individually.The fast-quantizing algorithms for LVQ based on DNand E8are described in[24].For DN,quantization of an N-dimensional vector requires(2N-1)additions,(N-1)comparisons,and(N+1)rounding operations.In the case of E8,quantizing an 8-dimensionalvector requires 4(2N-1)additions,2N multiplications,(2N-1)comparisons,and 2(N+1)rounding operations.Table 1 shows the computationalcomplexity for finding the best codeword for the input vector in various 8-dimensional vector quantization schemes.

In this example,computational complexity of statistical vector quantization increases exponentially,and the complexity of the LVQ schemes is constant as R increases.In the case of R=1,statistical vector quantization requires 6143 operations to find the best codeword in the codebook.This computationalcost is much higher than those for LVQ in Table 1.

▼Table 1.Computationalcomplexity for various 8-dimensionalvector quantization schemes

In LVQ there are input vectors lying outside the lattice,which is truncated for a given rate.Hence,additional operations are performed to quantize those input vectors.However,when the quantizer is optimally designed,the number of these vectors can be very small so that computationalcomplexity of LVQ is still quite low.This is true for real-time speech and audio coding applications presented in section 3.

2.3 Voronoi Codes

As defined in(3),the Voronoiregion of a lattice point x consists of all points in RNthat are at least as close to x as to any other lattice point.Given a latticeΛin RN,the Voronoi region around the origin is called the Voronoiregion ofΛand is denoted V(Λ).

A Voronoicode[25]can be defined as

where r=1,2,3,...and a is a small offset vector that can avoid any lattice point on the boundary of the truncated Voronoi region.The code CΛ(r,a)consists of all lattice points in the Voronoiregion V(rΛ)shifted by a[25].If r is a power of 2,that is,r=2R with integer R＞0,the code size is

Λ/rΛ=rN=2NR.Fig.1 shows the Voronoicodes based on the hexagonal lattice A2[4]for R=2 and R=3.

Because there are fast search and indexing algorithms for the root lattices[24],[25],Voronoicodes can be used as codebooks in low-complexity variable-rate lattice vector quantization.

3 LVQ Applied to Speech and Audio Coding Systems

LVQ has not been widely applied to real-time speech and audio coding systems because of several difficulties.These difficulties include truncating a lattice for a given rate in order to create an LVQ codebook that matches the probability density function(PDF)of the input source,quickly translating the codewords of the LVQ codebook into their indices,and quantizing the source vectors“outliers”that are outside the truncated lattice.In this section,we describe two LVQ approaches that have been successfully used in speech and audio coding standards[13],[16]-[23].We describe solutions to the problem of quantizing spectral vectors in transform-based speech and audio coding.

3.1 Embedded Algebraic Vector Quantization

Embedded algebraic vector quantization(EAVQ)is an LVQ application for speech and audio coding[13].It can be used to quantize spectral vectors in transform coding.With EAVQ as the basis,a split multirate LVQ scheme was developed[16]and has been used in several speech and audio coding standards.

3.1.1 Overview of EAVQ

EAVQ is an RE8-based variable-rate vector quantization scheme[13].The points of RE8fall on concentric spheres with radius 2centered at the origin,where r is a non-negative integer that can be used as a natural index for the spheres[26],[27].In EAVQ,the sets of lattice points on the spheres constitute quantizer codebooks,and each codebook consists of the lattice points inside a specific sphere.The EAVQ quantizer has five subquantizer codebooks,Q1,Q2,Q3,Q4,and Q5,from low rate to high rate,and these are spherically embedded.A high-rate subquantizer codebook contains those of the lower-rate subquantizers.The subquantizers have codebook sizes of 16(4 bits),256(8 bits),4096(12 bits),65,536(16 bits)and 1,048,448(20 bits),respectively.Qnis a 4n-bit codebook and comprises 24ncodewords.Additionally,the origin vector Q0=[0,0,0,0,0,0,0,0]is used as a codeword in EAVQ.Fig.2 shows the structure of the EAVQ codebooks.

The codewords of the codebooks are generated from appropriate permutations of the components of“leaders.”A leader is a vector whose components are arranged in descending order.The nearest neighbor of an input vector is easily found by permuting the leaders,and the index of the codeword is obtained by calculating its rank according to an algebraic method[27].In EAVQ,only the leaders and some parameters are stored as a lookup table for generating and indexing the codewords.This lookup table requires a small number of memory units.

▲Figure 1.Voronoicodes based on the hexagonallattice A 2.

?Figure 2.Structure of theEAVQcodebooks.

An arbitrary vector in 8 dimensions,denoted x=(x1,x2,x3,x4,x5,x6,x7,x8),is quantized in the following steps[13]:

Step 1.Find the nearest neighbor y of x in RE8by using the fast quantizing algorithm in[24].

Step 2.Reorder the components of y in descending order and then find its leader y0in a lookup table set up in advance.

Step 3.Scale down x with a predefined scaling factor when x lies outside the largest codebook Q5,then repeat steps 1 and 2 until a codeword is found in Q5.

Step 4.Compute the rank t of y as described in[27].

Step 5.Find the nearest neighbor y'of x in Q1when x is near the origin Q0.Then select between y and y'the lattice point closest to x in terms of the mean-squared error(MSE).

Step 6.Compute the index k of the selected codeword and determine the subquantizer number n according to the sphere associated with y0.

Step 7.Stop.

The decoding algorithm is described as follows[13]:

Step 1.From the received index k and the subquantizer number n,find the leader y0using the same lookup table as in the encoding operation.

Step 3.Compute the rank t according to k.Step 4.Find the vectorfrom y0and t[27].Step 5.Stop.

3.1.2 Application to Wideband Speech Coding

The EAVQ technique was applied to 50-7000 Hz wideband speech coding at 16 kbit/s[13].In this application,a speech signal sampled at 16 kHz was encoded by a transform coded excitation(TCX)coder[28],and EAVQ was used to quantize the target signal in the frequency domain.The TCXcoder operates on frames of 6 ms corresponding to 96 samples at 16 kHz.For each frame,there are 96 available bits—12 bits are used for the LPC coefficients,12 bits are used for the pitch parameters,and 72 bits are used for quantization of the target signal.The target signalis converted into a frequency-domain representation by discrete Fourier transform(DFT),and 48 complex coefficients are obtained.

Fig.3 shows the principle of quantization for the target signal in the TCXcoder.The norm(energy)Exof the complex coefficient vector x is given by

This norm is then quantized by a 7-bit logarithmic scalar quantizer(SQ).The 96-dimensional coefficient vector x is normalized by the quantized norm Exq.The normalized coefficient vector xnrmis first scaled by a scaling factor to match the EAVQ codebook described in the previous section.The best scaling factor is obtained experimentally.Then,the xnrmis split into 12 subvectors in 8 dimensions,and each subvector is quantized by EAVQ.Finally,the numbers of the 12 subquantizers used are entropy coded.

For performance evaluation,the two-dimensional statistical complex vector quantization(2D-CVQ)scheme in[29]was also used in the TCXcoder with the same bit-allocation scheme to quantize the target signal in the frequency domain.The objective test results showed that the EAVQ performed slightly better than 2D-CVQ[13].In addition,memory usage in EAVQ was much less than that in 2D-CVQ.

3.1.3 Split Multirate LVQ

The split multirate LVQ scheme[16]has been used in speech and audio standards such as 3GPPAMR-WB+[18],ITU-TG.718[20],G.711.1 Annex D[21],G.722 Annex B[22],and MPEG unified speech and audio coding(USAC)[23].

In this scheme,a modified version of EAVQ is used as a base quantizer,and an additional quantizer called Voronoi extension[16],[30],which is based on the same RE8 lattice,is used to extend the codebooks of the base quantizer when the nearest neighbor of the input vector lies outside these base codebooks.In the base quantizer,only the codebooks Q0,Q2,Q3,and Q4in EAVQ are used,and some leaders of Q3and Q4are replaced[16].With this modification,Q2and Q3are still embedded,but they are no longer subsets of Q4.Actually,Q4is a complementary subquantizer codebook to Q3.

The Voronoiextension is designed by using the Voronoi codes described in section 2.3,and its codebook size depends on the order of extension.For an R-order

▲Figure 3.EAVQapplication to the wideband speech coding in TCX.

Voronoi-extension,the codebook size is 28R.When the nearest neighbor of an input vector cannot be found in the base codebooks,the Voronoiextension is applied,and the selected lattice point is represented by the sum of two codevectors:the one from the base codebook Q3or Q4,and the other from the extended codebook.In this case,an input vector is quantized with 12+28Ror 16+28Rbits.

In real speech and audio coding systems,outliers may appear when the bit budget is limited.This quantization scheme is usually combined with a gain-shape scheme[1].The signal is normalized by a gain that is estimated over the current frame of the signal according to a predefined bit budget.Then the signal is quantized.

In the speech and audio codecs previously mentioned,quantization of the signal spectral parameters and transform coefficients is briefly described in the following steps:

Step 1.Normalize the signal by an estimated gain so that it fits within the predefined bit budget.

Step 2.Split the current frame of signalinto 8-dimensional vectors.

Step 3.Find the nearest neighbor in RE8for each vector.

Step 4.Determine whether the selected lattice point is in the base codebooks.If it is,encode the index of the codebook used and stop.Otherwise,continue.

Step 5.Apply the Voronoiextension and find the codevectors from the base codebooks and extended codebook[16],[30].

Step 6.Compute and encode the index of the codebooks used.

Step 7.Stop or iterate over global gain to adjust overall bit consumption.

3.2 Fast Lattice Vector Quantization

Fast lattice vector quantization(FLVQ)is an LVQ technique applied to low-complexity audio coding and is designed to quantize transform coefficients in transform coding[17],[19].

3.2.1 Overview of Fast Lattice Vector Quantization

In FLVQ,the quantizer comprises two subquantizers:a D8-based higher-rate lattice vector quantizer(HRQ)and an RE8-based lower-rate lattice vector quantizer(LRQ).HRQ is a multirate quantizer designed to quantize the input vector at rates greater than 1 bit/dimension.LRQ quantizes the input vector at 1 bit/dimension and uses spherical codes based on RE8 as the codebook.The codebooks of the FLVQ quantizer are constructed from a finite region of lattice and match the probability density function(PDF)of the input vectors.The codewords of HRQ are algorithmically generated,and a fast quantization algorithm is used.The LRQ codebook of 256 codewords is stored in a structured lookup table so that a fast searching method is designed for indexing the codewords.

LVQ is optimalonly for uniformly distributed sources.In transform coding,the distribution of transform coefficients is usually not uniform;therefore,entropy coding,such as Huffman coding,is applied to the quantization indices of HRQ to improve the efficiency of quantization in FLVQ.

3.2.2 Higher-Rate Quantization Based on D8

HRQ is based on the Voronoicode of D8presented in section 2.3 and is designed to quantize input vectors at 2 bit/dimension to 9 bit/dimension with increments of 1 bit/dimension.The codebook of this subquantizer is constructed from a finite region of D8and is not stored in memory.The codewords can be generated using a simple algebraic method.

To minimize the distortion for a given rate,D8should be truncated and scaled.The input vectors are scaled instead of the lattice codebook so that the fast-search algorithm introduced in[24]can be used,and then the reconstructed vectors at the decoder are rescaled.However,this fast-search algorithm assumes an infinite lattice that cannot be used as the codebook in real-time audio coding systems.In other words,for a given rate,the algorithm cannot be used to quantize input vectors lying outside the truncated lattice region.Therefore,a fast method for quantizing these outliers is developed in HRQ.

For a given rate R bit/dimension,where 2≤R≤9,an 8-dimentional vector x=(x1,x2,x3,x4,x5,x6,x7,x8)is quantized as follows:

Step 1.Apply a small offset a=2-6to each component of x in order to avoid any lattice point on the boundary of the truncated Voronoiregion,that is,x1=x-a,where a=(2-6,2-6,2-6,2-6,2-6,2-6,2-6,2-6).

Step 2.Scale x1by the scaling factorα:x2=αx1.For a

given R,the optimal scaling factor is experimentally selected.Step 3.In D8,find the nearest lattice point v to x2.This can be done by using the searching algorithm described in[24].Step 4.Suppose v is a codeword in the Voronoiregion truncated with R and compute the index vector k=(k1,k2,k3,k4,k5,k6,k7,k8)of v,where 0≤ki＜2Rand i=1,2,…,8.The index k is given by

where G is the generator matrix for D8and is defined as follows[4]:

Step 5.Compute the codeword y from k using the algorithm described in[25],then compare y with v.If y and v are exactly same,k is the index of the best codeword to x2and stop here.Otherwise,x2is an outlier and is quantized by the following steps:

Step 6.Scale down x2by 2:x2=x2/2.

Step 7.In D8,find the nearest lattice point u to x2.Then compute the index vector j of u.

Step 8.Find y from j and then compare y with u.If y is different from u,repeat steps 6 to 8;otherwise,compute w=x2/16.Because of the normalization of transform coefficients in transform coding,a few iterations may be performed to find a codeword to the outlier in the truncated lattice.

Step 9.Compute x2=x2+w.

Step 10.In D8,find the nearest lattice point u to x2.Then compute j of u.

Step 11.Find y from j then compare y with u.If y and u are exactly same,k=j and repeat steps 9 to 11;otherwise,k is the index of the best codeword to x2and stop.

The decoding procedure of HRQ is simple:Step 1.Find y from the received k according to R.

Step 2.Rescale y by the same scaling factorαused in the quantization process:y1=y/α.

Step 3.Add the same offset a used in step 1 of the quantization process to the rescaled codeword y1:y2=y1+a,and then stop.

The quantization efficiency of HRQ can be further improved by Huffman coding,which is an entropy-coding method that is most useful when the source is unevenly distributed[31].The transform coefficients are typically unevenly distributed;hence,using Huffman coding can improve the coding efficiency.In HRQ,Huffman coding is used to encode the quantization indices k and reduce the bit requirement.

3.2.3 Lower Rate Quantization Based on RE8

LRQ is based on RE8presented in section 2.2 and is designed to quantize input vectors at the rate of 1 bit/dimension.The Gosset lattice E8(RE8)is the best lattice in 8 dimensions for most purposes[4].However,from Table 1,the computational complexity of the LVQ scheme based on E8(RE8)is more than 3 times higher compared to that of the LVQ scheme based on D8.To reduce the complexity,LRQ includes a table-based searching method and a table-based indexing method.

In LRQ the codebook consists of all 256 codewords of the subquantizer Q2of EAVQ described in subsection 3.1.2.However,the codewords are arranged in a particular order to develop the fast indexing method(Table 2).For each 8-dimensional input vector x=(x1,x2,x3,x4,x5,x6,x7,x8),quantization is performed as follows:

Step 1.Apply an offset a=2-6to each component of the vector x:x1=x-a,where a=(2-6,2-6,2-6,2-6,2-6,2-6,2-6,2-6).

Step 2.Scale x1by the scaling factorα:x2=αx1.The optimal scaling factor is experimentally chosen.

Step 3.Obtain the new vector x3by reordering the components of x2in descending order.

Step 4.In Table 3,find the vector l that best matches x3in terms of MSE.The vectors given in Table 3 are leaders of thecodewords,and any codeword in the codebook can be generated by permutation of its leader.

▼Table 2.Codebook of the lower-rate quantizer

▼Table 3.Leaders of the codewords of LRQ

▼Table 4.Flag vectors and index offsets of the leaders

Step 5.Obtain the best codeword y by reordering the components of l in the original order.

Step 6.Find the flag vector of l in Table 4 and obtain the vector z by reordering the components of the flag vector in the original order.The flag vectors are defined as follows:

·If the leader consists of-2,2,and 0,then-2 and 2 are indicated by 1,and 0 is indicated by 0

·If the leader consists of-1 and 1,then-1 is indicated by 1,and 1 is indicated by 0.

Step 7.Find the index offset K related to the leader l in Table 4.

Step 8.If l is(2,0,0,0,0,0,0,-2)and y has the component 2 with index lower than that of the component-2,the offset K is adjusted so that K=K+28.

Step 9.Compute the vector dot product i=z pT,where p=(1,2,4,8,16,32,64,128).

Step 10.From i,find the index increment j related to y in Table 5.Step 11.Compute the index k of y:k=K+j,and then stop.The following are the steps taken in the decoding procedure of LRQ:

Step 1.Find the codeword y in Table 2 from the received index k.

Step 2.Rescale the codeword y by the same scaling factor αused in the quantization process:y1=y/α.

Step 3.Add the same offset a used in step 1 of the encoding procedure to the rescaled codeword y1:y2=y1+a,and then stop.

▼Table 5.Index increments related to the codewords of LRQ

3.2.4 Application to Low-Complexity Full-Band Audio Coding

FLVQ has been applied to 20 kHz audio coding in ITU-T Recommendation G.719[19].ITU-TG.719 is the first full-band audio codec of ITU-Tand was developed for low-complexity full-band audio coding for high-quality conversational applications.The G.719 codec is based on transform coding and operates on frames of 20 ms corresponding to 960 samples at a sampling rate of 48 kHz.The codec provides an audio bandwidth of 20 Hz to 20 kHz,operating from 32 kbit/s up to 128 kbit/s,and has an algorithmic delay of 40 ms.The G.719 codec features very high audio quality and extremely low computational complexity compared with other state-of-the-art audio coding algorithms.It is suitable for use in applications such as videoconferencing,telepresence,teleconferencing,streaming audio over the Internet,and IPTV.

▲Figure 4.FLVQ applied to audio coding in G.719.

▼Table 6.Computational complexity of FLVQin G.719

In the G.719 encoder,the input audio signal sampled at 48 kHz is converted by an adaptive time-frequency transform[19]from the time domain into the frequency domain.For every 20 ms,the input audio samples are transformed into 960 transform coefficients.FLVQ is used to quantize transform coefficients x(Fig.4).

After the transform,the obtained transform coefficients are grouped into sub-bands—8,16,24,or 32—of unequal length.Because the bandwidth is 20 kHz,only 800 transform coefficients are used.The 160 transform coefficients representing frequencies above 20 kHz are ignored.The power psof each sub-band is defined as the root-mean-square value of the subband and is given by

where x is the transform coefficients,and N is the number of coefficients in the sub-bands,that is,8,16,24,and 32.The resulting spectral envelope comprising the powers of all sub-bands is quantized and encoded.An adaptive bit-allocation scheme based on the quantized powers of the sub-bands is used to assign the available bits in a frame among the sub-bands.The number of bits assigned to each transform coefficient can be as large as 9 bits depending on the input signal.In each sub-band,the transform coefficients are normalized by the quantized powers psq.

Each sub-band consists of one or more vectors of 8-dimensional coefficients.Thus,the normalized coefficients xnare quantized in 8-dimensional vectors by using FLVQ previously mentioned.If a sub-band is assigned 1 bit per coefficient,then the lower-rate quantizer LRQ is used to quantize the normalized coefficients of the sub-band;otherwise,the coefficients are quantized by the higher-rate quantizer HRQ.The 8-dimensional coefficient vectors have a high concentration of probability around the origin;therefore,

Huffman coding is an option for the quantization indices of HRQ.When the rate is smaller than 6 bit/coefficient,the total of the bits needed for allsub-bands is added.If the Huffman coded bits are less than the allocated bits,Huffman coding is applied to the quantization indices,and a Huffman code flag is set.The saved bits are used to quantize the coefficients of the sub-bands assigned 0 bit.If the Huffman coded bits are not less than the allocated bits,then Huffman coding is not used,and the Huffman code flag is cleared.In each case,the Huffman code flag is transmitted as side information to the decoder.In this way,the best coding method is used.

Table 6 shows the computational complexity of FLVQ in 16/32-bit fixed-point for some bit rates in G.719.Computationalcomplexity is measured in units of weighted million operations per second(WMOPS)by using the basic operators of ITU-TSoftware Tool Library STL2005 v2.2 in ITU-TG.191[32].The ROM memory usage of the LRQ and HRQ tables is shown in Table 7.

Low computational complexity and storage requirements are a major advantage of using FLVQ for transform-based audio coding.

In February 2008,subjective tests for the ITU-TG.719 Optimization/Characterization phase were performed by independent listening laboratories in English,French,and Spanish according to a test plan designed by ITU-TQ7/SG12 Speech Quality Experts Group(SQEG)[33].Statistical analysis of the test results showed that the G.719 codec met all performance requirements[34].An additional subjective listening test for G.719 was conducted later to evaluate the quality of the codec at rates higher than those described in the ITU-Ttest plan[35].These test results showed that transparency was reached for critical material at 128 kbit/s.

The computational complexity of the G.719 codec in 16/32-bit fixed-point was estimated by encoding and decoding the source material used for the subjective test of the G.719 Optimization/Characterization phase.Using FLVQ,the computationalcomplexity of G.719 is quite low,forexample,15.397 WMOPSat 32 kbit/s,18.060 WMOPSat 64 Kbit/s,and 21.000 WMOPSat 128 kbit/s[19].

▼Table 7.ROMmemory usage of FLVQin G.719(in 16-bitwords)

4 Conclusion

LVQ has many advantages and is suitable for use in low-complexity transform-based speech and audio coding.

Embedded algebraic vector quantization(EAVQ)has been applied to speech and audio coding to efficiently quantize spectral vectors in transform coding,for example,TCX coding.Based on the EAVQ technique,split multirate LVQ has been developed and successfully used in several speech and audio coding standards,including 3GPPAMR-WB+,ITU-TG.718,G.711.1 Annex D,G.722 Annex B,and MPEG Unified Speech and Audio Coding(USAC).

Fast lattice vector quantization(FLVQ)has been applied to low-complexity full-band audio coding in ITU-T Recommendation G.719 and is designed to quantize transform coefficients in transform coding.The fast encoding algorithm is used,and an efficient method for quantizing outliers has been developed.Hence,the computational complexity of G.719 is quite low.In addition,Huffman coding is optionally applied to quantization indices to further improve the efficiency of the quantizer.

Acknowlegement

The author would like to thank Dr.Stéphane Ragot for valuable comments and discussions on this paper.The author also thanks the reviewers for their helpful suggestions in improving the presentation of the paper.

ZTE Communications2012年2期

ZTE Communications的其它文章: Key Technologies in Mobile Visual Search and MPEG Standardization Activities; Low-Complexity Error-Control Methods for Scalable VideoStreaming; MMT:The Next-Generation Media Transport Standard; Noise Feedback Coding Revisited:Refurbished Legacy Codecs and New Coding Models; Configurable Media Codec Framework:AStepping Stone for Fast and Stable Codec Development; AVS 3DVideo Coding Technology and System