R. E. Jones , J. A. Templeton C. M. Sanders and J. T. Ostien
Abstract: We use machine learning (ML) to infer stress and plastic flow rules using data from representative polycrystalline simulations. In particular, we use so-called deep(multilayer) neural networks (NN) to represent the two response functions. The ML process does not choose appropriate inputs or outputs, rather it is trained on selected inputs and output. Likewise, its discrimination of features is crucially connected to the chosen inputoutput map. Hence, we draw upon classical constitutive modeling to select inputs and enforce well-accepted symmetries and other properties. In the context of the results of numerous simulations, we discuss the design, stability and accuracy of constitutive NNs trained on typical experimental data. With these developments, we enable rapid model building in real-time with experiments, and guide data collection and feature discovery.
Keywords: Machine learning, constitutive modeling, plasticity, invariance.
Our effort to produce viable models of plasticity from trusted data draws upon traditional constitutive modeling theory and newly developed machine learning (ML) techniques.
The theory of constitutive function representation has a long history, going back to the beginnings of the Rational Mechanics movement. Much of the pioneering work was done by Rivlin, Pipkin, Smith, Spencer, Boehler, and co-workers [Spencer and Rivlin (1958a,b,1962); Pipkin and Wineman (1963); Wineman and Pipkin (1964); Smith and Rivlin (1964);Smith (1965); Rivlin and Smith (1969); Spencer (1971, 1987); Boehler (1987)]. Later,Zheng contributed a notable monograph on the application of representation theory to anisotropy [Zheng (1994)]. Much of these results have been condensed in: Spencer’s monograph [Spencer (1971)], Truesdell and Noll’s treatise [Truesdell and Noll (2004)],Gurtin’s text [Gurtin (1982)], and the recent book by Itskov [Itskov (2007)].
The application of machine learning (ML) to engineering dates back to at least the 1980’s and covers a wide variety of problems. For instance, Adeli et al. [Adeli and Yeh (1989)]applied ML to the design of steel beams; Hajela et al. [Hajela and Berke (1991)] used a ML model as a surrogate for the exact response of structures to enable fast optimization; Cheu et al. [Cheu and Ritchie (1995)] applied ML to traffic m odeling;a nd Theocaris et al. [Theocaris and Panagiotopoulos (1993)] used it to model fracture behavior and identification. For further bibliography along these lines, see a review of neural network applications in civil engineering that appeared in 2001 [Adeli(2001)].
Research on applying ML to constitutive modeling dates back to roughly the same time period. In solid mechanics in particular, Ghaboussi et al. [Ghaboussi, Pecknold,Zhang et al. (1998)] applied a neural network (NN) to data from experiments of beam deflection. They created a model which acquired increasing fidelity as experiment progressed via hierarchical learning and adapting new hidden layers. Furukawa et al.[Furukawa and Yagawa (1998)] constructed an “implicit” model of linear viscoplasticity with a NN based on a state space formulation, where the NN provided the driving term for plastic evolution and the elastic response was assumed to be known. Notably, they expressed a need for variety in the training data.
More recently, a number of studies have appeared comparing NN plasticity models to other models calibrated on experimental data for specific materials. Lin et a l. [Lin, Zhang and Zhong (2008)] built a NN model of the flow stress of low alloy steel based on only experimentally observable quantities. Bobbili et al. [Bobbili, Ramakrishna, Madhu et al.(2015)] constructed a NN model of high strain rate Hopkinson bar tests of 7017 aluminium alloy and compared it to a Johnson-Cook model. For T24 steel, Li et al. [Li, Wang, Wei et al. (2012)] compared a NN model to a modified Zerilli-Armstrong and strain-compensated Arrhenius-type model. They remarked on the opacity of the NN model and the need for extensive data. Desu et al. [Desu, Guntuku, Aditya et al. (2014)] made flow stress prediction of austenitic 304 stainless steel with support vector machine construct and compared it to a NN model. Asgharzadeh et al. [Asgharzadeh, Aval and Serajzadeh (2016)]modelled the flow s tress behavior o f AA5086 aluminum using a NN with two hidden layers. Also, in the realm of fluid mechanics, L ing et al. [Ling and Templeton (2015); Ling,Kurzawski and Templeton (2016)], Duraisamy et al. [Tracey, Duraisamy and Alonso(2015); Duraisamy, Zhang and Singh (2015)], and Milano et al. [Milano and Koumoutsakos (2002)] have been particularly active in applying machine learning techniques to model turbulence [Wang, Wu, Ling et al. (2017)]. Unlike traditional models based on physical mechanisms and intuition, these ML models are purely data-driven and phenomenological. Recently, mathematical analysis has been applied to understanding the training and response structure of NNs, which have traditionally been treated as black boxes. The work of Tishby and co-workers [Shwartz-Ziv and Tishby (2017)] and Koh et al.[Koh and Liang (2017))] is particularly illuminating and explores the trade-offs between information compression and prediction accuracy in the training process.
In the wider context of data-driven modeling, a number of recent developments [Alharbi and Kalidindi (2015); Kirchdoerfer and Ortiz (2016); Smith, Xiong, Yan et al.(2016); Versino, Tonda and Bronkhorst (2017); Bessa, Bostanabad, Liu et al. (2017)]are also noteworthy. Alharbi et al. [Alharbi and Kalidindi (2015)] constructed a database of Fourier transformed microstructural data and used this spectral information to drive the evolution of crystal plasticity simulation. Kirchdoerfer et al. [Kirchdoerfer and Ortiz (2016)] sought to subvert the traditional empirical model in the data-tomodel-to-prediction chain and replace it with a penalization of the prediction response by its distance to closest experimental observation/data point. This approach of directly using a database is commendable (and avoided data interpolation which appears, for example, in Shaughnessy et al. [Shaughnessy and Jones (2016))]. The optimization was constrained by conservation principles like a Newtonian force balance and was applied to truss and elasticity problems. The authors explored the technique’s robustness to noise and convergence. Versino et al. [Versino, Tonda and Bronkhorst(2017)] applied a genetic/evolutionary algorithm and a symbolic regression to model Taylor impact test data. The symbolic regression ML technique selects a best model composed of given analytic building-blocks and is especially attractive since the resulting tree structure leads to a physically intepretable model based on the physics embedded in the building-block sub-models. Lastly, Bessa et al. [Bessa, Bostanabad, Liu et al.(2017)] integrated design of experiments, simulation, and machine learning in materials discovery and design. It should be noted that the Materials Genome and similar material discovery and selection efforts [Jain, Ong, Hautier et al. (2013); Saal,Kirklin, Aykol et al. (2013); Raccuglia, Elbert, Adler et al. (2016)] are a deep and active field of research but this classification problem has minor bearing on the constitutive modeling task at hand.
In the vein of designing the architecture NN suit to specific tasks, the method we adopt and generalize, the Tensor Basis Neural Network (TBNN) [Ling, Jones and Templeton (2016)],is not simply a feed-forward, deep neural network. Unlike other NN mechanics models of the components of output quantities, such as stress, TBNN models have built-in invariance properties. The TBNN formulation shifts the basis for the unknown coefficient functions from the (arbitrary) Cartesian basis of the training data to an objective basis made up of powers of the selected inputs, as representation theory [Spencer (1971); Truesdell and Noll(2004)] suggests. This comes with the cost that the coefficient functions and basis are not linearly independent i.e. they must be trained simultaneously. This representation is akin to the Gaussian Approximation Potential (GAP) with the Smooth Overlap of Atomic Positions (SOAP) basis [Bartók and Csányi (2015)] that is gaining popularity in molecular dynamics, in that this machine learning constitutive function uses a spectral basis to preserve rotational and permutational invariance. It also has goals in common with image transforms that embed invariance properties [Khotanzad and Hong (1990); Lowe(1999)].
Motivated by the goal of achieving on-the-fly model construction, directed sampling/experiments, and discovery of features/trends in large datasets, in this work we show how classical constitutive modeling is needed to obtain viable ML models of constitutive behavior. In Section 2, we provide the fundamentals of representation and plasticity theories and connect them with our NN formulation of the components of plasticity, namely the stress and flow rules. In Section 3 , we discuss how the data t o t rain t he models is obtained, the specifics of the learning algorithm, and the time integration algorithm used to predict the plastic evolution. One of the data sets is obtained from the elastic-plastic response of an ensemble of oligo-crystalline aggregates, and so the resulting NN model can be considered a form of homogenization. The results of these developments are discussed in Section 4 and include comparisons of various model architectures. These comparisons are based on cross-validation errors and evaluations of stability and prediction accuracy in the context of adequate training. Finally, in Section 5, we discuss results and innovations,such as the generalized tensor basis architecture, the novel ways of embedding physical constraints in the formulation, and the exploration of data sufficiency, robustness, and stability in the context of commonly available data.
In thissection weprovideaconciseoverview of representation theory and how weapply it in the context of constitutive modeling by(artificial)neural networks(NNs).Specifically,we employ a generalization of the Tensor Basis Neural Network(TBNN)[Ling,Jones and Templeton(2016)]concept based on an understanding of classical representation theory.With it we construct models that represent the selected output as a function of inputs with completegenerality and compact simplicity.Thisconstruction isdistinct from the predominance of component-based NN constructions,for example those mentioned in the Introduction,in that basic symmetries,such as frame invariance are built into the representation and do not need to belearned.
Representation theoremsfor functionsof tensorshaveafoundation in group theory[Olver(2000);Goodman and Wallach(1998,2009);Sattinger and Weaver(2013)]with the connection being that symmetry is described as functional invariance under group action.In mechanics,the relevant invariance under group action are rotations(and translations)of the coordinate system,which is known as material frame indifference,invariance under super-posed rigid body motions or simply objectivity.1Frame indifference is a special case of the more general principle of covariance with changes of the metric tensor[Marsden and Hughes(1983)].This is a fundamental and exact symmetry.Practical applications of representation theory to mechanics are given in Truesdell and Noll’s monograph[Truesdell and Noll(2004)]and Gurtin’s text[Gurtin(1982)]and address complete,irreduciblerepresentations of general functions of physical vector and tensor arguments.For example,the scalar function f(A)of a(second order)tensor A isinvariant if

and a(second order)tensor-valued function M(A)isobjectiveif

for every member G of theorthogonal group.
Underpinning the representations of f and M are a number of theorems.The spectral theorem statesthat any symmetric second order tensor A has spectral representation:

composed of its eigen-values{λi}and eigen-vectors{ai}where i=1,2,3.Spectral representation makes powers of A take a simple form:An=∑iλniai? ai,and in particular A0≡I.The equally important Cayley-Hamilton theorem states that the tensor A satisfiesitscharacteristic equation:

where{Ji}arethe principal(scalar)invariants of A.The(generalized)Rivlin’sidentities[Rivlin(1955);Rivlin and Smith(1997)]providesimilar relationsfor multipletensors and their joint invariants.
Scalarsthat respect Eq.(1),such as{Ji},are called scalar invariants and are formed from polynomials or,more generally,functions of the eigenvalues of A.Hence,f(A)reduces to

where I is aset of scalar invariantsof A,and consequently f is also an invariant.A set of invariants I isconsidered irreducible if each of itselementscannot berepresented in terms of othersand conveysa sense of completenessand simplicity.2In somesense,acompleteset of invariantsare coordinateson the manifold induced by symmetry constraintsand hence are clearly not unique in their ability to coordinatize the manifold.Since the eigenvalues{λi}are costly to compute,typically traces such as{tr A=∑iλi,tr A2= ∑
iλ2i,tr A 3λ3i}are employed as scalar invariants.Joint invariants of a functional basis for multiple argumentsareformed with thehelp of Pascal’striangle.
For tensor-valued functionssuch as M(A)in Eq.(2),apower series representation

providesastarting pointfor amorecompact representation.Thecoeffi cientfunctions ciare represented in terms of scalar invariants asin Eq.(5).Thispower seriesrepresentation can then be reduced by application of the Cayley-Hamilton theorem(4),in the recursive form Aj+3=J1Aj+2-J2Aj+1+J3Aj.The transfer theorem(asreferred to by Gurtin[Gurtin(1982)])states that isotropic functions such as M(A)inherit the eigenvalues of their argumentsand impliesthefactthat thesefunctionsareco-linear with their arguments.Also Wang’s lemma(I,A,A2span the space of all tensors co-linear with A)is a consequence of Eq.(3)and Eq.(4),and givesa sense of completenessof therepresentation:

Eq.(7)evokes the general representation for a symmetric tensor function of an arbitrary number of arguments in terms of a sum of scalar coeffi cient functions multiplying the corresponding elements of the tensor basis.The general methodology for constructing the functional basis to represent scalar functions is given in Rivlin et al.[Rivlin and Ericksen(1955)],and thecorrespondingmethodology toconstructtensor basesisdeveloped in Wang[Wang(1969,1970)].
Representation theory,like machine learning,does not determine the appropriate inputs and output for constitutivefunctions.In mechanics,thereisa certain amount of fungibility to both.For instance,the(spatial)Cauchy stress can easily be transformed into the(referential)fi rst Piola-Kirchhoff stress,and left and right Cauchy-Green stretch have same eigenvalues but different eigen-bases.Also,any of the Seth-Hill/Doyle-Ericksen strain family[Seth(1961);Hill(1968);Doyle and Ericksen(1956)]provide equivalent information on deformation,and any of the objective rates formed from Lie derivatives[Johnson and Bammann(1984);SzABo and Balla(1989);Haupt and Tsakmakis(1989,1996)]provide equivalent measures of rate of deformation;however,some choices of argumentsand output lead to greater simplicity in therepresentation than others.
Lastly,it is important to note that isotropic functions are not restricted to isotropic responses.The addition of a structure tensor characterizing the material symmetry to the argumentsallowsisotropic function theory to be applied so that the joint invariantsencode anisotropies[Smith and Rivlin(1957b,a);Spencer(1982);Zhang and Rychlewski(1990);Svendsen(1994);Zheng(1994)].
Briefl y,plasticity is an inelastic,history-dependent process due to dislocation motion or other dissipative phenomena.We assume the usual multiplicative decomposition[Lee(1969);Lubarda(2004)]of the total deformation gradient F into elastic(reversible)Feand plastic(irreversible)Fpcomponents:

is valid.As a consequence,the velocity gradient in the current configuration,l≡˙FF-1,can beadditively decomposed into elastic and plastic components:

refer to Lubliner [Lubliner (2008)]. The assumption that Fpis pure stretch (no rotation)reduces Lpto Dp≡ sym Lp. The elastic deformation determines the stress, for instance the Cauchy stress T:

and the evolution of the plastic stateisdetermined by a fl ow rule,e.g.:

where Fpquantifi es the plastic state and T the driving stress.Invariance allows the reduction?of the a?rgument of T to,for example,the objective,elastic Almansi strain ee=12I-b-e1based on the left Cauchy-Green/Finger stretch tensor be=FeFTe.Similarly,the state variable in the fl ow rule can be reduced by applying invariance,for example,bp=FpFTp.The driving stress c?an be attrib?uted to the deviatoric part of the pull-back of the Cauchy stress T:σ=dev F-e1TF-eTwhich is also invariant and also coexists in the intermediate configuration with Dp.Furthermore,a deviatoric tensor basis element,such atσ,generates an isochoric fl ow which respects plastic incompressibility det Fp≡1.Other choices of the inputs and outputs of the stress and fl ow functions are discussed in Results section.Typically both the stress and fl ow are derived potentials,despitethefact that potential usually cannot bemeasured directly,to ensure elastic energy conservation for thestress and associativefl ow for thefl ow rule;however,in this work we to allow for amoregeneral fl ow and anon-differentiable NN model.3Also worth mentioning are the complex requirements for elastic stability,see Marsden et al.[Marsden and Hughes(1983)],thatwedo notattemptto embed in theformulation mainly because they requireapotential.
A few basic properties are built into traditional empirical models that need to be learned in typical NN models. First, zero strain, ee= 0, implies zero stress:

and, likewise, zero driving stress should result in zero plastic flow:

Also there is a dissipation requirement for the plastic flow. Generally speaking, the Coleman et al. [Coleman and Noll (1963)] argument, together with the first and second law of thermodynamics, applied to a free energy in terms of the elastic deformation and a plastic history variable results in: (a) The stress being conjugate to the elastic strain rate,and (b) The internal, plastic state variable, when it evolves, reduces the change in free energy to M · Lp≥ 0, where M is the Mandel stress

or equivalently

refer to Lubliner[Lubliner(2008)].Also,given the physics of dislocation motion,it is commonly assumed that the plastic deformation is incompressible,det Fp=1,which impliesthefl ow isdeviatoric

For moredetailsseethetexts[Lubliner(2008);Simo and Hughes(1998);Gurtin,Fried and Anand(2010)].
Asmentioned in the Introduction,wegeneralizethe Tensor Basis Neural Network(TBNN)formulation[Ling,Jonesand Templeton(2016)]to build NN representations for thestress relation,Eq.(10),and the plastic fl ow rule,Eq.(11),that embed a number of symmetries and constraints.Both T and Dparerequired to beisotropic functionsof their argumentsby invariance.As discussed in Section 2.1,classical representation theorems give the general form

where A≡{A1,A2,...}arethepre-supposed dependencies/argumentsof function f,I≡{Ij}is an(irreducible)set of scalar invariants of A,and B={Bj}is the corresponding tensor basis.In Eq.(17),only the scalar coefficient functions are{fi}are unknown once the inputs have been selected;and,hence,they are represented with a dense NN using the selected scalar invariants I as inputs embedded in the overall TBNN structure.In the TBNN framework,the sum the NN functions{fi(I)}and the corresponding tensor basis elements{Bi}in Eq.(17)is accomplished by a so-called merge layer,and the functions{fi}are trained simultaneously(refer to Fig.1 and more details will be given in Section 3.2).Thisformulation isin contrast to thestandard,component-wise NN formulation:

which isbased on componentsof both theinputs{A1,A2,...}and theoutput f.
For the stress,we assume a single symmetric tensor input selected from the Seth-Hill/Doyle-Ericksen elastic strain family,in particular ee,is sufficient,so that representation Eq.(7):

is appropriate.Despite this formulation being based on strain,versus stretch,it does not embed thezerostressproperty,Eq.(12),and,hence,σ0(I)will need tolearnthatzerostrain implies zero stress.We prefer to impose,rather than learn,physical constraints such as Eq.(12)sincethisreducesthenecessary training data[Ling,Jonesand Templeton(2016)],and exact satisfaction leadsto conservation and other propertiesnecessary for stability,etc.Exact satisfaction of Eq.(12)can be accomplished a few different ways:shifting the basis with the Cayley-Hamilton theorem(4)


so there is a(weak)eq∑uivalence between coeffi cient functions of the various representations.Here
As mentioned in Section 2.2,we assume that the inputs to the fl ow rule are(a)a history variable bp,and(b)driving stressσ.A general function representation from classical theory for an isotropic function of two(symmetric)tensor argumentsrequiresteninvariants[Rivlin(1955)](seealso Boehler[Boehler(1987)]):

and eight tensor generators/basiselements


Plastic incompressibility,in the form of deviatoric plastic fl ow,Eq.(16),can imposed by applying thelinear operator dev,dev

Dissipation of plastic fl ow can be strictly imposed by requiring that the fl ow be directly opposed to the stress in Eq.(15)which implies:

and f1(I)>0 and f3(I)>0.In this study wewill rely on the learning processto ensure thepositivity of thecoeffi cient functions f1and f3but thiscould beaccomplished exactly with the Macauley bracket(ramp function)applied to f1and f3,for example.
We train the NN models of plasticity with data from two traditional plasticity models.In this section we give details of(a)the traditional models,(b)the training of the NNs,and(c)numerical integration of the TBNN plasticity model.
In an exploration of the fundamental properties of NNs applied to plasticity,we seek to represent responses of two models:(a)A poly-crystalline representative volume element(RVE)with grain-wise crystal plasticity(CP)response(an unknown closed form model since the poly-crystalline aspect of the CP model obscures its closed form),and(b)A simple visco-plasticity(VP)material point(a known closed form model).Both are fi nite deformation modelsso that invarianceand finiterotation areimportant;and both areviscoplastic in thesenseof lacking awell-defi ned yield surfaceand strictly dissipativecharacter.Briefly, crystal plasticity (CP) is a well-known meso-scale model of single crystal deformation. Here we use crystal plasticity to prescribe the response of individual crystals in a perfectly bonded polycrystalline aggregate. The theoretical development of CP is described in Taylor et al. [Taylor (1934); Kroner (1961); Bishop and Hill (1951b,a); Mandel(1965)] and the computational aspects in reviews [Dawson (2000); Roters, Eisenlohr,Hantcherli et al. (2010)].
Specifi cally,for the crystal elasticity,we employ a St.Venant stress rule formulated with thesecond Piola-Kirchhoff stressmapped to thecurrent configuration

wheretheelastic modulustensor C=C11K+C12(I-K)+C44(J-2K?)hascubic?crystal symmetrieswith C11,C12,C44=204.6,137.7,126.2 GPa,and Ee=12FTeFe-I isthe elastic Lagrange strain.Here[I]ijkl= δijδkl,[J]ijkl= δ∑ikδjl+ δilδjk,[K]ijkl= δijδklδik(no sum implied),δijisthe Kronecker delta,and C Ee=k,l[C]ijkl[Ee]klEi?Ej.Plastic fl ow can occur on any of 12 face-centered cubic(FCC)slip planes.Each crystallographic slip system,indexed byα,is characterized by Schmid dyads Pα=sα?nαcomposed of theallowed slip direction,sα,and thenormal to theslip plane,nα.Given theset{Pα},the plastic velocity gradient isconstructed via:

which is inherently volume preserving in the(incompatible)intermediate/lattice configuration.Finally,the slip rate˙γαis related to the applied stress through the resolved shear(Mandel)stressτα=M·Pα,for that slip system.Weemploy acommon power-law form for theslip raterelation

With this model, we simulate the polycrystalline response using a uniform mesh 20 ×20 × 20 with the texture assigned element-wise and strict compatibility enforced at the voxelated grain boundaries. Ten realizations with 15, 15, 17, 18, 18, 19, 19, 20, 20, 21,22 grains were sampled from an average grain size ensemble and each grain was assigned a random orientation. Minimal boundary conditions to apply the various loading modes(tension, shear, etc.) were employed on the faces and edges of the cubical representative volumes. Also, we limit samples to a single, constant strain rate 1.01/s.
The simple visco-plastic (VP) model consists of a St. Venant stress rule in the current configuration with Almansi strain:

where C= λI+μJ with isotropic parametersYoung’s modulus E=200 GPa and Poisson’s ratio ν=0.3,together with a simple(associative)power law for thefl ow rule:

where c=0.001 MPa-(1+p)-s-1and p=0.1 arematerial constants.
A typical NN,such as the representation of each component of Eq.(18),is a twodimensional feed-forward,directed network consisting of an input layer,output layer and L intervening hidden layerswhereneighboring layersaredensely connected.Each layer?iconsists of N nodes(ij).The vector of outputs,yi,of the nodes(ij),j∈(1,N)of layer?iis the weighted sum of the outputs of the previous layer?i-1offset by a threshold and passed through aramp-likeor step-like activation function a(x):

where Wiis the weight matrix for the inputs of(hidden)layer?iof the state/output of nodes of the previous layer xi-1and biis the corresponding threshold vector.In our application the input layer consists of the NIinvariants I and the NBelements of the tensor basis B.The elements of I form the arguments of the coefficient functions,each having a L×N neural network representation,while the elements of B pass through the overall network until they are combined with the coeffi cient functions according to Eq.(17)to form the output via a merge layer that does the summation of the coeffi cient functions and the elements of the tensor basis.After exploring the C0 step-and ramp-like rectifying activation functions commonly used,we employ the ramp-like(C1 continuous)Exponential Linear Unit(ELU)[Clevert,Unterthiner and Hochreiter(2015)]activation function:

to promote smoothness of the response and limit the depth of the network necessary to represent the response relative that necessary with saturating step-like functions.
Training thenetwork weights Wiand thresholds biisaccomplished viathestandard backpropagation of errors[Werbos(1974);Rumelhart,Hinton and Williams(1986)]which,in turn,drives a(stochastic)gradient-based descent(SGD)optimization scheme to minimize theso-called loss/error,E.We employ the usual root mean square error(RMSE)

where D is the set of training data composed of inputs xk={Ik,Bk}and corresponding output dk.The gradient algorithm relies on:(a)The change in E with respect to each weight Wi

where

Here a′is the derivative of weight function, [a ? b]ij= aibjis the tensor product, and [a ⊙b]i= aibielement-wise Hadamard-Schur product.4The recursion seen in Eq. (36) gives back-propagation its name.The gradient defined by these expressions is evaluated with random sampling of subset of training data D called minibatches. Also, search for a minimum along this direction is governed by a step size called the learning rate in the ML community. These standard constructions are trivially generalized to the TBNN structure since the inputs B are not directly related to Winor bi,and are merely scaled by the coefficient functions to form the output y, refer to Fig. 1. For more details of the SGD algorithm see Nielsen [Nielsen (2015)].
To begin the training, the unknown weights, {Wi}, and thresholds, {bi}, are initialized with normally distributed random values to break the degeneracy of the network and enable local optimization. Since multiple local minima for training are known to exist,choosing an ensemble of initial weights which are then optimized improves the chances of finding a global minimum and the distribution of the solutions indicates the robustness of the training. Also, the full set of input data is divided into a training set D, used to generate the errors for the back-propagation algorithm; a test set T , for assessing convergence of the descent algorithm; and a third set V for cross-validation, to estimate the predictive capability of the trained network. Ensuring that the errors based on T are comparable to those on V reduces the likelihood over-fitting data with a larger than necessary NN. We chose to divide the available data in a T : D : V = 20:72:8 ratio. In addition, we sample individual stress-strain curves produced by the CP and VP simulators so as to maintain approximate uniform density of data along the arc-length of the stress-strain curve (vs. a uniform sampling over the strain range) to resolve high-gradient (elastic) and transition(yield) regimes. Also, it should be noted that we allow ourselves to train on inputs derived from the plastic deformation gradient, Fp, despite the fact that this quantity is difficult to observe directly in experiments. A critical part of the training algorithm is normalizing the data so that the NN maps O(1) inputs to O(1) outputs since having Wi, bi~ O(1) will achieve better SGD convergence. We also shift and scale the scalar invariants I so that they have a mean zero, variance one distribution. We normalize the other set of inputs, the tensor basis B, using the maximum Frobenius norm of the basis generators, e.g. bpand σ,over thetraining set D.During training,theoutput tensors arenormalized similarly based on their maximum normsover D,so that

Figure 1:TBNN structure for M(A)=∑ c(I)B with 3 invariants I={I,I,I},i i i 0 1 2 a 3×4 NN,2 coeffi cient functions{c0(I),c1(I)},and 2 tensor basis elements B={B 0,B 1}.Thelinear transformation y i=W i x i-1+b i of theoutputs x i-1 of layer i-1 to theinputs y i of layer i isdenoted by thearrowsconnecting thenodesof layer i-1 to those of layer i.Thenonlinearity of theactivation functions a(y i)isrepresented by a(yij)where yij are the components of y i.The scaling operations described in Section 3.2 are omitted for clarity

where sfis the scaling of output f;sBiis the scaling of basis element Bibased on the powers of principal generator(e.g.if Bi=basbthen sBi=sabsbswhere sbis the scaling of b);and=sIiIiis the set of scaled and shifted invariants.These scales have the added benefi t of coarsely encoding the range of training data so the extrapolation during prediction can be detected.
Convergence is assessed by averaging the error with respect to T over previous iterations of the SDG(in this work we average over the last 4-10 iterations)and terminating when this average converges,but not before performing a minimum number of iterations(1000 in this work).More discussion of thetraining approach can be found in Ling et al.[Ling,Jonesand Templeton(2016)],although in that work thelearning ratewasheld fi xed rather than decaying asthe training proceeds,asin thisstudy.
We need a time-integration scheme to solve the differential-algebraic system Eq.(10)and Eq.(11). We assume it is deformation driven so that F=F(t)is data.To form a numerical integrator,we rely on the well-known exponential map for the plastic deformation[Fp]n+α≡ Fp(tn+α):

which is an explicit/approximate solution to Eq.(11).In Tab.1,we outline an adaptive scheme based on this update and the interpolation of the(full)deformation gradient Fn+α=exp(αlogΔF)Fnby similar meansvia log Fn+α=(1-α)log Fn+1+αlog Fn=log Fn+αlogΔF,withΔF=Fn+1F-1n.Since we do not rely on the NN models of stress Eq.(10)and fl ow(11)being directly differentiable,5This relaxation could be improved by using the derivatives already computed by the backpropagation algorithm in a Newton solver with a trust region based on the bounds of the training data.we use a simple relaxation schemetoenforceconsistency of therate Dp(bp,σ)usingaresidual based ontheunknown[Fp]n+1updated at each iteration given[Fp]n.If any step has an increase in the residual,thestep sizeiscut;conversely,when asub-step converges,theremainder of theinterval is attempted.
In this section we cover our investigations of:(a)Optimal network size,inputs,and representation basis;(b)Infl uence of training data on error and stability;and(c)The robustness and accuracy of the model predictions.As mentioned,we employ data from an unknown-form CPmodel(Eq.(26)and Eq.(27))and known-form VPmodel(Eq.(29)and Eq.(30)).(Recall that we designate the CPmodel an unknown-form model since the response of the polycrystalline aggregates has no closed form.)Training with the datafrom the CP model illustrates the NN model’s ability to represent and homogenize the response of a complex system and the VP model is particularly useful for exploring NN representations since we know the true response and generating samples is computationally inexpensive.
Table 1: Time integration algorithm with adaptive time-stepping. Note,is a convergence tolerance

Table 1: Time integration algorithm with adaptive time-stepping. Note,is a convergence tolerance
For step n+1? Initialize F p=[F p]n,andΔF=F n+1F-1n? Sub-step:whileα < 1,tryα=1 with F ≡F n+α=exp(αlog(ΔF))F n Relaxation:at fixed α,initialize[F p]?k=0=[F p]n+αwith iteration indexed by k:1.b?p=?F p F Tp??k and b?e=F h?F Tp F p?-1i?k F T 2.T?=T(b?e)andσ?=dev?[F p]k F-1T?F-T[F e]T k?3.[D p]?k=f(b?p,σ?)4.Update[F p]?k+1=exp?αΔt[D p]?k?F p 5.If Rk≡???[D p]?k-[D p]?k-1???<∈??[D p]?0??then α += Δα,converged at tn+α,else if Rk> Rk-1 then cut stepα =1/2α,diverging,else continue.? Update T n+1=T?and[F p]n+1=[F p]?
We begin our numerical investigations with: (a) A survey of the possible representations for the models of stress and flow, and (b) Optimizing the structure and meta-parameters of the NN representations. To assess improvements in performance we used the traditional metric for evaluating NN performance, cross-validation error, where the training dataset D is replaced by the validation dataset V in evaluating the RMSE formula, Eq. (33).
For this study we use data from the CP model to train the stress and flow TBNNs. In particular, we collect data using 3 tension and 6 simple shear loading modes averaged over 570 random textures for each of the 10 polycrystalline realizations. As mentioned in the Methods section, we give ourselves access to the (average) plastic state variables of the CP simulations and so we train the stress and flow TBNNs independently (and not simultaneously). The trajectory for each realization is computed over a sequence of 100 uniformly spaced strains and then averaged. As stated in Section 3.2, interpolation between points in the elastic and elastic-plastic transition regimes is used to increase the density of training data in these regions. This results in approximately 2000 data points with 6 strain and 6 stress components per point, which are then split between training,test, and cross-validation datasets. This dataset is typical of experimental datasets used to calibrate traditional models but compared to typical big-data applications of neural networks, this is a relatively small dataset. It is likely that other surrogate representations,such as Gaussian Processes (GPs), would do as well at emulating the data as measured by cross validation error. Nevertheless, this work will demonstrate that appropriately sized NNs can represent the abrupt transition between the elastic and plastic regimes, which can lead to errors with other methods, e.g. global polynomials [Rizzi, Khalil, Jones et al.(2018)]. The proposed TBNN architecture is designed to enforce the underlying physical invariants and constraints in the system exactly, which few other surrogate models can achieve, e.g. GPs [Salloum and Templeton (2014)]. While less relevant for reducing overall representation error, respecting these invariants is crucial in forward integration of the model [Arnold (2013)] and also has benefits in understanding how the model is making its predictions in the context of classical constitutive theory. In order to optimize the dimensions of the NN to the amount of available data, the influence of the breadth and depth of the network architecture on prediction error was explored and will be discussed shortly. Since these meta-parameters can be tailored in a straightforward manner to optimize representation accuracy and avoid over-fitting, NNs appropriate to the physical problem and available data size can be constructed and evaluated for additional properties necessary for the application at hand.
Fig. 2 and Fig. 3 shows typical training data for the I3 stress and IF flow representations(refer to Tab. 2 and Tab. 3) with 3 × 4 and 5 × 8 NNs, respectively. The left column of Fig. 2 and Fig. 3 show the tension response and the right columns show the shear response.The upper panels show the (input) invariants and the (output) coefficient functions. In general, the inputs and outputs are smooth and correlated, and all coefficient functions contribute. The notable exception is the stress model in shear in which only the coefficient function of the linear basis element eeappears to contribute. Note that all invariants are arguments to each coefficient function. Also, in Fig. 3 the zero invariant, tr σ ≡ 0, that becomes noise upon the input scaling described in Section 3.2. Apparently, the NN training learns to ignore this input since the outputs are smooth and regular. The lower panels show:(a) The correspondence of the model (lines) and the data (points), and (b) The error as a function of strain. The errors for each of the components are of comparable magnitude and tend to have an irregular pattern in the elastic region of the loading. Note that with a C0 activation function (e.g. the Rectifying Unit a(x) = max(0, x)) we observed distinct scallops and cusps in error curves (not shown for brevity). Also it is remarkable that the errors of the flow model in shear are distinctly linear, which, perhaps, is related to the fact only the linear basis element is active.
Theseresultsare typical for awiderangeof NN structuresand(meta)training parameters.Fig.4 shows the cross-validation errors of the stress and fl ow(scaled by sTand sDp,respectively)using the full representations I3 and IF(refer to Tab. 2 and Tab. 3,respectively).As Schwartz-Ziv and Tishby[Shwartz-Ziv and Tishby(2017)]remark,trying to interpret the behavior of network from a single training tendsto be meaningless;hence,we evaluate parametric and structural changes with an ensemble of at least 30 replicas models Mk∈M in this and the following studies.(The replicas are obtained by using different random seedsto producetheinitial weightsand thresholds.)Theinsetsshow that the(initial)learning rate can have a strong effect on the errors,but once a small enough(<10-3)rate is selected the final errors are relatively insensitive to this parameter.The main panelsshow thetypical trendsin errorsranging from under-representation(too small anetwork)to over-fi tting(too largeanetwork).6As mentioned,we require that in the training procedure that the error on the training D and the testing T data be comparable as failure to achieve parity in the errors is indicative of bad predictionsand over-fi tting.For thestress TBNN,N=4 nodesappears to bean optimum even for relatively shallow networks(N<4)but theoptimal number of nodes is relatively insensitive for L>4.The fl ow TBNN shows analogous behavior but with a trade-off between nodes and layers,e.g.for L>6,N=4 appears to be best,while N>4 isbetter for shallower networks.Thesefi ndingsaresomewhatobscured by thenoise in the trend lines,which persists despite using the average of 150 replica networks.Also,the convergence window(described in Section 3.2)is an important meta parameter.We obtained theseresultswith a4 iteration convergencewindow;alonger convergencewindow(e.g.10 iterations)shifts the best cross-validation to smaller networks,but also induces larger variance in error between replicas.Since we want reliable error from each replica,we use a 4 iteration convergence window throughout the remainder of this work.Lastly,we do not believe cross-validation is suffi cient for determining completeness of network;however,these results indicate that optimal number of nodes is less than the number of input invariants for fl ow but greater than this matrix rank-based criterion for the stress representation.Apparently,the NN is compressing the input for the fl ow network;and,hence,weconjecturethat the NN isforming lower dimensional set of(alternate)invariants internally with respect to thetraining data.
Fig.5 showscross-validation error for the CPtraining data for variousbasisrepresentation of the stress and fl ow functions(refer to Tab.2 and Tab.3 for the defi nition of the labels).For thestress TBNNs,all(overall)errorsare comparable with theexception of the component-based representation and theoneterm E1 representation(with tensor basis B={ee}).Clearly,the E1 basisisnotsuffi cient sinceitisakin to aoneparameter Navier model of stress.From the results of the two truncated bases,I2 and ID,it appears that correlated inputs,B={I,ee}(I2),train comparably to linearly independent inputs,{I,dev ee}(ID,which uses a volumetric/deviatoric split).Also,the upper panel of Fig.5a shows that the representations without embedded satisfaction of the zero-stress at zero-strain constraint,Eq.(12),generally violatethisconstraint by about 1%of themaximum stress.For thefl ow TBNNs,all(overall)errors are comparable with the exception of the reduced scalar and tensor basisrepresentation S1.Also theother reduced representations(R3,R1,T3,T1)have slightly higher average errors than the full representations(UF,IF,IR,SF,DS,DS,DR,DZ)albeit with reduced variance.Theconsistency of therepresentations with thezero-fl ow-atzero-stress condition Eq.(12)generally follows whether powers of eeare included or not.Clearly,cross-validation based on this limited dataset is not sufficient for decisive model selection but it doeseliminatesomerepresentations.By comparison,thecomponent-based representations,Eijand CM,display higher errors,larger variance in the performance and poor zero-input-zero-output results.Beyond the fundamentally different functional representation,these models are likely suffering from an insufficiency of data to learn the necessary properties(theonesembedded in thegeneralized TBNN framework)accurately,as demonstrated in Ling et al.[Ling,Jones and Templeton(2016)].Lastly,as discussed in the Theory section,we have embedded a number of properties in the representations,e.g.symmetry,deviatoric fl ow,dissipation,and,generally,the violation of the learned propertiesison par with what weillustratewith thezero-stressand zero-fl ow conditions.In preliminary studies we also trained networks with different inputs and outputs.In general,thecross-validation errorswerecomparableover avariety of choices,for example

Table 2:Stress representations:Those with I in their tensor basis do not satisfy the zero stresscondition intrinsically,and thosewith lessthan threebasiselementsarenot complete
Fig.6 and Fig.7 show theresponseof the NNcoefficient functionsto tension and shear,for stressand fl ow,respectively.In theseplots,each coefficient is scaled according to Eq.(37)so that coefficient functionsof higher order termscan beplotted on par with thoseof lower order terms.First,wenoticethatthe E3 basisachieveszero-stressatzero-strainsatisfaction exactly at the expense of a more complex,larger magnitude per component response than I3,asthehigher order term e3eapparently needscompensation by thecomponent functions,refer to Eq.(21).Also,wesee more evidencethat thetruncated representations,I2 and ID,have almost indistinguishable response despite ID having a linearly independent tensor basis.For the fl ow representation we only compare the DZ and T1 representations for clarity.Note that T1 is much simpler in form(one tensor basis element versus ten)than DZ,which has a complete basis,and its response is simpler while achieving comparable cross-validation error to DZ.Also evident from both tension and shear response,DZbuilds a similar response to T1 by letting all/most components contribute.Also signifi cant,the coeffi cient ofσ2,C2,isessentially zero throughout theshear trajectory but not thetension,which implies that the NN may not be learning dissipation is an important property.This is in contrast with the DR representation(not shown)where the corresponding coefficient is essentially zero for both tension and shear.For both the stress and fl ow models,the coeffi cient responses generally resemble the trends in the stress and fl ow data,with large changesup to theelastic-plastic transition at strain>0.002 followed by relatively constant valuesin theplastic regime.Thisis consistent with theexpectation that in fully developed plastic fl ow(in a constant direction with negligible hardening)the elastic state and the plastic fl ow areconstant.
Our validations studies include tests of:(a)Completeness of representation and training data,and(b)Continuity/robustnessto perturbation.

Table 3: Flow representations: A ll but UF have a symmetric tensor basis, those with I or b p in the tensor basis do not satisfy the zero flow constraint intrinsically, those with tr σ include this noise invariant, those with dev applied to the tensor basis generate incompressible fl ows, and those with only odd powers of σ in the tensor basis are designed to be dissipative

Figure 2:Stress training data:(a)Tension and(b)Shear stress evolution with strain for CPmodel.Top panels:Scaled input invariants.Second panels:Scaled trained tensor basis coeffi cient functions.Third panels:Stress T response(lines:model,points:data).Bottom panel:Error asfunction of strain scaled by s T
As we already have indications that a training set composed of only tension and shear may be insufficient,we computed the(E3)TBNN and(Eij)component-based NN stress models’response to bimodal stretch ee(∈1,∈2)= ∈1M1+ ∈2M2,wherethemodes arethe CP(tension)training modes Mi(∈)= ∈(ei? ei- ν(I-ei? ei)).Hence,the(tension)training data aligns with axes and here the models represent the data well.Fig.8 shows that the model responses are significantly different away from the training data,the limits of which are denoted by the white box outline on the contour plots.In particular,the component model doesnot givea symmetric response,compare Fig.8a and Fig.8b,which is to be expected since each component is independently(and imperfectly)trained.Both modelshaveregionsof negativestressand largestressesoutsidethetraining limits.Fig.8c shows expectation for the stress from the crystal elasticity underlying the response of the polycrystalline aggregate.The TBNN response has more complex trends likely due to its formulation on invariants and the 11-stress response of the component model arguably represents the expected stress best,albeit with a distinct error in the gradient.This result illustrates that acceptable cross-validation errors along limited training data does not necessarily lead to comparably acceptableinterpolation,nor extrapolation,with NNs.To further investigate how much data and what variety of data is needed to suffi ciently train the constitutive models,we employed the simple VPunderlying model to generate data for loading modes that are symmetric and monotonic:F(t)=t∑3i=1λiei? ei.In particular,the training datasets Dnare comprised of n trajectories with 100 state points each.Theλifor each trajectory wereuniformly sampled on the2-sphere(using minimum energy quadraturepoints[Sloan and Womersley(2004)]).7Notetheuniform sampling points nest,in thesense that a larger set Dm containsall the points of asmaller set Dn,m>n.Thetesting dataset T consisted

Figure 3:Flow training data:(a)Tension and(b)Shear fl ow evolution with strain for CP model.Top panels:Scaled input invariants.Second panels:Scaled trained tensor basis coeffi cient functions.Third panels:Flow D p response(lines:model,points:data).Bottom panel:Error asfunction of strain scaled by s D p

Figure4:Network optimization for 150 realizations of full representations for both stress and fl ow on CPdata.Training meta parameter for full representations for(I3)stress and(SF)fl ow on CPdata.Error barsdenote min and max of error,and errors arescaled by s T and s D p,1 respectively

Figure 5: Cross validation error as a function of representation for (a) stress, (b) flow for CP data. Errors are scaled by s T and s D p , respectively Refer to Tab. 2 for stress representations and Tab. 3 for the fl ow representations

Figure6:Stresstensor basiscoeffi cients in shear and tension using different bases

Figure7:Flow Tensor basis coefficientsin shear and tension using DZand T1 bases
of 10 trajectories given by random samples on the 2-sphere.Fig.9 shows the change in accuracy of the model relative to the random sample test T set and the information gain with increasing thesizeof thetraining Dndataset.Thedecreasing errorsin Figs.9aand 9b with moretraining suggestscompletenessof therepresentation,and theslightly higher rate of convergencefor thelarger network indicates thecomplexity of theunderlying function.Also thevariability of themodelsisdecreasing with moredata,which givescontext for the variability of themodelstrained only with the CPdata.Thedecreaserateismodest,~n-a,where a∈[0.2,0.5]and n is the number of trajectories that each contain 100 points,but thevariability in responseisalso decreasing with moredata.
To measure of how much information has been gained by training the NN relative to its untrained state,we usethe Kullback-Leibler(KL)divergence:

evaluated with the assistance of standard kernel density estimators.Here Dnis a training set,T is the independent test data,and p(T|Dn)is the probability density function(PDF)of the predictions yjusing an ensemble of models Mk∈M and the(fi xed)data inputs xj,j indexesthestateand prescribed strain,and p(T)≡p(T|D0).In Fig.9c,d weseethat:(a)Both thestressand fl ow modelsaresteadily differentiating themselvesfrom their untrained state with increased training data,(b)The largest changes appear to occur in the initial increases in training data and yet convergence of the KL divergence is not reached even with D64,and(c)The stress is gaining more information from the low strain data and the fl ow model is gaining the most information from the post-yield data,which is physically intuitive.
As a prelude to studying the dynamic stability of our plasticity TBNN model,we test the TBNN formulations’sensitivity to noise by randomly perturbing the inputs by 1%using the CP training data.The response to the perturbations is fairly uniform over the range strains(not shown).As Fig.10 shows,theoutput variancefor most of themodelsison-par with theinputvariance,theexceptionsbeing tied to thepresenceof thenoisy invariant trσ.Clearly,pruning ill-conditioned invariants is crucial for stability.

Figure8:Stress responseof(a),(b)component Eij,(c)underlying crystal,and(d)TBNN E3 modelsto bimodal stretch.The TBNN 11-stress(c)and 22-stress(notshown)responses aresymmetric acrossthediagonal.Notewhitebox outlineslimit of training datawhich lies along theaxes

Figure 9:Error as a function of training data span and network size(a,b),and model information content relative to an uninformed/untrained model(c,d).Note each training set/trajectory has 100 state samples from the VP(known underlying)model and the convergence rates are reported in terms of number training sets(not number of state samples).Also,theerror barsrefl ect thevariancein thetraining errorsacrosstheensemble of NN

Figure 10: Variance of models in response to 1% Gaussian input noise: (a) Stress, (b) Flow,on-diagonal components on left, off-diagonal components on right
Generally speaking,errorsin thepredictionsof theproposed TBNNplasticity modelscome from:errors in the elastic model,those in the fl ow rule,and those engendered by the integration scheme.In preliminary work,we integrated the rate given by training data to tune thetolerancesof theintegration schemeand ensurethe integrator error isnegligible.In Fig.11 we show Lyapunov-like stability tests using a 3×4 E3/5×8 T1 TBNN model trained ona D64datasetfromtheknownclosed-form VPmodel.First,weperturbtheinitial conditions of state Fp(0)for a random(monotomic)loading mode F(t)and compare the responseof theunderlying model(gray lines)to that of an analytic stress(Eq.(29))/TBNN fl ow model hybrid(colored)for thisensembleof initial conditions.The TBNN responseis on par with that of thetruemodel,albeit with adistinct bias toward higher stress.Second,we compare the same models with a Fp(0)=I initial condition but with an imperfect stressmodel created by randomly perturbing the Young’s modulus E of theanalytic stress model.Hereagain,the TBNNresponseisonpar withthetruemodel and yet artifactsinthe trajectoriesareclearly present.Third,werepeated thefirstinvestigationwithat TBNNwith both a ML fl ow and a ML stressmodel.Theresultsarelargely similar to theresponseof the TBNN with the true stress model albeit with additional artifacts in trajectories.Lastly,we explore the sensitivity of the trajectory errors to fl ow model network size.Fig.11d shows that the trajectory errorsfor the fl ow model trained on VPdata are relatively insensitive to the NN dimensions.Also,sincethevarianceof theresultsdoesnotincreasewith strain,the errors are apparently primarily due to the stress representation in this loading mode.The inset of Fig.11d demonstrates the necessity of suffi cient variety of training data.Here we plot the fraction of the models that stably reach double the training strain.Apparently,in thisapplication,training on atleast25 trajectoriesisnecessary to achieverobustpredictions outsidethetraining data.
Lastly,we return to models trained on the tension and shear CPdata.Fig.12 shows the predictionsof a TBNN 3×4E3stressmodel with NNfl ow modelsof varioussizes.Fig.12a and 12b demonstrate that the predictions are essentially self-consistent with the training data.Also the fanning out of the trajectories is generally consistent with accumulation of errors from integrating an imperfect model.It appears that,as the plastic fl ow develops,non-smooth transitions occur which make some trajectoriesjump to paths neighboring the true/training path.Fig.12c and 12d show theresultsfor bonafi depredictions:(c)Illustrates a combined simple shear and tension loading mode,where[F(t)]11=(1+t),[F(t)]21=1/2t,and the other directions have traction-free boundary conditions;and(d)Illustrates a non-monotonic tension-then-compression mode at a different rate,[F(t)]11=(1+3t)for t∈[0,0.02]and[F(t)]11=(1.06-3t)for t∈[0.02,0.06].For these modes the results areconsiderably lessstable,especially in themixed tension-shear modewhich pointsto the stress model being the main issue(as discussed in the previous section).Even the tension phaseof thenon-monotonic loading leadsto decreased stability and accuracy compared to the tension only case,apparently due to the change in strain rate.Lastly,in these modes noneof thelarger 5×12network fl ow modelstested werestable,whichgivesmoreevidence that themain issueisalack of sufficient variety in thetraining data.
In this work we generalized the TBNN framework to fully take advantage of classical representation theory.By embedding constraints and properties directly in the structure and formulation of the NNs for stress and plastic fl ow,we were able to reduce the amount of training required for valid models compared to current component-based NN models.The constraints of plasticity phenomenology and the trade-offs between learning and embedding the properties lead to a variety of models and hence to a model selection process.Weshowed that traditional cross-validation errorsarenot suffi cient for thedownselection process and,for example,stability with respect to perturbation needs to be considered for a viable model.We also illustrated the facts that:Given limited data,the formulationsareinsensitiveto anumber of meta-parametersand variables,in particular the selected functional inputs;and,therearetrade-offsbetween model complexity and property preservation,such as preserving zero-stress-at-zero-strain.Using a known underlying data model,we demonstrated that the enhanced TBNN framework can provide robust and accurate predictions given suffi cient data.This also supports the speculation that richer training setsmay enhancetheability to discriminatebetween model representations.

Figure 11:Lyapunov bundle of trajectories for models on VP(known model)data:(a)A perfect stress model and a NN fl ow model with perturbed initial conditions,(b)Imperfect stressmodels(modulus E random)and a NN fl ow model,(c)A NN stressmodel and a NN fl ow model with perturbed initial conditions.(d)Ensemble of(unperturbed)NN stressand NN fl ow models.Deviation is with respect to an unperturbed trajectory,gray lines:Exact model,colored lines:TBNN.Inset of(d)shows the fraction of the models that reach the doubleduration of thetraining dataasafunction of theamount of training datato illustrate the models’stability in extrapolation
In addition,we demonstrated that the tension(and shear)experiments traditionally used in model calibration are likely insuffi cient to fully train a NN model,as formulated in the TBNN framework or via a component formulation that generally displayed worse performance.Someother practical findingsof thiswork:
?formulation of NNs based on invariants,such as the TBNN,reduce the required amountof training databutalso imposenon-linear relationsbetween thecomponents of the inputs and output,
?an ensemble of trained models give a truer indication of the performance of a given NN formulation and architecture than asingle calibrated model,
?cross-validationerrorsgenerally do notdiscriminatebetween functionally equivalent constitutive formulationsgiven limited data,

Figure12:Prediction of TBNNwith ML stressand ML fl ow modelstrained on CPtension and shear data:(a)Tension,(b)Shear,(c)Combined tension and shear,(d)Tension and then compression.Noteloading in(d)isat adifferent ratefrom training data
?Good cross-validation error based on a given dataset does not ensure stability since predictionstypically visit neighboring trajectories,
?Cross-validation indicationsof over-fitting by overly large NNsarecorroborated by instabilitiesin predictions,
?Pruning badly conditioned invariantsiscrucial for stability when neighboring inputs are sampled but inclusion of these inputs does not always affect overall crossvalidation errors,
?Enforcing zero-output at zero-input with a higher order basis results in more complex coeffi cient functions,
?Apparently,training a general invariant formulation on limited data does not result in simplicity in the component functions,
?Rank considerations,such as using the same number of network nodes as input invariants,should only be a starting point since limited data can lead to NN to compressthe required number of invariants internally,
?A reduced basis can perform well if the data complexity and coverageis low and is similar to theprediction regime,
?Metrics,such asthe KL divergenceof the NN weights,can beused to indicatewhat data is helping inform the particular constitutive functions,and what is not,thus leading to adaptivesampling schemesand design of experiments,
?Common experimental data,e.g.tension tests,are likely insuffi cient to train robust NN models,even accounting for over-fi tting,since these tests explore a small fraction of theinput space.
Given these results and the lack of a distinctly optimal formulation and network architecture,our bestrecommendation,given tension and shear data,isto employ a3×4 E3 stress model that has the same number of inputs as invariants and satisfi es the zero-output condition intrinsically,together witha3×8 T1fl ow model which isapparently compressing thespace of input invariantsbut requiresdepth to fully represent the phenomenology.
We speculate a profitable next step would be to use dropout layers and other pruning methods,such as compressive sensing/L1 regularization,to reduce the model complexity and hopefully increase stability for a given network’s dimensions.Such an approach may also help identify better performing representations by eliminating weak connections and their associated basis coefficients.Along the same lines,the present work indicates that assessing which coefficient functions are informed by the data,and how the addition of morecomplex trajectoriesthatfi ll theinputspaceaffectstraining,may lead tomoreoptimal representations.The source of this additional data could be multi-axial experiments or full-fi eld data,such as from digital image correlation.Lastly,developing an implicit time integrator based on derivatives of the neural network is another logical step in the development of viable NN based constitutivemodels.
Acknowledgement:We relied on Albany (https://github.com/gahansen/Albany), Dream3d(http://dream3d.bluequartz.net), Lasagne (https://lasagne.readthedocs.io/en/latest/), and Theano (http://deeplearning.net/software/theano/), to accomplish this work. We also wish to thank J. Ling for helping to upgrade the TBNN package to support the modeling approach described in this paper. Clay Sanders acknowledges support from the United States Department of Energy through the Computational Sciences Graduate Fellowship(DOE CSGF) under grant number: DE-FG02-97ER25308. This work was supported by the LDRD program at Sandia National Laboratories, and its support is gratefully acknowledged. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Computer Modeling In Engineering&Sciences2018年12期