Inference of Genetic Regulatory Networks by Evolutionary Algorithm and H ∞ Filtering

2023-06-14 来源：步旅网

InferenceofGeneticRegulatoryNetworksbyEvolutionaryAlgorithmandH∞Filtering

LijunQianandHaixinWang

DepartmentofElectricalandComputerEngineering

PrairieViewA&MUniversityPrairieView,Texas77446

Email:LiQian,HWang@pvamu.edu

Abstract—Thecorrectinferenceofgeneticregulatorynetworksplaysacriticalroleinunderstandingbiologicalregulationinphenotypicdeterminationanditcanaffectadvancedgenome-basedtherapeutics.Inthisstudy,weproposeajointevolutionaryalgorithmandH∞ﬁlteringapproachtoinfergeneticregulatorynetworksusingnoisytimeseriesdatafrommicroarraymea-surements.Speciﬁcally,aniterativealgorithmisproposedwheregeneticprogrammingisappliedtoidentifythestructureofthemodelandH∞ﬁlteringisusedtoestimatetheparametersineachiteration.Theproposedmethodcanobtainaccuratedynamicnonlinearordinarydifferentialequation(ODE)modelofgeneticregulatorynetworksevenwhenthenoisestatisticsisunknown.Bothsyntheticdataandexperimentaldatafrommicroarraymeasurementsareusedtodemonstratetheeffectivenessoftheproposedmethod.Withtheincreasingavailabilityoftimeseriesmicroarraydata,thealgorithmdevelopedinthispapercouldbeappliedtoconstructmodelstocharacterizecancerevolutionandserveasthebasisfordevelopingnewregulatorytherapies.

I.INTRODUCTION

Ageneticregulatorynetwork(GRN)isacollectionofDNAsegmentsinacellwhichinteractwitheachotherandwithothersubstancesinthecell,therebygoverningthegenetranscriptions.Thecorrectinferenceofgeneticregulatorynetworksplaysacriticalroleinunderstandingbiologicalreg-ulationinphenotypicdeterminationanditcanaffectadvancedgenome-basedtherapeutics.Inlightoftherecentdevelopmentofhigh-throughputDNAmicroarraytechnology,itbecomespossibletodiscoverGRNs,whicharecomplexandnonlinearinnature.Speciﬁcally,theincreasingexistenceofmicroar-raytime-seriesdatamakespossiblethecharacterizationofdynamicnonlinearregulatoryinteractionsamonggenes.Themodeling,analysisandcontrolofGRNsarecriticalforstudy-ingcancerevolutionandmayserveasthebasisfordevelopingnewregulatorytherapies.

BecauseGRNmodelsaredifﬁculttodeducesolelybymeansofexperimentaltechniques,computationalandmath-ematicalmethodsareindispensable.MuchresearchhasbeendoneonGRNmodelingbylineardifferential/differenceequa-tionsusingtime-seriesdata,forexample,[1-8],justtonameafew.Thebasicideaistoapproximatethecombinedeffectsofdifferentgenesbymeansofaweightedsumoftheirexpressionlevels.In[5],aconnectionistmodelisusedtomodelsmallgenenetworksoperatingintheblastodermofDrosophila.In[1],theconcentrationsofmRNAandproteinaremodeledbylineardifferentialequations.Asimpleformof

linear󰀁nadditivefunctionsissuggestedby[2],wheredxi/dt=j=1wijxj.Thedegradationrateofgenei’smRNAandenvironmentaleffectsareassumedtobeincorporatedintheparameterswijandtheirinﬂuenceongenei’sexpressionlevelxiisassumedtobelinear.Amethodtoobtainacontinuouslineardifferentialequationmodelfromsampledtime-seriesdataisproposedin[7].Foraddedbiologicalrealism(allconcentrationsgetsaturatedatsomepointintime),asigmoid(squashing)functionmaybeincludedintotheequation.Ithasbeenshownthatthissortofquasi-linearmodelcanbesolvedbyﬁrstapplyingtheinverseofthesquashingfunction[3].Inourstudy,aGRNismodeledbycontinuousnonlinearOrdinaryDifferentialEquations(ODEs).Comparedtolinearmodels,identiﬁcationofthenonlineardifferentialequationmodeliscomputationallymoreintensiveandcanrequiremoredata;however,therangeofnonlinearbehaviorsex-hibitedbyGRNscanbemorethoroughlyunderstoodwithnonlineardifferentialequations.Inaddition,wellestablisheddynamicalsystemstheoryisavailabletocharacterizethedynamicsproducedbythesemodels.Whenmoretime-seriesdatabecomeavailableowingtoadvancesinmicroarrayorothertechnologies,andassumingcontinuedimprovementincomputationalcapability,itcanbeexpectedthatcontinuousnonlineardynamicmodelswillplayacriticalroleinrevealingcomplicatedgenebehavior.

Ingeneral,modelinggeneregulatorynetworksisanonlinearidentiﬁcationproblem.AssumingthereareNgenesofinterestandxidenotesthestate(suchasthemicroarrayreading)oftheithgene,thenthedynamicsoftheGRNmaybemodeledas

dxi

=dt

asuniversalapproximators.Inordertomitigatetheeffectof“thecurseofdimensionality”,onlysecond-degreepoly-nomialsareselected.Notethatanadvantageofusinglow-degreepolynomialmodelsisthatevenwhenthereexistssomemodelmismatch,thesemodelsmaybesufﬁcientlyaccuratetorepresentmanyrealsystems,andthusarewidelyutilizedinpractice[9].WenotethatasimilarGRNmodelhasbeenadoptedby[10],butwithoutnoisebeingincludedinthemodel.μijandνiareparameternoiseandexternalnoise,respectively,anditisassumedthatthenoisestatisticssuchasthecovariancematricesareunknown.

ThenoisynatureofGRNsismodeledexplicitlyinthisstudy.Thedeterministicmodel(withoutnoise)correspondstothenominalcase,whilethevariousstochasticeffectsareincludedasnoisedisturbances.Forexample,thereisconsiderableexperimentalevidencethatindicatesthepresenceofsigniﬁcantstochasticityintranscriptionalregulationinbotheukaryotesandprokaryotes[11].Theinherentstochasticityofbiochemicalprocesses(transcriptionandtranslation)ismod-eledasnoiseintheparameters(μij),whichcorrespondstothe“intrinsicnoise”mentionedintheliterature[12].Othereffects,suchasthosefromgenesnotbeenincludedinthemicroarray,theamountofRNApolymerase,levelsofregulatoryproteins,andtheeffectsofmRNAandproteindegradation,aremodeledbytheexternalnoise(νi)[12].PreviousworkhasmodeledthesenoisetypesbyGaussianwhitenoiseprocesses[13].Ifthenoisestatisticsareknown,thenKalmanﬁltercanbeappliedtogettheoptimalestimatesoftheparameters[21].Onthecontrary,arobustﬁltersuchasanH∞ﬁlter,hastobeusedtoobtaintheoptimalparameterestimateswhenthenoisestatisticssuchasthecovariancematricesareunknown.Inthispaper,atwo-stepprocedureisproposedtoidentifyfi.Firstly,geneticprogramming(GP)isappliedtodeterminethenonlinearterms;thenthecorrespondingparametersassociatedwitheachtermareestimatedbyH∞ﬁltering.

Theremainderofthepaperisorganizedasfollows:TheproposedframeworkandtheiterativealgorithmareillustratedinSectionII.SimulationresultsaregiveninSectionIII.SectionIVcontainssomeconcludingremarks.

II.ALGORITHMDESCRIPTION

Thetaskofidentifyinggeneregulatorynetworksmaybeconsideredasanoptimizationproblem.Thegoalistomini-mizetheidentiﬁcationerrorandkeepthemodelassimpleaspossible,whichmaybeachievedbyminimizingthefollowingﬁtnessfunction

fitness=

N󰀂i=1

NoCreate Initial Random Population(M) and Apply H󰂒EndYesRun=N?Run:=Run+1StartGen:=0YesTermination Criterion Fitness Satisfied for RunDesignate Result for RunNoi:=0Apply fitness Measure to individual in the populationNoi:=M?Yesi:=0Gen=:Gen+1i:=i+1Yesi:=M?NoApply H󰂒 to individual in the populationi:=i+1Select genetic OperationPrSelect 5% individual Based on FitnessPerformReproductionInsert into New PopulationPcSelect 80% individual Based on FitnessPerformCrossoverInsert Offspring into New PopulationPmSelect 10% individual Based on FitnessPaPerform MutationInsert Mutant into New PopulationSelect an Architecture Altering Operation Based on its Specified Probability (5%)Perform the Architecture Altering OperationInsert Offspring into New PopulationSelect one Individual Based on Fitness Fig.1.TheiterativeprocessofjointGPandH∞ﬁltering.(Thegeneticprogrammingprocesshasfouroperations:reproduction,crossover,mutationandselection.H∞ﬁlteringisemployedtoestimatetheparametersforeverygeneration.)

[η1

M󰀂k=1

(xi(k)−xtari(k))+η2Γi]

(3)

Sinceitisaglobalnonlinearoptimizationproblem,anested

optimizationstructureisadopted,wheregeneticprogrammingisappliedtodeterminethenonlinearterms(globalopti-mization)whileH∞ﬁlteringisemployedtoestimatethecorrespondingparametersforeachterm(localoptimization)ineachiteration.SuchadecompositionoftheproblemintoastructuralpartsolvedbyGPandaparameteroptimizationpartsolvedbyH∞ﬁlteringreducethecomplexitysigniﬁcantlyandspeedupconvergence.ThedetailedproceduresoftheproposediterativealgorithmisillustratedinFig.1.

Inordertodealwithlargenumberofgenes,theoptimizationproblemisdecoupledintoNsub-problemswiththeithsub-problemfocusingontheithgene.Becausethetime-seriesdataofothergenesareﬁxed(frommeasurements)whenwearefocusingonanindividualgene,wecansolvetheoptimizationproblemonegeneatatime.ThisapproachmakestheinferenceoflargeGRNsfeasible.

whereMisthenumberofdatapoints,bethetargettimeseriesandxibetheobtainedtimeseriesgivenbytheobtaineddifferentialequation.Γiisapenaltytermanditischosenasthenumberoftermsinfi,i.e.,Γi=Li.η1>0andη2>0aretheweightsontheestimationerrorandthemodelcomplexity,respectively.

xtari

+Layer 1Node1 (* )Node 2 ( X2 )Layer 2Node1 (* )Node2(X1 )Layer 3Node1(X2 )Node2(X3 )Fig.2.Anexampleofthetreesturctureofadiferentialequation.

Δt

A.GeneticProgramming

Geneticprogramming[19]isatypeofevolutionaryalgo-rithms.Allevolutionaryalgorithmsworkwithapopulationofindividuals,whereeachindividualmaybeasolutionoftheoptimizationproblem.GPoperatesonatreestructure,whichisﬂexibleenoughtorepresentrelationshipsefﬁciently.Theleavesofatreerepresentvariablesorconstants,whiletheothernodesimplementoperators.AnexampleofatreestructureisshowninFig.2,wheretwooperations,multiplication(*)andaddition(+),areused.Thecorrespondingequationisdx1

dt=

󰀁M

θ,

noisestatisticsareunknown.Wealsoapplyourmethodtotherealmeasurementdatafrommicroarrayexperiment.A.SyntheticData

Inthispartofthesimulation,weusedataofametabolicnetwork,calledtheE-cellsystem(apartofthebiologicalphospholipidpathway),thatconsistsofthreesubstances.Thisnetworkcanbeapproximatedas:

x˙1=−10.32x1x3

x˙2=9.72x1x3−17.5x2x˙3=−9.7x1x3+17.5x2

1.4wwww−9−−1

Gwc−8−−1Guc

−7.128.264−16.14−19.78Guv−9−−1

(10)

Here,weapplyRunge-Kuttamethodtocalculatethesyntheticdataandaddtheintrinsicnoiseandtheexternalnoise(bothareassumedtobeGaussianwhitenoise).

SincetherearethreesubstancesintheE-cellsystem,thetreestructureshouldincludeasubsetofthefollowingtermsontheright-handsideofthedifferentialequation:x1,x2,

x3,x1x2,x1x3,x2x3,x21,x2,x3.1000individualsareﬁrstproducedandrankedaccordingtotheﬁtnessvalue.5%oftheindividualswiththeminimumﬁtnessvaluearekeptforthenextgeneration.80%individualsareperformedcrossoverand10%individualsareperformedmutationandtheremaining5%areforotheroperations.

Wecompareouralgorithmwiththeapproachin[21]fortwodifferentcases.Incase1,thenoisecovarianceisassumedtobeknown.matrices󰀃󰀄areQ1=10,Q2=󰀃󰀄Thecovariance

0.0100.20

,R1=20,R2=0.2,,Q3=

00.0100.2

R3=0.01.Itisassumedthatμijandνiareuncorrelatedforalliandj.Whileincase2,insteadofﬁxedcovariancematrices,itisassumedthatthecovariancematricesarenotknownexactly,thatis,thecovariancematricesofμijandνi

󰀅󰀅areQi=(1+qi)QiandRi=(1+ri)Ri,whereqiandriare

22]=σq,randomvariableswithE[qi]=E[ri]=0andE[qii

2222

E[ri]=σri.Variancesaregivenbyσq1=10,σq2=0.2,2222

=10,σr=20,σr=0.3,andσr=15.Therandomσq3123

variablesqiandriareuncorrelatedforalli.

TheresultsaresummarizedinTableI.ItisobservedthatGPplusKalmanﬁlterperformsverywellwhenthenoisecovarianceisknown.However,GPplusH∞ﬁlteroutperformsGPplusKalmanﬁlterwhenthenoisecovarianceisnotknownexactly.ThisisalsoconﬁrmedbythetimeseriesshowninFig.3.

ThecoefﬁcientsintheE-CellmodelaredeterminedbyH∞ﬁltering.TheconvergenceoftheH∞ﬁlteringalgorithmisanimportantissuewhenapplyingH∞ﬁltertothenoisyinputs.TheconvergenceoftheH∞ﬁlterincludestheconvergenceoftheestimatewˆ(n)andtheconvergenceoftheestimationerroreˆ(n).Fig.4showstheconvergenceoftheH∞ﬁlter.B.MicroarrayData

Weconsidertime-seriesgene-expressiondatacorrespondingtoyeastproteinsynthesis.Here,thedatafor3genes(HAP1,

1.21Concentration0.80.60.4Data(w/o Noise)Data(w/o Noise)Data (w/o Noise)GP+HinfGP+HinfGP+HinfGP+KLGP+KLGP+KL0.20010203040Time5060708090100Fig.3.TimeSeriesforE-CellsimulationbyKalmanandH∞ﬁltering.“data”istheoriginaldatawithoutnoise.“GP+KL”and“GP+Hinf”arethesimulationdatafromGPplusKalmanﬁlterandGPplusH∞ﬁlter,respectively.

CYC7andCYB2)arepickedbecausetherelationsamongthemhavebeenrevealedbybiologicalexperiments.Thetraceofthetime-seriesmicroarraymeasurementdatafrom[22]isusedinthispartofthesimulation,where17samplingdatapointsareprovidedforeachgenebytheexperiments.Thesamplingdatapointsareevenlyspacedandtheobservationintervalis10minutes.Inthesimulation,1000individualsareproducedineachgeneration.100generationsarecalculatedtoreachtheminimumﬁtnessvalues.

Thefollowingmodelisobtainedbytheproposedalgorithm:

x˙1=(−0.006+μ11)x1+(0.00835+μ12)x2+ν1x˙2=(−0.3661+μ21)x1+(−0.476+μ22)x2+ν2(11)x˙3=(−1.6124+μ31)x1+(0.45+μ32)x2+ν3

ThetrajectoriesforCYB2,HAP1andCYC7areshowninFig.5.

Theobtainedrelationshipsamonggenesareinagreementwithbiologicalexperimentalﬁndings.Forexample,weob-servethatHAP1repressesgeneCYC7andCYB2activatesCYC7.HAP1behavesasarepressor[23].

10.9REFERENCES

[1]T.Chen,H.L.He,andG.M.Church,“Modelinggeneexpressionwith

differentialequations”,Pac.Symp.Biocomputing,4:29-40,1999.[2]M.K.S.Yeung,J.Tegna˜r,andJ.J.Collins,“Reverseengineeringgene

networksusingsingularvaluedecompositionandrobustregression”,Proc.Natl.Acad.Sci.USA,99:6163-6168,2002.

[3]D.C.Weaver,C.T.Workman,G.D.Stormo,“Modelingregulatory

networkswithweightmatrices”,Pac.Symp.Biocomputing,4:112-123,1999.

[4]P.D’haeseleer,X.Wen,S.Fuhrman,andR.Somogyi,“LinearModeling

ofmRNAexpressionlevelsduringCNSdevelopmentandinjury”,Pac.Symp.Biocomputing,4:41-52,1999.[5]E.Mjolsness,D.H.Sharp,andJ.Reinitz,“Aconnectionistmodelof

development”,JTheorBiol.,152(4):429-53,Oct1991.

[6]H.deJong,“Modelingandsimulationofgeneticregulatorysystems:a

literaturereview”,JournalofComputationalBiology,9(1):67-103,2002.[7]I.Tabus¸,C.D.Giurcaneanu,andJ.Astola,“Geneticnetworksinferred

fromtimeseriesofgeneexpressiondata”,FirstInternationalSymposiumonControl,CommunicationsandSignalProcessing,pp.755-758,Hammamet,Tunisia,2004.

[8]M.J.L.deHoon,S.Imoto,K.Kobayashi,N.Ogasawara,andS.Miyano,

“Inferringgeneregulatorynetworksfromtime-orderedgeneexpressiondataofBacillussubtilisusingdifferentialequations”,Pac.Symp.Biocomputing,8:17-28,2003.

[9]O.Nelles,NonlinearSystemIdentiﬁcation,Springer,2001.

[10]S.Ando,E.SakamotoandH.Iba,“EvolutionaryModelingandInference

ofGeneNetwork”,InformationScience,Vol.145,pp.237-259,2002.[11]T.KeplerandT.Elston,“StochasticityinTranscriptionalRegulation:

Origins,Consequences,andMathematicalRepresentations”,BiophysJ,Vol.81,No.6,pp.3116-3136,Dec2001.

[12]P.Swain,M.Elowitz,andE.Siggia,“Intrinsicandextrinsiccontribu-tionstostochasticityingeneexpression”,Proc.Natl.Acad.Sci.USA,99:12795-12800,2002.

[13]J.Hasty,J.Pradines,M.Dolnik,andJ.J.Collins,“Noise-basedswitches

andampliﬁersforgeneexpression”,Proc.Natl.Acad.Sci.USA,97:2075-2080,2000.

[14]U.ShakedandY.Theodor,“H∞optimalestimation:Atutorial”,in

Proc.31stIEEECDC,pp.2278-2286,1992.

[15]B.HassibiandT.Kailath,“H∞adaptiveﬁltering”,inProc.IEEE

ICASSP95,Detroit,MI,pp.949-952,1995.

[16]X.ShenandL.Deng,“GametheoryapproachtodiscreteH∞ﬁlter

design”,IEEETransactionsonSignalProcessing,Vol.45,No.4,pp.1092-1095,April1997.

[17]X.ShenandL.Deng,“Adynamicsystemapproachtospeech

enhancementusingH-inﬁnityﬁlteringalgorithm”,IEEETransactionsonSpeechandAudioProcessing,Vol.7,pp.391-399,1998.

[18]D.Simon,OptimalStateEstimation:Kalman,HInﬁnity,andNonlinear

Approaches,Wiley,2006.

[19]J.R.Koza,GeneticProgramming:OntheProgrammingofComputers

byMeansofNaturalSelection,MITpress,1992.

[20]M.GrewalandA.Andrews,KalmanFiltering:TheoryandPractice,

PrenticeHall,1993.

[21]H.Wang,L.Qian,andE.Dougherty,“InferenceofGeneRegulatory

NetworksusingGeneticProgrammingandKalmanFilter”,Gensips,2006.

[22]http://sgdlite.princeton.edu/download/yeastda

Absolute Value of the Error0.80.70.60.50.40.30.20.100102030Number of Iteration405060708090100Fig.4.

ExperimentallearningcurvesoftheE-cellmodelbyH∞ﬁltering

700650HAP1CYC7CYB2600550Fluorescence level50045040035030025020002040Time6080100120Fig.5.ThetrajectoriesforCYB2,HAP1andCYC7.

IV.CONCLUSIONS

Themicroarraymeasurementsusuallycontainratherlargenoiseandthestatisticsofthenoisearenotknownexactly.Inthisstudy,noiseismodeledexplicitlyintheproposednonlinearODEmodeloftheGRN.AjointGPandH∞ﬁlteringapproachisappliedtoinfertheGRN,whereH∞ﬁlteringprovidesoptimalparameterestimationsunderuncer-tainties.Simulationresultsdemonstratetheeffectivenessoftheproposedscheme.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

Inference of Genetic Regulatory Networks by Evolutionary Algorithm and H ∞ Filtering