LijunQianandHaixinWang
DepartmentofElectricalandComputerEngineering
PrairieViewA&MUniversityPrairieView,Texas77446
Email:LiQian,HWang@pvamu.edu
Abstract—Thecorrectinferenceofgeneticregulatorynetworksplaysacriticalroleinunderstandingbiologicalregulationinphenotypicdeterminationanditcanaffectadvancedgenome-basedtherapeutics.Inthisstudy,weproposeajointevolutionaryalgorithmandH∞filteringapproachtoinfergeneticregulatorynetworksusingnoisytimeseriesdatafrommicroarraymea-surements.Specifically,aniterativealgorithmisproposedwheregeneticprogrammingisappliedtoidentifythestructureofthemodelandH∞filteringisusedtoestimatetheparametersineachiteration.Theproposedmethodcanobtainaccuratedynamicnonlinearordinarydifferentialequation(ODE)modelofgeneticregulatorynetworksevenwhenthenoisestatisticsisunknown.Bothsyntheticdataandexperimentaldatafrommicroarraymeasurementsareusedtodemonstratetheeffectivenessoftheproposedmethod.Withtheincreasingavailabilityoftimeseriesmicroarraydata,thealgorithmdevelopedinthispapercouldbeappliedtoconstructmodelstocharacterizecancerevolutionandserveasthebasisfordevelopingnewregulatorytherapies.
I.INTRODUCTION
Ageneticregulatorynetwork(GRN)isacollectionofDNAsegmentsinacellwhichinteractwitheachotherandwithothersubstancesinthecell,therebygoverningthegenetranscriptions.Thecorrectinferenceofgeneticregulatorynetworksplaysacriticalroleinunderstandingbiologicalreg-ulationinphenotypicdeterminationanditcanaffectadvancedgenome-basedtherapeutics.Inlightoftherecentdevelopmentofhigh-throughputDNAmicroarraytechnology,itbecomespossibletodiscoverGRNs,whicharecomplexandnonlinearinnature.Specifically,theincreasingexistenceofmicroar-raytime-seriesdatamakespossiblethecharacterizationofdynamicnonlinearregulatoryinteractionsamonggenes.Themodeling,analysisandcontrolofGRNsarecriticalforstudy-ingcancerevolutionandmayserveasthebasisfordevelopingnewregulatorytherapies.
BecauseGRNmodelsaredifficulttodeducesolelybymeansofexperimentaltechniques,computationalandmath-ematicalmethodsareindispensable.MuchresearchhasbeendoneonGRNmodelingbylineardifferential/differenceequa-tionsusingtime-seriesdata,forexample,[1-8],justtonameafew.Thebasicideaistoapproximatethecombinedeffectsofdifferentgenesbymeansofaweightedsumoftheirexpressionlevels.In[5],aconnectionistmodelisusedtomodelsmallgenenetworksoperatingintheblastodermofDrosophila.In[1],theconcentrationsofmRNAandproteinaremodeledbylineardifferentialequations.Asimpleformof
linearnadditivefunctionsissuggestedby[2],wheredxi/dt=j=1wijxj.Thedegradationrateofgenei’smRNAandenvironmentaleffectsareassumedtobeincorporatedintheparameterswijandtheirinfluenceongenei’sexpressionlevelxiisassumedtobelinear.Amethodtoobtainacontinuouslineardifferentialequationmodelfromsampledtime-seriesdataisproposedin[7].Foraddedbiologicalrealism(allconcentrationsgetsaturatedatsomepointintime),asigmoid(squashing)functionmaybeincludedintotheequation.Ithasbeenshownthatthissortofquasi-linearmodelcanbesolvedbyfirstapplyingtheinverseofthesquashingfunction[3].Inourstudy,aGRNismodeledbycontinuousnonlinearOrdinaryDifferentialEquations(ODEs).Comparedtolinearmodels,identificationofthenonlineardifferentialequationmodeliscomputationallymoreintensiveandcanrequiremoredata;however,therangeofnonlinearbehaviorsex-hibitedbyGRNscanbemorethoroughlyunderstoodwithnonlineardifferentialequations.Inaddition,wellestablisheddynamicalsystemstheoryisavailabletocharacterizethedynamicsproducedbythesemodels.Whenmoretime-seriesdatabecomeavailableowingtoadvancesinmicroarrayorothertechnologies,andassumingcontinuedimprovementincomputationalcapability,itcanbeexpectedthatcontinuousnonlineardynamicmodelswillplayacriticalroleinrevealingcomplicatedgenebehavior.
Ingeneral,modelinggeneregulatorynetworksisanonlinearidentificationproblem.AssumingthereareNgenesofinterestandxidenotesthestate(suchasthemicroarrayreading)oftheithgene,thenthedynamicsoftheGRNmaybemodeledas
dxi
=dt
1-4244-1198-X/07/$25.00 ©2007 IEEE21SSP 2007
asuniversalapproximators.Inordertomitigatetheeffectof“thecurseofdimensionality”,onlysecond-degreepoly-nomialsareselected.Notethatanadvantageofusinglow-degreepolynomialmodelsisthatevenwhenthereexistssomemodelmismatch,thesemodelsmaybesufficientlyaccuratetorepresentmanyrealsystems,andthusarewidelyutilizedinpractice[9].WenotethatasimilarGRNmodelhasbeenadoptedby[10],butwithoutnoisebeingincludedinthemodel.μijandνiareparameternoiseandexternalnoise,respectively,anditisassumedthatthenoisestatisticssuchasthecovariancematricesareunknown.
ThenoisynatureofGRNsismodeledexplicitlyinthisstudy.Thedeterministicmodel(withoutnoise)correspondstothenominalcase,whilethevariousstochasticeffectsareincludedasnoisedisturbances.Forexample,thereisconsiderableexperimentalevidencethatindicatesthepresenceofsignificantstochasticityintranscriptionalregulationinbotheukaryotesandprokaryotes[11].Theinherentstochasticityofbiochemicalprocesses(transcriptionandtranslation)ismod-eledasnoiseintheparameters(μij),whichcorrespondstothe“intrinsicnoise”mentionedintheliterature[12].Othereffects,suchasthosefromgenesnotbeenincludedinthemicroarray,theamountofRNApolymerase,levelsofregulatoryproteins,andtheeffectsofmRNAandproteindegradation,aremodeledbytheexternalnoise(νi)[12].PreviousworkhasmodeledthesenoisetypesbyGaussianwhitenoiseprocesses[13].Ifthenoisestatisticsareknown,thenKalmanfiltercanbeappliedtogettheoptimalestimatesoftheparameters[21].Onthecontrary,arobustfiltersuchasanH∞filter,hastobeusedtoobtaintheoptimalparameterestimateswhenthenoisestatisticssuchasthecovariancematricesareunknown.Inthispaper,atwo-stepprocedureisproposedtoidentifyfi.Firstly,geneticprogramming(GP)isappliedtodeterminethenonlinearterms;thenthecorrespondingparametersassociatedwitheachtermareestimatedbyH∞filtering.
Theremainderofthepaperisorganizedasfollows:TheproposedframeworkandtheiterativealgorithmareillustratedinSectionII.SimulationresultsaregiveninSectionIII.SectionIVcontainssomeconcludingremarks.
II.ALGORITHMDESCRIPTION
Thetaskofidentifyinggeneregulatorynetworksmaybeconsideredasanoptimizationproblem.Thegoalistomini-mizetheidentificationerrorandkeepthemodelassimpleaspossible,whichmaybeachievedbyminimizingthefollowingfitnessfunction
fitness=
Ni=1
NoCreate Initial Random Population(M) and Apply HEndYesRun=N?Run:=Run+1StartGen:=0YesTermination Criterion Fitness Satisfied for RunDesignate Result for RunNoi:=0Apply fitness Measure to individual in the populationNoi:=M?Yesi:=0Gen=:Gen+1i:=i+1Yesi:=M?NoApply H to individual in the populationi:=i+1Select genetic OperationPrSelect 5% individual Based on FitnessPerformReproductionInsert into New PopulationPcSelect 80% individual Based on FitnessPerformCrossoverInsert Offspring into New PopulationPmSelect 10% individual Based on FitnessPaPerform MutationInsert Mutant into New PopulationSelect an Architecture Altering Operation Based on its Specified Probability (5%)Perform the Architecture Altering OperationInsert Offspring into New PopulationSelect one Individual Based on Fitness Fig.1.TheiterativeprocessofjointGPandH∞filtering.(Thegeneticprogrammingprocesshasfouroperations:reproduction,crossover,mutationandselection.H∞filteringisemployedtoestimatetheparametersforeverygeneration.)
[η1
Mk=1
2
(xi(k)−xtari(k))+η2Γi]
(3)
Sinceitisaglobalnonlinearoptimizationproblem,anested
optimizationstructureisadopted,wheregeneticprogrammingisappliedtodeterminethenonlinearterms(globalopti-mization)whileH∞filteringisemployedtoestimatethecorrespondingparametersforeachterm(localoptimization)ineachiteration.SuchadecompositionoftheproblemintoastructuralpartsolvedbyGPandaparameteroptimizationpartsolvedbyH∞filteringreducethecomplexitysignificantlyandspeedupconvergence.ThedetailedproceduresoftheproposediterativealgorithmisillustratedinFig.1.
Inordertodealwithlargenumberofgenes,theoptimizationproblemisdecoupledintoNsub-problemswiththeithsub-problemfocusingontheithgene.Becausethetime-seriesdataofothergenesarefixed(frommeasurements)whenwearefocusingonanindividualgene,wecansolvetheoptimizationproblemonegeneatatime.ThisapproachmakestheinferenceoflargeGRNsfeasible.
whereMisthenumberofdatapoints,bethetargettimeseriesandxibetheobtainedtimeseriesgivenbytheobtaineddifferentialequation.Γiisapenaltytermanditischosenasthenumberoftermsinfi,i.e.,Γi=Li.η1>0andη2>0aretheweightsontheestimationerrorandthemodelcomplexity,respectively.
xtari
22
+Layer 1Node1 (* )Node 2 ( X2 )Layer 2Node1 (* )Node2(X1 )Layer 3Node1(X2 )Node2(X3 )Fig.2.Anexampleofthetreesturctureofadiferentialequation.
Δt
.C
A.GeneticProgramming
Geneticprogramming[19]isatypeofevolutionaryalgo-rithms.Allevolutionaryalgorithmsworkwithapopulationofindividuals,whereeachindividualmaybeasolutionoftheoptimizationproblem.GPoperatesonatreestructure,whichisflexibleenoughtorepresentrelationshipsefficiently.Theleavesofatreerepresentvariablesorconstants,whiletheothernodesimplementoperators.AnexampleofatreestructureisshowninFig.2,wheretwooperations,multiplication(*)andaddition(+),areused.Thecorrespondingequationisdx1
dt=
M
θ,
θ
23
noisestatisticsareunknown.Wealsoapplyourmethodtotherealmeasurementdatafrommicroarrayexperiment.A.SyntheticData
Inthispartofthesimulation,weusedataofametabolicnetwork,calledtheE-cellsystem(apartofthebiologicalphospholipidpathway),thatconsistsofthreesubstances.Thisnetworkcanbeapproximatedas:
x˙1=−10.32x1x3
x˙2=9.72x1x3−17.5x2x˙3=−9.7x1x3+17.5x2
1.4wwww−9−−1
Gwc−8−−1Guc
−7.128.264−16.14−19.78Guv−9−−1
(10)
Here,weapplyRunge-Kuttamethodtocalculatethesyntheticdataandaddtheintrinsicnoiseandtheexternalnoise(bothareassumedtobeGaussianwhitenoise).
SincetherearethreesubstancesintheE-cellsystem,thetreestructureshouldincludeasubsetofthefollowingtermsontheright-handsideofthedifferentialequation:x1,x2,
22
x3,x1x2,x1x3,x2x3,x21,x2,x3.1000individualsarefirstproducedandrankedaccordingtothefitnessvalue.5%oftheindividualswiththeminimumfitnessvaluearekeptforthenextgeneration.80%individualsareperformedcrossoverand10%individualsareperformedmutationandtheremaining5%areforotheroperations.
Wecompareouralgorithmwiththeapproachin[21]fortwodifferentcases.Incase1,thenoisecovarianceisassumedtobeknown.matricesareQ1=10,Q2=Thecovariance
0.0100.20
,R1=20,R2=0.2,,Q3=
00.0100.2
R3=0.01.Itisassumedthatμijandνiareuncorrelatedforalliandj.Whileincase2,insteadoffixedcovariancematrices,itisassumedthatthecovariancematricesarenotknownexactly,thatis,thecovariancematricesofμijandνi
areQi=(1+qi)QiandRi=(1+ri)Ri,whereqiandriare
22]=σq,randomvariableswithE[qi]=E[ri]=0andE[qii
2222
E[ri]=σri.Variancesaregivenbyσq1=10,σq2=0.2,2222
=10,σr=20,σr=0.3,andσr=15.Therandomσq3123
variablesqiandriareuncorrelatedforalli.
TheresultsaresummarizedinTableI.ItisobservedthatGPplusKalmanfilterperformsverywellwhenthenoisecovarianceisknown.However,GPplusH∞filteroutperformsGPplusKalmanfilterwhenthenoisecovarianceisnotknownexactly.ThisisalsoconfirmedbythetimeseriesshowninFig.3.
ThecoefficientsintheE-CellmodelaredeterminedbyH∞filtering.TheconvergenceoftheH∞filteringalgorithmisanimportantissuewhenapplyingH∞filtertothenoisyinputs.TheconvergenceoftheH∞filterincludestheconvergenceoftheestimatewˆ(n)andtheconvergenceoftheestimationerroreˆ(n).Fig.4showstheconvergenceoftheH∞filter.B.MicroarrayData
Weconsidertime-seriesgene-expressiondatacorrespondingtoyeastproteinsynthesis.Here,thedatafor3genes(HAP1,
1.21Concentration0.80.60.4Data(w/o Noise)Data(w/o Noise)Data (w/o Noise)GP+HinfGP+HinfGP+HinfGP+KLGP+KLGP+KL0.20010203040Time5060708090100Fig.3.TimeSeriesforE-CellsimulationbyKalmanandH∞filtering.“data”istheoriginaldatawithoutnoise.“GP+KL”and“GP+Hinf”arethesimulationdatafromGPplusKalmanfilterandGPplusH∞filter,respectively.
CYC7andCYB2)arepickedbecausetherelationsamongthemhavebeenrevealedbybiologicalexperiments.Thetraceofthetime-seriesmicroarraymeasurementdatafrom[22]isusedinthispartofthesimulation,where17samplingdatapointsareprovidedforeachgenebytheexperiments.Thesamplingdatapointsareevenlyspacedandtheobservationintervalis10minutes.Inthesimulation,1000individualsareproducedineachgeneration.100generationsarecalculatedtoreachtheminimumfitnessvalues.
Thefollowingmodelisobtainedbytheproposedalgorithm:
x˙1=(−0.006+μ11)x1+(0.00835+μ12)x2+ν1x˙2=(−0.3661+μ21)x1+(−0.476+μ22)x2+ν2(11)x˙3=(−1.6124+μ31)x1+(0.45+μ32)x2+ν3
ThetrajectoriesforCYB2,HAP1andCYC7areshowninFig.5.
Theobtainedrelationshipsamonggenesareinagreementwithbiologicalexperimentalfindings.Forexample,weob-servethatHAP1repressesgeneCYC7andCYB2activatesCYC7.HAP1behavesasarepressor[23].
24
10.9REFERENCES
[1]T.Chen,H.L.He,andG.M.Church,“Modelinggeneexpressionwith
differentialequations”,Pac.Symp.Biocomputing,4:29-40,1999.[2]M.K.S.Yeung,J.Tegna˜r,andJ.J.Collins,“Reverseengineeringgene
networksusingsingularvaluedecompositionandrobustregression”,Proc.Natl.Acad.Sci.USA,99:6163-6168,2002.
[3]D.C.Weaver,C.T.Workman,G.D.Stormo,“Modelingregulatory
networkswithweightmatrices”,Pac.Symp.Biocomputing,4:112-123,1999.
[4]P.D’haeseleer,X.Wen,S.Fuhrman,andR.Somogyi,“LinearModeling
ofmRNAexpressionlevelsduringCNSdevelopmentandinjury”,Pac.Symp.Biocomputing,4:41-52,1999.[5]E.Mjolsness,D.H.Sharp,andJ.Reinitz,“Aconnectionistmodelof
development”,JTheorBiol.,152(4):429-53,Oct1991.
[6]H.deJong,“Modelingandsimulationofgeneticregulatorysystems:a
literaturereview”,JournalofComputationalBiology,9(1):67-103,2002.[7]I.Tabus¸,C.D.Giurcaneanu,andJ.Astola,“Geneticnetworksinferred
fromtimeseriesofgeneexpressiondata”,FirstInternationalSymposiumonControl,CommunicationsandSignalProcessing,pp.755-758,Hammamet,Tunisia,2004.
[8]M.J.L.deHoon,S.Imoto,K.Kobayashi,N.Ogasawara,andS.Miyano,
“Inferringgeneregulatorynetworksfromtime-orderedgeneexpressiondataofBacillussubtilisusingdifferentialequations”,Pac.Symp.Biocomputing,8:17-28,2003.
[9]O.Nelles,NonlinearSystemIdentification,Springer,2001.
[10]S.Ando,E.SakamotoandH.Iba,“EvolutionaryModelingandInference
ofGeneNetwork”,InformationScience,Vol.145,pp.237-259,2002.[11]T.KeplerandT.Elston,“StochasticityinTranscriptionalRegulation:
Origins,Consequences,andMathematicalRepresentations”,BiophysJ,Vol.81,No.6,pp.3116-3136,Dec2001.
[12]P.Swain,M.Elowitz,andE.Siggia,“Intrinsicandextrinsiccontribu-tionstostochasticityingeneexpression”,Proc.Natl.Acad.Sci.USA,99:12795-12800,2002.
[13]J.Hasty,J.Pradines,M.Dolnik,andJ.J.Collins,“Noise-basedswitches
andamplifiersforgeneexpression”,Proc.Natl.Acad.Sci.USA,97:2075-2080,2000.
[14]U.ShakedandY.Theodor,“H∞optimalestimation:Atutorial”,in
Proc.31stIEEECDC,pp.2278-2286,1992.
[15]B.HassibiandT.Kailath,“H∞adaptivefiltering”,inProc.IEEE
ICASSP95,Detroit,MI,pp.949-952,1995.
[16]X.ShenandL.Deng,“GametheoryapproachtodiscreteH∞filter
design”,IEEETransactionsonSignalProcessing,Vol.45,No.4,pp.1092-1095,April1997.
[17]X.ShenandL.Deng,“Adynamicsystemapproachtospeech
enhancementusingH-infinityfilteringalgorithm”,IEEETransactionsonSpeechandAudioProcessing,Vol.7,pp.391-399,1998.
[18]D.Simon,OptimalStateEstimation:Kalman,HInfinity,andNonlinear
Approaches,Wiley,2006.
[19]J.R.Koza,GeneticProgramming:OntheProgrammingofComputers
byMeansofNaturalSelection,MITpress,1992.
[20]M.GrewalandA.Andrews,KalmanFiltering:TheoryandPractice,
PrenticeHall,1993.
[21]H.Wang,L.Qian,andE.Dougherty,“InferenceofGeneRegulatory
NetworksusingGeneticProgrammingandKalmanFilter”,Gensips,2006.
[22]http://sgdlite.princeton.edu/download/yeastda
Absolute Value of the Error0.80.70.60.50.40.30.20.100102030Number of Iteration405060708090100Fig.4.
ExperimentallearningcurvesoftheE-cellmodelbyH∞filtering
700650HAP1CYC7CYB2600550Fluorescence level50045040035030025020002040Time6080100120Fig.5.ThetrajectoriesforCYB2,HAP1andCYC7.
IV.CONCLUSIONS
Themicroarraymeasurementsusuallycontainratherlargenoiseandthestatisticsofthenoisearenotknownexactly.Inthisstudy,noiseismodeledexplicitlyintheproposednonlinearODEmodeloftheGRN.AjointGPandH∞filteringapproachisappliedtoinfertheGRN,whereH∞filteringprovidesoptimalparameterestimationsunderuncer-tainties.Simulationresultsdemonstratetheeffectivenessoftheproposedscheme.
25
因篇幅问题不能全部显示,请点此查看更多更全内容