Date of Graduation


Document Type


Degree Name

Doctor of Philosophy in Cell & Molecular Biology (PhD)

Degree Level



Biological Sciences


Andy Pereira

Committee Member

Vibha Srivastava

Second Committee Member

Ravi Barabote

Third Committee Member

Andrew Alverson


Coexpression, Database, Gene Networks, Transcriptomes


Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of

cellular activities with increasing breadth and depth. However, we know very little about how the

genome functions and what the identified genes do. The lack of functional annotations of genes

greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant

biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant

Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some

aspects of their functions assigned. Therefore, there is an urgent need to develop innovative

methods to predict and expand on the currently available functional annotations of plant genes.

With open-access catching the ‘pulse’ of modern day molecular research, an integration of the

copious amount of transcriptome datasets allows rapid prediction of gene functions in specific

biological contexts, which provide added evidence over traditional homology-based functional

inference. The main goal of this dissertation was to develop data analysis strategies and tools

broadly applicable in systems biology research.

Two user friendly interactive web applications are presented: The Rice Regulatory

Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to

facilitate the identification of transcription factor targets during induction of various environmental

stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network

that encapsulates various aspects of seed formation, including embryogenesis, endosperm

development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is

proposed that uses network density as a parameter to estimate the gain or loss in correlation of

pathways between two conditionally independent coexpression networks.