Quantitative analysis of synthesized nucleic acid pools

Experimental evolution of RNA (or DNA) is a powerful method to isolate sequences with useful function (e.g., catalytic RNA), discover fundamental features of the sequence-activity relationship (i.e., the fitness landscape), and map evolutionary pathways or functional optimization strategies. However, the limitations of current sequencing technology create a significant undersampling problem which impedes our ability to measure the true distribution of unique sequences. In addition, synthetic sequence pools contain a non-random distribution of nucleotides. Here, we present and analyze simple models to approximate the true sequence distribution. We also provide tools that compensate for sequencing errors and other biases that occur during sample processing. We describe our implementation of these algorithms in the Galaxy bioinformatics platform.

Xulvi-Brunet, R., Campbell, G.W., Rajamani, S., Jimenez, J. I. and Chen, I. A.
Nonlinear Dynamics in Biological Systems
Pages: 19–41
Date: July, 2016
ICB Affiliated Authors: Irene A Chen