Next-Generation Sequencing and Large Genome Assemblies
Next-Generation Sequencing and Large Genome Assemblies
WGS genome assembly remains an active area of innovation, which has been greatly affected by the introduction of NGS sequencing, even if its fundamental problems remain largely the same. Assemblies built from NGS reads alone are far from perfect, exhibiting, in particular, many errors involving counts of segmental duplications. Early on, it was suggested that such short-read technology may not be viable for de novo assembly of large genomes without some help from more expensive sequencing methods. However, with the rapid development of assembly techniques, the quality of NGS assemblies is beginning to approach that which is possible by other means. Some assemblers can achieve much better results on local errors than others (and without apparent costs elsewhere for some types of errors), showing that improvements are possible. New assembly analysis studies are set to show how much of a gap still exists between the quality of NGS assemblies and finished sequence.
We have also seen a number of apparent tradeoffs. When choosing how to create reads, longer read length often implies more errors, especially when using the new PacBio technology. Most interestingly when designing a study, assemblers that excel on long-range continuity in contigs perform badly on suppressing local errors such as indels (such as SOAPdenovo) or vice versa (such as SGA). The choice made here in different genomic studies will vary depending on the intended use of the assembly.
Conclusion
WGS genome assembly remains an active area of innovation, which has been greatly affected by the introduction of NGS sequencing, even if its fundamental problems remain largely the same. Assemblies built from NGS reads alone are far from perfect, exhibiting, in particular, many errors involving counts of segmental duplications. Early on, it was suggested that such short-read technology may not be viable for de novo assembly of large genomes without some help from more expensive sequencing methods. However, with the rapid development of assembly techniques, the quality of NGS assemblies is beginning to approach that which is possible by other means. Some assemblers can achieve much better results on local errors than others (and without apparent costs elsewhere for some types of errors), showing that improvements are possible. New assembly analysis studies are set to show how much of a gap still exists between the quality of NGS assemblies and finished sequence.
We have also seen a number of apparent tradeoffs. When choosing how to create reads, longer read length often implies more errors, especially when using the new PacBio technology. Most interestingly when designing a study, assemblers that excel on long-range continuity in contigs perform badly on suppressing local errors such as indels (such as SOAPdenovo) or vice versa (such as SGA). The choice made here in different genomic studies will vary depending on the intended use of the assembly.
Source...