Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

Effect of performing RNA-Seq with a highly fragmented reference genome: MAQ-values and genotyping Ha

$
0
0

Hi, I am performing RNA-Seq to identify new polymorphisms in a species of sea star. Our short-term goal is to generate novel DNA sequences of coding genes for phylogenetic analysis. It is therefore important that polymorphisms be called accurately and that they can be phased.

Our reference genome is poorly assembled and comprises over 60,000 scaffolds and contigs. Subsequently, when paired-end RNA-Seq reads are aligned to this reference genome (using TopHat), the two halves of the pair are often mapped to different scaffolds or contigs. This seems to greatly lower the MAQ score, which in turn leads to HaplotypeCaller missing well-supported polymorphisms, because the reads that support them have MAQ values between 1 and 3.

The obvious solution for this is to set the --min-mapping-quality-score to 1 or 2, rather than the default of 20; and raising the --min_base_quality_score from the default value of 10 to maybe 25 or 30. This does, however, increase the risk of calling false positives from poorly aligned regions.

Has this situation been considered by the GATK development team, and is there a recommended way to account for it?


Viewing all articles
Browse latest Browse all 1335

Trending Articles