Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

Why is there difference of variants between after-BQSR bam and after-HaplotypeCaller bam?

$
0
0

Dear GATK team,

Hi, I have followed Best Practices to find out germline variants (GATK-3.7) of my samples designed by case-control study for ~500 samples in total.
I have run BQSR, Prind Reads, and then HaplotypeCaller as described in below:

BQSR
java -jar $GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R $Reference -knownSites $dbSNP138 -knownSites $Mills -knownSites $oneKGindels -nct 8 -I $Output/$1.sort.dup.ir.bam -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o $Output/$1.recal.data.grp -L $Interval -ip 100

Print Reads
java -jar $GATK/GenomeAnalysisTK.jar -T PrintReads -nct 8 -R $Reference -I $Output/$1.sort.dup.ir.bam -BQSR $Output/$1.recal.data.grp -o $Output/$1.sort.dup.ir.BQSR.bam

HaplotypeCaller (HC)
java -jar $GATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R $Reference -I $Input/$1.sort.dup.ir.BQSR.bam -o $Output/$1.hc.vcf.gz -L chr14:92537200-92537700 -bamout $Output/$1.bamout.bam

When I comparing variants of after-BQSR bam with those of after-HC bam in region of chr14:92537200-92537700 using IGV, I noticed that both of the bams showed different looking especially for indels like this:

So I have several questions,
1) Why is there difference of variants between after-BQSR bam and after-HC bam in terms of indels? The indels at chr14:92,537,354 were not in after-BQSR bam, but those were in after-HC bam. Among my processed samples, some samples showed same indels in both bams, but others showed different indels.
2) I noticed that some regions seems to be snapped in after-HC bam, not in after-BQSR bam. I don't have an idea why this happened.
3) Some samples showed that variants in whole regions of chr14:92537200-92537700 were not called in after-HC bam, but reads were mapped in the same regions in after-BQSR bam. How can I interpret it?

I don't know exactly but I guess that there are quite possibility to calling inaccurate variants since the regions I interested in have several repeat sequences as well as the variants are repeated indels. Is this right? I don't know what can I do, so I ask for help me regarding to this issues.

Thanks in advance!

Best regards,
Soojin


Viewing all articles
Browse latest Browse all 1335

Trending Articles