Hi,
My use case is quite straightforward, but has been surprisingly hard to achieve:
For each sample, I have both Omni 2.5M SNP genotype data and RNA-seq variant call data (done with GATK3).
Now I want to see how well the RNA-seq variant calling is performing, using the SNP genotypes as reference.
To do this, I need not only the variant calls in the RNA-seq data (as HC is outputting normally), but all genotypes for a given set of positions.
Ideally, I would like to keep all the normal info fields from the RNA VCF, to allow calculation of some concordance metrics based on depth of coverage and other quality parameters later.
I've tried the following:
1. GenotypeAndValidate. With SNP VCF as "truth" and BAM to evaluate. The command:
java -Xmx32g -jar ${GATK} \
-T GenotypeAndValidate \
-R ${REF} \
-I ${BAM} \
-alleles ${SNPVCF} \
-L ${SNPVCF} \
-o $SAMPLEID.rnasnp.vcf \
-nt 4
The results (running only chr 1, with ~185k SNPs):
(empty) | ALT | REF | No Status |
---|---|---|---|
called alt | 0 | 0 | 4096 |
called ref | 0 | 0 | 12995 |
not called | 0 | 0 | 153034 |
sensitivity: NaN%
specificity: 100.000000%
not confident: 3678
not covered: 149356
This runs surprisingly fast - which makes me think I'm not inputting the files as expected.
2. Haplotype Caller in GGA mode. Giving it the SNP VCF as the --alleles file. The command, adjusted for RNA-seq data:
java -Xmx32g -jar ${GATK} \
-T HaplotypeCaller \
-R ${REF} \
--dbsnp ${DBSNP} \
-I ${BAM} \
-L ${SNPVCF} \
-alleles ${SNPVCF} \
--interval_padding 150 \
-gt_mode GENOTYPE_GIVEN_ALLELES \
-recoverDanglingHeads \
-dontUseSoftClippedBases \
-stand_call_conf 0.0 \
-stand_emit_conf 0.0 \
-o $SAMPLEID.rnasnp.vcf \
-nct 16
This almost results in what I want, in that HC starts outputting also 0/0 and ./. calls for reference and non-covered bases.
But, it does so only for SNP-positions with non-reference alleles in the SNP VCF. Again, I want all positions called - including those that are homozygous reference in the SNP VCF.
I am using these tools wrong? Or should I be doing this differently?
Thanks in advance, Vasilios