Hi,
I have generated a gVCF for an exome (with non-variant block records) from a BAM file belonging to the 1000Genomes data.
I am using GATK tools version 3.5-0-g36282e4 and I have run the HaplotypeCaller as follows:
time java -jar $gatk_dir/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R $reference \
-I $bamfile \
-ploidy 2 \
-stand_call_conf 20 \
-stand_emit_conf 10 \
-ERC GVCF \
-o output.g.vcf.gz
Within the purpose of the analysis I am performing, from this gVCF I need to be able to know whether the positions are no-called, homozygous reference, variant sites or if the positions were not targeted in the exome sequencing.
However, with the gVCF file I obtained I am not able to do it because there are only variant site records or non-variant block records where the GT tag is always "0/0".
So I have few questions regarding the non-variant block records:
Why the output file does not contain any no-call ("./.") record?
Shouldn't regions where there are no reads have the tag GT equal to "./." instead of "0/0"?
How can regions without reads (not targeted) be distinguished from regions with reads that were not called?
When looking at the bam file with IGV, non-variant blocks displayed in gVCF contain regions with reads. What is the explanation for such behaviour?
Thank you for your attention,
Sofia