Hi there, I'm running now the new GATK 2.2-2 version and I noticed an issue with HaplotypeCaller I had in the previous version I was using. Despite adding the dbSNP ROD to the walker, the emitted VCF doesn't contain rs names in the name field. On the contrary, UnifiedGenotyper annotates the variants with the appropriate names.
In my .scala code I wrote:
class HaplotypeCallerArguments (t: Target) extends HaplotypeCaller with UNIVERSAL_GATK_ARGS {
this.reference_sequence = qscript.referenceFile
this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
// Set the memory limit to 6 gigabytes on each command.
this.memoryLimit = 6
this.input_file :+= qscript.bamFile
this.D = qscript.dbSNP_b37
}
and that is correctly reflected when queue launches the job as
INFO 16:07:30,655 FunctionEdge - Starting: 'java' '-Xmx6144m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/SAN/biomed/analysis/tmp' '-cp' '/share/apps/genomics/Queue-2.2-2-gf44cc4e/Queue.jar'
'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/SAN/biomed/analysis/recal.list' '-L' '/SAN/biomed/analysis/.queue/scatterGather/HaplotypeCaller-sg/temp_016_of_300/scatter.intervals' '-R' '/share/apps/genomics/reference/human_g1k_v37.fasta'
'-l' 'INFO' '-o' '/SAN/biomed/analysis/.queue/scatterGather/HaplotypeCaller-sg/temp_016_of_300/comparisonHC.raw.vcf' '-D' '/share/apps/genomics/reference/gatkresources_hg19_1.5/ftp.broadinstitute.org/bundle/1.5/b37/dbsnp_135.b37.vcf'
However, my VCF still looks like
grep -v \# HC.raw.vcf | cut -f 1,2,3,4,5 | more
1 762273 . G A
1 865738 . A G
1 866319 . G A
1 866511 . C CCCCT
1 871042 . C CA
1 874734 . C T
Am I doing something wrong? It would be quite time consuming to launch VariantAnnotation if not necessary, as I understand now the covariates used by VQSR are already emitted by the caller.
thanks, Francesco