Hi,
I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).
- I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
- Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.
I am using the following parameters for gVCF call:
java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar
-T HaplotypeCaller
-I file.bam
-nct 8
-R human_g1k_v37.fasta
-o /ttemp/file.g.vcf
-L b37_wgs.intervals
—emitRefConfidence GVCF
--variant_index_type LINEAR --variant_index_parameter 128000
-dcov 250
-minPruning 3
-stand_call_conf 30
-stand_emit_conf 30
-G Standard -A AlleleBalance -A Coverage
-A HomopolymerRun -A QualByDepth
Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”
P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.
Thanks,
Shalabh