Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

HaploytpeCaller gVCF calling for WGS

$
0
0

Hi,

I am doing gVCF calls for whole genome samples and I would notice that the gvcf-calling jobs for some of the samples would fail at random genomic locations and if I resubmit those failed jobs, they would either finish successfully or fail again at a different genomic location ('genomic location' info from "ProgressMeter" line inside logs).

  • I am doing one gVCF job per WGS sample. Right now there are more than 70% of jobs that are failing. Is there anything that should be changed on the parameters?
  • Do you have something like a SOP for best practises on doing HaplotypeCaller calling for WGS samples? I understand the process is very similar to exome sequencing gVCF calling but somehow I see many more job failures with gVCF calling on WGS samples.

I am using the following parameters for gVCF call:

java -Xmx128g -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit -jar GenomeAnalysisTK.jar         
    -T HaplotypeCaller  
    -I file.bam 
    -nct 8 
    -R human_g1k_v37.fasta 
    -o /ttemp/file.g.vcf            
    -L b37_wgs.intervals
    —emitRefConfidence GVCF 
    --variant_index_type LINEAR --variant_index_parameter 128000            
    -dcov 250 
    -minPruning 3 
    -stand_call_conf 30 
    -stand_emit_conf 30
    -G Standard -A AlleleBalance -A Coverage            
    -A HomopolymerRun -A QualByDepth

Compute: One full node (“256GB RAM, 20 cores” per node) per single sample WGS gvcf job.
GATK version being used is "3.1”

P.S. I am also testing out the latest version of GATK (3.4) without “-dcov” option to see if that resolves the issue.

Thanks,

Shalabh


Viewing all articles
Browse latest Browse all 1335