Hi,
I am new to the HaplotypeCaller and have huge problems getting it to run ok. I have WGS re-sequencing bam files with ~30-60 coverage (bam files are >3GB in size). I am running these in ERC mode as suggested, but within minutes, 3/4 are killed by the cluster due to exceeding memory. I am using the following command:
java -Xmx32g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I $bamfile -minPruning 4 --min_base_quality_score $min_base_qual --min_mapping_quality_score $min_map_qual -rf DuplicateRead -rf BadMate -rf BadCigar -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -R $ref -o $HCdir"."HC.$bamfile".""."g.vcf -ploidy $cohort1_ploidy -stand_emit_conf $stand_emit -stand_call_conf $stand_call --pcr_indel_model NONE "
I have varied the amount of memory I allocate up to -Xmx256 with no improvements, and this seems a bit odd to me? Even adding the minPruning did not seem to improve the situation. I have looked at previous posts and know that HC appears quite memory greedy, but is this normal to this extent?
Many thanks in advance for any pointers.