Hi, I am interested in calling SNPs for a set of 150 bacterial genomes (genome size ~1Mb). I'm attempting to use the HaplotypeCaller and am running into errors with the java memory: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java.
There is an estimated run time of ~11 days. I have increased the memory to 20g and am limiting the max_alternate_alleles as well as shown below:
java -d64 -Xmx20g -jar $EXECGATK \
-T HaplotypeCaller \
-R $REF \
-I $DATAPATH${BAMLIST} \
-stand_call_conf 20 \
-stand_emit_conf 20 \
--sample_ploidy 1 \
--maxNumHaplotypesInPopulation 198 \
--max_alternate_alleles 3 \
-L "gi|15594346|ref|NC_001318.1|" \
-o ${OUTPATH}${BASE}.chr.snps.indels.vcf
Is there a way to call only SNPs as my understanding is that indel calling is memory intensive and I am going to focus on SNPs for this part of my analysis? Or is there another way to make this analysis more efficient?
Thank you!