I'm running the HaplotypeCaller on a series of samples using a while loop in a bash script and for some samples the HaplotypeCaller is stopping part way through the file. My command was:
java -Xmx18g -jar $Gpath/GenomeAnalysisTK.jar \
-nct 8 \
-l INFO \
-R $ref \
-log $log/$plate.$prefix.HaplotypeCaller.log \
-T HaplotypeCaller \
-I $bam/$prefix.realign.bam \
--emitRefConfidence GVCF \
-variant_index_type LINEAR \
-variant_index_parameter 128000 \
-o $gvcf/$prefix.GATK.gvcf.vcf
Most of the samples completed and the output looks good, but for some I only have a truncated gvcf file with no index. When I look at the log it looks like this:
INFO 17:25:15,289 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:15,291 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
INFO 17:25:15,291 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 17:25:15,291 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 17:25:15,294 HelpFormatter - Program Args: -nct 8 -l INFO -R /home/owens/ref/Gasterosteus_aculeatus.BROADS1.73.dna.toplevel.fa -log /home/owens/SB/C31KCACXX05.log/C31KCACXX05.sb1Pax102L-S2013.Hap
INFO 17:25:15,296 HelpFormatter - Executing as owens@GObox on Linux 3.2.0-63-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_17-b02.
INFO 17:25:15,296 HelpFormatter - Date/Time: 2014/06/10 17:25:15
INFO 17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:15,296 HelpFormatter - --------------------------------------------------------------------------------
INFO 17:25:15,722 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:25:15,892 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO 17:25:15,898 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 17:25:15,942 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
INFO 17:25:15,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 17:25:15,993 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 12 processors available on this machine
INFO 17:25:16,097 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 17:25:16,114 GenomeAnalysisEngine - Done preparing for traversal
INFO 17:25:16,114 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 17:25:16,114 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining
INFO 17:25:16,114 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
INFO 17:25:16,116 HaplotypeCaller - All sites annotated with PLs force to true for reference-model confidence output
INFO 17:25:16,278 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 17:25:46,116 ProgressMeter - scaffold_1722:1180 1.49e+05 30.0 s 3.3 m 0.0% 25.6 h 25.6 h
INFO 17:26:46,117 ProgressMeter - scaffold_279:39930 1.37e+07 90.0 s 6.0 s 3.0% 50.5 m 49.0 m
INFO 17:27:16,118 ProgressMeter - scaffold_139:222911 2.89e+07 120.0 s 4.0 s 6.3% 31.7 m 29.7 m
INFO 17:27:46,119 ProgressMeter - scaffold_94:517387 3.89e+07 2.5 m 3.0 s 8.5% 29.2 m 26.7 m
INFO 17:28:16,121 ProgressMeter - scaffold_80:591236 4.06e+07 3.0 m 4.0 s 8.9% 33.6 m 30.6 m
INFO 17:28:46,123 ProgressMeter - groupXXI:447665 6.07e+07 3.5 m 3.0 s 13.3% 26.4 m 22.9 m
INFO 17:29:16,395 ProgressMeter - groupV:8824013 7.25e+07 4.0 m 3.0 s 17.6% 22.7 m 18.7 m
INFO 17:29:46,396 ProgressMeter - groupXIV:11551262 9.93e+07 4.5 m 2.0 s 24.0% 18.7 m 14.2 m
WARN 17:29:52,732 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at groupX:1516679 has 8 alternate alleles so only the top alleles
INFO 17:30:19,324 ProgressMeter - groupX:14278234 1.15e+08 5.1 m 2.0 s 27.9% 18.1 m 13.0 m
INFO 17:30:49,414 ProgressMeter - groupXVIII:5967453 1.46e+08 5.6 m 2.0 s 33.0% 16.8 m 11.3 m
INFO 17:31:19,821 ProgressMeter - groupXI:15030145 1.63e+08 6.1 m 2.0 s 38.5% 15.7 m 9.7 m
INFO 17:31:50,192 ProgressMeter - groupVI:5779653 1.96e+08 6.6 m 2.0 s 43.8% 15.0 m 8.4 m
INFO 17:32:20,334 ProgressMeter - groupXVI:18115788 2.13e+08 7.1 m 1.0 s 50.1% 14.1 m 7.0 m
INFO 17:32:50,335 ProgressMeter - groupVIII:4300439 2.50e+08 7.6 m 1.0 s 55.1% 13.7 m 6.2 m
INFO 17:33:30,336 ProgressMeter - groupXIII:2378126 2.89e+08 8.2 m 1.0 s 63.1% 13.0 m 4.8 m
INFO 17:34:02,099 GATKRunReport - Uploaded run statistics report to AWS S3
It seems like it got half way through and stopped. I think it's a memory issue because when I increased the available ram to java, the problem happens less, although I can't figure out why some samples work and others don't (there isn't anything else running on the machine using ram and the biggest bam files aren't failing). It's also strange to me that there doesn't seem to be an error message. Any insight into why this is happening and how to avoid it would be appreciated.