I am running HaplotypeCaller with the --alleles option and it is crashing. I've traced it to a very long line in the --alleles VCF file, with a very long allele. The line is:
chr4 1394536 . AATGTGGAGTGCCCGCCTGCTCACACGTGCCCATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC A,AATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC,CATGTGGAGTGCCCGCCTGCTCACACGTGCCCATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC .
. .
This VCF was generated by Mutect2 for creation of a panel-of-normals.
The command and output were:
twtoal@carcinos> env -i LD_LIBRARY_PATH=/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/lib/R/library/rJava/libs:/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/jre/lib/amd64/server PATH=/bin:/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/bin /share/carvajal-archive/PACKAGES/src/java/jdk8_latest/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Djava.io.tmpdir=/share/carvajal-archive/tmp -jar /share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar HaplotypeCaller --QUIET -R /share/carvajal-archive/REFERENCE_DATA/genomes/GRCh38_decoy_LCCpanel/Homo_sapiens_assembly38_LCCpanel.fasta -I DATA/VO-56/N2/VO-56N2.recal.bam -O DATA/VO-56/N2/VO-56N2.PONdepths.vcf.gz -L GLOBAL/PON4.bed --alleles GLOBAL/PON6.vcf.gz --genotyping-mode GENOTYPE_GIVEN_ALLELES --annotation-group StandardAnnotation --seconds-between-progress-updates 1 --verbosity INFO
14:23:07.181 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:23:07.438 INFO HaplotypeCaller - Initializing engine
14:23:08.919 INFO FeatureManager - Using codec VCFCodec to read file file:///share/carvajal-archive/SEQ_DATA/PANELS/MSEQ_GC_Panel_03_28_2017/GLOBAL/PON6.vcf.gz
14:23:09.003 INFO FeatureManager - Using codec BEDCodec to read file file:///share/carvajal-archive/SEQ_DATA/PANELS/MSEQ_GC_Panel_03_28_2017/GLOBAL/PON4.bed
14:23:09.014 INFO IntervalArgumentCollection - Processing 10 bp from intervals
14:23:09.046 INFO HaplotypeCaller - Done initializing engine
14:23:09.163 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:23:11.216 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:23:11.230 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:23:11.547 WARN NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/share/carvajal-archive/tmp/twtoal/libgkl_pairhmm_omp3184435069396731524.so: /usr/lib/x86_64-linux-gnu/libgomp.so.1: version `GOMP_4.0' not found (required by /share/carvajal-archive/tmp/twtoal/libgkl_pairhmm_omp3184435069396731524.so))
14:23:11.547 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
14:23:11.547 INFO NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
14:23:11.786 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:23:11.786 WARN IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
14:23:11.787 INFO PairHMM - Using the AVX-accelerated native PairHMM implementation
14:23:11.941 INFO ProgressMeter - Starting traversal
14:23:11.941 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
14:23:13.127 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 9.267520000000001E-4
14:23:13.127 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.008127312000000001
14:23:13.128 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.10 sec
14:23:13.128 INFO HaplotypeCaller - Shutting down engine
java.lang.IllegalArgumentException: expected haplotypes.size() >= eventsAtThisLoc.size() + 1
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerGenotypingEngine.createAlleleMapper(AssemblyBasedCallerGenotypingEngine.java:152)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:131)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:565)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:218)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)