Happy New Year!
I'm attempting to joint genotype ~1000 exomes using GATK4. I've run HC per sample with the following command:
java -Xmx7g -jar gatk-package-4.beta.6-local.jar HaplotypeCaller -ERC GVCF -G StandardAnnotation -G AS_StandardAnnotation --maxReadsPerAlignmentStart 0 -GQB 5 -GQB 10 -GQB 15 -GQB 20 -GQB 25 -GQB 30 -GQB 35 -GQB 40 -GQB 45 -GQB 50 -GQB 55 -GQB 60 -GQB 65 -GQB 70 -GQB 75 -GQB 80 -GQB 85 -GQB 90 -GQB 95 -GQB 99 -I example.bam -O example.g.vcf.gz -R /path/to/GRCh38.d1.vd1.fa
And then attempted to create a GenomicDB per chromosome with the following command:
java -Xmx70g -jar gatk-package-4.beta.6-local.jar GenomicsDBImport -genomicsDBWorkspace chrX_db --overwriteExistingGenomicsDBWorkspace true --intervals chrX -V gvcfs.list
I get the following error:
Exception: [January 2, 2018 9:36:26 AM EST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.09 minutes. Runtime.totalMemory()=2238185472 htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: Discordant field size detected for field AS_RAW_ReadPosRankSum at chrX:251751. Field had 4 values but the header says this should have 1 values based on header record INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias"> at htsjdk.variant.variantcontext.VariantContext.fullyDecodeAttributes(VariantContext.java:1571) at htsjdk.variant.variantcontext.VariantContext.fullyDecodeInfo(VariantContext.java:1546) at htsjdk.variant.variantcontext.VariantContext.fullyDecode(VariantContext.java:1530) at htsjdk.variant.variantcontext.writer.BCF2Writer.add(BCF2Writer.java:176) at com.intel.genomicsdb.GenomicsDBImporter.add(GenomicsDBImporter.java:1232) at com.intel.genomicsdb.GenomicsDBImporter.importBatch(GenomicsDBImporter.java:1282) at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:443) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:137) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:158) at org.broadinstitute.hellbender.Main.main(Main.java:239)
Which refers the following line in one of the GVCFs:
chrX 251751 . G A,<NON_REF> 46.56 . AS_RAW_BaseQRankSum=|30,1,33,1|;AS_RAW_MQ=0.00|7200.00|0.00;AS_RAW_MQRankSum=|60,2|;AS_RAW_ReadPosRankSum=|5,1,20,1|;AS_SB_TABLE=0,0|0,0|0,0;DP=2;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=7200.00 GT:AD:GQ:PL:SB 1/1:0,2,0:6:73,6,0,73,6,73:0,0,1,1
I haven't found a way to get past this error. I found this post from a while back with a very similar error:
https://gatkforums.broadinstitute.org/gatk/discussion/comment/43382#Comment_43382
But they seemed to indicate that it was fixed for them in GATK4beta6.
Any help/insight in to how to resolve it, or if its an unimportant annotation how to ignore it would be greatly appreciated. Thanks!
Ben