Hello,
I am using HaplotypeCaller (GATK v3.5) with an input BAM file which has a header line like this (just a fake example):
@SQ SN:chr1 LN:100000 SP:Arabis thal AS:2 M5:8668a646eada2f4 UR:file:refgenome_Atha_v2.fa
But the output VCF only has a subset of this information:
##contig=<ID=chr1,length=100000>
##reference=file:///home/me/tmp/refgenome_Atha_v2.fa
Is there a way to obtain something like this instead? (i.e. also indicate species, assembly and MD5 sum)
##contig=<ID=chr1,length=100000,assembly=2,md5=8668a646eada2f4,species="Arabis thal">
The information in the BAM file initially comes from a "dict" file generated by Picard CreateSequenceDictionary. So I tried to feed this "dict" file with the VCF file to Picard UpdateVcfSequenceDictionary, but it didn't give me species nor mD5 sum:
##contig=<ID=chr1,length=100000,assembly=2>
Thank you in advance,
Tim