Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

How to select specific chromosomes and convert chromosome notation in BAM for HaplotypeCaller

$
0
0

I am preparing BAM files from the 1000 genomes project to use in my GATK pipeline (along with other already processed BAMs) and I have the following issues:

  • chromosome notation on my BAMs is from GRCh37 but my pipeline uses hg19, so I would like to replace chromosome notation (1 -> chr1)
  • the mitochondrial chromosome is slightly different in hg19 and GRCh37 (see here), so I want to leave it out
  • and actually leave out all alternate contigs

This sounds quite trivial, but I haven't found a clean way to do this yet. I have tried the following:

i=INPUT.bam
j=OUTPUT.bam
samtools view -h $i | awk 'BEGIN{FS=OFS="\t"} (/^@/ && !/@SQ/){print $0} $2~/^SN:[1-9]|^SN:X|^SN:Y/{print $0}  $3~/^[1-9]|X|Y/{$3="chr"$3; print $0} ' | sed 's/SN:/SN:chr/g' | samtools view -bS - > $j
However, when I try running the HaplotypeCaller, I get the following error:
ERROR MESSAGE: BAM file(s) do not have the contig: chrM. You are probably using a different reference than the one this file was aligned with

Could you help me prepare these BAM files for processing? Thanks a lot in advance


Viewing all articles
Browse latest Browse all 1335

Trending Articles