Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

HaplotypeCaller reassembles reads so that no variants are called in a specific region

$
0
0

Hi.

gatk-4.0.7.0
java openjdk version "1.8.0_181"

I've encountered a problem while running HaplotypeCaller in -ERC BP_RESOLUTION --genotyping-mode DISCOVERY --output-mode EMIT_ALL_CONFIDENT_SITES mode on a single-sample paired-end WGS data. The problem is that HC reassembles reads so that a particular region of interest contains no supporting reads in the corresponding bamout file while the original BAM file does contain them. The region of interest is the CYP2D6 gene on chr22.

Commands

After trimming the reads I align them to GRCh38 with Bowtie2:

bowtie2 --threads 16 --trim5 0 --trim3 0 --phred33 --no-mixed --no-discordant --no-unal --local -x /hg38/bowtie2/hg38 -q -1 /trimmomatic_output/sample_R1.paired.fastq.gz -2 /trimmomatic_output/sample_R2.paired.fastq.gz -t -S /bowtie2_output/sample_raw.sam --met-file /path_to_met_file

This is followed by sorting the resulting SAM file and its conversion to BAM, running Picard's MarkDuplicates on the resulting BAM and further filtering using samtools with flags -q 10 -f 0x2 -F 0x4 -F 0x100 -F 0x400 -F 0x800. BQSR is applied next followed by calling on separate chromosomes/ROIs:

/home/ubuntu/Tools/gatk-4.0.7.0/gatk  HaplotypeCaller --reference /hg38.fa --input /bqsr_output/sample.bqsr.bam --genotyping-mode DISCOVERY --output-mode EMIT_ALL_CONFIDENT_SITES --read-filter NotSecondaryAlignmentReadFilter --read-filter NotDuplicateReadFilter --read-filter MappingQualityAvailableReadFilter  --read-filter NotSupplementaryAlignmentReadFilter --smith-waterman FASTEST_AVAILABLE --min-base-quality-score 10 --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --output /gatk_gvcf_output/sample.diploid.raw.g.vcf.gz --emit-ref-confidence BP_RESOLUTION --sample-ploidy 2 --native-pair-hmm-threads 16 -L chr22

The resulting files are combined with CombineGVCFs followed by genotype calling using GenotypeGVCFs (gatk 3.8-1-0-gf15c1c3ef). This old version is used due to the fact that --includeNonVariantSites option is not implemented in up-to-date package (as far as I know), while I need to get the reference calls for further analysis.

Problems

The sample output from final VCF file from GenotypeGVCFs for all positions in a region of interest looks like this (zero AD, DP, which seems reasonable as bamout contains no supporting reads):

chr22   42130000    .   G   .   .   .   .   GT:AD:DP:RGQ    ./.:0,0:0:0

The IGV screenshot of the original BAM file after samtools filtering and bamout BAM file from HC is attached below.

I have digged into some of the old threads for similar problems (link, link, link, link) and found no appropriate solution. I tried to run HC as above but with adding --dont-trim-active-regions True --min-dangling-branch-length 1 --min-pruning 1 --disable-optimizations True --allow-non-unique-kmers-in-ref parameters. It resulted in complete absence of supporting reads in the bamout (the upper empty panel is for bamout output, the lower is for original BAM) as in the previous case:

Finally I tried running HC with --dont-trim-active-regions True --min-dangling-branch-length 1 --min-pruning 1 --disable-optimizations True --allow-non-unique-kmers-in-ref and --read-filter AllowAllReadsReadFilter --disable-tool-default-read-filters True. The IGV screenshot is below:


Viewing all articles
Browse latest Browse all 1335

Trending Articles