Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

Variant calls being missed - how to improve variant detection using Sanger sequencing data?

$
0
0

@Geraldine_VdAuwera and @Sheila - Please help!

I've read through many of your posts/responses regarding HaplotypeCaller not calling variants, and tried many of the suggestions you've made to others, but I'm still missing variants. My situation is a little different (I'm trying to identify variants from Sanger sequence reads) but I'm hoping you might have additional ideas or can see something I've overlooked. I hope I haven't given you too much information below, but I've seen it mentioned that too much info is better than not enough.

A while back, I generated a variant call set from Illumina Next Gen Sequencing data using UnifiedGenotyper (circa v2.7.4), identifying ~46,000 discordant variants between the genomes of two haploid strains of S. cerevisiae. Our subsequent experiments included Sanger sequencing ~95 kb of DNA across 17 different loci in these two strains. I don't think any of the SNP calls were false positives, and there were very, very few were false negatives.

Since then, we've constructed many strains by swapping variants at these loci between these two strains of yeast. To check if a strain was constructed successfully, we PCR the loci of interest, and Sanger sequencing the PCR product. I'm trying to use GATK (version 3.4-46) HaplotypeCaller (preferably, or alternatively UnifiedGenotyper) in a variant detection pipeline to confirm a properly constructed strain. I convert the .ab1 files to fastqs using EMBOSS seqret, map the Sanger reads using bwa mem ($ bwa mem -R $RG $refFasta $i > ${outDir}/samFiles/${fileBaseName}.sam), merge the sam files for each individual, and then perform the variant calling separately for each individual. I do not dedup (I actually intentionally leave out the -M flag in bwa), nor do I realign around indels (I plan to in the future, but there aren't any indels in any of the regions we are currently looking at), or do any BQSR in this pipeline. Also, when I do the genotyping after HaplotypCaller, I don't do joint genotyping, each sample (individual) gets genotyped individually.

In general, this pipeline does identify many variants from the Sanger reads, but I'm still missing many variant calls that I can clearly see in the Sanger reads. Using a test set of 36 individuals, I examined the variant calls made from 364 Sanger reads that cover a total of 63 known variant sites across three ~5kb loci (40 SNPs in locus 08a-s02, 9 SNPs in locus 10a-s01, 14 SNPs in locus 12c-s02). Below are some example calls to HaplotypeCaller and UnifiedGenotyper, as well as a brief summary statement of general performance using the given command. I've also included some screenshots from IGV showing the alignments (original bam files and bamOut files) and SNP calls from the different commands.

Ideally, I'd like to use the HaplotypeCaller since not only can it give me a variant call with a confidence value, but it can also give me a reference call with a confidence value. And furthermore, I'd like to stay in DISCOVERY mode as opposed to Genotype Given Alleles, that way I can also assess whether any experimental manipulations we've performed might have possibly introduced new mutations.

Again, I'm hoping someone can advice me on how to make adjustments to reduce the number of missed calls.

Call 1:
The first call to HaplotypeCaller I'm showing produced the least amount of variant calls at sites where I've checked the Sanger reads.

java -Xmx4g -jar $gatkJar \
    -R $refFasta \
    -T HaplotypeCaller \
    -I $inBam \
    -ploidy 1 \
    -nct 1 \
    -bamout ${inBam%.bam}_hapcallRealigned.bam \
    -forceActive \
    -disableOptimizations \
    -dontTrimActiveRegions \
    --genotyping_mode DISCOVERY \
    --emitRefConfidence BP_RESOLUTION \
    --interval_padding 500 \
    --intervals $outDir/tmp.intervals.bed \
    --min_base_quality_score 5 \
    --standard_min_confidence_threshold_for_calling 0 \
    --standard_min_confidence_threshold_for_emitting 0 \
    -A VariantType \
    -A SampleList \
    -A AlleleBalance \
    -A BaseCounts \
    -A AlleleBalanceBySample \
    -o $outDir/vcfFiles/${fileBaseName}_hc_bp_raw.g.vcf

Call 2:
I tried a number of different -kmerSize values [(-kmerSize 10 -kmerSize 25), (-kmerSize 9), (-kmerSize 10), (-kmerSize 12), (-kmerSize 19), (-kmerSize 12 -kmerSize 19), (maybe some others). I seemed to have the best luck when using -kmerSize 12 only; I picked up a few more SNPs (where I expected them), and only lost one SNP call as compared Call 1.

java -Xmx4g -jar $gatkJar \
    -R $refFasta \
    -T HaplotypeCaller \
    -I $inBam \
    -ploidy 1 \
    -nct 1 \
    -bamout ${inBam%.bam}_kmer_hapcallRealigned.bam \
    -forceActive \
    -disableOptimizations \
    -dontTrimActiveRegions \
    --genotyping_mode DISCOVERY \
    --emitRefConfidence BP_RESOLUTION \
    --interval_padding 500 \
    --intervals $outDir/tmp.intervals.bed \
    --min_base_quality_score 5 \
    --standard_min_confidence_threshold_for_calling 0 \
    --standard_min_confidence_threshold_for_emitting 0 \
    -kmerSize 12 \
    -A VariantType \
    -A SampleList \
    -A AlleleBalance \
    -A BaseCounts \
    -A AlleleBalanceBySample \
    -o $outDir/vcfFiles/${fileBaseName}_hc_bp_kmer_raw.g.vcf

Call 3:
I tried adjusting --minPruning 1 and --minDanglingBranchLength 1, which helped more than playing with kmerSize. I picked up many more SNPs compared to both Call 1 and Call 2 (but not necessarily the same SNPs I gained in Call 2).

java -Xmx4g -jar $gatkJar \
    -R $refFasta \
    -T HaplotypeCaller \
    -I $inBam \
    -ploidy 1 \
    -nct 1 \
    -bamout ${inBam%.bam}_adv_hapcallRealigned.bam \
    -forceActive \
    -disableOptimizations \
    -dontTrimActiveRegions \
    --genotyping_mode DISCOVERY \
    --emitRefConfidence BP_RESOLUTION \
    --interval_padding 500 \
    --intervals $outDir/tmp.intervals.bed \
    --min_base_quality_score 5 \
    --standard_min_confidence_threshold_for_calling 0 \
    --standard_min_confidence_threshold_for_emitting 0 \
    --minPruning 1 \
    --minDanglingBranchLength 1 \
    -A VariantType \
    -A SampleList \
    -A AlleleBalance \
    -A BaseCounts \
    -A AlleleBalanceBySample \
    -o $outDir/vcfFiles/${fileBaseName}_hc_bp_adv_raw.g.vcf

Call 4:
I then tried adding both --minPruning 1 --minDanglingBranchLength 1 and -kmerSize 12 all at once, and I threw in a --min_mapping_quality_score 5. I maybe did slightly better... than in Calls 1-4. I did actually lose 1 SNP compared to Calls 1-4, but I got most of the additional SNPs I got from using Call 3, as well as some of the SNPs I got from using Call 2.

java -Xmx4g -jar $gatkJar \
    -R $refFasta \
    -T HaplotypeCaller \
    -I $inBam \
    -ploidy 1 \
    -nct 1 \
    -bamout ${inBam%.bam}_hailMary_raw.bam \
    -forceActive \
    -disableOptimizations \
    -dontTrimActiveRegions \
    --genotyping_mode DISCOVERY \
    --emitRefConfidence BP_RESOLUTION \
    --interval_padding 500 \
    --intervals $outDir/tmp.intervals.bed \
    --min_base_quality_score 5 \
    --min_mapping_quality_score 10 \
    --standard_min_confidence_threshold_for_calling 0 \
    --standard_min_confidence_threshold_for_emitting 0 \
    --minPruning 1 \
    --minDanglingBranchLength 1 \
    -kmerSize 12 \
    -A VariantType \
    -A SampleList \
    -A AlleleBalance \
    -A BaseCounts \
    -A AlleleBalanceBySample \
    -o $outDir/vcfFiles/${fileBaseName}_hailMary_raw.g.vcf

Call 5:
As I mentioned above, I've experience better performance (or at least I've done a better job executing) with UnifiedGenotyper. I actually get the most SNPs called at the known SNP sites, in individuals where manual examination confirms a SNP.

java -Xmx4g -jar $gatkJar \
    -R $refFasta \
    -T UnifiedGenotyper \
    -I $inBam \
    -ploidy 1 \
    --output_mode EMIT_ALL_SITES \
    -glm BOTH \
    -dt NONE -dcov 0 \
    -nt 4 \
    -nct 1 \
    --intervals $outDir/tmp.intervals.bed \
    --interval_padding 500 \
    --min_base_quality_score 5 \
    --standard_min_confidence_threshold_for_calling 0 \
    --standard_min_confidence_threshold_for_emitting 0 \
    -minIndelCnt 1 \
    -A VariantType \
    -A SampleList \
    -A AlleleBalance \
    -A BaseCounts \
    -A AlleleBalanceBySample \
    -o $outDir/vcfFiles/${fileBaseName}_ug_emitAll_raw.vcf

I hope you're still with me :)

None of the above commands are calling all of the SNPs that I (maybe naively) would expect them to. "Examples 1-3" in the first attached screenshot are three individuals with reads (two reads each) showing the alternate allele. The map quality scores for each read are 60, and the base quality scores at this position for individual #11 are 36 and 38, and for the other individuals, the base quality scores are between 48-61. The reads are very clean upstream of this position, the next upstream SNP is about ~80bp away, and the downstream SNP at the position marked for "Examples 4-6" is ~160 bp away. Commands 1 and 2 do not elicit a SNP call for Examples 1-6, Command 3 get the calls at both positions for individual 10, Command 4 for gets the both calls for individuals 10 and the upstream SNP for individual 11. Command 5 (UnifiedGenotyper) gets the alt allele called in all 3 individuals at the upstream position, and the alt allele called for individuals 10 and 12 at the downstream position. Note that in individual 11, there is only one read covering the downstream variant position, where UnifiedGenotyper missed the call.

Here is the vcf output for those two positions from each command. Note that there are more samples in the per-sample breakdown for the FORMAT tags. The last three groups of FORMAT tags correspond to the three individuals I've shown in the screenshots.

Command 1 output

Examples 1-3    649036  .   G   .   .   .   AN=11;DP=22;VariantType=NO_VARIATION;set=ReferenceInAll GT:AD:DP:RGQ    .   .   .   .   .   .   .   .   .   .   .:0:0:0 0:0:2:0 0:2:2:89    0:0:2:0 0:2:2:84    0:0:2:0 0:2:2:89    0:0:2:0 0:2:2:89    0:0:2:0 0:0:2:0 0:0:2:0
Examples 4-6    649160  .   C   .   .   .   AN=11;DP=21;VariantType=NO_VARIATION;set=ReferenceInAll GT:AD:DP:RGQ    .   .   .   .   .   .   .   .   .   .   .:0:0:0 0:0:2:0 0:2:2:89    0:0:2:0 0:2:2:0 0:0:2:0 0:2:2:71    0:0:2:0 0:2:2:44    0:0:2:0 0:0:1:0 0:0:2:0

Command 2 output

Examples 1-3    649036  .   G   A   26.02   .   ABHom=1.00;AC=6;AF=0.545;AN=11;DP=18;MLEAC=1;MLEAF=1.00;MQ=60.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-002,qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-006,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010,qHZT-12c-s02_r2657_p4096_dJ-011;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-002_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-006_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-011_merged_sorted_hc_bp_kmer_raw.vcf  GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .:0:0:.:.:0 1:0,2:.:56:56,0 0:2:2:.:.:89    1:0,1:.:45:45,0 0:2:2:.:.:84    1:0,1:.:45:45,0 0:2:2:.:.:89    1:0,1:.:45:45,0 0:2:2:.:.:89    1:0,1:.:45:45,0 1:0,2:.:88:88,0 0:0:2:.:.:0
Examples 4-6    649160  .   C   A   13.22   .   AC=3;AF=0.273;AN=11;DP=18;MLEAC=1;MLEAF=1.00;MQ=60.00;OND=1.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_hc_bp_kmer_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_hc_bp_kmer_raw.vcf   GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .   .   .   .       .   .   .   .   .   .   .   .:0:0:.:.:0 0:0:2:.:.:0 0:2:2:.:.:89    1:0,1:.:43:43,0 0:2:2:.:.:0 0:0:2:.:.:0 0:2:2:.:.:71    1:0,0,1:.:37:37,0   0:2:2:.:.:44    1:0,1:.:34:34,0 0:0:1:.:.:0 0:0:2:.:.:0

Command 3 output

Examples 1-3    649036  .   G   A   36.01   .   ABHom=1.00;AC=3;AF=0.273;AN=11;DP=20;MLEAC=1;MLEAF=1.00;MQ=60.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-002,qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-006;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-002_merged_sorted_hc_bp_adv_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hc_bp_adv_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-006_merged_sorted_hc_bp_adv_raw.vcf    GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .:0:0:.:.:0 1:0,2:.:66:66,0 0:2:2:.:.:89    1:0,1:.:45:45,0 0:2:2:.:.:84    1:0,1:.:45:45,0 0:2:2:.:.:89    0:0:2:.:.:0 0:2:2:.:.:89    0:0:2:.:.:0 0:0:2:.:.:0 0:0:2:.:.:0
Examples 4-6    649160  .   C   A   13.22   .   ABHom=1.00;AC=1;AF=0.091;AN=11;DP=20;MLEAC=1;MLEAF=1.00;MQ=60.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-004;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hc_bp_adv_raw.vcf    GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .:0:0:.:.:0 0:0:2:.:.:0 0:2:2:.:.:89    1:0,1:.:43:43,0 0:2:2:.:.:0 0:0:2:.:.:0 0:2:2:.:.:71    0:0:2:.:.:0 0:2:2:.:.:44    0:0:2:.:.:0 0:0:1:.:.:0 0:0:2:.:.:0

Command 4 output

Examples 1-3    649036  .   G   A   26.02   .   ABHom=1.00;AC=6;AF=0.545;AN=11;DP=18;MLEAC=1;MLEAF=1.00;MQ=60.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-002,qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-006,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010,qHZT-12c-s02_r2657_p4096_dJ-011;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-002_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-006_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-011_merged_sorted_hailMary_raw.vcf  GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .   .:0:0:.:.:0 1:0,2:.:56:56,0 0:2:2:.:.:89    1:0,1:.:45:45,0 0:2:2:.:.:84    1:0,1:.:45:45,0 0:2:2:.:.:89    1:0,1:.:45:45,0 0:2:2:.:.:89    1:0,1:.:45:45,0 1:0,2:.:88:88,0 0:0:2:.:.:0
Examples 4-6    649160  .   C   A   13.22   .   AC=3;AF=0.273;AN=11;DP=18;MLEAC=1;MLEAF=1.00;MQ=60.00;OND=1.00;Samples=qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_hailMary_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_hailMary_raw.vcf GT:AD:DP:GQ:PL:RGQ  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .:0:0:.:.:0 0:0:2:.:.:0 0:2:2:.:.:89    1:0,1:.:43:43,0 0:2:2:.:.:0 0:0:2:.:.:0 0:2:2:.:.:71    1:0,0,1:.:37:37,0   0:2:2:.:.:44    1:0,1:.:34:34,0 0:0:1:.:.:0 0:0:2:.:.:0

Command 5 output

Examples 1-3    649036  .   G   A   26.02   .   ABHom=1.00;AC=7;AF=0.636;AN=11;DP=22;Dels=0.00;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;SOR=2.303;Samples=qHZT-12c-s02_r2657_p4096_dJ-002,qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-006,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010,qHZT-12c-s02_r2657_p4096_dJ-011,qHZT-12c-s02_r2657_p4096_dJ-012;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-002_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-006_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-011_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-012_merged_sorted_ug_emitAll_raw.vcf  GT:AD:DP:GQ:PL  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./../.  ./. ./. ./. ./. ./. ./. 1:0,2:2:56:56,0 0:.:2   1:0,2:2:99:117,0    0:.:2   1:0,2:2:99:122,0    0:.:2   1:0,2:2:67:67,0 0:.:2   1:0,2:2:99:110,0    1:0,2:2:84:84,0 1:0,2:2:99:127,0
Examples 4-6    649160  .   C   A   46  .   ABHom=1.00;AC=5;AF=0.455;AN=11;DP=21;Dels=0.00;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;Samples=qHZT-12c-s02_r2657_p4096_dJ-004,qHZT-12c-s02_r2657_p4096_dJ-006,qHZT-12c-s02_r2657_p4096_dJ-008,qHZT-12c-s02_r2657_p4096_dJ-010,qHZT-12c-s02_r2657_p4096_dJ-012;VariantType=SNP;set=qHZT-12c-s02_r2657_p4096_dJ-004_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-006_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-008_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-010_merged_sorted_ug_emitAll_raw.vcf-qHZT-12c-s02_r2657_p4096_dJ-012_merged_sorted_ug_emitAll_raw.vcf  GT:AD:DP:GQ:PL  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./../.  ./. ./. ./. ./. ./. ./. 0:.:2   0:.:2   1:0,2:2:76:76,0 0:.:2   1:0,2:2:70:70,0 0:.:2   1:0,1:2:37:37,0 0:.:2   1:0,2:2:60:60,0 0:.:1   1:0,2:2:75:75,0

There are many more examples of missed SNP calls. When using the HaplotypeCaller, I'm missing ~23% of the SNP calls. So...what can I do to tweak my variant detection pipeline so that I don't miss so many SNP calls?

As I mentioned, I'm currently getting better results with the UnifiedGenotyper walker. I'm only missing about 2% of all Alt SNP calls. Also, about half of that 2% are improperly being genotyped as Ref by Command #5. It appears to me that most of the variant calls I'm missing using the UnifiedGenotyper are at positions where I only have a single Sanger read covering the base, and the base quality score starts to fall below 25 (such as in individual #11 in the first attached screen shot, base quality score was 20). Attached is a second IGV screenshot of a different locus where I've also missed SNP calls using Command 5 (Examples 7-9). I've also included the read details for those positions, as well as the VCF file output from Command 5. I have seen at least one instance where I had two Sanger reads reporting an alternate allele, however, UG did not call the variant. In that case though, the base quality scores in both reads were very low (8); mapping quality was 60 for both reads.

Does anyone have any suggestions as to how I might alter any of the parameters to reduce (hopefully eliminate) the missed SNP calls. I think I would accept false positives over false negatives in this case. Or does anyone have any other idea as to what my problem might be?

Thanks so much!
Matt Maurer

Command 5 output for second screen shot file:

VCF output
The samples shown in the second attached screen shot correspond to the 11th and 12th groupings in the per-sample breakdown of the FORMAT tags.

Examples 7-8    163422  .   G   C   173 .   ABHom=1.00;AC=2;AF=0.182;AN=11;DP=14;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;SOR=1.609;Samples=qHZT-08a-s02_r2657_p4094_dJ-002,qHZT-08a-s02_r2657_p4094_dJ-008;VariantType=SNP;set=qHZT-08a-s02_r2657_p4094_dJ-002_merged_sorted_ug_emitAll_raw.vcf-qHZT-08a-s02_r2657_p4094_dJ-008_merged_sorted_ug_emitAll_raw.vcf GT:AD:DP:GQ:PL  0:.:1   1:0,4:4:99:203,0    0:.:1   0:.:1   0:.:1   0:.:1   0:.:1   1:0,1:1:54:54,0 0:.:1   ./. 0:.:1   0:.:1   ./. ./. ./. ./. ./. ./. ./. ./. ./../.  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
Example 9   163476  .   A   G   173 .   ABHom=1.00;AC=2;AF=0.167;AN=12;DP=15;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=1.00;MQ0=0;SOR=1.609;Samples=qHZT-08a-s02_r2657_p4094_dJ-002,qHZT-08a-s02_r2657_p4094_dJ-008;VariantType=SNP;set=qHZT-08a-s02_r2657_p4094_dJ-002_merged_sorted_ug_emitAll_raw.vcf-qHZT-08a-s02_r2657_p4094_dJ-008_merged_sorted_ug_emitAll_raw.vcf  GT:AD:DP:GQ:PL  0:.:1   1:0,4:4:99:203,0    0:.:1   0:.:1   0:.:1   0:.:1   0:.:1   1:0,1:1:57:57,0 0:.:1   0:.:1   0:.:1   0:.:1   .   .   .   .   .   .   .   .   .   .   .

Also, why are the GT's sometimes "./." as they are for site163422, and sometimes "." as they are for site 163476?

Read Details:

Example#7

Read name = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
Sample = qHZT-08a-s02_r2657_p4094_dJ-011
Read group = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
----------------------
Location = 163,422
Alignment start = 163,293 (+)
Cigar = 34S833M1D72M1I50M1I9M
Mapped = yes
Mapping quality = 60
Secondary = no
Supplementary = no
Duplicate = no
Failed QC = no
----------------------
Base = C
Base phred quality = 23
----------------------
RG = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
NM = 20
AS = 858
XS = 0
-------------------

Example 8

Read name = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
Sample = qHZT-08a-s02_r2657_p4094_dJ-011
Read group = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
----------------------
Location = 163,476
Alignment start = 163,293 (+)
Cigar = 34S833M1D72M1I50M1I9M
Mapped = yes
Mapping quality = 60
Secondary = no
Supplementary = no
Duplicate = no
Failed QC = no
----------------------
Base = G
Base phred quality = 15
----------------------
RG = qHZT-08a-s02_L_r2657_p4094_dJ-011_pcrP1_oMM575_2014-11-27_A11
NM = 20
AS = 858
XS = 0
-------------------

Example #9

Read name = qHZT-08a-s02_L_r2657_p4094_dJ-012_pcrP1_oMM575_2014-11-27_A12
Sample = qHZT-08a-s02_r2657_p4094_dJ-012
Read group = qHZT-08a-s02_L_r2657_p4094_dJ-012_pcrP1_oMM575_2014-11-27_A12
----------------------
Location = 163,422
Alignment start = 163,329 (+)
Cigar = 67S16M1D181M1D634M1D9M1I8M1I62M1D17M4S
Mapped = yes
Mapping quality = 60
Secondary = no
Supplementary = no
Duplicate = no
Failed QC = no
----------------------
Base = C
Base phred quality = 18
----------------------
RG = qHZT-08a-s02_L_r2657_p4094_dJ-012_pcrP1_oMM575_2014-11-27_A12
NM = 87
AS = 480
XS = 0
-------------------

Viewing all articles
Browse latest Browse all 1335

Trending Articles