Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all 1335 articles
Browse latest View live

HaplotypeCaller Error: SAM/BAM/CRAM Invalid GZIP header

$
0
0

This is my GATK (3.5-0-g36282e4) arguments Program Args:

-T HaplotypeCaller 
-R human_g1k_v37.22.fasta
-nct 16
-I ref.22.500x.bwamem.sorted.bqsr.bam
-I somatic_sim_af20_500x.bwamem.bqsr.bam
-I somatic_sim_het_500x.bwamem.sorted.bqsr.bam
-D All_20170403.vcf
-L 22
-o somatic_sim.hpcaller.22.vcf

This is the error message:

##### ERROR MESSAGE: SAM/BAM/CRAM file somatic_sim_het_500x.bwamem.sorted.bqsr.bam is malformed. .... Error details: Invalid GZIP header

The command worked fine the first time I ran it. However, I goofed and used the same ID in the read group for the het and af20 BAM file.

I ran

samtools addreplacerg ...
samtools view -H new_rg.bam >header.txt

Then I manually removed the old read group since I've noticed in the past, GATK will omit null genotypes for that sample

samtools reheader -P header.txt new_rg.bam >het.bam

Reindexed it from scratch and now I get this error.

I've used addreplacerg and HaplotypeCaller in the past successfully. However I never removed the older read groups.

samtools quickcheck -v het.bam 

Returns no error, so I'm at a lost here. Do BAM files typically have GZIP headers?

Thanks for any help or insight.


GATK 3.8: Allele-specific Annotations

$
0
0

I am using GATK-3.8.1 for HaplotypeCaller (using gVCF mode and then GenotypeGVCF) and I noticed that final VCF output from GenotypeGVCF has missing DP values.

I found a workaround that while doing gVCF calling if I skip the -G StandardAnnotation -G AS_StandardAnnotation parameters, but then if I keep those parameters during the GenotypeGVCF step, DP values are NOT missing anymore.

My question is if the values for 'Allele-specific annotations' in the Final VCF will be any different if you don't include those -G parameters while gVCF calling but include them during GenotypeGVCF mode COMPARED to when you include those during both steps (gVCF and GenotypeGVCF steps) ??

*I am not able to upgrade to GATK-4 yet, since we still want to utilize UnifiedGenotyper feature.

(How to) generate a complete realigned bam file using -bamout argument in HaplotypeCaller?

$
0
0

Hello, I want to get a realigned bam file for other tools to call variants, so I used the -bamout argument in HaplotypeCaller. I found that bam file is incomplete when I used only -bamout argument. When I set --disable-optimizations and -bamout arguments and added -forceActive and -dontTrimActiveRegions flags, error messages said that " A USER ERROR has occurred: f is not a recognized option". Maybe the program didn't recognize these flags. My command line is shown below:
'''
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/bin/gatk-package-4.0.3.0-local.jar HaplotypeCaller -R /root/data/reference/gatk_bundle/Homo_sapiens_assembly38.fasta -I /root/data/output/6_BQSR/SRR2188163.bqsr.bam --dbsnp /root/data/reference/gatk_bundle/dbsnp_146.hg38.vcf.gz -O SRR2188163.raw.2.vcf -bamout SRR2188163.bamout.2.bam --disable-optimizations true -forceActive -dontTrimActiveRegions
'''
Could you tell me how I can use HplotypeCaller to get a complet realigned bam file? Thanks a lot.
The first picture is the screenshot of the output bam file which I used -bamout to generate. I used -bamout and --disable-optimizations arguments without add any flags to get the result in second picture. And I failed to add flags.

Should I provide the exome target list (-L argu) even while calling gVCF file using Haplotypecaller?

$
0
0

Hi,

Recently we performed exome sequencing using Nextera Illumina platform for three samples (Father, Mother and Son). I downloaded the exome interval list from Illumina's website.

1) Trimmed the raw reads
2) Aligned the trimmed reads against the human reference hg19 as recommended for exome-sequencing
3) Then sorted, deduped, recalibrated the bam file.
4) Then performed variant calling in two steps process for all three samples individually
4.1) Used the GATK Haplotype Caller tool in GVCF mode
Command: java -Xmx16g -jar GenomeAnalysisTK.jar - T Haplotypecaller -R /GATK_bundle/hg19.fa -I sample1.sorted.dedup.recal.bam --emitRefConfidence GVCF --dbsnp /GATK_bundle/dbsnp.138.hg19.vcf -o sample1.raw.g.vcf
4.2) Used GenotypeGVCFs (Joint SNP calling) for all three samples together
Command: java -Xmx16g -jar GenomeAnalysisTK.jar - T GenotypeGVCFs -R /GATK_bundle/hg19.fa --variant sample1.raw.g.vcf --variant sample2.raw.g.vcf --variant sample3.raw.g.vcf --dbsnp /GATK_bundle/dbsnp.138.hg19.vcf -o sample1.2.3.trio.raw.vcf

In the above command, I didn't use the Illumina's exome interval list used for targeting the exomes in sequencing process.

As per this link "https://software.broadinstitute.org/gatk/documentation/article.php?id=4669", under the example section of GATK command lines, for exome sequencing the article suggests us to provide the exome targets using -L argument.

I have following queries,as per the aforementioned article
1) Should I provide the exome target list (-L argument) only while calling regular VCF file using Haplotype caller?
or
2) Should I provide the exome target list (-L argument) even while calling gVCF file using Haplotype caller?

HaplotypeCaller: Alternate allele get called or not depending on -ip option

$
0
0

Hi, I'm currently analyzing some data (exome-seq) using HaplotypeCaller and get what seems to me an odd behaviour:
The problem is that I've got a position which is clearly bi-allelic in IGV and that is said to have only the reference allele in the gVCF I'm generating with HaplotypeCaller.

Here is the command line I used:

nohup java -jar PATH/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R PATH/ucsc.hg19_noHaps.fasta \
-I PATH/JLCL254.realigned.recalibrated.bam \
-L PATH/merged.bed \
-ip 50 \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-o JLCL254.vcf 

Here is the gVCF line of the variant of interest:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT JLCL254

chr1 912049 . T NON_REF . . END=912049 GT:DP:GQ:MIN_DP:PL 0/0:68:0:68:0,0,0

The variant of interest is located 19bp away from the captured region but with "-ip 50" it should be detected.

To check what is really analyzed, I output the bamout for all analyzed regions (-L PATH/merged.bed, -ip 50) and saw that the location of the variant is not analyzed (If I'm correct: as there is no coverage, this is not an active region).

Then I forced the bamout at the location +/-20nt around my variant to check whether some reads with the alternate allele are still kept. I used:
-L chr1:912029-912069 \
-forceActive \
-disableOptimizations

Doing so I've being able to see that many reads with the alternate allele are indeed still kept. The gVCF file generated along with the bamout file contains now my variant of interest with the alternate allele:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT JLCL254

chr1 912049 rs9803103 T C,NON_REF 1235.77 . BaseQRankSum=-0.677;ClippingRankSum=-0.942;DB;DP=61;MLEAC=1,0;MLEAF=0.500,0.00;MQ=54.88;MQRankSum=-0.600;ReadPosRankSum=-0.195 GT:AD:DP:GQ:PL:SB 0/1:19,42,0:61:99:1264,0,448,1321,574,1895:3,16,6,36

Please see an IGV screenshot:

Tracks are (from top to bottom):
* the original bam file
* the bamout for all captured regions (known from the file -L PATH/merged.bed)
* the forced bamout (at the location of the variant i.e -L chr1:912029-912069)
* merged.bed is the file used with the -L option.

Finally, I tried to call variants changing the -ip option to 100 and got the alternate allele called.

Please note that:
If I manually add/subtract 50bp to the closest target region boundaries, I've got the same result as with -ip 50.
If I manually add/subtract 100bp to the closest target region boundaries, I've got the same result as with -ip 100.

I tried several versions of GATK (3.46, 3.7, 4.0.4.0) and always got the same results.

I may have miss something but so far I can't explain myself what's happening. Do you see any explanation for what I observe? Do you see any options I should use to overcome this?
Many thanks in advance for your help.

NB: java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)

Phased Heterozygous SNP

$
0
0

Dear all,

I have difficulties in understanding the genotypes of the phased SNPs. Here i have a SNP where only one read has a reference allele and 11 reads have an alternate allele and is called as heterozygous SNP.

 chr15  8485088 .   G   T   4936.33 PASS     
 BaseQRankSum=1.82;ClippingRankSum=0;ExcessHet=0;FS=2.399;InbreedingCoeff=0.721;
 MQ=60;MQRankSum=0;QD=32.86;ReadPosRankSum=0.267;SOR=1.167;
 DP=10789;AF=0.013;MLEAC=13;MLEAF=0.012;AN=1300;AC=28    
GT:AD:DP:GQ:PGT:PID:PL  0/1:1,12:13:3:0|1:8485088_G_T:485,0,3

The genotype for a single sample from a multi-sample VCF is shown here. Could someone throw light on how to interpret the genotype as heterozygous as only one read has reference allele. It should have been called as homozygous SNP. Is this a bug or am i missing something also IGV does not show the reference read.(GATK Version=3.7-0-gcfedb67).

Is there a paper describing the »Haplotype Caller algorithm?

$
0
0

Hi,

I'd like to ask you if there is a paper describing the Haplotype Caller algorithm, if you could please send me the reference. I have tried to find it, but I only found the paper on GATK which is great, but it doesn't describe in detail the Haplotype Caller algorithm.

thank you,

HaplotypeCaller sensitivity in large(ish) cohorts

$
0
0

One of my projects currently has ~150 patients (exomes) that I've been processing through the standard pipeline (2.8-1, including ReduceReads). In my most recent run through HC, I split the cohort in half for the sake of time. A subset of these patients have undergone targeted genotyping in the clinic, and I have a list of 36 validated variants in 28 samples. When I checked these variants in the final VCF, 5 of 36 were not called by HaplotypeCaller and have moderate to excellent support in the BAM. Several of these (possibly all of them? Not sure) were present in previous HC and UG runs with fewer samples, and I verified that the one I'm focusing on is called correctly when I only use five samples.

Debugging runs on a small region have revealed the following:

  1. ReduceReads does not seem to be the culprit, my variant is still uncalled when using the un-reduced bams
  2. My variant is not inside an Active Region
  3. When I force it to be with -forceActive, it's not in the trimmed ActiveRegion
  4. I've tried increasing -maxNumHaplotypesInPopulation as high as 1024, and the trimmed region still doesn't include my variant
  5. I've also tried running with -dontTrimActiveRegions, but haven't successfully finished yet (runtime increases from 30 seconds to over an hour, I keep trying to run it in short queues while I'm doing other stuff and getting killed by the scheduler)

A couple of other random notes that may or may not be applicable: These are rare variants that I only expect to see in 1 or 2 samples. My testing region is ~400bp around the variant in question. There is a variant in another sample at an immediately adjacent nucleotide that is also not called (and, perhaps obviously, is also outside the active regions).

Do you have any suggestions for approaching this? I haven't messed with -minPruning yet, as increasing that value should result in a loss of sensitivity and reducing it seems like a bad idea. I suppose I could split my cohort into subsets of 30 or 40 samples, but that doesn't seem like the best approach


Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4

$
0
0

Hi,

I am trying to filter my GATK 4.0.3 - HaplotypeCaller generated multi-sample VCF for allele depth (AD) annotation at sample genotype-level (so available in "FORMAT" fields of each sample).

I think prior to GATK 4, this annotation was available as "Allele Balance" (AB) ratios (generated by AlleleBalanceBySample), but it is not available anymore in GATK 4. So I tried to filter genotypes based on AD field, that is exactly the same thing but indicated in "X,Y" format, so in an array format of integers. This array format makes it difficult to filter based on depth of alternative allele divided by depth of all alleles at a specific site.

Can you please recommend any solution to this problem? If I could turn this array into a ratio, I could easily filter genotypes using VariantFiltration or other tools such as vcflib/vcffilter. I also tried the below code (following https://gatkforums.broadinstitute.org/gatk/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk):

gatk VariantFiltration -R $ref -V $vcf -O $output --genotype-filter-expression 'vc.getGenotype("Sample1").getAD().1 / vc.getGenotype("Sample1").getAD().0 > 0.33' --set-filtered-genotype-to-no-call --genotype-filter-name 'ABfilter'

This worked, but strangely it filters the variant for all samples if only one of the sample have allele depths that are not in balance (defined by the filter). If it worked only for Sample1, I was planning to write a quick loop for all the samples for instance. I tried the same with GATK 3.8, but still it filters whole variant for all the samples if it is filtered in just one sample.

GenotypeGVCFs and VariantFiltration tools

$
0
0

We are following "Calling variants on cohorts of samples using the HaplotypeCaller in GVCF mode" best practices using GATK 3.8.1 and Java 1.8. Thus we merged the raw.g.vcfs from HaplotypeCaller into one cohort.g.vcf and then carried out joint genotyping using the GenotypeGVCFs tool. We are working in a haploid model organism so we then tried to use the VariantFiltration tool on the output (which is a vcf file containing the information from all of the sequences with which we are working). However this failed and we got the error
"Line 2176: there aren't enough columns for line 102"
Others have encountered the same problem and I see that you have responded that the GATK and java versions are incompatible but this was several versions ago. Is this true for us? Please can you tell me where to go to next.

GATK HaplotypeCaller missing SNPs at the terminals of the segment when calling SNPs for Influenza A

$
0
0

We are trying to call variants for Influenza A virus sequenced by MiSeq using HaplotypeCaller following GATK best practices (GATK version 3.7). However, when checking in IGV the called variants with BAM file, we frequently identify snps that are missed by HaplotypeCaller at the beginning or the end of a segment. The missing ones are well supported by the reads, and are called by samtools and UnifiedGenotyper with high confidence.

As one example (showing below), there are three rows of called variants at the top, from top to bottom, called by UnifiedGenotyper, samtools, and HaplotypeCaller. The right most snp is called by first two tools but missed by HaplotypeCaller, although the support reads show consistent snp readouts.

Just to show that this snp is well supported by the reads, here is the vcf record reporting this snp in VCF generated by UnifiedGenotyper:

A-New_Jersey-NHRC_93408-2016-H3N2(KY078630)-HA 15 . A T 166598 . AC=1;AF=1.00;AN=1;DP=3970;Dels=0.00;FS=0.000;HaplotypeScore=26.7856;MLEAC=1;MLEAF=1.00;MQ=59.99;MQ0=0;QD=34.24;SOR=4.823 GT:AD:DP:GQ:PL 1:0,3969:3970:99:166628,0

A close check in the HaplotypeCaller generated BAM file for debugging, we noticed that the variant is consistently missing from the de novo generated Haplotypes.

There are also other cases of missing snps. The similarity is that they are always at the terminal of the segment, well supported by reads, and only HaplotypeCaller misses them. However, for some samples, similar variants at the terminal are called by HaplotypeCaller.

My question is following:

  • is this a bug of HaplotypeCaller? If so, has it been fixed?
  • if not a bug, is there a parameter of HaplotypeCaller that can be set to guarantee that it will not miss the good quality variants at the terminal?

Many thanks.

Using NIO with GATK4 HaplotypeCaller

$
0
0

Is GATK4 HaplotypeCaller NIO compatible? If not, is there another version that is?

Thanks!

i am running haplotypcaller in one bam file

$
0
0

java -jar GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference/GRCh37/hs37d5.fa -I output.bam --dbsnp reference/gatkbundle/dbsnp_138.b37.vcf -o output.g.vcf -ERC GVCF

i am trying add one more bam file to my cohort. i ran one bam file separately, but while applying gvcf mode it is not running . it is throwing the below error.

MESSAGE: Invalid command line: Argument emitRefConfidence has a bad value: Can only be used in single sample mode currently. Use the sample_name argument to run on a single sample out of a multi-sample BAM file

SNP calling using pooled RNA-seq data

$
0
0

Hello,

First of all, thank you for your detailed best practice pipeline for SNP calling from RNA-seq data.

I have pooled RNA seq data which I need to call SNP from. Each library consists of a pooled sample of 2-3 individuals of the same sex-tissue combination.

I was wondering if Haplotype caller can handle SNP calling from pooled sequences or is it better if I use FreeBayes?

I understand that these results come from experimenting with the data but it would be great if you could share your experiences with me on this.

Cheers,
Homa

"UKNOWN" zygosity in CSV file

$
0
0

Hi There,
I am using GATK 3 . Recently i checked two CSV and bam file for couple, that both of them are carrier of one pathogenic variant, But in CSV file, the zygosity of this variant in both of them labeled as "Unknown" and not Heterozygote.
I have two question:
1- What is main criteria to determine "zygosity" of one variant in GATK?
2-How can i eliminate false negative (or false positive) variants in final VCF (by GATK)?

Than you
Mojtaba


Is UnifiedGenotyper actually better than HaplotypeCaller for this pooled sample project?

$
0
0

Hi, I am interested in calling variants from pooled samples. Specifically, I wish to determine SNP allele frequencies from samples that were made by pooling many individuals (1000+) together. I know that HaplotypeCaller is now recommended over UnifiedGenotyper in all cases. However, is this project an exception? I have:

  • 1000s of individuals in each pooled sample
  • only two possible alleles at every site
  • I only need to call SNPs
  • I can generate a set of known SNPs to call (does GENOTYPE_GIVEN_ALLELES work in HaplotypeCaller?)
  • I have high read coverage
  • I want to detect rare alleles as best as possible

If you still advise using HaplotypeCaller in this case, do you have any special suggestions? I'd like to maximize the -ploidy number to detect the rare alleles, but otherwise streamline the job. Thanks for any advice you can provide!

HaplotypeCaller gives error and generate vcd file with no variant call

$
0
0

Dear GATK Team,
I'm using GATK and picard to call short variant from plasmodium genome paired read fastq file . I used the HaplotypeCaller package after doing duplicate marking using picard MarkDuplicates package.
This output an error during variant call by HaplotypeCaller.
Kindly help resolve this issue

below is the log of GATK HaplotypeCaller step:

Using GATK jar /home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar HaplotypeCaller -R Pf_ref/pf_3D7_38_Genome.fasta -I ./bam_output/Day0_IJD_252_dedup.bam -O ./variant_output/Day0_IJD_252_raw_variants.g.vcf
22:44:31.686 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
22:44:32.283 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.283 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.5.1
22:44:32.284 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
22:44:32.285 INFO HaplotypeCaller - Executing as ubuntu@mrcclimbserver.vms.swansea.climb.ac.uk on Linux v4.4.0-127-generic amd64
22:44:32.286 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v9.0.1+11
22:44:32.286 INFO HaplotypeCaller - Start Date/Time: June 27, 2018 at 10:44:31 PM UTC
22:44:32.286 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.287 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.289 INFO HaplotypeCaller - HTSJDK Version: 2.15.1
22:44:32.289 INFO HaplotypeCaller - Picard Version: 2.18.2
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:44:32.291 INFO HaplotypeCaller - Deflater: IntelDeflater
22:44:32.291 INFO HaplotypeCaller - Inflater: IntelInflater
22:44:32.291 INFO HaplotypeCaller - GCS max retries/reopens: 20
22:44:32.291 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
22:44:32.291 INFO HaplotypeCaller - Initializing engine
22:44:32.813 INFO HaplotypeCaller - Done initializing engine
22:44:32.828 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
22:44:32.849 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
22:44:32.856 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
22:44:32.968 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
22:44:32.969 INFO IntelPairHmm - Available threads: 32
22:44:32.969 INFO IntelPairHmm - Requested threads: 4
22:44:32.969 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
22:44:33.047 INFO ProgressMeter - Starting traversal
22:44:33.048 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
22:44:33.065 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
22:44:33.066 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
22:44:33.070 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
22:44:33.070 INFO HaplotypeCaller - Shutting down engine
[June 27, 2018 at 10:44:33 PM UTC] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2147483648
Exception in thread "main" java.lang.IncompatibleClassChangeError: Inconsistent constant pool data in classfile for class org/broadinstitute/hellbender/transformers/ReadTransformer. Method lambda$identity$d67512bf$1(Lorg/broadinstitute/hellbender/utils/read/GATKRead;)Lorg/broadinstitute/hellbender/utils/read/GATKRead; at index 65 is CONSTANT_MethodRef and should be CONSTANT_InterfaceMethodRef
at org.broadinstitute.hellbender.transformers.ReadTransformer.identity(ReadTransformer.java:30)
at org.broadinstitute.hellbender.engine.GATKTool.makePreReadFilterTransformer(GATKTool.java:288)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:266)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

regards
Archie

Calling invaiant sites with the new pipeline of HaplotypeCaller

$
0
0

Hello,

I am using the new pipeline of haplotype caller in order to obtain a vcf file containing both variant and invariant sites.

For each individual, I called variant and invariant sites :

java -Xmx300g -jar GenomeAnalysisTK.jar \
     -T HaplotypeCaller \
     -R ref.fasta \
     -I ${INPUT}.bam \
     --genotyping_mode DISCOVERY 
     -stand_emit_conf 0 \
     -stand_call_conf 0 \
     -o ${INPUT}\_VC.vcf \
     --emitRefConfidence BP_RESOLUTION  \
     --variant_index_type LINEAR \
     --variant_index_parameter 128000 \
     -nct 16

In the vcf that I obtain, I indeed have every position.
The problem is that he INFO and QUAL fileds are empty (.) if the site is non variant.

KE332545.1      44      .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,503
KE332545.1      45      .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,518
KE332545.1      46      .       C       T,<NON_REF>     0       .       BaseQRankSum=-2.270;ClippingRankSum=-0.691;DP=17;MLEAC=0,0;MLEAF=0.00,0.00;MQ=38.98;MQ0=0;MQRankSum=0.099;ReadPosRankSum=0.493  GT:AD:DP:GQ:PL:SB      0/0:11,2,0:13:3:0,3,379,33,385,414:0,0,0,0
KE332545.1      47      .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,515
KE332545.1      48      .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,540
KE332545.1      49      .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,563

But I also wanted this information in order to use my filtering pipeline on those invariant sites as well !
Any solution ?

Thanks !

Muriel

All annotations in BP_RESOLUTION mode

$
0
0

Hello,

I was wondering if there is a way to output all annotations for all sites when running HaplotypeCaller with BP_RESOLUTION. Currently it outputs all annotations for only called variants. Thanks in advance.

HaplotypeCaller output header and one position recode without error

$
0
0

I'm trying to run gatk4 HaplotypeCaller using the following command:

./gatk HaplotypeCaller -R ./reference.fasta --emit-ref-confidence GVCF --dbsnp ./samtools_gatk_common.vcf -I ./sample.bqsr.bam -O ./sample.gvcf --TMP_DIR ./tmp

the log output gives no error but the result *.gvcf file only contained header and one base recode. The dbsnp file was the intersection of samtools and gatk.

here the log file:

Using GATK jar /path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar HaplotypeCaller -R /path/to/index/chrom23.fasta --emit-ref-confidence GVCF --dbsnp /path/to/dbsnp/sample.dbsnp.vcf -I /path/to/BQSR/sample.bqsr.bam -O /path/to/result/sample.g.vcf --TMP_DIR /path/to/tmp
18:38:47.051 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
18:38:47.439 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.440 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.4.0
18:38:47.440 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
18:38:47.442 INFO  HaplotypeCaller - Executing as hankai@cngb-compute-e05-6.cngb.sz.hpc on Linux v2.6.32-696.el6.x86_64 amd64
18:38:47.442 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_172-b11
18:38:47.442 INFO  HaplotypeCaller - Start Date/Time: July 4, 2018 6:38:46 PM CST
18:38:47.442 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.442 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.443 INFO  HaplotypeCaller - HTSJDK Version: 2.14.3
18:38:47.443 INFO  HaplotypeCaller - Picard Version: 2.18.2
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:38:47.444 INFO  HaplotypeCaller - Deflater: IntelDeflater
18:38:47.444 INFO  HaplotypeCaller - Inflater: IntelInflater
18:38:47.444 INFO  HaplotypeCaller - GCS max retries/reopens: 20
18:38:47.444 INFO  HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
18:38:47.444 INFO  HaplotypeCaller - Initializing engine
18:38:50.210 INFO  FeatureManager - Using codec VCFCodec to read file file:///path/to/dbsnp/sample.dbsnp.vcf
18:38:50.292 INFO  HaplotypeCaller - Done initializing engine
18:38:50.303 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
18:38:50.303 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
18:38:51.794 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
18:38:51.817 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
18:38:51.915 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
18:38:51.916 INFO  IntelPairHmm - Available threads: 112
18:38:51.916 INFO  IntelPairHmm - Requested threads: 4
18:38:51.916 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
18:38:51.996 INFO  ProgressMeter - Starting traversal
18:38:51.997 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
18:39:02.152 INFO  ProgressMeter - pseudochrom_23:39888              0.2                   240           1418.4
18:39:12.324 INFO  ProgressMeter - pseudochrom_23:112351              0.3                   650           1918.6
18:39:22.383 INFO  ProgressMeter - pseudochrom_23:166271              0.5                   980           1935.1
18:39:32.471 INFO  ProgressMeter - pseudochrom_23:208604              0.7                  1240           1838.3
18:39:42.498 INFO  ProgressMeter - pseudochrom_23:270983              0.8                  1610           1912.8
18:39:52.827 INFO  ProgressMeter - pseudochrom_23:315473              1.0                  1890           1864.2
18:40:03.130 INFO  ProgressMeter - pseudochrom_23:368748              1.2                  2220           1872.5
18:40:13.602 INFO  ProgressMeter - pseudochrom_23:430805              1.4                  2590           1905.3
18:40:23.620 INFO  ProgressMeter - pseudochrom_23:512763              1.5                  3060           2003.9
18:40:33.781 INFO  ProgressMeter - pseudochrom_23:592148              1.7                  3540           2086.8
18:40:46.199 INFO  ProgressMeter - pseudochrom_23:661025              1.9                  3950           2075.3
18:40:56.336 INFO  ProgressMeter - pseudochrom_23:731629              2.1                  4380           2113.6
18:41:09.819 INFO  ProgressMeter - pseudochrom_23:835707              2.3                  5000           2176.7
18:41:19.874 INFO  ProgressMeter - pseudochrom_23:941548              2.5                  5630           2284.3
18:41:30.479 INFO  ProgressMeter - pseudochrom_23:1044902              2.6                  6230           2358.6
18:41:40.552 INFO  ProgressMeter - pseudochrom_23:1157010              2.8                  6910           2459.7
18:41:50.606 INFO  ProgressMeter - pseudochrom_23:1222918              3.0                  7310           2455.6
18:42:00.695 INFO  ProgressMeter - pseudochrom_23:1305523              3.1                  7790           2477.0
18:42:10.765 INFO  ProgressMeter - pseudochrom_23:1457789              3.3                  8680           2620.1
18:42:20.899 INFO  ProgressMeter - pseudochrom_23:1636208              3.5                  9750           2800.4
18:42:30.922 INFO  ProgressMeter - pseudochrom_23:1780023              3.6                 10640           2916.1
18:42:40.981 INFO  ProgressMeter - pseudochrom_23:1955789              3.8                 11720           3071.0
18:42:51.075 INFO  ProgressMeter - pseudochrom_23:2108472              4.0                 12660           3177.2
18:43:01.113 INFO  ProgressMeter - pseudochrom_23:2286350              4.2                 13710           3302.1
18:43:11.157 INFO  ProgressMeter - pseudochrom_23:2484540              4.3                 14930           3456.6
18:43:21.167 INFO  ProgressMeter - pseudochrom_23:2607582              4.5                 15660           3490.7
18:43:31.253 INFO  ProgressMeter - pseudochrom_23:2779264              4.7                 16750           3598.9
18:43:41.256 INFO  ProgressMeter - pseudochrom_23:2958401              4.8                 17840           3700.5
18:43:51.431 INFO  ProgressMeter - pseudochrom_23:3091735              5.0                 18670           3741.1
18:44:01.489 INFO  ProgressMeter - pseudochrom_23:3256919              5.2                 19650           3809.5
18:44:11.888 INFO  ProgressMeter - pseudochrom_23:3395538              5.3                 20500           3845.1
18:44:22.047 INFO  ProgressMeter - pseudochrom_23:3496925              5.5                 21130           3841.2
18:44:32.048 INFO  ProgressMeter - pseudochrom_23:3647997              5.7                 22050           3890.6
18:44:42.058 INFO  ProgressMeter - pseudochrom_23:3770277              5.8                 22830           3913.0
18:44:52.224 INFO  ProgressMeter - pseudochrom_23:3855394              6.0                 23350           3889.2
18:45:02.305 INFO  ProgressMeter - pseudochrom_23:3961378              6.2                 24000           3888.7
18:45:12.396 INFO  ProgressMeter - pseudochrom_23:4077288              6.3                 24700           3895.9
18:45:22.481 INFO  ProgressMeter - pseudochrom_23:4209807              6.5                 25510           3919.8
18:45:32.603 INFO  ProgressMeter - pseudochrom_23:4301812              6.7                 26100           3909.1
18:45:42.779 INFO  ProgressMeter - pseudochrom_23:4400034              6.8                 26720           3902.8
18:45:53.263 INFO  ProgressMeter - pseudochrom_23:4475456              7.0                 27180           3871.2
18:46:04.692 INFO  ProgressMeter - pseudochrom_23:4607856              7.2                 28000           3882.6
18:46:14.837 INFO  ProgressMeter - pseudochrom_23:4739532              7.4                 28790           3900.7
18:46:26.963 INFO  ProgressMeter - pseudochrom_23:4805956              7.6                 29230           3854.8
18:46:37.150 INFO  ProgressMeter - pseudochrom_23:4932551              7.8                 30010           3871.0
18:46:47.557 INFO  ProgressMeter - pseudochrom_23:5051360              7.9                 30750           3879.6
18:46:57.575 INFO  ProgressMeter - pseudochrom_23:5156893              8.1                 31410           3881.1
18:47:07.589 INFO  ProgressMeter - pseudochrom_23:5256960              8.3                 32020           3876.6
18:47:17.844 INFO  ProgressMeter - pseudochrom_23:5339306              8.4                 32520           3857.3
18:47:28.069 INFO  ProgressMeter - pseudochrom_23:5447309              8.6                 33170           3856.4
18:47:38.135 INFO  ProgressMeter - pseudochrom_23:5562641              8.8                 33870           3862.5
18:47:48.259 INFO  ProgressMeter - pseudochrom_23:5648642              8.9                 34390           3847.8
18:47:58.434 INFO  ProgressMeter - pseudochrom_23:5750249              9.1                 35010           3844.2
18:48:09.065 INFO  ProgressMeter - pseudochrom_23:5853949              9.3                 35650           3839.7
18:48:19.112 INFO  ProgressMeter - pseudochrom_23:5955110              9.5                 36280           3838.4
18:48:29.206 INFO  ProgressMeter - pseudochrom_23:6051364              9.6                 36860           3831.5
18:48:39.584 INFO  ProgressMeter - pseudochrom_23:6140606              9.8                 37400           3819.0
18:48:49.694 INFO  ProgressMeter - pseudochrom_23:6228203             10.0                 37930           3807.6
18:48:59.742 INFO  ProgressMeter - pseudochrom_23:6327447             10.1                 38550           3805.9
18:49:10.118 INFO  ProgressMeter - pseudochrom_23:6412023             10.3                 39070           3792.5
18:49:20.131 INFO  ProgressMeter - pseudochrom_23:6528580             10.5                 39780           3799.8
18:49:30.488 INFO  ProgressMeter - pseudochrom_23:6664489             10.6                 40640           3819.0
18:49:41.323 INFO  ProgressMeter - pseudochrom_23:6776006             10.8                 41330           3819.0
18:49:51.947 INFO  ProgressMeter - pseudochrom_23:6871397             11.0                 41910           3810.3
18:50:02.348 INFO  ProgressMeter - pseudochrom_23:6965003             11.2                 42470           3801.3
18:50:12.656 INFO  ProgressMeter - pseudochrom_23:7064647             11.3                 43070           3796.6
18:50:22.681 INFO  ProgressMeter - pseudochrom_23:7129699             11.5                 43450           3774.5
18:50:32.723 INFO  ProgressMeter - pseudochrom_23:7217180             11.7                 43990           3766.7
18:50:42.805 INFO  ProgressMeter - pseudochrom_23:7334195             11.8                 44720           3774.9
18:50:52.874 INFO  ProgressMeter - pseudochrom_23:7470037             12.0                 45560           3792.0
18:51:03.070 INFO  ProgressMeter - pseudochrom_23:7580430             12.2                 46240           3795.0
18:51:13.109 INFO  ProgressMeter - pseudochrom_23:7703064             12.4                 46990           3804.3
18:51:23.274 INFO  ProgressMeter - pseudochrom_23:7839176             12.5                 47810           3818.3
18:51:33.338 INFO  ProgressMeter - pseudochrom_23:7960865             12.7                 48540           3825.4
18:51:43.392 INFO  ProgressMeter - pseudochrom_23:8028264             12.9                 48960           3808.2
18:51:53.463 INFO  ProgressMeter - pseudochrom_23:8151834             13.0                 49710           3816.7
18:52:03.665 INFO  ProgressMeter - pseudochrom_23:8270942             13.2                 50430           3822.1
18:52:13.727 INFO  ProgressMeter - pseudochrom_23:8359715             13.4                 50970           3814.5
18:52:23.905 INFO  ProgressMeter - pseudochrom_23:8477290             13.5                 51650           3816.9
18:52:33.954 INFO  ProgressMeter - pseudochrom_23:8594099             13.7                 52380           3823.6
18:52:44.110 INFO  ProgressMeter - pseudochrom_23:8710379             13.9                 53100           3828.8
18:52:54.114 INFO  ProgressMeter - pseudochrom_23:8848199             14.0                 53970           3845.3
18:53:04.680 INFO  ProgressMeter - pseudochrom_23:8983340             14.2                 54800           3856.1
18:53:15.384 INFO  ProgressMeter - pseudochrom_23:9068836             14.4                 55310           3843.7
18:53:25.473 INFO  ProgressMeter - pseudochrom_23:9222012             14.6                 56240           3863.2
18:53:35.477 INFO  ProgressMeter - pseudochrom_23:9305881             14.7                 56750           3854.1
18:53:45.512 INFO  ProgressMeter - pseudochrom_23:9431585             14.9                 57500           3861.2
18:53:55.687 INFO  ProgressMeter - pseudochrom_23:9550933             15.1                 58210           3864.8
18:54:05.702 INFO  ProgressMeter - pseudochrom_23:9694239             15.2                 59090           3880.3
18:54:15.903 INFO  ProgressMeter - pseudochrom_23:9779200             15.4                 59620           3871.8
18:54:25.917 INFO  ProgressMeter - pseudochrom_23:9884556             15.6                 60260           3871.4
18:54:36.002 INFO  ProgressMeter - pseudochrom_23:9991326             15.7                 60900           3870.7
18:54:46.010 INFO  ProgressMeter - pseudochrom_23:10127422             15.9                 61710           3881.1
18:54:56.072 INFO  ProgressMeter - pseudochrom_23:10247506             16.1                 62430           3885.4
18:55:06.287 INFO  ProgressMeter - pseudochrom_23:10372627             16.2                 63210           3892.7
18:55:16.338 INFO  ProgressMeter - pseudochrom_23:10508632             16.4                 64040           3903.5
18:55:26.423 INFO  ProgressMeter - pseudochrom_23:10605673             16.6                 64630           3899.5
18:55:36.484 INFO  ProgressMeter - pseudochrom_23:10680890             16.7                 65090           3888.0
18:55:46.555 INFO  ProgressMeter - pseudochrom_23:10755549             16.9                 65530           3875.4
18:55:56.618 INFO  ProgressMeter - pseudochrom_23:10860581             17.1                 66160           3874.2
18:56:06.724 INFO  ProgressMeter - pseudochrom_23:10958345             17.2                 66750           3870.6
18:56:16.801 INFO  ProgressMeter - pseudochrom_23:11078670             17.4                 67480           3875.2
18:56:26.824 INFO  ProgressMeter - pseudochrom_23:11172750             17.6                 68070           3871.9
18:56:36.886 INFO  ProgressMeter - pseudochrom_23:11297520             17.7                 68800           3876.5
18:56:46.910 INFO  ProgressMeter - pseudochrom_23:11394420             17.9                 69390           3873.2
18:56:56.924 INFO  ProgressMeter - pseudochrom_23:11466077             18.1                 69840           3862.4
18:57:06.975 INFO  ProgressMeter - pseudochrom_23:11575994             18.2                 70500           3863.1
18:57:17.094 INFO  ProgressMeter - pseudochrom_23:11713112             18.4                 71340           3873.3
18:57:27.171 INFO  ProgressMeter - pseudochrom_23:11835109             18.6                 72080           3878.1
18:57:37.329 INFO  ProgressMeter - pseudochrom_23:11907584             18.8                 72540           3867.7
18:57:47.364 INFO  ProgressMeter - pseudochrom_23:12031631             18.9                 73340           3875.8
18:57:57.451 INFO  ProgressMeter - pseudochrom_23:12122040             19.1                 73890           3870.4
18:58:07.495 INFO  ProgressMeter - pseudochrom_23:12238860             19.3                 74590           3873.1
18:58:17.565 INFO  ProgressMeter - pseudochrom_23:12364885             19.4                 75350           3878.8
18:58:27.731 INFO  ProgressMeter - pseudochrom_23:12451270             19.6                 75890           3872.8
18:58:38.320 INFO  ProgressMeter - pseudochrom_23:12537057             19.8                 76410           3864.5
18:58:48.414 INFO  ProgressMeter - pseudochrom_23:12580452             19.9                 76650           3844.0
18:58:59.346 INFO  ProgressMeter - pseudochrom_23:12630247             20.1                 76930           3823.1
18:59:10.085 INFO  ProgressMeter - pseudochrom_23:12746384             20.3                 77510           3818.0
18:59:20.474 INFO  ProgressMeter - pseudochrom_23:12814970             20.5                 77930           3806.2
18:59:30.683 INFO  ProgressMeter - pseudochrom_23:12833522             20.6                 78040           3780.1
18:59:41.531 INFO  ProgressMeter - pseudochrom_23:12867911             20.8                 78220           3756.0
18:59:51.979 INFO  ProgressMeter - pseudochrom_23:12898083             21.0                 78380           3732.4
19:00:02.811 INFO  ProgressMeter - pseudochrom_23:12912010             21.2                 78460           3704.4
19:00:12.854 INFO  ProgressMeter - pseudochrom_23:12954239             21.3                 78720           3687.5
19:00:23.618 INFO  ProgressMeter - pseudochrom_23:13045215             21.5                 79170           3677.7
19:00:33.765 INFO  ProgressMeter - pseudochrom_23:13113654             21.7                 79520           3665.2
19:00:46.176 INFO  ProgressMeter - pseudochrom_23:13230637             21.9                 80100           3657.0
19:00:57.561 INFO  ProgressMeter - pseudochrom_23:13254119             22.1                 80230           3631.5
19:01:11.951 INFO  ProgressMeter - pseudochrom_23:13277140             22.3                 80370           3598.8
19:01:23.954 INFO  ProgressMeter - pseudochrom_23:13291793             22.5                 80450           3570.4
19:01:34.143 INFO  ProgressMeter - pseudochrom_23:13313750             22.7                 80580           3549.4
19:01:44.470 INFO  ProgressMeter - pseudochrom_23:13410560             22.9                 81090           3545.0
19:01:54.793 INFO  ProgressMeter - pseudochrom_23:13469784             23.0                 81440           3533.7
19:02:05.477 INFO  ProgressMeter - pseudochrom_23:13499022             23.2                 81590           3513.1
19:02:15.584 INFO  ProgressMeter - pseudochrom_23:13574066             23.4                 81950           3503.2
19:02:27.238 INFO  ProgressMeter - pseudochrom_23:13603519             23.6                 82110           3481.1
19:02:37.410 INFO  ProgressMeter - pseudochrom_23:13625698             23.8                 82240           3461.7
19:02:48.228 INFO  ProgressMeter - pseudochrom_23:13691826             23.9                 82570           3449.4
19:02:59.032 INFO  ProgressMeter - pseudochrom_23:13757035             24.1                 82950           3439.4
19:03:09.114 INFO  ProgressMeter - pseudochrom_23:13779661             24.3                 83100           3421.8
19:03:19.416 INFO  ProgressMeter - pseudochrom_23:13820635             24.5                 83330           3407.2
19:03:29.183 INFO  HaplotypeCaller - 55869059 read(s) filtered by: ((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
  55869059 read(s) filtered by: (((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter)
      55869059 read(s) filtered by: ((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
          55869059 read(s) filtered by: (((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
              55869059 read(s) filtered by: ((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
                  47376329 read(s) filtered by: (((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
                      46853127 read(s) filtered by: ((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
                          46853127 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
                              46853127 read(s) filtered by: MappingQualityReadFilter 
                      523202 read(s) filtered by: NotSecondaryAlignmentReadFilter 
                  8492730 read(s) filtered by: NotDuplicateReadFilter 

19:03:29.184 INFO  ProgressMeter - pseudochrom_23:13859898             24.6                 83586           3395.1
19:03:29.184 INFO  ProgressMeter - Traversal complete. Processed 83586 total regions in 24.6 minutes.
19:03:30.381 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
19:03:30.381 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
19:03:30.381 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
19:03:30.381 INFO  HaplotypeCaller - Shutting down engine
[July 4, 2018 7:03:30 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 24.73 minutes.
Runtime.totalMemory()=372873625

and the result *.gvcf:

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --dbsnp /path/to/dbsnp/sample.dbsnp.vcf --emit-ref-confidence GVCF --output /path/to/result/sample.g.vcf --input /path/to/BQSR/sample.bqsr.bam --reference /path/to/index/chrom23.fasta --TMP_DIR /path/to/tmp  --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --disable-tool-default-annotations false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.4.0,Date="July 4, 2018 6:38:51 PM CST">
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=pseudochrom_23,length=13860564>
##source=HaplotypeCaller
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  CL100020307_L01_17
pseudochrom_23  1   .   A   <NON_REF>   .   .   END=13860564    GT:DP:GQ:MIN_DP:PL  0/0:0:0:0:0,0,0

I don't know if it's reasonable to suppose that there must be some variation, as the dbsnp vcf file contained 11733 variation. Even if there is no variation, HaplotypeCaller should output all recode like position 1. But there is nothing.

Viewing all 1335 articles
Browse latest View live