Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all 1335 articles
Browse latest View live

GenotypeGVCFs and VariantFiltration tools

$
0
0

We are following "Calling variants on cohorts of samples using the HaplotypeCaller in GVCF mode" best practices using GATK 3.8.1 and Java 1.8. Thus we merged the raw.g.vcfs from HaplotypeCaller into one cohort.g.vcf and then carried out joint genotyping using the GenotypeGVCFs tool. We are working in a haploid model organism so we then tried to use the VariantFiltration tool on the output (which is a vcf file containing the information from all of the sequences with which we are working). However this failed and we got the error
"Line 2176: there aren't enough columns for line 102"
Others have encountered the same problem and I see that you have responded that the GATK and java versions are incompatible but this was several versions ago. Is this true for us? Please can you tell me where to go to next.


Using NIO with GATK4 HaplotypeCaller

$
0
0

Is GATK4 HaplotypeCaller NIO compatible? If not, is there another version that is?

Thanks!

"UKNOWN" zygosity in CSV file

$
0
0

Hi There,
I am using GATK 3 . Recently i checked two CSV and bam file for couple, that both of them are carrier of one pathogenic variant, But in CSV file, the zygosity of this variant in both of them labeled as "Unknown" and not Heterozygote.
I have two question:
1- What is main criteria to determine "zygosity" of one variant in GATK?
2-How can i eliminate false negative (or false positive) variants in final VCF (by GATK)?

Than you
Mojtaba

Is UnifiedGenotyper actually better than HaplotypeCaller for this pooled sample project?

$
0
0

Hi, I am interested in calling variants from pooled samples. Specifically, I wish to determine SNP allele frequencies from samples that were made by pooling many individuals (1000+) together. I know that HaplotypeCaller is now recommended over UnifiedGenotyper in all cases. However, is this project an exception? I have:

  • 1000s of individuals in each pooled sample
  • only two possible alleles at every site
  • I only need to call SNPs
  • I can generate a set of known SNPs to call (does GENOTYPE_GIVEN_ALLELES work in HaplotypeCaller?)
  • I have high read coverage
  • I want to detect rare alleles as best as possible

If you still advise using HaplotypeCaller in this case, do you have any special suggestions? I'd like to maximize the -ploidy number to detect the rare alleles, but otherwise streamline the job. Thanks for any advice you can provide!

HaplotypeCaller warnings DepthPerSampleHC

$
0
0

Hi I'm trying to do a multisample variant call using several bam files in the following cmd

/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/raw_variants.vcf

Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/mshef7_wt_vs_ips_raw_variants.vcf
10:26:29.719 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:26:29.935 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.935 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.0
10:26:29.935 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:26:29.935 INFO HaplotypeCaller - Executing as md1jale@sharc-node122.shef.ac.uk on Linux v3.10.0-693.11.6.el7.x86_64 amd64
10:26:29.936 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
10:26:29.936 INFO HaplotypeCaller - Start Date/Time: 14 February 2018 10:26:29 GMT
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
10:26:29.936 INFO HaplotypeCaller - Picard Version: 2.17.2
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:26:29.937 INFO HaplotypeCaller - Deflater: IntelDeflater
10:26:29.937 INFO HaplotypeCaller - Inflater: IntelInflater
10:26:29.937 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:26:29.937 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
10:26:29.937 INFO HaplotypeCaller - Initializing engine
10:26:30.520 INFO HaplotypeCaller - Done initializing engine
10:26:30.528 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
10:26:31.119 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:26:31.154 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:26:31.259 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:26:31.259 INFO IntelPairHmm - Available threads: 16
10:26:31.259 INFO IntelPairHmm - Requested threads: 4
10:26:31.259 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:26:31.298 INFO ProgressMeter - Starting traversal
10:26:31.298 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:26:33.832 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.865 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.880 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.911 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:34.733 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:41.497 INFO ProgressMeter - 1:15485 0.2 80 470.6

Despite having slight memory issues with running the above, the now command runs on providing large amount of memory, although i do get lots of WARN DepthPerSampleHC. Is this normal?

HaplotypeCaller gives error and generate vcd file with no variant call

$
0
0

Dear GATK Team,
I'm using GATK and picard to call short variant from plasmodium genome paired read fastq file . I used the HaplotypeCaller package after doing duplicate marking using picard MarkDuplicates package.
This output an error during variant call by HaplotypeCaller.
Kindly help resolve this issue

below is the log of GATK HaplotypeCaller step:

Using GATK jar /home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar HaplotypeCaller -R Pf_ref/pf_3D7_38_Genome.fasta -I ./bam_output/Day0_IJD_252_dedup.bam -O ./variant_output/Day0_IJD_252_raw_variants.g.vcf
22:44:31.686 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
22:44:32.283 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.283 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.5.1
22:44:32.284 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
22:44:32.285 INFO HaplotypeCaller - Executing as ubuntu@mrcclimbserver.vms.swansea.climb.ac.uk on Linux v4.4.0-127-generic amd64
22:44:32.286 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v9.0.1+11
22:44:32.286 INFO HaplotypeCaller - Start Date/Time: June 27, 2018 at 10:44:31 PM UTC
22:44:32.286 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.287 INFO HaplotypeCaller - ------------------------------------------------------------
22:44:32.289 INFO HaplotypeCaller - HTSJDK Version: 2.15.1
22:44:32.289 INFO HaplotypeCaller - Picard Version: 2.18.2
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:44:32.290 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:44:32.291 INFO HaplotypeCaller - Deflater: IntelDeflater
22:44:32.291 INFO HaplotypeCaller - Inflater: IntelInflater
22:44:32.291 INFO HaplotypeCaller - GCS max retries/reopens: 20
22:44:32.291 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
22:44:32.291 INFO HaplotypeCaller - Initializing engine
22:44:32.813 INFO HaplotypeCaller - Done initializing engine
22:44:32.828 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
22:44:32.849 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
22:44:32.856 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/ubuntu/gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
22:44:32.968 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
22:44:32.969 INFO IntelPairHmm - Available threads: 32
22:44:32.969 INFO IntelPairHmm - Requested threads: 4
22:44:32.969 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
22:44:33.047 INFO ProgressMeter - Starting traversal
22:44:33.048 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
22:44:33.065 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
22:44:33.066 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
22:44:33.070 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
22:44:33.070 INFO HaplotypeCaller - Shutting down engine
[June 27, 2018 at 10:44:33 PM UTC] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2147483648
Exception in thread "main" java.lang.IncompatibleClassChangeError: Inconsistent constant pool data in classfile for class org/broadinstitute/hellbender/transformers/ReadTransformer. Method lambda$identity$d67512bf$1(Lorg/broadinstitute/hellbender/utils/read/GATKRead;)Lorg/broadinstitute/hellbender/utils/read/GATKRead; at index 65 is CONSTANT_MethodRef and should be CONSTANT_InterfaceMethodRef
at org.broadinstitute.hellbender.transformers.ReadTransformer.identity(ReadTransformer.java:30)
at org.broadinstitute.hellbender.engine.GATKTool.makePreReadFilterTransformer(GATKTool.java:288)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:266)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

regards
Archie

All annotations in BP_RESOLUTION mode

$
0
0

Hello,

I was wondering if there is a way to output all annotations for all sites when running HaplotypeCaller with BP_RESOLUTION. Currently it outputs all annotations for only called variants. Thanks in advance.

Calling invaiant sites with the new pipeline of HaplotypeCaller

$
0
0

Hello,

I am using the new pipeline of haplotype caller in order to obtain a vcf file containing both variant and invariant sites.

For each individual, I called variant and invariant sites :

java -Xmx300g -jar GenomeAnalysisTK.jar \
     -T HaplotypeCaller \
     -R ref.fasta \
     -I ${INPUT}.bam \
     --genotyping_mode DISCOVERY 
     -stand_emit_conf 0 \
     -stand_call_conf 0 \
     -o ${INPUT}\_VC.vcf \
     --emitRefConfidence BP_RESOLUTION  \
     --variant_index_type LINEAR \
     --variant_index_parameter 128000 \
     -nct 16

In the vcf that I obtain, I indeed have every position.
The problem is that he INFO and QUAL fileds are empty (.) if the site is non variant.

KE332545.1      44      .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,503
KE332545.1      45      .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,518
KE332545.1      46      .       C       T,<NON_REF>     0       .       BaseQRankSum=-2.270;ClippingRankSum=-0.691;DP=17;MLEAC=0,0;MLEAF=0.00,0.00;MQ=38.98;MQ0=0;MQRankSum=0.099;ReadPosRankSum=0.493  GT:AD:DP:GQ:PL:SB      0/0:11,2,0:13:3:0,3,379,33,385,414:0,0,0,0
KE332545.1      47      .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,515
KE332545.1      48      .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,540
KE332545.1      49      .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,563

But I also wanted this information in order to use my filtering pipeline on those invariant sites as well !
Any solution ?

Thanks !

Muriel


i_variant_quality_by_depth/i_genotype_quality interpretation

$
0
0

When interpreting the output of HaplotypeCaller, what do the i_variant_quality_by_depth and i_genotype_quality
columns represent and which of these would be a good value on which to base an assessment of confidence in the variant call and quality? What scale are they on? Or is there a different column that would be better?

HaplotypeCaller output header and one position recode without error

$
0
0

I'm trying to run gatk4 HaplotypeCaller using the following command:

./gatk HaplotypeCaller -R ./reference.fasta --emit-ref-confidence GVCF --dbsnp ./samtools_gatk_common.vcf -I ./sample.bqsr.bam -O ./sample.gvcf --TMP_DIR ./tmp

the log output gives no error but the result *.gvcf file only contained header and one base recode. The dbsnp file was the intersection of samtools and gatk.

here the log file:

Using GATK jar /path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar HaplotypeCaller -R /path/to/index/chrom23.fasta --emit-ref-confidence GVCF --dbsnp /path/to/dbsnp/sample.dbsnp.vcf -I /path/to/BQSR/sample.bqsr.bam -O /path/to/result/sample.g.vcf --TMP_DIR /path/to/tmp
18:38:47.051 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
18:38:47.439 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.440 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.4.0
18:38:47.440 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
18:38:47.442 INFO  HaplotypeCaller - Executing as hankai@cngb-compute-e05-6.cngb.sz.hpc on Linux v2.6.32-696.el6.x86_64 amd64
18:38:47.442 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_172-b11
18:38:47.442 INFO  HaplotypeCaller - Start Date/Time: July 4, 2018 6:38:46 PM CST
18:38:47.442 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.442 INFO  HaplotypeCaller - ------------------------------------------------------------
18:38:47.443 INFO  HaplotypeCaller - HTSJDK Version: 2.14.3
18:38:47.443 INFO  HaplotypeCaller - Picard Version: 2.18.2
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:38:47.444 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:38:47.444 INFO  HaplotypeCaller - Deflater: IntelDeflater
18:38:47.444 INFO  HaplotypeCaller - Inflater: IntelInflater
18:38:47.444 INFO  HaplotypeCaller - GCS max retries/reopens: 20
18:38:47.444 INFO  HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
18:38:47.444 INFO  HaplotypeCaller - Initializing engine
18:38:50.210 INFO  FeatureManager - Using codec VCFCodec to read file file:///path/to/dbsnp/sample.dbsnp.vcf
18:38:50.292 INFO  HaplotypeCaller - Done initializing engine
18:38:50.303 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
18:38:50.303 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
18:38:51.794 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
18:38:51.817 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/path/to/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
18:38:51.915 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
18:38:51.916 INFO  IntelPairHmm - Available threads: 112
18:38:51.916 INFO  IntelPairHmm - Requested threads: 4
18:38:51.916 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
18:38:51.996 INFO  ProgressMeter - Starting traversal
18:38:51.997 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
18:39:02.152 INFO  ProgressMeter - pseudochrom_23:39888              0.2                   240           1418.4
18:39:12.324 INFO  ProgressMeter - pseudochrom_23:112351              0.3                   650           1918.6
18:39:22.383 INFO  ProgressMeter - pseudochrom_23:166271              0.5                   980           1935.1
18:39:32.471 INFO  ProgressMeter - pseudochrom_23:208604              0.7                  1240           1838.3
18:39:42.498 INFO  ProgressMeter - pseudochrom_23:270983              0.8                  1610           1912.8
18:39:52.827 INFO  ProgressMeter - pseudochrom_23:315473              1.0                  1890           1864.2
18:40:03.130 INFO  ProgressMeter - pseudochrom_23:368748              1.2                  2220           1872.5
18:40:13.602 INFO  ProgressMeter - pseudochrom_23:430805              1.4                  2590           1905.3
18:40:23.620 INFO  ProgressMeter - pseudochrom_23:512763              1.5                  3060           2003.9
18:40:33.781 INFO  ProgressMeter - pseudochrom_23:592148              1.7                  3540           2086.8
18:40:46.199 INFO  ProgressMeter - pseudochrom_23:661025              1.9                  3950           2075.3
18:40:56.336 INFO  ProgressMeter - pseudochrom_23:731629              2.1                  4380           2113.6
18:41:09.819 INFO  ProgressMeter - pseudochrom_23:835707              2.3                  5000           2176.7
18:41:19.874 INFO  ProgressMeter - pseudochrom_23:941548              2.5                  5630           2284.3
18:41:30.479 INFO  ProgressMeter - pseudochrom_23:1044902              2.6                  6230           2358.6
18:41:40.552 INFO  ProgressMeter - pseudochrom_23:1157010              2.8                  6910           2459.7
18:41:50.606 INFO  ProgressMeter - pseudochrom_23:1222918              3.0                  7310           2455.6
18:42:00.695 INFO  ProgressMeter - pseudochrom_23:1305523              3.1                  7790           2477.0
18:42:10.765 INFO  ProgressMeter - pseudochrom_23:1457789              3.3                  8680           2620.1
18:42:20.899 INFO  ProgressMeter - pseudochrom_23:1636208              3.5                  9750           2800.4
18:42:30.922 INFO  ProgressMeter - pseudochrom_23:1780023              3.6                 10640           2916.1
18:42:40.981 INFO  ProgressMeter - pseudochrom_23:1955789              3.8                 11720           3071.0
18:42:51.075 INFO  ProgressMeter - pseudochrom_23:2108472              4.0                 12660           3177.2
18:43:01.113 INFO  ProgressMeter - pseudochrom_23:2286350              4.2                 13710           3302.1
18:43:11.157 INFO  ProgressMeter - pseudochrom_23:2484540              4.3                 14930           3456.6
18:43:21.167 INFO  ProgressMeter - pseudochrom_23:2607582              4.5                 15660           3490.7
18:43:31.253 INFO  ProgressMeter - pseudochrom_23:2779264              4.7                 16750           3598.9
18:43:41.256 INFO  ProgressMeter - pseudochrom_23:2958401              4.8                 17840           3700.5
18:43:51.431 INFO  ProgressMeter - pseudochrom_23:3091735              5.0                 18670           3741.1
18:44:01.489 INFO  ProgressMeter - pseudochrom_23:3256919              5.2                 19650           3809.5
18:44:11.888 INFO  ProgressMeter - pseudochrom_23:3395538              5.3                 20500           3845.1
18:44:22.047 INFO  ProgressMeter - pseudochrom_23:3496925              5.5                 21130           3841.2
18:44:32.048 INFO  ProgressMeter - pseudochrom_23:3647997              5.7                 22050           3890.6
18:44:42.058 INFO  ProgressMeter - pseudochrom_23:3770277              5.8                 22830           3913.0
18:44:52.224 INFO  ProgressMeter - pseudochrom_23:3855394              6.0                 23350           3889.2
18:45:02.305 INFO  ProgressMeter - pseudochrom_23:3961378              6.2                 24000           3888.7
18:45:12.396 INFO  ProgressMeter - pseudochrom_23:4077288              6.3                 24700           3895.9
18:45:22.481 INFO  ProgressMeter - pseudochrom_23:4209807              6.5                 25510           3919.8
18:45:32.603 INFO  ProgressMeter - pseudochrom_23:4301812              6.7                 26100           3909.1
18:45:42.779 INFO  ProgressMeter - pseudochrom_23:4400034              6.8                 26720           3902.8
18:45:53.263 INFO  ProgressMeter - pseudochrom_23:4475456              7.0                 27180           3871.2
18:46:04.692 INFO  ProgressMeter - pseudochrom_23:4607856              7.2                 28000           3882.6
18:46:14.837 INFO  ProgressMeter - pseudochrom_23:4739532              7.4                 28790           3900.7
18:46:26.963 INFO  ProgressMeter - pseudochrom_23:4805956              7.6                 29230           3854.8
18:46:37.150 INFO  ProgressMeter - pseudochrom_23:4932551              7.8                 30010           3871.0
18:46:47.557 INFO  ProgressMeter - pseudochrom_23:5051360              7.9                 30750           3879.6
18:46:57.575 INFO  ProgressMeter - pseudochrom_23:5156893              8.1                 31410           3881.1
18:47:07.589 INFO  ProgressMeter - pseudochrom_23:5256960              8.3                 32020           3876.6
18:47:17.844 INFO  ProgressMeter - pseudochrom_23:5339306              8.4                 32520           3857.3
18:47:28.069 INFO  ProgressMeter - pseudochrom_23:5447309              8.6                 33170           3856.4
18:47:38.135 INFO  ProgressMeter - pseudochrom_23:5562641              8.8                 33870           3862.5
18:47:48.259 INFO  ProgressMeter - pseudochrom_23:5648642              8.9                 34390           3847.8
18:47:58.434 INFO  ProgressMeter - pseudochrom_23:5750249              9.1                 35010           3844.2
18:48:09.065 INFO  ProgressMeter - pseudochrom_23:5853949              9.3                 35650           3839.7
18:48:19.112 INFO  ProgressMeter - pseudochrom_23:5955110              9.5                 36280           3838.4
18:48:29.206 INFO  ProgressMeter - pseudochrom_23:6051364              9.6                 36860           3831.5
18:48:39.584 INFO  ProgressMeter - pseudochrom_23:6140606              9.8                 37400           3819.0
18:48:49.694 INFO  ProgressMeter - pseudochrom_23:6228203             10.0                 37930           3807.6
18:48:59.742 INFO  ProgressMeter - pseudochrom_23:6327447             10.1                 38550           3805.9
18:49:10.118 INFO  ProgressMeter - pseudochrom_23:6412023             10.3                 39070           3792.5
18:49:20.131 INFO  ProgressMeter - pseudochrom_23:6528580             10.5                 39780           3799.8
18:49:30.488 INFO  ProgressMeter - pseudochrom_23:6664489             10.6                 40640           3819.0
18:49:41.323 INFO  ProgressMeter - pseudochrom_23:6776006             10.8                 41330           3819.0
18:49:51.947 INFO  ProgressMeter - pseudochrom_23:6871397             11.0                 41910           3810.3
18:50:02.348 INFO  ProgressMeter - pseudochrom_23:6965003             11.2                 42470           3801.3
18:50:12.656 INFO  ProgressMeter - pseudochrom_23:7064647             11.3                 43070           3796.6
18:50:22.681 INFO  ProgressMeter - pseudochrom_23:7129699             11.5                 43450           3774.5
18:50:32.723 INFO  ProgressMeter - pseudochrom_23:7217180             11.7                 43990           3766.7
18:50:42.805 INFO  ProgressMeter - pseudochrom_23:7334195             11.8                 44720           3774.9
18:50:52.874 INFO  ProgressMeter - pseudochrom_23:7470037             12.0                 45560           3792.0
18:51:03.070 INFO  ProgressMeter - pseudochrom_23:7580430             12.2                 46240           3795.0
18:51:13.109 INFO  ProgressMeter - pseudochrom_23:7703064             12.4                 46990           3804.3
18:51:23.274 INFO  ProgressMeter - pseudochrom_23:7839176             12.5                 47810           3818.3
18:51:33.338 INFO  ProgressMeter - pseudochrom_23:7960865             12.7                 48540           3825.4
18:51:43.392 INFO  ProgressMeter - pseudochrom_23:8028264             12.9                 48960           3808.2
18:51:53.463 INFO  ProgressMeter - pseudochrom_23:8151834             13.0                 49710           3816.7
18:52:03.665 INFO  ProgressMeter - pseudochrom_23:8270942             13.2                 50430           3822.1
18:52:13.727 INFO  ProgressMeter - pseudochrom_23:8359715             13.4                 50970           3814.5
18:52:23.905 INFO  ProgressMeter - pseudochrom_23:8477290             13.5                 51650           3816.9
18:52:33.954 INFO  ProgressMeter - pseudochrom_23:8594099             13.7                 52380           3823.6
18:52:44.110 INFO  ProgressMeter - pseudochrom_23:8710379             13.9                 53100           3828.8
18:52:54.114 INFO  ProgressMeter - pseudochrom_23:8848199             14.0                 53970           3845.3
18:53:04.680 INFO  ProgressMeter - pseudochrom_23:8983340             14.2                 54800           3856.1
18:53:15.384 INFO  ProgressMeter - pseudochrom_23:9068836             14.4                 55310           3843.7
18:53:25.473 INFO  ProgressMeter - pseudochrom_23:9222012             14.6                 56240           3863.2
18:53:35.477 INFO  ProgressMeter - pseudochrom_23:9305881             14.7                 56750           3854.1
18:53:45.512 INFO  ProgressMeter - pseudochrom_23:9431585             14.9                 57500           3861.2
18:53:55.687 INFO  ProgressMeter - pseudochrom_23:9550933             15.1                 58210           3864.8
18:54:05.702 INFO  ProgressMeter - pseudochrom_23:9694239             15.2                 59090           3880.3
18:54:15.903 INFO  ProgressMeter - pseudochrom_23:9779200             15.4                 59620           3871.8
18:54:25.917 INFO  ProgressMeter - pseudochrom_23:9884556             15.6                 60260           3871.4
18:54:36.002 INFO  ProgressMeter - pseudochrom_23:9991326             15.7                 60900           3870.7
18:54:46.010 INFO  ProgressMeter - pseudochrom_23:10127422             15.9                 61710           3881.1
18:54:56.072 INFO  ProgressMeter - pseudochrom_23:10247506             16.1                 62430           3885.4
18:55:06.287 INFO  ProgressMeter - pseudochrom_23:10372627             16.2                 63210           3892.7
18:55:16.338 INFO  ProgressMeter - pseudochrom_23:10508632             16.4                 64040           3903.5
18:55:26.423 INFO  ProgressMeter - pseudochrom_23:10605673             16.6                 64630           3899.5
18:55:36.484 INFO  ProgressMeter - pseudochrom_23:10680890             16.7                 65090           3888.0
18:55:46.555 INFO  ProgressMeter - pseudochrom_23:10755549             16.9                 65530           3875.4
18:55:56.618 INFO  ProgressMeter - pseudochrom_23:10860581             17.1                 66160           3874.2
18:56:06.724 INFO  ProgressMeter - pseudochrom_23:10958345             17.2                 66750           3870.6
18:56:16.801 INFO  ProgressMeter - pseudochrom_23:11078670             17.4                 67480           3875.2
18:56:26.824 INFO  ProgressMeter - pseudochrom_23:11172750             17.6                 68070           3871.9
18:56:36.886 INFO  ProgressMeter - pseudochrom_23:11297520             17.7                 68800           3876.5
18:56:46.910 INFO  ProgressMeter - pseudochrom_23:11394420             17.9                 69390           3873.2
18:56:56.924 INFO  ProgressMeter - pseudochrom_23:11466077             18.1                 69840           3862.4
18:57:06.975 INFO  ProgressMeter - pseudochrom_23:11575994             18.2                 70500           3863.1
18:57:17.094 INFO  ProgressMeter - pseudochrom_23:11713112             18.4                 71340           3873.3
18:57:27.171 INFO  ProgressMeter - pseudochrom_23:11835109             18.6                 72080           3878.1
18:57:37.329 INFO  ProgressMeter - pseudochrom_23:11907584             18.8                 72540           3867.7
18:57:47.364 INFO  ProgressMeter - pseudochrom_23:12031631             18.9                 73340           3875.8
18:57:57.451 INFO  ProgressMeter - pseudochrom_23:12122040             19.1                 73890           3870.4
18:58:07.495 INFO  ProgressMeter - pseudochrom_23:12238860             19.3                 74590           3873.1
18:58:17.565 INFO  ProgressMeter - pseudochrom_23:12364885             19.4                 75350           3878.8
18:58:27.731 INFO  ProgressMeter - pseudochrom_23:12451270             19.6                 75890           3872.8
18:58:38.320 INFO  ProgressMeter - pseudochrom_23:12537057             19.8                 76410           3864.5
18:58:48.414 INFO  ProgressMeter - pseudochrom_23:12580452             19.9                 76650           3844.0
18:58:59.346 INFO  ProgressMeter - pseudochrom_23:12630247             20.1                 76930           3823.1
18:59:10.085 INFO  ProgressMeter - pseudochrom_23:12746384             20.3                 77510           3818.0
18:59:20.474 INFO  ProgressMeter - pseudochrom_23:12814970             20.5                 77930           3806.2
18:59:30.683 INFO  ProgressMeter - pseudochrom_23:12833522             20.6                 78040           3780.1
18:59:41.531 INFO  ProgressMeter - pseudochrom_23:12867911             20.8                 78220           3756.0
18:59:51.979 INFO  ProgressMeter - pseudochrom_23:12898083             21.0                 78380           3732.4
19:00:02.811 INFO  ProgressMeter - pseudochrom_23:12912010             21.2                 78460           3704.4
19:00:12.854 INFO  ProgressMeter - pseudochrom_23:12954239             21.3                 78720           3687.5
19:00:23.618 INFO  ProgressMeter - pseudochrom_23:13045215             21.5                 79170           3677.7
19:00:33.765 INFO  ProgressMeter - pseudochrom_23:13113654             21.7                 79520           3665.2
19:00:46.176 INFO  ProgressMeter - pseudochrom_23:13230637             21.9                 80100           3657.0
19:00:57.561 INFO  ProgressMeter - pseudochrom_23:13254119             22.1                 80230           3631.5
19:01:11.951 INFO  ProgressMeter - pseudochrom_23:13277140             22.3                 80370           3598.8
19:01:23.954 INFO  ProgressMeter - pseudochrom_23:13291793             22.5                 80450           3570.4
19:01:34.143 INFO  ProgressMeter - pseudochrom_23:13313750             22.7                 80580           3549.4
19:01:44.470 INFO  ProgressMeter - pseudochrom_23:13410560             22.9                 81090           3545.0
19:01:54.793 INFO  ProgressMeter - pseudochrom_23:13469784             23.0                 81440           3533.7
19:02:05.477 INFO  ProgressMeter - pseudochrom_23:13499022             23.2                 81590           3513.1
19:02:15.584 INFO  ProgressMeter - pseudochrom_23:13574066             23.4                 81950           3503.2
19:02:27.238 INFO  ProgressMeter - pseudochrom_23:13603519             23.6                 82110           3481.1
19:02:37.410 INFO  ProgressMeter - pseudochrom_23:13625698             23.8                 82240           3461.7
19:02:48.228 INFO  ProgressMeter - pseudochrom_23:13691826             23.9                 82570           3449.4
19:02:59.032 INFO  ProgressMeter - pseudochrom_23:13757035             24.1                 82950           3439.4
19:03:09.114 INFO  ProgressMeter - pseudochrom_23:13779661             24.3                 83100           3421.8
19:03:19.416 INFO  ProgressMeter - pseudochrom_23:13820635             24.5                 83330           3407.2
19:03:29.183 INFO  HaplotypeCaller - 55869059 read(s) filtered by: ((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
  55869059 read(s) filtered by: (((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter)
      55869059 read(s) filtered by: ((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
          55869059 read(s) filtered by: (((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
              55869059 read(s) filtered by: ((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
                  47376329 read(s) filtered by: (((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
                      46853127 read(s) filtered by: ((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
                          46853127 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
                              46853127 read(s) filtered by: MappingQualityReadFilter 
                      523202 read(s) filtered by: NotSecondaryAlignmentReadFilter 
                  8492730 read(s) filtered by: NotDuplicateReadFilter 

19:03:29.184 INFO  ProgressMeter - pseudochrom_23:13859898             24.6                 83586           3395.1
19:03:29.184 INFO  ProgressMeter - Traversal complete. Processed 83586 total regions in 24.6 minutes.
19:03:30.381 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
19:03:30.381 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
19:03:30.381 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
19:03:30.381 INFO  HaplotypeCaller - Shutting down engine
[July 4, 2018 7:03:30 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 24.73 minutes.
Runtime.totalMemory()=372873625

and the result *.gvcf:

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --dbsnp /path/to/dbsnp/sample.dbsnp.vcf --emit-ref-confidence GVCF --output /path/to/result/sample.g.vcf --input /path/to/BQSR/sample.bqsr.bam --reference /path/to/index/chrom23.fasta --TMP_DIR /path/to/tmp  --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --disable-tool-default-annotations false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.4.0,Date="July 4, 2018 6:38:51 PM CST">
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=pseudochrom_23,length=13860564>
##source=HaplotypeCaller
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  CL100020307_L01_17
pseudochrom_23  1   .   A   <NON_REF>   .   .   END=13860564    GT:DP:GQ:MIN_DP:PL  0/0:0:0:0:0,0,0

I don't know if it's reasonable to suppose that there must be some variation, as the dbsnp vcf file contained 11733 variation. Even if there is no variation, HaplotypeCaller should output all recode like position 1. But there is nothing.

a question about running HaplotypeCaller with intervals

$
0
0

Hi,

I have a question when running HaplotypeCaller functions with intervals on exome-seq data.
Here is the command I used:
java -jar gatk-package-4.0.6.0-local.jar HaplotypeCaller -R /espresso/share/genomes/hg38/genome.fa -I recal_reads.bam -O variants.g.vcf -ERC GVCF -L capture.bed

However, when I ran the command, I got the following message:
17:13:14.439 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 17:13:14.591 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.591 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.6.0 17:13:14.591 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/ 17:13:14.591 INFO HaplotypeCaller - Executing as ... on Linux v2.6.32-431.29.2.el6.x86_64 amd64 17:13:14.592 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_121-b13 17:13:14.592 INFO HaplotypeCaller - Start Date/Time: July 16, 2018 5:13:14 PM EDT 17:13:14.592 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.592 INFO HaplotypeCaller - ------------------------------------------------------------ 17:13:14.592 INFO HaplotypeCaller - HTSJDK Version: 2.16.0 17:13:14.592 INFO HaplotypeCaller - Picard Version: 2.18.7 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 17:13:14.592 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 17:13:14.593 INFO HaplotypeCaller - Deflater: IntelDeflater 17:13:14.593 INFO HaplotypeCaller - Inflater: IntelInflater 17:13:14.593 INFO HaplotypeCaller - GCS max retries/reopens: 20 17:13:14.593 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 17:13:14.593 INFO HaplotypeCaller - Initializing engine 17:13:15.037 INFO FeatureManager - Using codec BEDCodec to read file file:///capture.bed 17:13:16.883 INFO IntervalArgumentCollection - Processing 64190747 bp from intervals 17:13:17.009 INFO HaplotypeCaller - Shutting down engine [July 16, 2018 5:13:17 PM EDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2041053184 java.lang.NullPointerException at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:325) at java.util.ComparableTimSort.sort(ComparableTimSort.java:202) at java.util.Arrays.sort(Arrays.java:1312) at java.util.Arrays.sort(Arrays.java:1506) at java.util.ArrayList.sort(ArrayList.java:1454) at java.util.Collections.sort(Collections.java:141) at org.broadinstitute.hellbender.utils.IntervalUtils.sortAndMergeIntervals(IntervalUtils.java:459) at org.broadinstitute.hellbender.utils.IntervalUtils.getIntervalsWithFlanks(IntervalUtils.java:956) at org.broadinstitute.hellbender.utils.IntervalUtils.getIntervalsWithFlanks(IntervalUtils.java:971) at org.broadinstitute.hellbender.engine.MultiIntervalLocalReadShard.<init>(MultiIntervalLocalReadShard.java:59) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.makeReadShards(AssemblyRegionWalker.java:195) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:175) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:133) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

I did not see any error but it seems HaplotypeCaller did not run and there is no output.
So I will really appreciate it if I can get help from you guys.

Thank you!

Best,
Siyu

can VariantsToTable output the raw genotype call (i.e., 0/1) rather than the actual basecall (A/T)?

$
0
0

I'm interested in getting simple "heterozygous" or "homozygous" designations for all of the samples/SNPs in my multisample VCF file. In the past, I have been using the -GF GT option in VariantsToTable, and then annotating my basecalls in Excel as either heterozygous or homozygous. This takes forever since Excel isn't really built for big data like this. Is there a simple way to output all of the SNPs as 0/1, 0/0, 0/1, or 1/1 instead of C/A, A/A, G/T, C/C?

Short read data in highly repetitive genomic region for heterozygous individuals

$
0
0

Hello GATK team,

This might be a very general and overrated question but I appreciate your input. I am working with natural populations of plants (expected highly heterozygous individuals) and an enriched genomic region which contains some promoters of interest together with transposons, duplications and a lot of expected indels and SVs, including a potential paralog for one of our BACs. Unfortunately the long read sequencing is not yet ready so I am using the 2*75pb data and our BAC sequences as references to test how close we can get with HaplotypeCaller to see some SNP and short indel calls for an association analysis. Our coverage distribution seems to be heavily biased towards areas with duplications and potential TE and most of the assemblers based on local assembly are thrown off by our data. I have use very strict mapping parameters to avoid this problem with missaligned reads, given that we can't discard the possibility of having hyper-variable regions.

I understand that aiming for genotype calls is dangerous given our kind of data and the lack of a genome reference, so I am aiming to include the genotype likelihoods into the association analysis. With HaplotypeCaller I get a vcf file for my population and an associated PL value. My question is basically if given our type of data, do you consider that the local assembly inherent to HaplotypeCaller will give us false positives variants in the final output? Do you have any suggestion or alternative tools to get genotype likelihoods (without local assembly?) and input those into an association analysis tool?

I really appreciate your insight.

Best,

Distribution of RGQ scores

$
0
0

I work with non-human genomes and commonly need the confidence of the reference sites, so I was happy to see the inclusion of the RGQ score in the format field of GenotypeGVCFs. However, I am a little confused as to what this score means (how it is calculated). Out of curiosity I plotted the distribution of RGQ and GQ scores over ~1Mbp. A few things jumped out that I was hoping you could explain:

(1) There are two peaks of GQ and RGQ scores, one at 99 - which is obviously just the highest confidence score and another at exactly GQ/RGQ=45. You can see this in the GQ/RGQ distribution below. I've excluded the sites where RGQ/GQ = 0 or 99 (RGQ = blue, GQ=red) is there some reason why so many GT calls == 45?

(2) There are very few GQ = 0 calls and ~96% are GQ=99 - but in the RGQ ~42% == 0 and 54%=99. Is there any explanation why so many RGQ scores == 0? I fear that filtering on RGQ will bias the data against reference calls and include a disproportionate number of variant calls.

Issue of Haplotype call on a large chromosome (>536 Mb)

$
0
0

Hi
I tried to run HaplotypeCaller with GVCF mode. My reference genome is over 5 Gb in size. Below my code and error,

Using GATK jar /source/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:+UseSerialGC -Xmx100g -jar /source/gatk-4.0.6.0/gatk-package-4.0.6.0-local.jar HaplotypeCaller -R /data/Pseudomolecule_v3.fasta -L /IntervalFiles/0003-scattered.intervals -I WGS_FTNO.cram -O result/0003-scattered.vcf.gz -mbq 20 --native-pair-hmm-threads 4 -ERC GVCF --verbosity ERROR
[August 1, 2018 11:32:11 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2076049408
htsjdk.samtools.SAMException: Exception creating BAM index for slice slice: seqID 1, start 536834320, span 457789, records 259850.
at htsjdk.samtools.CRAMBAIIndexer.processSingleReferenceSlice(CRAMBAIIndexer.java:194)
at htsjdk.samtools.cram.CRAIIndex.openCraiFileAsBaiStream(CRAIIndex.java:180)
at htsjdk.samtools.SamIndexes.asBaiSeekableStreamOrNull(SamIndexes.java:78)
at htsjdk.samtools.CRAMFileReader.initWithStreams(CRAMFileReader.java:228)
at htsjdk.samtools.CRAMFileReader.(CRAMFileReader.java:219)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:422)
at htsjdk.samtools.SamReaderFactory.open(SamReaderFactory.java:105)
at org.broadinstitute.hellbender.engine.ReadsDataSource.(ReadsDataSource.java:227)
at org.broadinstitute.hellbender.engine.ReadsDataSource.(ReadsDataSource.java:162)
at org.broadinstitute.hellbender.engine.GATKTool.initializeReads(GATKTool.java:387)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:636)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:156)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:133)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32770
at htsjdk.samtools.CRAMBAIIndexer$BAMIndexBuilder.processSingleReferenceSlice(CRAMBAIIndexer.java:354)
at htsjdk.samtools.CRAMBAIIndexer$BAMIndexBuilder.access$100(CRAMBAIIndexer.java:227)
at htsjdk.samtools.CRAMBAIIndexer.processSingleReferenceSlice(CRAMBAIIndexer.java:192)
... 17 more

Does GATK4 handle large single chromosome ? Is there any solution ?


Mutect2 missed variant called by HaplotypeCaller

$
0
0

Hi,

I am running GATK 3.5.0 with java version 1.8.0. I have two cell line samples that I paired with a promega baseline reference (its essentially a mixed germline sample) to run Mutect2 (which I am aware of is not a part of the Best Practices). I also ran the tumour sample a lone using the HaplotypeCaller and noticed a very clear ALK variant that was missed by Mutect2 but called by the HaplotypeCaller in both samples. Due to the nature of the cell line we also expected to see an ALK variant which is why it was detected.

What I find odd is that the local reassembly of Mutect2 seems to have discarded the variant as the bamout does not contain the variant (C > T) at loci chr2:29443695 whereas the HaplotypeCaller call does for both samples. I have read through the documentation and the specifics of the local reassembly and would be very interested in knowing at what stage this occurs and your suggestions on what can be done.

I will be trying GATK v.4.0 as well as some of the things mentioned here https://software.broadinstitute.org/gatk/documentation/article?id=1235 in the meantime I would be very greatful if someone could look into this. I will be posting the updates on my new tests as well. See details below on various metrics and IGV screenshots.

The chemistry is a DNA capture Kapa hyperplus kit, 75 paired end reads.

Sample 945

  • Entire ALK covered up to 80X
  • Mean/min coverage 1013/378
  • BWA bam shows 50% allele frequency

HaplotypeCaller line Sample 945

  • chr2 29443695 . G T 8496.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.863;ClippingRankSum=-0.368;DP=601;ExcessHet=3.0103;FS=0.536;MLEAC=1;MLEAF=0.500;MQ=62.46;MQRankSum=1.113;QD=14.21;ReadPosRankSum=0.502;SOR=0.76GT:AD:DP:GQ:PL 0/1:300,298:598:99:8525,0,8240

Sample 946

  • Entire ALK covered up to 80x
  • Mean/min coverage 523/204
  • BWA bam shows 49% allele frequency

HaplotypeCaller line Sample 946

  • chr2 29443695 . G T 5056.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.569;ClippingRankSum=-0.212;DP=397;ExcessHet=3.0103;FS=2.133;MLEAC=1;MLEAF=0.500;MQ=63.61;MQRankSum=-1.274;QD=13.00;ReadPosRankSum=0.063;SOR=0.595 GT:AD:DP:GQ:PL 0/1:199,190:389:99:5085,0,5319

Promega control sample

  • Same control sample used as pair for both 945 and 946 using Mutect
  • Coverage around ALK region ~200+

Please see IGV images of the various cases below. The --bamout (run together with disabling optimization and forcing output) command was run with a 500bp padding downstream and upstream of the target location that contains the variant (i.e the actual padding upstream and downstream the actual variant at loci 29443695 will be slighly more than 500bp). I also ran mutect with the adjust 500bp but included all the targets in chr2 without adding any padding on any other targets other than the one that contains the variant.

Sample945_bwaBAM - Bam output from BWA

Sample946_bwaBAM - Bam output from BWA

Sample945_GATKForcedBamOut

Sample946_GATKForcedBamOut

Sample945_MutectForcedBamOutChr2

Sample946_MutectForcedBamOutChr2

Sample945_MutectForcedBamOutALKOnly

Sample946_MutectForcedBamOutALKOnly

Thank you very much and I look forward hearing your thoughts on this
Sabri

Haptyepecaller calls incorrect genotype in several site

$
0
0

Hi,
I found that the Haptyepecaller made heterozygous calls where there is no support for them in the BAM. We use IGV to compare input BAM and Haptyepecaller output bam. The region shown in the figure confused us. At the top of this figure is input-BAM while another is Haptyepecaller-output-bam. Haptyepecaller-output-gvcf also suggest this site is heterozygous.
It seems that it's the same issue as https://gatkforums.broadinstitute.org/gatk/discussion/2319/haplotypecaller-incorrectly-making-heterozygous-calls-again. In that question,your suggested solution is updating GATK. Howerer,we used GATK 3.8 and GATK4.0.6 and we got same results.
The command line we used is:
~/software/gatk-4.0.6.0/gatk --java-options "-Xmx30G" HaplotypeCaller -L chr01:9550000-9850000 -ERC GVCF -R -I -O <output_g.vcf> -bamout


Quality of mutation by constructing haplotypes

$
0
0

Hi there,
i have two questions
one can i construct haplotype using GATK haplotypecaller?
how can i check the quality of a mutation using haplotypes?

Is there any way to take vcf data and output 2 fastas - one of each of the sample's alleles?

$
0
0

So I looked at using the ReadBackedPhasing tool or the Haplotype caller but I already have a calling pipeline setup that works well with my data and I'm really just looking at a way to leverage the vcfs I generate to make consensus fastas of each allele. Sample data is diploid, currently I export to a fasta with ambiguity codes and use dnasp to generate the allele fastas, but I know there's got to be a good way to leverage that vcf information.

adapter removal and variant calling in samples with different library prep/pre-processing

$
0
0

Hi,

This question is an amalgamation of good practice and conceptual doubts. So I have a cohort of a non-model organism of say approx. 100 animals. 40 have been sequenced at 10x depth by Illumina 2500 machine and rest 60 have been sequenced at 30x depth by Illumina 4000. The samples that have been sequenced at 30x had their adapters removed during bcl2fastq conversion stage. Unfortunately, the samples that were sequenced at 10x did not have their adapters removed. On doing some fastqc analysis, adapters were found in those samples, but except for one or two samples, the lines did not reach the red zone.

I used BWA-mem for alignment. Theoretically, the adapters present in 10x samples get soft-clipped be default as they won't match the reference genome. Hence, I did not remove the adapters from those samples. My aim is to understand the genetic variation amongst those samples and hence followed the germline variant discovery pipeline (SNPs+Indels). The questions are:

1) Haplotypecaller does local-reassembly and throws away MAPQ information and also uses soft-clipped bases for re-alignment unless '--dontUseSoftClippedBases' is used. During realignment, technically, the adapter sequences won't align again and Haplotypecaller will call SNPs or indels from those regions?

2) Since a joint genotype calling is done at a later stage, when genotype calling is done at a region where adapter is present in a 10x sample, adapter won't be found at that region in the 30x samples, and a lower genotype quality score will be given to that particular locus with and SNP or probably an indel? I will be filtering positions (put them to missing) when GQ will be less than 40 which may reduce wrongly assigned variants/ genotypes.

3) Should I have removed the adapters before performing variant calling? I wanted to keep the pipeline same for all samples and because of my above understanding, I followed my procedure of not removing adapters from 10x depth samples.

Viewing all 1335 articles
Browse latest View live