Hi team!
I am testing haplotypecaller with VectorLoglessPairHMM on a singel BAM. There are two weird things.
- There is no speedup going from -nct 1 to -nct 10.
- There is no speedup implementing VectorLoglessPairHMM.
I am very sorry, but here is the first lines of the log file. Hope you have a suggestion for what I can do to speed up the haplotypecaller successfully.
```sh INFO 21:37:58,043 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:37:58,045 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.2-2-gec30cee, Compiled 2014/07/17 15:22:03 INFO 21:37:58,045 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 21:37:58,045 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 21:37:58,048 HelpFormatter - Program Args: -T HaplotypeCaller -R /mnt/users/torfn/Projects/BosTau/Reference/Bos_taurus.UMD3.1.74.dna_rm.chromosome.ALL.fa -I /mnt/users/tikn/old_Backup2/cigene-pipeline-snp-detection/align_all/2052/2052_aln.posiSrt.withRG.dedup.bam --genotyping_mode DISCOVERY --dbsnp /mnt/users/torfn/Projects/BosTau/Reference/vcf_chr_ALL-dbSNP138.vcf -stand_emit_conf 10 -stand_call_conf 30 -minPruning 3 -o test.gatk.31.vcf -nct 10 --pair_hmm_implementation VECTOR_LOGLESS_CACHING INFO 21:37:58,052 HelpFormatter - Executing as tikn@m620-7 on Linux 2.6.32-504.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_71-mockbuild_2014_10_17_22_23-b00. INFO 21:37:58,052 HelpFormatter - Date/Time: 2014/11/27 21:37:58 INFO 21:37:58,052 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:37:58,053 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:37:58,331 GenomeAnalysisEngine - Strictness is SILENT INFO 21:37:58,521 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 INFO 21:37:58,538 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 21:37:58,866 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.33 INFO 21:37:58,892 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 21:37:59,211 MicroScheduler - Running the GATK in parallel mode with 10 total threads, 10 CPU thread(s) for each of 1 data thread(s), of 32 processors available on this machine INFO 21:37:59,338 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 21:38:00,229 GenomeAnalysisEngine - Done preparing for traversal INFO 21:38:00,230 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 21:38:00,231 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 21:38:00,232 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 21:38:00,446 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units INFO 21:38:00,448 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option Profiling is enabled only when running in single thread mode
Using AVX accelerated implementation of PairHMM INFO 21:38:04,922 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file INFO 21:38:04,923 VectorLoglessPairHMM - Using vectorized implementation of PairHMM INFO 21:38:30,237 ProgressMeter - 1:656214 0.0 30.0 s 49.6 w 0.0% 33.8 h 33.8 h INFO 21:39:30,239 ProgressMeter - 1:2160900 0.0 90.0 s 148.8 w 0.1% 30.8 h 30.8 h INFO 21:40:30,241 ProgressMeter - 1:3789347 0.0 2.5 m 248.0 w 0.1% 29.3 h 29.2 h INFO 21:41:30,242 ProgressMeter - 1:5347891 0.0 3.5 m 347.2 w 0.2% 29.0 h 29.0 h
```
kind reagards
Tim Knutsen