GATK release 2.2 was released on October 31, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Base Quality Score Recalibration
- Improved the algorithm around homopolymer runs to use a "delocalized context".
- Massive performance improvements that allow these tools to run efficiently (and correctly) in multi-threaded mode.
- Fixed bug where the tool failed for reads that begin with insertions.
- Fixed bug in the scatter-gather functionality.
- Added new argument to enable emission of the .pdf output file (see --plot_pdf_file).
Unified Genotyper
- Massive runtime performance improvement for multi-allelic sites; -maxAltAlleles now defaults to 6.
- The genotyper no longer emits the Stand Bias (SB) annotation by default. Use the --computeSLOD argument to enable it.
- Added the ability to automatically down-sample out low grade contamination from the input bam files using the --contamination_fraction_to_filter argument; by default the value is set at 0.05 (5%).
- Fixed annotations (AD, FS, DP) that were miscalculated when run on a Reduce Reads processed bam.
- Fixed bug for the general ploidy model that occasionally caused it to choose the wrong allele when there are multiple possible alleles to choose from.
- Fixed bug where the inbreeding coefficient was computed at monomorphic sites.
- Fixed edge case bug where we could abort prematurely in the special case of multiple polymorphic alleles and samples with drastically different coverage.
- Fixed bug in the general ploidy model where it wasn't counting errors in insertions correctly.
- The FisherStrand annotation is now computed both with and without filtering low-qual bases (we compute both p-values and take the maximum one - i.e. least significant).
- Fixed annotations (particularly AD) for indel calls; previous versions didn't accurately bin reads into the reference or alternate sets correctly.
- Generalized ploidy model now handles reference calls correctly.
Haplotype Caller
- Massive runtime performance improvement for multi-allelic sites; -maxAltAlleles now defaults to 6.
- Massive runtime performance improvement to the HMM code which underlies the likelihood model of the HaplotypeCaller.
- Added the ability to automatically down-sample out low grade contamination from the input bam files using the --contamination_fraction_to_filter argument; by default the value is set at 0.05 (5%).
- Now requires at least 10 samples to merge variants into complex events.
Variant Annotator
- Fixed annotations for indel calls; previous versions either didn't compute the annotations at all or did so incorrectly for many of them.
Reduce Reads
- Fixed several bugs where certain reads were either dropped (fully or partially) or registered as occurring at the wrong genomic location.
- Fixed bugs where in rare cases N bases were chosen as consensus over legitimate A,C,G, or T bases.
- Significant runtime performance optimizations; the average runtime for a single exome file is now just over 2 hours.
Variant Filtration
- Fixed a bug where DP couldn't be filtered from the FORMAT field, only from the INFO field.
Variant Eval
- AlleleCount stratification now supports records with ploidy other than 2.
Combine Variants
- Fixed bug where the AD field was not handled properly. We now strip the AD field out whenever the alleles change in the combined file.
- Now outputs the first non-missing QUAL, not the maximum.
Select Variants
- Fixed bug where the AD field was not handled properly. We now strip the AD field out whenever the alleles change in the combined file.
- Removed the -number argument because it gave biased results.
Validate Variants
- Added option to selectively choose particular strict validation options.
- Fixed bug where mixed genotypes (e.g. ./1) would incorrectly fail.
- improved the error message around unused ALT alleles.
Somatic Indel Detector
- Fixed several bugs, including missing AD/DP header lines and putting annotations in correct order (Ref/Alt).
Miscellaneous
- New CPU "nano" parallelization option (-nct) added GATK-wide (see docs for more details about this cool new feature that allows parallelization even for Read Walkers).
- Fixed raw HapMap file conversion bug in VariantsToVCF.
- Added GATK-wide command line argument (-maxRuntime) to control the maximum runtime allowed for the GATK.
- Fixed bug in GenotypeAndValidate where it couldn't handle both SNPs and indels.
- Fixed bug where VariantsToTable did not handle lists and nested arrays correctly.
- Fixed bug in BCF2 writer for case where all genotypes are missing.
- Fixed bug in DiagnoseTargets when intervals with zero coverage were present.
- Fixed bug in Phase By Transmission when there are no likelihoods present.
- Fixed bug in fasta .fai generation.
- Updated and improved version of the BadCigar read filter.
- Picard jar remains at version 1.67.1197.
- Tribble jar remains at version 110.