I am using HaplotypeCaller with a sample which I expect to be entirely homozygous. Therefore I only really expect 1 haplotype per site and others may be reads derived from paralogs/repeats not present in the reference genome.
When I use HaplotypeCaller it would be useful if I could: (1) retrieve the haplotypes created - it's likely only the one most similar to the reference is the appropriate haplotype for the region (2) Retrieve reads from the alternate haplotypes (3) Exclude reads that are likely derived from those haplotypes
What would happen if I used homozygous sample and set "--maxNumHaplotypesInPopulation" to 1 how does it choose the 1 haplotype and what happens to reads that don't match that haplotype?