Hi, I am interested in calling variants from pooled samples. Specifically, I wish to determine SNP allele frequencies from samples that were made by pooling many individuals (1000+) together. I know that HaplotypeCaller is now recommended over UnifiedGenotyper in all cases. However, is this project an exception? I have:
- 1000s of individuals in each pooled sample
- only two possible alleles at every site
- I only need to call SNPs
- I can generate a set of known SNPs to call (does GENOTYPE_GIVEN_ALLELES work in HaplotypeCaller?)
- I have high read coverage
- I want to detect rare alleles as best as possible
If you still advise using HaplotypeCaller in this case, do you have any special suggestions? I'd like to maximize the -ploidy number to detect the rare alleles, but otherwise streamline the job. Thanks for any advice you can provide!