Hello everyone,
Can anyone please explain to me whether we need all the reads for the 3rd step of the HaplotypeCaller ("Determine likelihoods of the haplotypes given the read data") as evidence or only the reads that contain the active regions?
In other words, is the data set for determining the haplotype likelihoods the same data set that we feed into the HaplotypeCaller to begin with?
If the data set is the same as the initial bam file, does that mean that we need more data to perform the PairHMM algorithm in the 3rd step than to perform the Smith-Waterman algorithm in the 2nd step?
Hope my question makes sense and thank you in advance!