Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

GenotypeGVCFs hangs on some positions

$
0
0

Hi all,

I am attempting to use the HaplotyperCaller / CombineGVCFs / GenotypeGVCFs to call variants on chrom X and Y of 769 samples (356 males, 413 females) sequenced at 12x coverage (WG sequening, but right not only calling X and Y).

I have called the samples according to the best practises using the HaplotypeCaller, using ploidy = 1 for males on X and Y and ploidy =2 for females on X, e.g.:

INFO 16:28:45,750 HelpFormatter - Program Args: -R /gcc/resources/b37/indices/human_g1k_v37.fa -T HaplotypeCaller -L X -ploidy 1 -minPruning 3 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -I /target/gpfs2/gcc/groups/gonl/projects/trio-analysis/rawdata_release2/A102.human_g1k_v37.trio_realigned.bam --sample_name A102a -o /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/A102a.chrX.hc.g.vcf

Then I have used CombineGVCFs to combine my samples in batches of 100 samples. Now I am attempting to genotype them and I face the same issue on both X (males + females) and Y (males only): It starts running fine and then just hang on a certain position. At first it crashed asking for additional memory but now with 24Gb or memory it simply stays at a single position for 24hrs until my job eventually stops due to walltime. Here is the chrom X output:

INFO  15:00:39,501 HelpFormatter - Program Args: -R /gcc/resources/b37/indices/human_g1k_v37.fa -T GenotypeGVCFs -ploidy 1 --dbsnp /gcc/resources/b37/snp/dbSNP/dbsnp_138.b37.vcf -stand_call_conf 10 -stand_emit_conf 10 --max_alternate_alleles 4 -o /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.vcf -L X -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.1.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.2.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.3.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.4.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.5.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.6.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.7.g.vcf -V /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.8.g.vcf
INFO  15:00:39,507 HelpFormatter - Executing as lfrancioli@targetgcc15-mgmt on Linux 3.0.80-0.5-default amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.
INFO  15:00:39,507 HelpFormatter - Date/Time: 2014/11/12 15:00:39
INFO  15:00:39,508 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:00:39,508 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:00:40,951 GenomeAnalysisEngine - Strictness is SILENT
INFO  15:00:41,153 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  15:57:53,416 RMDTrackBuilder - Writing Tribble index to disk for file /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.4.g.vcf.idx
INFO  17:09:39,597 RMDTrackBuilder - Writing Tribble index to disk for file /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.5.g.vcf.idx
INFO  18:21:00,656 RMDTrackBuilder - Writing Tribble index to disk for file /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.6.g.vcf.idx
INFO  19:30:46,624 RMDTrackBuilder - Writing Tribble index to disk for file /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.7.g.vcf.idx
INFO  20:22:38,368 RMDTrackBuilder - Writing Tribble index to disk for file /gcc/groups/gonl/tmp01/lfrancioli/chromX/hc/results/gonl.chrX.hc.8.g.vcf.idx
WARN  20:26:45,716 FSLockWithShared$LockAcquisitionTask - WARNING: Unable to lock file /gcc/resources/b37/snp/dbSNP/dbsnp_138.b37.vcf.idx because an IOException occurred with message: No locks available.
INFO  20:26:45,718 RMDTrackBuilder - Could not acquire a shared lock on index file /gcc/resources/b37/snp/dbSNP/dbsnp_138.b37.vcf.idx, falling back to using an in-memory index for this GATK run.
INFO  20:33:29,491 IntervalUtils - Processing 155270560 bp from intervals
INFO  20:33:29,628 GenomeAnalysisEngine - Preparing for traversal
INFO  20:33:29,635 GenomeAnalysisEngine - Done preparing for traversal
INFO  20:33:29,636 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  20:33:29,637 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  20:33:29,638 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  20:33:29,948 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
INFO  20:33:59,642 ProgressMeter -         X:65301         0.0    30.0 s      49.6 w        0.0%    19.8 h      19.8 h
INFO  20:34:59,820 ProgressMeter -         X:65301         0.0    90.0 s     149.1 w        0.0%    59.4 h      59.4 h
...
INFO  20:52:01,064 ProgressMeter -        X:177301         0.0    18.5 m    1837.7 w        0.1%    11.3 d      11.2 d
INFO  20:53:01,066 ProgressMeter -        X:177301         0.0    19.5 m    1936.9 w        0.1%    11.9 d      11.9 d
...
INFO  14:58:25,243 ProgressMeter -        X:177301         0.0    18.4 h   15250.3 w        0.1%    96.0 w      95.9 w
INFO  14:59:38,112 ProgressMeter -        X:177301         0.0    18.4 h   15250.3 w        0.1%    96.1 w      96.0 w
INFO  15:00:47,482 ProgressMeter -        X:177301         0.0    18.5 h   15250.3 w        0.1%    96.2 w      96.1 w
=>> PBS: job killed: walltime 86440 exceeded limit 86400

I would really appreciate if you could give me some pointer as how to handle this situation.

Thanks! Laurent


Viewing all articles
Browse latest Browse all 1335

Trending Articles