Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

Multithreading Queue with SGE

$
0
0

Hello,
I'm trying to run GATK-Queue in my SGE environment to parallelize my pipeline, but I'm having trouble making use of multiple CPUs (-nct). I can run the test scala qscripts (ExampleCountReads.scala, Hello_World.scala), but those don't call upon multithreading.

I am able to process using the scatter approach, but unable to parallelize these scatters.

Instead of trying to write my own scala from scratch, I modified one that I found here
https://github.com/CuppenResearch/GATK-QScripts/blob/master/HaplotypeCaller.scala

When I run with the scatter, I'm using this command:
java -Djava.io.tmpdir=tmp -jar /opt/tools/Queue/Queue.jar -S HaplotypeCaller.scala -R hg19.fa -I BWAmem_dupremoved_realigned.sorted.bam -nct 1 -nsc 5 -mem 4 -stand_emit_conf 10 -stand_call_conf 30 -O BWAmem_queue_haplo.vcf -jobRunner GridEngine -run -l DEBUG

Looking into the script, the -nct is set as:

@Argument(doc="Number of cpu threads per data thread", shortName="nct", required=true)
var numCPUThreads: Int = _

haplotypeCaller.num_cpu_threads_per_data_thread = numCPUThreads

The -nsc is set as:
@Argument(doc="Number of scatters", shortName="nsc", required=true)
var numScatters: Int = _
haplotypeCaller.scatterCount = numScatters

This command runs in my SGE environment just fine.

However, when I try to run with a modified -nct > 1:

java -Djava.io.tmpdir=tmp -jar /opt/tools/Queue/Queue.jar -S HaplotypeCaller.scala -R hg19.fa -I BWAmem_dupremoved_realigned.sorted.bam -nct 6 -nsc 1 -mem 24 -stand_emit_conf 10 -stand_call_conf 30 -O BWAmem_queue_haplo.vcf -jobRunner GridEngine -run -l DEBUG

INFO 14:46:00,276 QScriptManager - Compiling 1 QScript
DEBUG 14:46:00,277 QScriptManager - Compilation directory: /opt/tools/Queue-3.5-0-g36282e4/resources/tmp/Q-Classes-304214474060816321
INFO 14:46:02,609 QScriptManager - Compilation complete
INFO 14:46:02,739 HelpFormatter - ----------------------------------------------------------------------
INFO 14:46:02,739 HelpFormatter - Queue v3.5-0-g36282e4, Compiled 2015/11/25 04:03:40
INFO 14:46:02,739 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 14:46:02,739 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
DEBUG 14:46:02,739 HelpFormatter - Current directory: /opt/tools/Queue-3.5-0-g36282e4/resources
INFO 14:46:02,740 HelpFormatter - Program Args: -S HaplotypeCaller.scala -R /mnt/data/GENOMES/hg19/FASTA/hg19.fa -I /mnt/data/WholeGenomes/Family14/TX14-2_BWAmem_dupremoved_realigned.sorted.bam -nct 6 -nsc 1 -mem 24 -stand_emit_conf 10 -stand_call_conf 30 -O /mnt/data/WholeGenomes/Family14/TX14-2_BWAmem_queue_haplo.vcf -jobRunner GridEngine -run -l DEBUG
INFO 14:46:02,740 HelpFormatter - Executing as gridtech@CFRICAUSESHPC05 on Linux 2.6.32-573.12.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15.
INFO 14:46:02,740 HelpFormatter - Date/Time: 2016/06/07 14:46:02
INFO 14:46:02,740 HelpFormatter - ----------------------------------------------------------------------
INFO 14:46:02,740 HelpFormatter - ----------------------------------------------------------------------
INFO 14:46:02,746 QCommandLine - Scripting VariantCaller
DEBUG 14:46:02,829 QGraph - adding QNode: 0
INFO 14:46:02,844 QCommandLine - Added 1 functions
INFO 14:46:02,845 QGraph - Generating graph.
INFO 14:46:02,875 QGraph - Running jobs.
DEBUG 14:46:03,042 FunctionEdge - Starting: /opt/tools/Queue-3.5-0-g36282e4/resources > 'java' '-Xmx24576m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/opt/tools/Queue-3.5-0-g36282e4/resources/tmp' '-cp' '/opt/tools/Queue/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/mnt/data/WholeGenomes/Family14/TX14-2_BWAmem_dupremoved_realigned.sorted.bam' '-R' '/mnt/data/GENOMES/hg19/FASTA/hg19.fa' '-nct' '6' '-o' '/mnt/data/WholeGenomes/Family14/TX14-2_BWAmem_queue_haplo.vcf.raw_variants.vcf' '-stand_call_conf' '30.0' '-stand_emit_conf' '10.0'
INFO 14:46:03,043 FunctionEdge - Output written to /mnt/data/WholeGenomes/Family14/TX14-2_BWAmem_queue_haplo.vcf.raw_variants.vcf.out
INFO 14:46:03,075 GridEngineJobRunner - Native spec is: -V -l h_rss=29492M -pe smp_pe 6
ERROR 14:46:03,085 Retry - Caught error during attempt 1 of 4.
org.broadinstitute.gatk.queue.QException: Unable to submit job: job rejected: the requested parallel environment "smp_pe" does not exist
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply$mcV$sp(DrmaaJobRunner.scala:89)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85)
at org.broadinstitute.gatk.queue.util.Retry$.attempt(Retry.scala:49)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.start(DrmaaJobRunner.scala:85)
at org.broadinstitute.gatk.queue.engine.FunctionEdge.start(FunctionEdge.scala:84)
at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:453)
at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156)
at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:170)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:61)
at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)

I believe the error lies somewhere in linking the smp_pe to the numCPUThreads.

I have attached my scala script (had to rename to a .txt in order to attach).

I'm happy to try out a different HaplotypeCaller scala script as well.


Viewing all articles
Browse latest Browse all 1335

Trending Articles