Can I get per task profiling stats in PySpark?
up vote
1
down vote
favorite
I'm trying to investigate straggling tasks in my PySpark job (tasks which take much longer than the p50/p75 tasks) to understand why some tasks take much longer to execute versus other tasks in the same stage.
The default PySpark profiler provides aggregate stats for RDDs, but is there a way I can get profiling stats at the task level?
apache-spark pyspark
add a comment |
up vote
1
down vote
favorite
I'm trying to investigate straggling tasks in my PySpark job (tasks which take much longer than the p50/p75 tasks) to understand why some tasks take much longer to execute versus other tasks in the same stage.
The default PySpark profiler provides aggregate stats for RDDs, but is there a way I can get profiling stats at the task level?
apache-spark pyspark
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm trying to investigate straggling tasks in my PySpark job (tasks which take much longer than the p50/p75 tasks) to understand why some tasks take much longer to execute versus other tasks in the same stage.
The default PySpark profiler provides aggregate stats for RDDs, but is there a way I can get profiling stats at the task level?
apache-spark pyspark
I'm trying to investigate straggling tasks in my PySpark job (tasks which take much longer than the p50/p75 tasks) to understand why some tasks take much longer to execute versus other tasks in the same stage.
The default PySpark profiler provides aggregate stats for RDDs, but is there a way I can get profiling stats at the task level?
apache-spark pyspark
apache-spark pyspark
asked Nov 18 at 1:33
krishonadish
2941211
2941211
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53357146%2fcan-i-get-per-task-profiling-stats-in-pyspark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown