Pass Every Column in a Row into a Hash Function in Spark SQL
I have a table with N
columns, I want to concatenate them all to a string column and then perform a hash on that column. I have found a similar question in Scala.
I want to do this entirely inside of Spark SQL ideally, I have tried HASH(*) as myhashcolumn
but due to several columns being sometimes null I can't make this work as I would expected.
If I have to create a UDF and register it to make this happen, I need to use Python and not Scala as all my other code is in Python.
Any ideas?
python apache-spark pyspark apache-spark-sql
add a comment |
I have a table with N
columns, I want to concatenate them all to a string column and then perform a hash on that column. I have found a similar question in Scala.
I want to do this entirely inside of Spark SQL ideally, I have tried HASH(*) as myhashcolumn
but due to several columns being sometimes null I can't make this work as I would expected.
If I have to create a UDF and register it to make this happen, I need to use Python and not Scala as all my other code is in Python.
Any ideas?
python apache-spark pyspark apache-spark-sql
add a comment |
I have a table with N
columns, I want to concatenate them all to a string column and then perform a hash on that column. I have found a similar question in Scala.
I want to do this entirely inside of Spark SQL ideally, I have tried HASH(*) as myhashcolumn
but due to several columns being sometimes null I can't make this work as I would expected.
If I have to create a UDF and register it to make this happen, I need to use Python and not Scala as all my other code is in Python.
Any ideas?
python apache-spark pyspark apache-spark-sql
I have a table with N
columns, I want to concatenate them all to a string column and then perform a hash on that column. I have found a similar question in Scala.
I want to do this entirely inside of Spark SQL ideally, I have tried HASH(*) as myhashcolumn
but due to several columns being sometimes null I can't make this work as I would expected.
If I have to create a UDF and register it to make this happen, I need to use Python and not Scala as all my other code is in Python.
Any ideas?
python apache-spark pyspark apache-spark-sql
python apache-spark pyspark apache-spark-sql
edited Nov 26 '18 at 11:49
dataLeo
6431519
6431519
asked Nov 26 '18 at 11:31
Scott BellScott Bell
175
175
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Try below code.
df.select([hash(col) for col in df.columns]).show()
Regards,
Neeraj
add a comment |
You can do it in pyspark likes the following (just pass input columns to the function):
new_df = df.withColumn("contcatenated", hash_function(col("col1"), col("col2"), col("col3")))
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53480184%2fpass-every-column-in-a-row-into-a-hash-function-in-spark-sql%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try below code.
df.select([hash(col) for col in df.columns]).show()
Regards,
Neeraj
add a comment |
Try below code.
df.select([hash(col) for col in df.columns]).show()
Regards,
Neeraj
add a comment |
Try below code.
df.select([hash(col) for col in df.columns]).show()
Regards,
Neeraj
Try below code.
df.select([hash(col) for col in df.columns]).show()
Regards,
Neeraj
answered Nov 26 '18 at 14:57
neeraj bhadanineeraj bhadani
935313
935313
add a comment |
add a comment |
You can do it in pyspark likes the following (just pass input columns to the function):
new_df = df.withColumn("contcatenated", hash_function(col("col1"), col("col2"), col("col3")))
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
add a comment |
You can do it in pyspark likes the following (just pass input columns to the function):
new_df = df.withColumn("contcatenated", hash_function(col("col1"), col("col2"), col("col3")))
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
add a comment |
You can do it in pyspark likes the following (just pass input columns to the function):
new_df = df.withColumn("contcatenated", hash_function(col("col1"), col("col2"), col("col3")))
You can do it in pyspark likes the following (just pass input columns to the function):
new_df = df.withColumn("contcatenated", hash_function(col("col1"), col("col2"), col("col3")))
edited Dec 1 '18 at 15:25
answered Nov 26 '18 at 11:56
OmGOmG
8,49953047
8,49953047
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
add a comment |
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
Thanks, is there to map all columns dynamically? The reason, I'm not listening them inside of my sql is because this doesn't seem possible.
– Scott Bell
Nov 26 '18 at 11:58
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53480184%2fpass-every-column-in-a-row-into-a-hash-function-in-spark-sql%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown