Spiky kubernetes HPA with metric number of pubsub unacked messsages

Currently we have a pipeline of data streaming: api call -> google pub/sub -> BigQuery. The number of api call will depend on the traffic on the website.

We create a kubernetes deployment (in GKE) for ingesting data from pub/sub to BigQuery. This deployment have a horizontal pod autoscaler (HPA) with with metricName: pubsub.googleapis.com|subscription|num_undelivered_messages and targetValue: "5000". This structure able to autoscale when the traffic have a sudden increase. However, it will cause a spiky scaling.

What I meant by spiky is as follows:

The number of unacked messages will go up more than the target value

The autoscaler will increase the number of pods

Since the number of unacked will slowly decrease, but since it is still above target value the autoscaler will still increase the number of pods --> this happen until we hit the max number of pods in the autoscaler

The number of unacked will decrease until it goes below target and it will stay very low

The autoscaler will reduce the number of pods to the minimum number of pods

The number of unacked messages will increase again and will go similar situation with (1) and it will go into a loop/cycle of spikes

Here are the chart when it goes spiky (the traffic is going up but it is stable and non-spiky):
The spiky number of unacknowledged message in pub/sub

We set an alarm in stackdriver if the number of unacknowledged message is more than 20k, and in this situation it will always triggered frequently.

Is there a way so that the HPA become more stable (non-spiky) in this case?

Any comment, suggestion, or answer is well appreciated.

Thanks!

asked Nov 22 '18 at 10:12

Yosua Michael

223

Have you checked this document about 'Autoscaling on metrics not related to Kubernetes objects'? see if that suits your scenario.

– Digil
Nov 23 '18 at 1:59

Yes, I have read the documentation. I use External metric type and have tried both Value and AverageValue. Unfortunately the autoscaling is still very spiky...

– Yosua Michael
Nov 26 '18 at 4:04

Seems like this is a defect within the GKE version. Which version are you using? As per the documentation this issue is already addressed in the kubernetes version 1.12. Hopefully the same will be applied to the latest GKE version. May be GKE 1.12 or latest.

– Digil
Nov 30 '18 at 1:43

Currently I am still using version 1.10.6-gke.11. The latest version of kubernetes that available in GKE is 1.11.3-gke.18. Will try to upgrade it then. Thanks!

– Yosua Michael
Nov 30 '18 at 10:17

add a comment |

Currently we have a pipeline of data streaming: api call -> google pub/sub -> BigQuery. The number of api call will depend on the traffic on the website.

What I meant by spiky is as follows:

The number of unacked messages will go up more than the target value

The autoscaler will increase the number of pods

Since the number of unacked will slowly decrease, but since it is still above target value the autoscaler will still increase the number of pods --> this happen until we hit the max number of pods in the autoscaler

The number of unacked will decrease until it goes below target and it will stay very low

The autoscaler will reduce the number of pods to the minimum number of pods

The number of unacked messages will increase again and will go similar situation with (1) and it will go into a loop/cycle of spikes

Here are the chart when it goes spiky (the traffic is going up but it is stable and non-spiky):
The spiky number of unacknowledged message in pub/sub

We set an alarm in stackdriver if the number of unacknowledged message is more than 20k, and in this situation it will always triggered frequently.

Is there a way so that the HPA become more stable (non-spiky) in this case?

Any comment, suggestion, or answer is well appreciated.

Thanks!

asked Nov 22 '18 at 10:12

Yosua Michael

223

Have you checked this document about 'Autoscaling on metrics not related to Kubernetes objects'? see if that suits your scenario.

– Digil
Nov 23 '18 at 1:59

Yes, I have read the documentation. I use External metric type and have tried both Value and AverageValue. Unfortunately the autoscaling is still very spiky...

– Yosua Michael
Nov 26 '18 at 4:04

Seems like this is a defect within the GKE version. Which version are you using? As per the documentation this issue is already addressed in the kubernetes version 1.12. Hopefully the same will be applied to the latest GKE version. May be GKE 1.12 or latest.

– Digil
Nov 30 '18 at 1:43

Currently I am still using version 1.10.6-gke.11. The latest version of kubernetes that available in GKE is 1.11.3-gke.18. Will try to upgrade it then. Thanks!

– Yosua Michael
Nov 30 '18 at 10:17

add a comment |

Currently we have a pipeline of data streaming: api call -> google pub/sub -> BigQuery. The number of api call will depend on the traffic on the website.

What I meant by spiky is as follows:

The number of unacked messages will go up more than the target value

The autoscaler will increase the number of pods

Since the number of unacked will slowly decrease, but since it is still above target value the autoscaler will still increase the number of pods --> this happen until we hit the max number of pods in the autoscaler

The number of unacked will decrease until it goes below target and it will stay very low

The autoscaler will reduce the number of pods to the minimum number of pods

The number of unacked messages will increase again and will go similar situation with (1) and it will go into a loop/cycle of spikes

Here are the chart when it goes spiky (the traffic is going up but it is stable and non-spiky):
The spiky number of unacknowledged message in pub/sub

We set an alarm in stackdriver if the number of unacknowledged message is more than 20k, and in this situation it will always triggered frequently.

Is there a way so that the HPA become more stable (non-spiky) in this case?

Any comment, suggestion, or answer is well appreciated.

Thanks!

asked Nov 22 '18 at 10:12

Yosua Michael

223

Currently we have a pipeline of data streaming: api call -> google pub/sub -> BigQuery. The number of api call will depend on the traffic on the website.

What I meant by spiky is as follows:

The number of unacked messages will go up more than the target value

The autoscaler will increase the number of pods

Since the number of unacked will slowly decrease, but since it is still above target value the autoscaler will still increase the number of pods --> this happen until we hit the max number of pods in the autoscaler

The number of unacked will decrease until it goes below target and it will stay very low

The autoscaler will reduce the number of pods to the minimum number of pods

The number of unacked messages will increase again and will go similar situation with (1) and it will go into a loop/cycle of spikes

Here are the chart when it goes spiky (the traffic is going up but it is stable and non-spiky):
The spiky number of unacknowledged message in pub/sub

We set an alarm in stackdriver if the number of unacknowledged message is more than 20k, and in this situation it will always triggered frequently.

Is there a way so that the HPA become more stable (non-spiky) in this case?

Any comment, suggestion, or answer is well appreciated.

Thanks!

kubernetes google-cloud-platform autoscaling google-cloud-pubsub google-kubernetes-engine

asked Nov 22 '18 at 10:12

Yosua Michael

223

asked Nov 22 '18 at 10:12

Yosua Michael

223

asked Nov 22 '18 at 10:12

Yosua Michael

223

asked Nov 22 '18 at 10:12

Yosua Michael

223

asked Nov 22 '18 at 10:12

Yosua Michael

223

Have you checked this document about 'Autoscaling on metrics not related to Kubernetes objects'? see if that suits your scenario.

– Digil
Nov 23 '18 at 1:59

Yes, I have read the documentation. I use External metric type and have tried both Value and AverageValue. Unfortunately the autoscaling is still very spiky...

– Yosua Michael
Nov 26 '18 at 4:04

Seems like this is a defect within the GKE version. Which version are you using? As per the documentation this issue is already addressed in the kubernetes version 1.12. Hopefully the same will be applied to the latest GKE version. May be GKE 1.12 or latest.

– Digil
Nov 30 '18 at 1:43

Currently I am still using version 1.10.6-gke.11. The latest version of kubernetes that available in GKE is 1.11.3-gke.18. Will try to upgrade it then. Thanks!

– Yosua Michael
Nov 30 '18 at 10:17

add a comment |

Have you checked this document about 'Autoscaling on metrics not related to Kubernetes objects'? see if that suits your scenario.

– Digil
Nov 23 '18 at 1:59

Yes, I have read the documentation. I use External metric type and have tried both Value and AverageValue. Unfortunately the autoscaling is still very spiky...

– Yosua Michael
Nov 26 '18 at 4:04

Seems like this is a defect within the GKE version. Which version are you using? As per the documentation this issue is already addressed in the kubernetes version 1.12. Hopefully the same will be applied to the latest GKE version. May be GKE 1.12 or latest.

– Digil
Nov 30 '18 at 1:43

Currently I am still using version 1.10.6-gke.11. The latest version of kubernetes that available in GKE is 1.11.3-gke.18. Will try to upgrade it then. Thanks!

– Yosua Michael
Nov 30 '18 at 10:17

Have you checked this document about 'Autoscaling on metrics not related to Kubernetes objects'? see if that suits your scenario.

– Digil
Nov 23 '18 at 1:59

Yes, I have read the documentation. I use External metric type and have tried both Value and AverageValue. Unfortunately the autoscaling is still very spiky...

– Yosua Michael
Nov 26 '18 at 4:04

Seems like this is a defect within the GKE version. Which version are you using? As per the documentation this issue is already addressed in the kubernetes version 1.12. Hopefully the same will be applied to the latest GKE version. May be GKE 1.12 or latest.

– Digil
Nov 30 '18 at 1:43

Currently I am still using version 1.10.6-gke.11. The latest version of kubernetes that available in GKE is 1.11.3-gke.18. Will try to upgrade it then. Thanks!

– Yosua Michael
Nov 30 '18 at 10:17

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428544%2fspiky-kubernetes-hpa-with-metric-number-of-pubsub-unacked-messsages%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk