Perf Monitoring for rdtsc dynamically
Is there a way to monitor for assembly instructions in "real-time" dynamically using perf?
I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf?
I am using Debian 9 on Skylake and also on Haswell.
sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
sudo /proc/config.gz returns command not found Any help/ideas are appreciated.
assembly profiling monitoring perf rdtsc
add a comment |
Is there a way to monitor for assembly instructions in "real-time" dynamically using perf?
I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf?
I am using Debian 9 on Skylake and also on Haswell.
sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
sudo /proc/config.gz returns command not found Any help/ideas are appreciated.
assembly profiling monitoring perf rdtsc
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
1
sounds very much like this question
– Zulan
Nov 20 at 11:25
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
There aren't perf events for most specific instructions. For division, there'sarith.divider_active
. You might find microcoded instructions withidq.ms_cycles
, but that isn't specific tordtsc
orclflush
. (I'm not sureclflush
is more than 4 uops, though.)
– Peter Cordes
Nov 20 at 17:26
No, you can't do that usingperf
.
– Hadi Brais
Nov 20 at 19:34
add a comment |
Is there a way to monitor for assembly instructions in "real-time" dynamically using perf?
I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf?
I am using Debian 9 on Skylake and also on Haswell.
sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
sudo /proc/config.gz returns command not found Any help/ideas are appreciated.
assembly profiling monitoring perf rdtsc
Is there a way to monitor for assembly instructions in "real-time" dynamically using perf?
I have seen that if I use perf record /perf top and then click on the recorded functions, I see the assembly instructions, but can I directly monitor specific assembly instructions e.g., rdtsc or clflush e.g., how often they are called by a process within certain period using perf?
I am using Debian 9 on Skylake and also on Haswell.
sudo uname -a Linux bla 4.9.0-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
sudo /proc/config.gz returns command not found Any help/ideas are appreciated.
assembly profiling monitoring perf rdtsc
assembly profiling monitoring perf rdtsc
asked Nov 20 at 9:56
assembly_question
92
92
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
1
sounds very much like this question
– Zulan
Nov 20 at 11:25
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
There aren't perf events for most specific instructions. For division, there'sarith.divider_active
. You might find microcoded instructions withidq.ms_cycles
, but that isn't specific tordtsc
orclflush
. (I'm not sureclflush
is more than 4 uops, though.)
– Peter Cordes
Nov 20 at 17:26
No, you can't do that usingperf
.
– Hadi Brais
Nov 20 at 19:34
add a comment |
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
1
sounds very much like this question
– Zulan
Nov 20 at 11:25
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
There aren't perf events for most specific instructions. For division, there'sarith.divider_active
. You might find microcoded instructions withidq.ms_cycles
, but that isn't specific tordtsc
orclflush
. (I'm not sureclflush
is more than 4 uops, though.)
– Peter Cordes
Nov 20 at 17:26
No, you can't do that usingperf
.
– Hadi Brais
Nov 20 at 19:34
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
1
1
sounds very much like this question
– Zulan
Nov 20 at 11:25
sounds very much like this question
– Zulan
Nov 20 at 11:25
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
There aren't perf events for most specific instructions. For division, there's
arith.divider_active
. You might find microcoded instructions with idq.ms_cycles
, but that isn't specific to rdtsc
or clflush
. (I'm not sure clflush
is more than 4 uops, though.)– Peter Cordes
Nov 20 at 17:26
There aren't perf events for most specific instructions. For division, there's
arith.divider_active
. You might find microcoded instructions with idq.ms_cycles
, but that isn't specific to rdtsc
or clflush
. (I'm not sure clflush
is more than 4 uops, though.)– Peter Cordes
Nov 20 at 17:26
No, you can't do that using
perf
.– Hadi Brais
Nov 20 at 19:34
No, you can't do that using
perf
.– Hadi Brais
Nov 20 at 19:34
add a comment |
1 Answer
1
active
oldest
votes
Yes, you could certainly build something that dynamically samples instructions running on the host.
The basic idea is to periodically sample the processes you are interested in (which could be "all of them"), and examine the area around the sampled instruction pointer to determine the instructions that must have been executed for such a sample to have existed: for example, by disassembling until the next conditional branch, or perhaps just until the end of the basic block.
Doing this repeatedly you'll get a histogram of executed instructions, and you can then estimate how often rdtsc
or any other instruction of interest is operating.
This isn't even actually all that difficult: most of the logic already exists in perf top
, perf record
and perf report
: just combine the sampling code from perf top
with the annotation code from perf top
and/or perf report
as described above. Perhaps you can even do it after the fact: use perf record --all-cpus
to gather the samples and then run perf script
or otherwise parse the file to monitor the instructions.
Each sample will only give you a small window of executed instructions, so if you need to catch the occasional rdtsc
accurately, this won't work at all.
You could extend the "window" for each sample by exploiting the "last branch record" feature to essentially go back in time based on the most recent branches, and disassemble all those basic blocks, which would extend your coverage per sample by a lot.
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390392%2fperf-monitoring-for-rdtsc-dynamically%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, you could certainly build something that dynamically samples instructions running on the host.
The basic idea is to periodically sample the processes you are interested in (which could be "all of them"), and examine the area around the sampled instruction pointer to determine the instructions that must have been executed for such a sample to have existed: for example, by disassembling until the next conditional branch, or perhaps just until the end of the basic block.
Doing this repeatedly you'll get a histogram of executed instructions, and you can then estimate how often rdtsc
or any other instruction of interest is operating.
This isn't even actually all that difficult: most of the logic already exists in perf top
, perf record
and perf report
: just combine the sampling code from perf top
with the annotation code from perf top
and/or perf report
as described above. Perhaps you can even do it after the fact: use perf record --all-cpus
to gather the samples and then run perf script
or otherwise parse the file to monitor the instructions.
Each sample will only give you a small window of executed instructions, so if you need to catch the occasional rdtsc
accurately, this won't work at all.
You could extend the "window" for each sample by exploiting the "last branch record" feature to essentially go back in time based on the most recent branches, and disassemble all those basic blocks, which would extend your coverage per sample by a lot.
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
add a comment |
Yes, you could certainly build something that dynamically samples instructions running on the host.
The basic idea is to periodically sample the processes you are interested in (which could be "all of them"), and examine the area around the sampled instruction pointer to determine the instructions that must have been executed for such a sample to have existed: for example, by disassembling until the next conditional branch, or perhaps just until the end of the basic block.
Doing this repeatedly you'll get a histogram of executed instructions, and you can then estimate how often rdtsc
or any other instruction of interest is operating.
This isn't even actually all that difficult: most of the logic already exists in perf top
, perf record
and perf report
: just combine the sampling code from perf top
with the annotation code from perf top
and/or perf report
as described above. Perhaps you can even do it after the fact: use perf record --all-cpus
to gather the samples and then run perf script
or otherwise parse the file to monitor the instructions.
Each sample will only give you a small window of executed instructions, so if you need to catch the occasional rdtsc
accurately, this won't work at all.
You could extend the "window" for each sample by exploiting the "last branch record" feature to essentially go back in time based on the most recent branches, and disassemble all those basic blocks, which would extend your coverage per sample by a lot.
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
add a comment |
Yes, you could certainly build something that dynamically samples instructions running on the host.
The basic idea is to periodically sample the processes you are interested in (which could be "all of them"), and examine the area around the sampled instruction pointer to determine the instructions that must have been executed for such a sample to have existed: for example, by disassembling until the next conditional branch, or perhaps just until the end of the basic block.
Doing this repeatedly you'll get a histogram of executed instructions, and you can then estimate how often rdtsc
or any other instruction of interest is operating.
This isn't even actually all that difficult: most of the logic already exists in perf top
, perf record
and perf report
: just combine the sampling code from perf top
with the annotation code from perf top
and/or perf report
as described above. Perhaps you can even do it after the fact: use perf record --all-cpus
to gather the samples and then run perf script
or otherwise parse the file to monitor the instructions.
Each sample will only give you a small window of executed instructions, so if you need to catch the occasional rdtsc
accurately, this won't work at all.
You could extend the "window" for each sample by exploiting the "last branch record" feature to essentially go back in time based on the most recent branches, and disassemble all those basic blocks, which would extend your coverage per sample by a lot.
Yes, you could certainly build something that dynamically samples instructions running on the host.
The basic idea is to periodically sample the processes you are interested in (which could be "all of them"), and examine the area around the sampled instruction pointer to determine the instructions that must have been executed for such a sample to have existed: for example, by disassembling until the next conditional branch, or perhaps just until the end of the basic block.
Doing this repeatedly you'll get a histogram of executed instructions, and you can then estimate how often rdtsc
or any other instruction of interest is operating.
This isn't even actually all that difficult: most of the logic already exists in perf top
, perf record
and perf report
: just combine the sampling code from perf top
with the annotation code from perf top
and/or perf report
as described above. Perhaps you can even do it after the fact: use perf record --all-cpus
to gather the samples and then run perf script
or otherwise parse the file to monitor the instructions.
Each sample will only give you a small window of executed instructions, so if you need to catch the occasional rdtsc
accurately, this won't work at all.
You could extend the "window" for each sample by exploiting the "last branch record" feature to essentially go back in time based on the most recent branches, and disassemble all those basic blocks, which would extend your coverage per sample by a lot.
answered Nov 22 at 3:33
BeeOnRope
24.8k875170
24.8k875170
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
add a comment |
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
Hi, thanks for the answer! The thing is that I do not know how to get to this instruction to count it. I see that rdtsc has been used if I use perf top and click on the process that has called her and see the corresponding assembly code and there I see the instruction, but if I want to derive statistics for that with the --all-cpus option, I do not know what should I parse. With the perf script it was not showing it directly. Could you give me a tip about that?
– assembly_question
Nov 30 at 13:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390392%2fperf-monitoring-for-rdtsc-dynamically%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you want to limit your self to a process, use a BDI tool (like frida, dynamoRio or even rr). System-wide I don't think it's possible with reasonable performance.
– Margaret Bloom
Nov 20 at 10:04
1
sounds very much like this question
– Zulan
Nov 20 at 11:25
Thanks for the answers. However, I do not have a specific process and do not want to do binary instrumentation. If I use performance counters monitoring e.g., with perf I can see the rdtsc during runtime, but only if I manually click on the function that calls it. That is why I was thinking that there might be a way.. setting some registers or whatever
– assembly_question
Nov 20 at 11:47
There aren't perf events for most specific instructions. For division, there's
arith.divider_active
. You might find microcoded instructions withidq.ms_cycles
, but that isn't specific tordtsc
orclflush
. (I'm not sureclflush
is more than 4 uops, though.)– Peter Cordes
Nov 20 at 17:26
No, you can't do that using
perf
.– Hadi Brais
Nov 20 at 19:34