How to penalize for empty fields in a DataFrame?

I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.

So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.

asked 3 hours ago

jatrp5

111

New contributor

add a comment |

asked 3 hours ago

jatrp5

111

New contributor

add a comment |

asked 3 hours ago

jatrp5

111

New contributor

pandas data

asked 3 hours ago

jatrp5

111

New contributor

asked 3 hours ago

jatrp5

111

New contributor

asked 3 hours ago

jatrp5

111

New contributor

asked 3 hours ago

jatrp5

111

asked 3 hours ago

jatrp5

111

New contributor

jatrp5 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

This heavily depends on the domain knowledge. A general approach would be to place

A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)text{max}(sigma_c)$ or $(1 + m)text{avg}(sigma_c)$ respectively, for the null values at that circuit, or

A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)text{max}(sigma_d)$ or $(1 + m)text{avg}(sigma_d)$ respectively, for their unfinished races, or

A multiplicative of average of driver and circuit worst consistencies, i.e. $(1 + m)[text{max}(sigma_d) + text{max}(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.

No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either

Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or

By trying a range of values like $m in {-0.2, -0.1, 0, 0.1, 0.2, .., 0.5}$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

add a comment |

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This heavily depends on the domain knowledge. A general approach would be to place

A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)text{max}(sigma_c)$ or $(1 + m)text{avg}(sigma_c)$ respectively, for the null values at that circuit, or

A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)text{max}(sigma_d)$ or $(1 + m)text{avg}(sigma_d)$ respectively, for their unfinished races, or

A multiplicative of average of driver and circuit worst consistencies, i.e. $(1 + m)[text{max}(sigma_d) + text{max}(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.

No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either

Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or

By trying a range of values like $m in {-0.2, -0.1, 0, 0.1, 0.2, .., 0.5}$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

add a comment |

This heavily depends on the domain knowledge. A general approach would be to place

A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)text{max}(sigma_c)$ or $(1 + m)text{avg}(sigma_c)$ respectively, for the null values at that circuit, or

A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)text{max}(sigma_d)$ or $(1 + m)text{avg}(sigma_d)$ respectively, for their unfinished races, or

A multiplicative of average of driver and circuit worst consistencies, i.e. $(1 + m)[text{max}(sigma_d) + text{max}(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.

No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either

Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or

By trying a range of values like $m in {-0.2, -0.1, 0, 0.1, 0.2, .., 0.5}$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

add a comment |

This heavily depends on the domain knowledge. A general approach would be to place

A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)text{max}(sigma_c)$ or $(1 + m)text{avg}(sigma_c)$ respectively, for the null values at that circuit, or

A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)text{max}(sigma_d)$ or $(1 + m)text{avg}(sigma_d)$ respectively, for their unfinished races, or

A multiplicative of average of driver and circuit worst consistencies, i.e. $(1 + m)[text{max}(sigma_d) + text{max}(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.

No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either

Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or

By trying a range of values like $m in {-0.2, -0.1, 0, 0.1, 0.2, .., 0.5}$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

This heavily depends on the domain knowledge. A general approach would be to place

A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)text{max}(sigma_c)$ or $(1 + m)text{avg}(sigma_c)$ respectively, for the null values at that circuit, or

A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)text{max}(sigma_d)$ or $(1 + m)text{avg}(sigma_d)$ respectively, for their unfinished races, or

A multiplicative of average of driver and circuit worst consistencies, i.e. $(1 + m)[text{max}(sigma_d) + text{max}(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.

No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either

Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or

By trying a range of values like $m in {-0.2, -0.1, 0, 0.1, 0.2, .., 0.5}$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

edited 38 mins ago

answered 2 hours ago

Esmailian

2,222218

answered 2 hours ago

Esmailian

2,222218

answered 2 hours ago

Esmailian

2,222218

add a comment |

jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

jatrp5 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk