Metric for evaluating predicted bounding boxes from semantic segmentation on an object level outside of...
Context
For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.
In other words, a single image might look like:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
and have dimensions [3, w].
then for a given image with w=10 and n=3 its labels ground truth might be:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
and our model might predict as output:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
compared to the objects of the ground truth:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
Question
If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?
I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?
Goal
I would like a metric that, per class, gives me something like this:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
but I am not sure how to best approach this when a single object is fragmented into several...
python tensorflow machine-learning computer-vision semantic-segmentation
add a comment |
Context
For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.
In other words, a single image might look like:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
and have dimensions [3, w].
then for a given image with w=10 and n=3 its labels ground truth might be:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
and our model might predict as output:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
compared to the objects of the ground truth:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
Question
If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?
I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?
Goal
I would like a metric that, per class, gives me something like this:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
but I am not sure how to best approach this when a single object is fragmented into several...
python tensorflow machine-learning computer-vision semantic-segmentation
add a comment |
Context
For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.
In other words, a single image might look like:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
and have dimensions [3, w].
then for a given image with w=10 and n=3 its labels ground truth might be:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
and our model might predict as output:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
compared to the objects of the ground truth:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
Question
If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?
I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?
Goal
I would like a metric that, per class, gives me something like this:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
but I am not sure how to best approach this when a single object is fragmented into several...
python tensorflow machine-learning computer-vision semantic-segmentation
Context
For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.
In other words, a single image might look like:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
and have dimensions [3, w].
then for a given image with w=10 and n=3 its labels ground truth might be:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
and our model might predict as output:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
compared to the objects of the ground truth:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
Question
If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?
I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?
Goal
I would like a metric that, per class, gives me something like this:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
but I am not sure how to best approach this when a single object is fragmented into several...
python tensorflow machine-learning computer-vision semantic-segmentation
python tensorflow machine-learning computer-vision semantic-segmentation
asked Nov 20 at 9:40
SumNeuron
1,115823
1,115823
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390093%2fmetric-for-evaluating-predicted-bounding-boxes-from-semantic-segmentation-on-an%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.
add a comment |
Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.
add a comment |
Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.
Assumption
You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.
Let us assume you have two ground truth bounding boxes: box 1 and box 2.
Further, let us assume that our model is not so great and predicts more than 2 boxes
(maybe it found something new, maybe it broke one box into two).
For this demonstration let us consider that this is what we are working with:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
Core Problem
What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
belong to which predictions?
Pick your distance function of choice and pair each prediction with a target based on that function.
In this case I will use a modified intersection over union (IOU) for the 1D case.
I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.
With a score for each prediction, pair it with the target that produced the best score.
Now with a one-to-one prediction-target pair, calculate whatever it is that you want.
Demo with above assumption
from the above assumptions:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
a 1d version of intersection over union:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
some simple helpers:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
how we align our boundaries:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
how we calculate difference:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
combine it all into one:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
run:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.
Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.
Solution to question asked
copy pasted your examples:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
since you know the boxes per class this is easy:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
Does this output make sense?
Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.
How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.
Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).
If we were to draw what the score suggests it would be:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
It doesn't look so great, but this is just an example...
Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...
Conclusion
Is this a faulty metric?
Depends on what you are using it for / trying to show.
However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.
answered Nov 21 at 20:38
SumNeuron
1,115823
1,115823
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390093%2fmetric-for-evaluating-predicted-bounding-boxes-from-semantic-segmentation-on-an%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown