Metric for evaluating predicted bounding boxes from semantic segmentation on an object level outside of...












0














Context



For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.



In other words, a single image might look like:



img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]


and have dimensions [3, w].



then for a given image with w=10 and n=3 its labels ground truth might be:



# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])


and our model might predict as output:



# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])


for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9



# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])


Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:



# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]


compared to the objects of the ground truth:



# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]


Question



If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?



I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?



Goal



I would like a metric that, per class, gives me something like this:



class 1: [-1, 2]  # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should

class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should

class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon


but I am not sure how to best approach this when a single object is fragmented into several...










share|improve this question



























    0














    Context



    For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.



    In other words, a single image might look like:



    img = [
    [r1, r2, ..., rw], # channel r
    [g1, g2, ..., gw], # channel g
    [b1, b2, ..., bw], # channel b
    ]


    and have dimensions [3, w].



    then for a given image with w=10 and n=3 its labels ground truth might be:



    # ground "truth"
    target = np.array([
    #0 1 2 3 4 5 6 7 8 9 # position
    [0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
    [0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
    [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
    ])


    and our model might predict as output:



    # prediction
    output = np.array([
    #0 1 2 3 4 5 6 7 8 9 # position
    [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
    [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
    [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
    ])


    for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9



    # binary mask with cutoff 0.9
    b_mask = np.array([
    #0 1 2 3 4 5 6 7 8 9 # position
    [0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
    [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
    [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
    ])


    Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:



    # "detected" objects
    p_obj = [
    [[2, 3], [8, 8]], # class 1
    [[4, 4], [6, 7]], # class 2
    [[0, 0]] # class 3
    ]


    compared to the objects of the ground truth:



    # true objects
    t_obj = [
    [[1, 3], [6, 9]], # class 1
    [[4, 7]], # class 2
    [[0, 0]] # class 3
    ]


    Question



    If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?



    I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?



    Goal



    I would like a metric that, per class, gives me something like this:



    class 1: [-1, 2]  # bounding boxes for class one, on average start one
    # pixel before they should and end two pixels after
    # they should

    class 2: [ 0, 3] # bounding boxes for class two, on average start
    # exactly where they should and end three pixels
    # after they should

    class 3: [ 3, -1] # bounding boxes for class three, on average start
    # three pixels after where they begin and end one
    # pixels too soon


    but I am not sure how to best approach this when a single object is fragmented into several...










    share|improve this question

























      0












      0








      0







      Context



      For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.



      In other words, a single image might look like:



      img = [
      [r1, r2, ..., rw], # channel r
      [g1, g2, ..., gw], # channel g
      [b1, b2, ..., bw], # channel b
      ]


      and have dimensions [3, w].



      then for a given image with w=10 and n=3 its labels ground truth might be:



      # ground "truth"
      target = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
      [0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
      [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
      ])


      and our model might predict as output:



      # prediction
      output = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
      [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
      [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
      ])


      for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9



      # binary mask with cutoff 0.9
      b_mask = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
      [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
      [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
      ])


      Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:



      # "detected" objects
      p_obj = [
      [[2, 3], [8, 8]], # class 1
      [[4, 4], [6, 7]], # class 2
      [[0, 0]] # class 3
      ]


      compared to the objects of the ground truth:



      # true objects
      t_obj = [
      [[1, 3], [6, 9]], # class 1
      [[4, 7]], # class 2
      [[0, 0]] # class 3
      ]


      Question



      If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?



      I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?



      Goal



      I would like a metric that, per class, gives me something like this:



      class 1: [-1, 2]  # bounding boxes for class one, on average start one
      # pixel before they should and end two pixels after
      # they should

      class 2: [ 0, 3] # bounding boxes for class two, on average start
      # exactly where they should and end three pixels
      # after they should

      class 3: [ 3, -1] # bounding boxes for class three, on average start
      # three pixels after where they begin and end one
      # pixels too soon


      but I am not sure how to best approach this when a single object is fragmented into several...










      share|improve this question













      Context



      For simplicity let us pretend we are performing semantic segmentation on a series of one pixel high images of width w with three channels (r, g, b) with n label classes.



      In other words, a single image might look like:



      img = [
      [r1, r2, ..., rw], # channel r
      [g1, g2, ..., gw], # channel g
      [b1, b2, ..., bw], # channel b
      ]


      and have dimensions [3, w].



      then for a given image with w=10 and n=3 its labels ground truth might be:



      # ground "truth"
      target = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
      [0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
      [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
      ])


      and our model might predict as output:



      # prediction
      output = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
      [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
      [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
      ])


      for further simplicity, let us transform our model's output by binarizing it with a cutoff of 0.9



      # binary mask with cutoff 0.9
      b_mask = np.array([
      #0 1 2 3 4 5 6 7 8 9 # position
      [0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
      [0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
      [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
      ])


      Then if we were to look at the "objects" of each class the bounding boxes (or in this case just boundaries i.e. [start, stop] pixels) our predicted objects from the binary mask "introduce" an object:



      # "detected" objects
      p_obj = [
      [[2, 3], [8, 8]], # class 1
      [[4, 4], [6, 7]], # class 2
      [[0, 0]] # class 3
      ]


      compared to the objects of the ground truth:



      # true objects
      t_obj = [
      [[1, 3], [6, 9]], # class 1
      [[4, 7]], # class 2
      [[0, 0]] # class 3
      ]


      Question



      If I wanted a metric to describe the accuracy of the boundaries, on average, per object, what would be the appropriate metric?



      I understand IOU in the context of training a model which predicts bounding boxes, e.g. it is an object to object comparison, but what should one do when one object might be fragmented into several?



      Goal



      I would like a metric that, per class, gives me something like this:



      class 1: [-1, 2]  # bounding boxes for class one, on average start one
      # pixel before they should and end two pixels after
      # they should

      class 2: [ 0, 3] # bounding boxes for class two, on average start
      # exactly where they should and end three pixels
      # after they should

      class 3: [ 3, -1] # bounding boxes for class three, on average start
      # three pixels after where they begin and end one
      # pixels too soon


      but I am not sure how to best approach this when a single object is fragmented into several...







      python tensorflow machine-learning computer-vision semantic-segmentation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 at 9:40









      SumNeuron

      1,115823




      1,115823
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Assumption



          You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.



          Let us assume you have two ground truth bounding boxes: box 1 and box 2.



          Further, let us assume that our model is not so great and predicts more than 2 boxes
          (maybe it found something new, maybe it broke one box into two).



          For this demonstration let us consider that this is what we are working with:



          # labels
          # box 1: x----y
          # box 2: x++++y
          # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
          # x--------y x+++++++++++++++++++++++++++++y TRUTH
          # a-----------b PRED 1, BOX 1
          # a+++++++++++++++++b PRED 2, BOX 2
          # a++++++++++++++++++++++++++++++++b PRED 3, BOX 2


          Core Problem



          What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
          belong to which predictions?



          Pick your distance function of choice and pair each prediction with a target based on that function.
          In this case I will use a modified intersection over union (IOU) for the 1D case.
          I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.



          With a score for each prediction, pair it with the target that produced the best score.



          Now with a one-to-one prediction-target pair, calculate whatever it is that you want.



          Demo with above assumption



          from the above assumptions:



          pred_boxes = [
          [4, 8],
          [6, 12],
          [5, 16]
          ]

          true_boxes = [
          [4, 7],
          [10, 20]
          ]


          a 1d version of intersection over union:



          def iou_1d(predicted_boundary, target_boundary):
          '''Calculates the intersection over union (IOU) based on a span.

          Notes:
          boundaries are provided in the the form of [start, stop].
          boundaries where start = stop are accepted
          boundaries are assumed to be only in range [0, int < inf)

          Args:
          predicted_boundary (list): the [start, stop] of the predicted boundary
          target_boundary (list): the ground truth [start, stop] for which to compare

          Returns:
          iou (float): the IOU bounded in [0, 1]
          '''

          p_lower, p_upper = predicted_boundary
          t_lower, t_upper = target_boundary

          # boundaries are in form [start, stop] and 0<= start <= stop
          assert 0<= p_lower <= p_upper
          assert 0<= t_lower <= t_upper

          # no overlap, pred is too far left or pred is too far right
          if p_upper < t_lower or p_lower > t_upper:
          return 0

          if predicted_boundary == target_boundary:
          return 1

          intersection_lower_bound = max(p_lower, t_lower)
          intersection_upper_bound = min(p_upper, t_upper)


          intersection = intersection_upper_bound - intersection_lower_bound
          union = max(t_upper, p_upper) - min(t_lower, p_lower)
          union = union if union != 0 else 1
          return min(intersection / union, 1)


          some simple helpers:



          from math import sqrt
          def euclidean(u, v):
          return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

          def mean(arr):
          return sum(arr) / len(arr)


          how we align our boundaries:



          def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
          '''Aligns predicted_bondary to the closest target_boundary based on the
          alignment_scoring_fn

          Args:
          predicted_boundary (list): the predicted boundary in form of [start, stop]

          target_boundaries (list): a list of all valid target boundaries each having
          form [start, stop]

          alignment_scoring_fn (function): a function taking two arguments each of
          which is a list of two elements, the first assumed to be the predicted
          boundary and the latter the target boundary. Should return a single number.

          take (function): should either be min or max. Selects either the highest or
          lower score according to the alignment_scoring_fn

          Returns:
          aligned_boundary (list): the aligned boundary in form [start, stop]
          '''
          scores = [
          alignment_scoring_fn(predicted_boundary, target_boundary)
          for target_boundary in target_boundaries
          ]



          # boundary did not align to any boxes, use fallback scoring mechanism to break
          # tie
          if not any(scores):
          scores = [
          1 / euclidean(predicted_boundary, target_boundary)
          for target_boundary in target_boundaries
          ]

          aligned_index = scores.index(take(scores))
          aligned = target_boundaries[aligned_index]
          return aligned


          how we calculate difference:



          def diff(u, v):
          return [u[0] - v[0], u[1] - v[1]]


          combine it all into one:



          def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
          '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

          Args:
          predicted_boundaries (list): a list of all valid target boundaries each
          having form [start, stop]

          target_boundaries (list): a list of all valid target boundaries each having
          form [start, stop]

          alignment_scoring_fn (function): a function taking two arguments each of
          which is a list of two elements, the first assumed to be the predicted
          boundary and the latter the target boundary. Should return a single number.

          take (function): should either be min or max. Selects either the highest or
          lower score according to the alignment_scoring_fn

          distance_fn (function): a function taking two lists and should return a
          single value.

          aggregate_fn (function): a function taking a list of numbers (distances
          calculated by distance_fn) and returns a single value (the aggregated
          distance)

          Returns:
          aggregated_distnace (float): return the aggregated distance of the
          aligned predicted_boundaries

          aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
          '''


          paired = [
          (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
          for predicted_boundary in predicted_boundaries
          ]
          distances = [distance_fn(*pair) for pair in paired]
          aggregated = [aggregate_fn(error) for error in zip(*distances)]
          return aggregated


          run:



          aligned_distance_1d(pred_boxes, true_boxes)

          # [-3.0, -3.6666666666666665]


          Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.



          Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.



          Solution to question asked



          copy pasted your examples:



          # "detected" objects
          p_obj = [
          [[2, 3], [8, 8]], # class 1
          [[4, 4], [6, 7]], # class 2
          [[0, 0]] # class 3
          ]

          # true objects
          t_obj = [
          [[1, 3], [6, 9]], # class 1
          [[4, 7]], # class 2
          [[0, 0]] # class 3
          ]


          since you know the boxes per class this is easy:



          [
          aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
          for cls_no in range(len(t_obj))
          ]


          # [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]


          Does this output make sense?



          Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.



          How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.



          Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).



          If we were to draw what the score suggests it would be:



          #  0  1  2  3  4  5  6  7  8  9
          # ---------- # truth [4, 7]
          # ++ # pred [4 + 1, 7 - 1.5]


          It doesn't look so great, but this is just an example...



          Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
          No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...



          Conclusion



          Is this a faulty metric?



          Depends on what you are using it for / trying to show.
          However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390093%2fmetric-for-evaluating-predicted-bounding-boxes-from-semantic-segmentation-on-an%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Assumption



            You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.



            Let us assume you have two ground truth bounding boxes: box 1 and box 2.



            Further, let us assume that our model is not so great and predicts more than 2 boxes
            (maybe it found something new, maybe it broke one box into two).



            For this demonstration let us consider that this is what we are working with:



            # labels
            # box 1: x----y
            # box 2: x++++y
            # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
            # x--------y x+++++++++++++++++++++++++++++y TRUTH
            # a-----------b PRED 1, BOX 1
            # a+++++++++++++++++b PRED 2, BOX 2
            # a++++++++++++++++++++++++++++++++b PRED 3, BOX 2


            Core Problem



            What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
            belong to which predictions?



            Pick your distance function of choice and pair each prediction with a target based on that function.
            In this case I will use a modified intersection over union (IOU) for the 1D case.
            I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.



            With a score for each prediction, pair it with the target that produced the best score.



            Now with a one-to-one prediction-target pair, calculate whatever it is that you want.



            Demo with above assumption



            from the above assumptions:



            pred_boxes = [
            [4, 8],
            [6, 12],
            [5, 16]
            ]

            true_boxes = [
            [4, 7],
            [10, 20]
            ]


            a 1d version of intersection over union:



            def iou_1d(predicted_boundary, target_boundary):
            '''Calculates the intersection over union (IOU) based on a span.

            Notes:
            boundaries are provided in the the form of [start, stop].
            boundaries where start = stop are accepted
            boundaries are assumed to be only in range [0, int < inf)

            Args:
            predicted_boundary (list): the [start, stop] of the predicted boundary
            target_boundary (list): the ground truth [start, stop] for which to compare

            Returns:
            iou (float): the IOU bounded in [0, 1]
            '''

            p_lower, p_upper = predicted_boundary
            t_lower, t_upper = target_boundary

            # boundaries are in form [start, stop] and 0<= start <= stop
            assert 0<= p_lower <= p_upper
            assert 0<= t_lower <= t_upper

            # no overlap, pred is too far left or pred is too far right
            if p_upper < t_lower or p_lower > t_upper:
            return 0

            if predicted_boundary == target_boundary:
            return 1

            intersection_lower_bound = max(p_lower, t_lower)
            intersection_upper_bound = min(p_upper, t_upper)


            intersection = intersection_upper_bound - intersection_lower_bound
            union = max(t_upper, p_upper) - min(t_lower, p_lower)
            union = union if union != 0 else 1
            return min(intersection / union, 1)


            some simple helpers:



            from math import sqrt
            def euclidean(u, v):
            return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

            def mean(arr):
            return sum(arr) / len(arr)


            how we align our boundaries:



            def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
            '''Aligns predicted_bondary to the closest target_boundary based on the
            alignment_scoring_fn

            Args:
            predicted_boundary (list): the predicted boundary in form of [start, stop]

            target_boundaries (list): a list of all valid target boundaries each having
            form [start, stop]

            alignment_scoring_fn (function): a function taking two arguments each of
            which is a list of two elements, the first assumed to be the predicted
            boundary and the latter the target boundary. Should return a single number.

            take (function): should either be min or max. Selects either the highest or
            lower score according to the alignment_scoring_fn

            Returns:
            aligned_boundary (list): the aligned boundary in form [start, stop]
            '''
            scores = [
            alignment_scoring_fn(predicted_boundary, target_boundary)
            for target_boundary in target_boundaries
            ]



            # boundary did not align to any boxes, use fallback scoring mechanism to break
            # tie
            if not any(scores):
            scores = [
            1 / euclidean(predicted_boundary, target_boundary)
            for target_boundary in target_boundaries
            ]

            aligned_index = scores.index(take(scores))
            aligned = target_boundaries[aligned_index]
            return aligned


            how we calculate difference:



            def diff(u, v):
            return [u[0] - v[0], u[1] - v[1]]


            combine it all into one:



            def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
            '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

            Args:
            predicted_boundaries (list): a list of all valid target boundaries each
            having form [start, stop]

            target_boundaries (list): a list of all valid target boundaries each having
            form [start, stop]

            alignment_scoring_fn (function): a function taking two arguments each of
            which is a list of two elements, the first assumed to be the predicted
            boundary and the latter the target boundary. Should return a single number.

            take (function): should either be min or max. Selects either the highest or
            lower score according to the alignment_scoring_fn

            distance_fn (function): a function taking two lists and should return a
            single value.

            aggregate_fn (function): a function taking a list of numbers (distances
            calculated by distance_fn) and returns a single value (the aggregated
            distance)

            Returns:
            aggregated_distnace (float): return the aggregated distance of the
            aligned predicted_boundaries

            aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
            '''


            paired = [
            (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
            for predicted_boundary in predicted_boundaries
            ]
            distances = [distance_fn(*pair) for pair in paired]
            aggregated = [aggregate_fn(error) for error in zip(*distances)]
            return aggregated


            run:



            aligned_distance_1d(pred_boxes, true_boxes)

            # [-3.0, -3.6666666666666665]


            Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.



            Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.



            Solution to question asked



            copy pasted your examples:



            # "detected" objects
            p_obj = [
            [[2, 3], [8, 8]], # class 1
            [[4, 4], [6, 7]], # class 2
            [[0, 0]] # class 3
            ]

            # true objects
            t_obj = [
            [[1, 3], [6, 9]], # class 1
            [[4, 7]], # class 2
            [[0, 0]] # class 3
            ]


            since you know the boxes per class this is easy:



            [
            aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
            for cls_no in range(len(t_obj))
            ]


            # [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]


            Does this output make sense?



            Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.



            How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.



            Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).



            If we were to draw what the score suggests it would be:



            #  0  1  2  3  4  5  6  7  8  9
            # ---------- # truth [4, 7]
            # ++ # pred [4 + 1, 7 - 1.5]


            It doesn't look so great, but this is just an example...



            Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
            No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...



            Conclusion



            Is this a faulty metric?



            Depends on what you are using it for / trying to show.
            However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.






            share|improve this answer


























              0














              Assumption



              You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.



              Let us assume you have two ground truth bounding boxes: box 1 and box 2.



              Further, let us assume that our model is not so great and predicts more than 2 boxes
              (maybe it found something new, maybe it broke one box into two).



              For this demonstration let us consider that this is what we are working with:



              # labels
              # box 1: x----y
              # box 2: x++++y
              # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
              # x--------y x+++++++++++++++++++++++++++++y TRUTH
              # a-----------b PRED 1, BOX 1
              # a+++++++++++++++++b PRED 2, BOX 2
              # a++++++++++++++++++++++++++++++++b PRED 3, BOX 2


              Core Problem



              What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
              belong to which predictions?



              Pick your distance function of choice and pair each prediction with a target based on that function.
              In this case I will use a modified intersection over union (IOU) for the 1D case.
              I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.



              With a score for each prediction, pair it with the target that produced the best score.



              Now with a one-to-one prediction-target pair, calculate whatever it is that you want.



              Demo with above assumption



              from the above assumptions:



              pred_boxes = [
              [4, 8],
              [6, 12],
              [5, 16]
              ]

              true_boxes = [
              [4, 7],
              [10, 20]
              ]


              a 1d version of intersection over union:



              def iou_1d(predicted_boundary, target_boundary):
              '''Calculates the intersection over union (IOU) based on a span.

              Notes:
              boundaries are provided in the the form of [start, stop].
              boundaries where start = stop are accepted
              boundaries are assumed to be only in range [0, int < inf)

              Args:
              predicted_boundary (list): the [start, stop] of the predicted boundary
              target_boundary (list): the ground truth [start, stop] for which to compare

              Returns:
              iou (float): the IOU bounded in [0, 1]
              '''

              p_lower, p_upper = predicted_boundary
              t_lower, t_upper = target_boundary

              # boundaries are in form [start, stop] and 0<= start <= stop
              assert 0<= p_lower <= p_upper
              assert 0<= t_lower <= t_upper

              # no overlap, pred is too far left or pred is too far right
              if p_upper < t_lower or p_lower > t_upper:
              return 0

              if predicted_boundary == target_boundary:
              return 1

              intersection_lower_bound = max(p_lower, t_lower)
              intersection_upper_bound = min(p_upper, t_upper)


              intersection = intersection_upper_bound - intersection_lower_bound
              union = max(t_upper, p_upper) - min(t_lower, p_lower)
              union = union if union != 0 else 1
              return min(intersection / union, 1)


              some simple helpers:



              from math import sqrt
              def euclidean(u, v):
              return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

              def mean(arr):
              return sum(arr) / len(arr)


              how we align our boundaries:



              def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
              '''Aligns predicted_bondary to the closest target_boundary based on the
              alignment_scoring_fn

              Args:
              predicted_boundary (list): the predicted boundary in form of [start, stop]

              target_boundaries (list): a list of all valid target boundaries each having
              form [start, stop]

              alignment_scoring_fn (function): a function taking two arguments each of
              which is a list of two elements, the first assumed to be the predicted
              boundary and the latter the target boundary. Should return a single number.

              take (function): should either be min or max. Selects either the highest or
              lower score according to the alignment_scoring_fn

              Returns:
              aligned_boundary (list): the aligned boundary in form [start, stop]
              '''
              scores = [
              alignment_scoring_fn(predicted_boundary, target_boundary)
              for target_boundary in target_boundaries
              ]



              # boundary did not align to any boxes, use fallback scoring mechanism to break
              # tie
              if not any(scores):
              scores = [
              1 / euclidean(predicted_boundary, target_boundary)
              for target_boundary in target_boundaries
              ]

              aligned_index = scores.index(take(scores))
              aligned = target_boundaries[aligned_index]
              return aligned


              how we calculate difference:



              def diff(u, v):
              return [u[0] - v[0], u[1] - v[1]]


              combine it all into one:



              def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
              '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

              Args:
              predicted_boundaries (list): a list of all valid target boundaries each
              having form [start, stop]

              target_boundaries (list): a list of all valid target boundaries each having
              form [start, stop]

              alignment_scoring_fn (function): a function taking two arguments each of
              which is a list of two elements, the first assumed to be the predicted
              boundary and the latter the target boundary. Should return a single number.

              take (function): should either be min or max. Selects either the highest or
              lower score according to the alignment_scoring_fn

              distance_fn (function): a function taking two lists and should return a
              single value.

              aggregate_fn (function): a function taking a list of numbers (distances
              calculated by distance_fn) and returns a single value (the aggregated
              distance)

              Returns:
              aggregated_distnace (float): return the aggregated distance of the
              aligned predicted_boundaries

              aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
              '''


              paired = [
              (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
              for predicted_boundary in predicted_boundaries
              ]
              distances = [distance_fn(*pair) for pair in paired]
              aggregated = [aggregate_fn(error) for error in zip(*distances)]
              return aggregated


              run:



              aligned_distance_1d(pred_boxes, true_boxes)

              # [-3.0, -3.6666666666666665]


              Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.



              Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.



              Solution to question asked



              copy pasted your examples:



              # "detected" objects
              p_obj = [
              [[2, 3], [8, 8]], # class 1
              [[4, 4], [6, 7]], # class 2
              [[0, 0]] # class 3
              ]

              # true objects
              t_obj = [
              [[1, 3], [6, 9]], # class 1
              [[4, 7]], # class 2
              [[0, 0]] # class 3
              ]


              since you know the boxes per class this is easy:



              [
              aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
              for cls_no in range(len(t_obj))
              ]


              # [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]


              Does this output make sense?



              Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.



              How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.



              Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).



              If we were to draw what the score suggests it would be:



              #  0  1  2  3  4  5  6  7  8  9
              # ---------- # truth [4, 7]
              # ++ # pred [4 + 1, 7 - 1.5]


              It doesn't look so great, but this is just an example...



              Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
              No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...



              Conclusion



              Is this a faulty metric?



              Depends on what you are using it for / trying to show.
              However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.






              share|improve this answer
























                0












                0








                0






                Assumption



                You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.



                Let us assume you have two ground truth bounding boxes: box 1 and box 2.



                Further, let us assume that our model is not so great and predicts more than 2 boxes
                (maybe it found something new, maybe it broke one box into two).



                For this demonstration let us consider that this is what we are working with:



                # labels
                # box 1: x----y
                # box 2: x++++y
                # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                # x--------y x+++++++++++++++++++++++++++++y TRUTH
                # a-----------b PRED 1, BOX 1
                # a+++++++++++++++++b PRED 2, BOX 2
                # a++++++++++++++++++++++++++++++++b PRED 3, BOX 2


                Core Problem



                What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
                belong to which predictions?



                Pick your distance function of choice and pair each prediction with a target based on that function.
                In this case I will use a modified intersection over union (IOU) for the 1D case.
                I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.



                With a score for each prediction, pair it with the target that produced the best score.



                Now with a one-to-one prediction-target pair, calculate whatever it is that you want.



                Demo with above assumption



                from the above assumptions:



                pred_boxes = [
                [4, 8],
                [6, 12],
                [5, 16]
                ]

                true_boxes = [
                [4, 7],
                [10, 20]
                ]


                a 1d version of intersection over union:



                def iou_1d(predicted_boundary, target_boundary):
                '''Calculates the intersection over union (IOU) based on a span.

                Notes:
                boundaries are provided in the the form of [start, stop].
                boundaries where start = stop are accepted
                boundaries are assumed to be only in range [0, int < inf)

                Args:
                predicted_boundary (list): the [start, stop] of the predicted boundary
                target_boundary (list): the ground truth [start, stop] for which to compare

                Returns:
                iou (float): the IOU bounded in [0, 1]
                '''

                p_lower, p_upper = predicted_boundary
                t_lower, t_upper = target_boundary

                # boundaries are in form [start, stop] and 0<= start <= stop
                assert 0<= p_lower <= p_upper
                assert 0<= t_lower <= t_upper

                # no overlap, pred is too far left or pred is too far right
                if p_upper < t_lower or p_lower > t_upper:
                return 0

                if predicted_boundary == target_boundary:
                return 1

                intersection_lower_bound = max(p_lower, t_lower)
                intersection_upper_bound = min(p_upper, t_upper)


                intersection = intersection_upper_bound - intersection_lower_bound
                union = max(t_upper, p_upper) - min(t_lower, p_lower)
                union = union if union != 0 else 1
                return min(intersection / union, 1)


                some simple helpers:



                from math import sqrt
                def euclidean(u, v):
                return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

                def mean(arr):
                return sum(arr) / len(arr)


                how we align our boundaries:



                def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
                '''Aligns predicted_bondary to the closest target_boundary based on the
                alignment_scoring_fn

                Args:
                predicted_boundary (list): the predicted boundary in form of [start, stop]

                target_boundaries (list): a list of all valid target boundaries each having
                form [start, stop]

                alignment_scoring_fn (function): a function taking two arguments each of
                which is a list of two elements, the first assumed to be the predicted
                boundary and the latter the target boundary. Should return a single number.

                take (function): should either be min or max. Selects either the highest or
                lower score according to the alignment_scoring_fn

                Returns:
                aligned_boundary (list): the aligned boundary in form [start, stop]
                '''
                scores = [
                alignment_scoring_fn(predicted_boundary, target_boundary)
                for target_boundary in target_boundaries
                ]



                # boundary did not align to any boxes, use fallback scoring mechanism to break
                # tie
                if not any(scores):
                scores = [
                1 / euclidean(predicted_boundary, target_boundary)
                for target_boundary in target_boundaries
                ]

                aligned_index = scores.index(take(scores))
                aligned = target_boundaries[aligned_index]
                return aligned


                how we calculate difference:



                def diff(u, v):
                return [u[0] - v[0], u[1] - v[1]]


                combine it all into one:



                def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
                '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

                Args:
                predicted_boundaries (list): a list of all valid target boundaries each
                having form [start, stop]

                target_boundaries (list): a list of all valid target boundaries each having
                form [start, stop]

                alignment_scoring_fn (function): a function taking two arguments each of
                which is a list of two elements, the first assumed to be the predicted
                boundary and the latter the target boundary. Should return a single number.

                take (function): should either be min or max. Selects either the highest or
                lower score according to the alignment_scoring_fn

                distance_fn (function): a function taking two lists and should return a
                single value.

                aggregate_fn (function): a function taking a list of numbers (distances
                calculated by distance_fn) and returns a single value (the aggregated
                distance)

                Returns:
                aggregated_distnace (float): return the aggregated distance of the
                aligned predicted_boundaries

                aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
                '''


                paired = [
                (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
                for predicted_boundary in predicted_boundaries
                ]
                distances = [distance_fn(*pair) for pair in paired]
                aggregated = [aggregate_fn(error) for error in zip(*distances)]
                return aggregated


                run:



                aligned_distance_1d(pred_boxes, true_boxes)

                # [-3.0, -3.6666666666666665]


                Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.



                Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.



                Solution to question asked



                copy pasted your examples:



                # "detected" objects
                p_obj = [
                [[2, 3], [8, 8]], # class 1
                [[4, 4], [6, 7]], # class 2
                [[0, 0]] # class 3
                ]

                # true objects
                t_obj = [
                [[1, 3], [6, 9]], # class 1
                [[4, 7]], # class 2
                [[0, 0]] # class 3
                ]


                since you know the boxes per class this is easy:



                [
                aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
                for cls_no in range(len(t_obj))
                ]


                # [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]


                Does this output make sense?



                Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.



                How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.



                Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).



                If we were to draw what the score suggests it would be:



                #  0  1  2  3  4  5  6  7  8  9
                # ---------- # truth [4, 7]
                # ++ # pred [4 + 1, 7 - 1.5]


                It doesn't look so great, but this is just an example...



                Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
                No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...



                Conclusion



                Is this a faulty metric?



                Depends on what you are using it for / trying to show.
                However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.






                share|improve this answer












                Assumption



                You ask specifically about the 1D case, so we will solve the 1D case here, but the method is essentially the same for 2D.



                Let us assume you have two ground truth bounding boxes: box 1 and box 2.



                Further, let us assume that our model is not so great and predicts more than 2 boxes
                (maybe it found something new, maybe it broke one box into two).



                For this demonstration let us consider that this is what we are working with:



                # labels
                # box 1: x----y
                # box 2: x++++y
                # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                # x--------y x+++++++++++++++++++++++++++++y TRUTH
                # a-----------b PRED 1, BOX 1
                # a+++++++++++++++++b PRED 2, BOX 2
                # a++++++++++++++++++++++++++++++++b PRED 3, BOX 2


                Core Problem



                What you want is in effect a score on the alignment of your predictions to the targets.... but oh no! which targets
                belong to which predictions?



                Pick your distance function of choice and pair each prediction with a target based on that function.
                In this case I will use a modified intersection over union (IOU) for the 1D case.
                I chose this function as I wanted both PRED 2 and 3 from the above diagram to align to box 2.



                With a score for each prediction, pair it with the target that produced the best score.



                Now with a one-to-one prediction-target pair, calculate whatever it is that you want.



                Demo with above assumption



                from the above assumptions:



                pred_boxes = [
                [4, 8],
                [6, 12],
                [5, 16]
                ]

                true_boxes = [
                [4, 7],
                [10, 20]
                ]


                a 1d version of intersection over union:



                def iou_1d(predicted_boundary, target_boundary):
                '''Calculates the intersection over union (IOU) based on a span.

                Notes:
                boundaries are provided in the the form of [start, stop].
                boundaries where start = stop are accepted
                boundaries are assumed to be only in range [0, int < inf)

                Args:
                predicted_boundary (list): the [start, stop] of the predicted boundary
                target_boundary (list): the ground truth [start, stop] for which to compare

                Returns:
                iou (float): the IOU bounded in [0, 1]
                '''

                p_lower, p_upper = predicted_boundary
                t_lower, t_upper = target_boundary

                # boundaries are in form [start, stop] and 0<= start <= stop
                assert 0<= p_lower <= p_upper
                assert 0<= t_lower <= t_upper

                # no overlap, pred is too far left or pred is too far right
                if p_upper < t_lower or p_lower > t_upper:
                return 0

                if predicted_boundary == target_boundary:
                return 1

                intersection_lower_bound = max(p_lower, t_lower)
                intersection_upper_bound = min(p_upper, t_upper)


                intersection = intersection_upper_bound - intersection_lower_bound
                union = max(t_upper, p_upper) - min(t_lower, p_lower)
                union = union if union != 0 else 1
                return min(intersection / union, 1)


                some simple helpers:



                from math import sqrt
                def euclidean(u, v):
                return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

                def mean(arr):
                return sum(arr) / len(arr)


                how we align our boundaries:



                def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
                '''Aligns predicted_bondary to the closest target_boundary based on the
                alignment_scoring_fn

                Args:
                predicted_boundary (list): the predicted boundary in form of [start, stop]

                target_boundaries (list): a list of all valid target boundaries each having
                form [start, stop]

                alignment_scoring_fn (function): a function taking two arguments each of
                which is a list of two elements, the first assumed to be the predicted
                boundary and the latter the target boundary. Should return a single number.

                take (function): should either be min or max. Selects either the highest or
                lower score according to the alignment_scoring_fn

                Returns:
                aligned_boundary (list): the aligned boundary in form [start, stop]
                '''
                scores = [
                alignment_scoring_fn(predicted_boundary, target_boundary)
                for target_boundary in target_boundaries
                ]



                # boundary did not align to any boxes, use fallback scoring mechanism to break
                # tie
                if not any(scores):
                scores = [
                1 / euclidean(predicted_boundary, target_boundary)
                for target_boundary in target_boundaries
                ]

                aligned_index = scores.index(take(scores))
                aligned = target_boundaries[aligned_index]
                return aligned


                how we calculate difference:



                def diff(u, v):
                return [u[0] - v[0], u[1] - v[1]]


                combine it all into one:



                def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
                '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

                Args:
                predicted_boundaries (list): a list of all valid target boundaries each
                having form [start, stop]

                target_boundaries (list): a list of all valid target boundaries each having
                form [start, stop]

                alignment_scoring_fn (function): a function taking two arguments each of
                which is a list of two elements, the first assumed to be the predicted
                boundary and the latter the target boundary. Should return a single number.

                take (function): should either be min or max. Selects either the highest or
                lower score according to the alignment_scoring_fn

                distance_fn (function): a function taking two lists and should return a
                single value.

                aggregate_fn (function): a function taking a list of numbers (distances
                calculated by distance_fn) and returns a single value (the aggregated
                distance)

                Returns:
                aggregated_distnace (float): return the aggregated distance of the
                aligned predicted_boundaries

                aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
                '''


                paired = [
                (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
                for predicted_boundary in predicted_boundaries
                ]
                distances = [distance_fn(*pair) for pair in paired]
                aggregated = [aggregate_fn(error) for error in zip(*distances)]
                return aggregated


                run:



                aligned_distance_1d(pred_boxes, true_boxes)

                # [-3.0, -3.6666666666666665]


                Note, for many predictions and many targets there are many ways to optimize the code. Here, I broke up the main functional chunks so it is clear what is going on.



                Now does this make sense? Well since I wanted pred 2 and 3 to align to box 2, yes, both starts are prior the truth and both end prematurely.



                Solution to question asked



                copy pasted your examples:



                # "detected" objects
                p_obj = [
                [[2, 3], [8, 8]], # class 1
                [[4, 4], [6, 7]], # class 2
                [[0, 0]] # class 3
                ]

                # true objects
                t_obj = [
                [[1, 3], [6, 9]], # class 1
                [[4, 7]], # class 2
                [[0, 0]] # class 3
                ]


                since you know the boxes per class this is easy:



                [
                aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
                for cls_no in range(len(t_obj))
                ]


                # [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]


                Does this output make sense?



                Starting with a sanity check, let us look at class 3. The average distance of [start, stop] are both 0. Makes sense.



                How about class 1? both predictions start too late (2 > 1, 8 > 6) but only one ends too soon (8 < 9). So makes sense.



                Now let us look at class 2, which is why it seems you asked the question (more predictions than targets).



                If we were to draw what the score suggests it would be:



                #  0  1  2  3  4  5  6  7  8  9
                # ---------- # truth [4, 7]
                # ++ # pred [4 + 1, 7 - 1.5]


                It doesn't look so great, but this is just an example...



                Does this make sense? Yes / no. Yes in terms of how we calculated the metric. One stoped 3 values too soon the other started 2 too late.
                No in the sense that neither of your predictions actually cover the value 5, and yet this metric leads you to believe that is the case...



                Conclusion



                Is this a faulty metric?



                Depends on what you are using it for / trying to show.
                However since you use a binary mask to generate you predicted boundaries, that is a non negligible root of this problem. Perhaps there is a better strategy to get boundaries from your label probabilities.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 21 at 20:38









                SumNeuron

                1,115823




                1,115823






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390093%2fmetric-for-evaluating-predicted-bounding-boxes-from-semantic-segmentation-on-an%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Ottavio Pratesi

                    Tricia Helfer

                    15 giugno