Custom losses and metrics

Collection of various loss and metric functions

Classification metrics


source

adjusted_R2Score

 adjusted_R2Score (r2_score, n, k)

Calculates adjusted_R2Score based on r2_score, number of observations (n) and number of predictor variables(k)

Regression metrics


rrmse

 rrmse (preds, targs)

Relative RMSE. Normalized with mean of the target


bias

 bias (preds, targs)

Average bias of predictions


bias_pct

 bias_pct (preds, targs)

Mean weighted bias, normalized with mean of the target

BigEarthNet metrics

Metrics used in BigEarthNet paper to evaluate multi-label classification


source

label_ranking_average_precision_score

 label_ranking_average_precision_score (sigmoid=True, sample_weight=None)

Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score.

from fastai.learner import Learner
class TstLearner(Learner):
    def __init__(self,dls=None,model=None,**kwargs): self.pred,self.xb,self.yb = None,None,None

def compute_val(met, x1, x2):
    met.reset()
    vals = [0,6,15,20]
    learn = TstLearner()
    for i in range(3):
        learn.pred,learn.yb = x1[vals[i]:vals[i+1]],(x2[vals[i]:vals[i+1]],)
        met.accumulate(learn)
    return met.value
x_1 = torch.randn(10,3)
x_2 = torch.randint(2,(10,3))
x_1, torch.sigmoid(x_1), x_2
(tensor([[-0.0175, -1.1692,  0.4717],
         [-0.1019,  1.7057,  0.8773],
         [-0.6633,  0.0593,  1.3823],
         [-0.6155, -0.4767, -1.4756],
         [-1.0687,  0.0246, -0.3407],
         [ 0.8646, -0.5164,  0.7274],
         [-0.7253,  0.6507,  0.2994],
         [-0.8993,  0.8574,  0.3334],
         [ 0.5395,  0.5471,  0.7207],
         [ 0.5087, -0.7967,  0.5395]]),
 tensor([[0.4956, 0.2370, 0.6158],
         [0.4745, 0.8463, 0.7063],
         [0.3400, 0.5148, 0.7994],
         [0.3508, 0.3830, 0.1861],
         [0.2557, 0.5061, 0.4156],
         [0.7036, 0.3737, 0.6742],
         [0.3262, 0.6572, 0.5743],
         [0.2892, 0.7021, 0.5826],
         [0.6317, 0.6335, 0.6728],
         [0.6245, 0.3107, 0.6317]]),
 tensor([[1, 0, 0],
         [1, 1, 0],
         [1, 1, 1],
         [0, 0, 1],
         [1, 1, 1],
         [1, 0, 0],
         [0, 0, 0],
         [0, 1, 0],
         [1, 0, 1],
         [0, 1, 0]]))
lrap = label_ranking_average_precision_score()
compute_val(lrap, x_1, x_2)
0.7833333333333332

source

label_ranking_loss

 label_ranking_loss (sigmoid=True, sample_weight=None)

Compute the average number of label pairs that are incorrectly ordered given y_score weighted by the size of the label set and the number of labels not in the label set.

lrl = label_ranking_loss()
compute_val(lrl, x_1, x_2)
0.35

one_error

 one_error (preds, targs)

Rate for which the top ranked label is not among ground truth

one_error(x_1, x_2)
tensor(0.4000)

source

coverage_error

 coverage_error (sigmoid=True, sample_weight=None)

Compute how far we need to go through the ranked scores to cover all true labels. The best value is equal to the average number of labels in y_true per sample.

cov = coverage_error()
compute_val(cov, x_1, x_2)
2.2

Segmentation metrics


source

JaccardCoeffMulti

 JaccardCoeffMulti (axis=1)

Averaged Jaccard coefficient for multiclass target in segmentation. Excludes background class

x1 = torch.randn(20,6,3,3)
x2 = torch.randint(0, 6, (20, 3, 3))
pred = x1.argmax(1)
compute_val(JaccardCoeffMulti(), x1, x2)
0.08437537158034053

Object detection metrics and evaluation for shapefiles MOSTLY OBSOLETE, USE GisCOCOEval instead

To evaluate our collection of predicted masks, we’ll compare each of our predicted masks with each of the available target masks for a given input.

  • A true positive, when a prediction-target mask pair has an IoU score which exceeds some predefined threshold
  • A false positive, when a predicted object had no associated ground truth object mask
  • A false negative indicates a ground truth object mask had no associated predicted object mask

In the case of multiple detections, the one with the highest confidence is considered to be “correct” and others are FP.

From these, we can get Precision and Recall

\(Precision = \frac{TP}{TP + FP} = \frac{TP}{all \: detections}, Recall = \frac{TP}{TP+FN} = \frac{TP}{all \: ground \: truths}\)

And use these to derive other metrics.

Typical metrics include Average Precision (AP) and mean Average Precision (mAP). From these several metrics can be derived:

  • AP50, AP75, AP[.50:.05:.95] are the most common, with AP[.50:.05:.95] being the primary challenge metric in COCO
  • AP Across scales: APsmall, APmedium, APlarge, where small, medium and large have specified areas
    • Scales for COCO are less than 32² for small, between 32² and 96² for medium and more than 96² for large, sizes in pixels
    • Our data has variable resolution sizes, but on average the resolution is around 0.05m, so small is less than 2.56m², medium is between 2.56m² and 23.04m², and large is more than 23.04m²
  • Average Recall (AR) is also sometimes used similarly, but with restrictions for the number of detections per image
    • It is computed as the area under Recall-IoU -curve for IoU thresholds from [0.5, 1]
  • All of these can be applied to bounding boxes and masks

All the following functions assume that you have two GeoDataFrames that have same CRS and matching column label. Usage example is the following:


ground_truth = gpd.read_file(<path_to_ground_truth>)
results = gpd.read_file(<path_to_results>)

# clip the geodataframes to have same extent
results = gpd.clip(results, box(*ground_truth.total_bounds), keep_geom_type=True)
ground_truth = gpd.clip(ground_truth, box(*results.total_bounds), keep_geom_type=True)

# create spatial index for faster queries                         
res_sindex = results.sindex
gt_sindex = ground_truth.sindex

# TP/FN check with different thresholds, applied to ground truth
tp_cols = [f'TP_{np.round(i, 2)}' for i in np.arange(0.5, 1.04, 0.05)]
ground_truth[tp_cols] = ground_truth.apply(lambda row: is_true_positive(row, results, res_sindex), 
                                           axis=1, result_type='expand')

# TP/FP check with different thresholds, applied to predictions
fp_cols = [f'FP_{np.round(i, 2)}' for i in np.arange(0.5, 1.01, 0.05)]
results[fp_cols] = results.apply(lambda row: is_false_positive(row, ground_truth, gt_sindex, results, res_sindex), 
                                 axis=1, result_type='expand')

source

poly_IoU

 poly_IoU (poly_1:shapely.geometry.polygon.Polygon,
           poly_2:shapely.geometry.polygon.Polygon)

IoU for polygons


source

poly_dice

 poly_dice (poly_1:shapely.geometry.polygon.Polygon,
            poly_2:shapely.geometry.polygon.Polygon)

Dice for polygons


source

is_true_positive

 is_true_positive (row, results:geopandas.geodataframe.GeoDataFrame, res_s
                   index:<module'geopandas.sindex'from'/usr/share/minicond
                   a3/envs/test/lib/python3.10/site-
                   packages/geopandas/sindex.py'>)

Check if a single ground truth mask is TP or FN with 11 different IoU thresholds


source

is_false_positive

 is_false_positive (row, ground_truth:geopandas.geodataframe.GeoDataFrame,
                    gt_sindex:<module'geopandas.sindex'from'/usr/share/min
                    iconda3/envs/test/lib/python3.10/site-
                    packages/geopandas/sindex.py'>,
                    results:geopandas.geodataframe.GeoDataFrame, res_sinde
                    x:<module'geopandas.sindex'from'/usr/share/miniconda3/
                    envs/test/lib/python3.10/site-
                    packages/geopandas/sindex.py'>)

Check if prediction is FP or TP for 11 different IoU thresholds

average_precision and average_recall both return dict of the results, with each label and each IoU threshold separately. Each item is 11 item list where each item correspond to a different recall threshold in the range of [0:.1:1] in the case of average_precision, or IoU threshold in the range of [.50:.05:1] in for average_recall.


source

average_precision

 average_precision (ground_truth:geopandas.geodataframe.GeoDataFrame,
                    preds:geopandas.geodataframe.GeoDataFrame)

Get 11-point AP score for each label separately and with all iou_thresholds


source

average_recall

 average_recall (ground_truth:geopandas.geodataframe.GeoDataFrame,
                 preds:geopandas.geodataframe.GeoDataFrame,
                 max_detections:int=None)

Get 11-point AR score for each label separately and with all iou_thresholds. If max_detections is not None evaluate with only that most confident predictions Seems to be still bugged, needs fixing

Object detection metrics with pycocotools and gis-data

Run predict_instance_masks or predict_bboxes for each scene separately, and save the resulting files in data_path containing * folder raster_tiles that contain the corresponding raster data. Required for transforming shapefiles to pixel coordinates * folder vector_tiles that contain ground truth masks * folder predicted_vectors that contain predictions

All files corresponding to the same scene should have the same name, e.g. raster_tiles/1053_Hiidenportti_Chunk9_orto.tif for raster image, vector_tiles/1053_Hiidenportti_Chunk9_orto.geojson for ground truth and predicted_vectors/1053_Hiidenportti_Chunk9_orto.geojson for predictions.


source

GisCOCOeval

 GisCOCOeval (data_path:str, outpath:str, coco_info:dict,
              coco_licenses:list, coco_categories:list)

Initialize evaluator with data path and coco information


source

GisCOCOeval.prepare_data

 GisCOCOeval.prepare_data (gt_label_col:str='label',
                           res_label_col:str='label',
                           rotated_bbox:bool=False, min_bbox_area:int=0)

Convert GIS-data predictions to COCO-format for evaluation, and save resulting files to self.outpath


source

GisCOCOeval.prepare_eval

 GisCOCOeval.prepare_eval (eval_type:str='segm')

Prepare COCOeval to evaluate predictions with 100 and 1000 detections. AP metrics are evaluated with 1000 detections and AR with 100


source

GisCOCOeval.evaluate

 GisCOCOeval.evaluate (classes_separately:bool=True)

Run evaluation and print metrics


source

GisCOCOeval.save_results

 GisCOCOeval.save_results (outpath, iou_thresh:float=0.5)

Saves correctly detected ground truths, correct detections missed ground truths and misclassifications with specified iou_threshold in separate files for each scene