from fastai.learner import Learner
Custom losses and metrics
Classification metrics
adjusted_R2Score
adjusted_R2Score (r2_score, n, k)
Calculates adjusted_R2Score based on r2_score, number of observations (n) and number of predictor variables(k)
Regression metrics
rrmse
rrmse (preds, targs)
Relative RMSE. Normalized with mean of the target
bias
bias (preds, targs)
Average bias of predictions
bias_pct
bias_pct (preds, targs)
Mean weighted bias, normalized with mean of the target
BigEarthNet metrics
Metrics used in BigEarthNet paper to evaluate multi-label classification
label_ranking_average_precision_score
label_ranking_average_precision_score (sigmoid=True, sample_weight=None)
Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score.
class TstLearner(Learner):
def __init__(self,dls=None,model=None,**kwargs): self.pred,self.xb,self.yb = None,None,None
def compute_val(met, x1, x2):
met.reset()= [0,6,15,20]
vals = TstLearner()
learn for i in range(3):
= x1[vals[i]:vals[i+1]],(x2[vals[i]:vals[i+1]],)
learn.pred,learn.yb
met.accumulate(learn)return met.value
= torch.randn(10,3)
x_1 = torch.randint(2,(10,3))
x_2 x_1, torch.sigmoid(x_1), x_2
(tensor([[-0.0175, -1.1692, 0.4717],
[-0.1019, 1.7057, 0.8773],
[-0.6633, 0.0593, 1.3823],
[-0.6155, -0.4767, -1.4756],
[-1.0687, 0.0246, -0.3407],
[ 0.8646, -0.5164, 0.7274],
[-0.7253, 0.6507, 0.2994],
[-0.8993, 0.8574, 0.3334],
[ 0.5395, 0.5471, 0.7207],
[ 0.5087, -0.7967, 0.5395]]),
tensor([[0.4956, 0.2370, 0.6158],
[0.4745, 0.8463, 0.7063],
[0.3400, 0.5148, 0.7994],
[0.3508, 0.3830, 0.1861],
[0.2557, 0.5061, 0.4156],
[0.7036, 0.3737, 0.6742],
[0.3262, 0.6572, 0.5743],
[0.2892, 0.7021, 0.5826],
[0.6317, 0.6335, 0.6728],
[0.6245, 0.3107, 0.6317]]),
tensor([[1, 0, 0],
[1, 1, 0],
[1, 1, 1],
[0, 0, 1],
[1, 1, 1],
[1, 0, 0],
[0, 0, 0],
[0, 1, 0],
[1, 0, 1],
[0, 1, 0]]))
= label_ranking_average_precision_score()
lrap compute_val(lrap, x_1, x_2)
0.7833333333333332
label_ranking_loss
label_ranking_loss (sigmoid=True, sample_weight=None)
Compute the average number of label pairs that are incorrectly ordered given y_score weighted by the size of the label set and the number of labels not in the label set.
= label_ranking_loss()
lrl compute_val(lrl, x_1, x_2)
0.35
one_error
one_error (preds, targs)
Rate for which the top ranked label is not among ground truth
one_error(x_1, x_2)
tensor(0.4000)
coverage_error
coverage_error (sigmoid=True, sample_weight=None)
Compute how far we need to go through the ranked scores to cover all true labels. The best value is equal to the average number of labels in y_true per sample.
= coverage_error()
cov compute_val(cov, x_1, x_2)
2.2
Segmentation metrics
JaccardCoeffMulti
JaccardCoeffMulti (axis=1)
Averaged Jaccard coefficient for multiclass target in segmentation. Excludes background class
= torch.randn(20,6,3,3)
x1 = torch.randint(0, 6, (20, 3, 3))
x2 = x1.argmax(1) pred
compute_val(JaccardCoeffMulti(), x1, x2)
0.08437537158034053
Object detection metrics and evaluation for shapefiles MOSTLY OBSOLETE, USE GisCOCOEval instead
To evaluate our collection of predicted masks, we’ll compare each of our predicted masks with each of the available target masks for a given input.
- A true positive, when a prediction-target mask pair has an IoU score which exceeds some predefined threshold
- A false positive, when a predicted object had no associated ground truth object mask
- A false negative indicates a ground truth object mask had no associated predicted object mask
In the case of multiple detections, the one with the highest confidence is considered to be “correct” and others are FP.
From these, we can get Precision and Recall
\(Precision = \frac{TP}{TP + FP} = \frac{TP}{all \: detections}, Recall = \frac{TP}{TP+FN} = \frac{TP}{all \: ground \: truths}\)
And use these to derive other metrics.
Typical metrics include Average Precision (AP) and mean Average Precision (mAP). From these several metrics can be derived:
- AP50, AP75, AP[.50:.05:.95] are the most common, with AP[.50:.05:.95] being the primary challenge metric in COCO
- AP Across scales: APsmall, APmedium, APlarge, where small, medium and large have specified areas
- Scales for COCO are less than 32² for small, between 32² and 96² for medium and more than 96² for large, sizes in pixels
- Our data has variable resolution sizes, but on average the resolution is around 0.05m, so small is less than 2.56m², medium is between 2.56m² and 23.04m², and large is more than 23.04m²
- Average Recall (AR) is also sometimes used similarly, but with restrictions for the number of detections per image
- It is computed as the area under Recall-IoU -curve for IoU thresholds from [0.5, 1]
- All of these can be applied to bounding boxes and masks
All the following functions assume that you have two GeoDataFrame
s that have same CRS and matching column label
. Usage example is the following:
= gpd.read_file(<path_to_ground_truth>)
ground_truth = gpd.read_file(<path_to_results>)
results
# clip the geodataframes to have same extent
= gpd.clip(results, box(*ground_truth.total_bounds), keep_geom_type=True)
results = gpd.clip(ground_truth, box(*results.total_bounds), keep_geom_type=True)
ground_truth
# create spatial index for faster queries
= results.sindex
res_sindex = ground_truth.sindex
gt_sindex
# TP/FN check with different thresholds, applied to ground truth
= [f'TP_{np.round(i, 2)}' for i in np.arange(0.5, 1.04, 0.05)]
tp_cols = ground_truth.apply(lambda row: is_true_positive(row, results, res_sindex),
ground_truth[tp_cols] =1, result_type='expand')
axis
# TP/FP check with different thresholds, applied to predictions
= [f'FP_{np.round(i, 2)}' for i in np.arange(0.5, 1.01, 0.05)]
fp_cols = results.apply(lambda row: is_false_positive(row, ground_truth, gt_sindex, results, res_sindex),
results[fp_cols] =1, result_type='expand') axis
poly_IoU
poly_IoU (poly_1:shapely.geometry.polygon.Polygon, poly_2:shapely.geometry.polygon.Polygon)
IoU for polygons
poly_dice
poly_dice (poly_1:shapely.geometry.polygon.Polygon, poly_2:shapely.geometry.polygon.Polygon)
Dice for polygons
is_true_positive
is_true_positive (row, results:geopandas.geodataframe.GeoDataFrame, res_s index:<module'geopandas.sindex'from'/usr/share/minicond a3/envs/test/lib/python3.10/site- packages/geopandas/sindex.py'>)
Check if a single ground truth mask is TP or FN with 11 different IoU thresholds
is_false_positive
is_false_positive (row, ground_truth:geopandas.geodataframe.GeoDataFrame, gt_sindex:<module'geopandas.sindex'from'/usr/share/min iconda3/envs/test/lib/python3.10/site- packages/geopandas/sindex.py'>, results:geopandas.geodataframe.GeoDataFrame, res_sinde x:<module'geopandas.sindex'from'/usr/share/miniconda3/ envs/test/lib/python3.10/site- packages/geopandas/sindex.py'>)
Check if prediction is FP or TP for 11 different IoU thresholds
average_precision
and average_recall
both return dict
of the results, with each label and each IoU threshold separately. Each item is 11 item list where each item correspond to a different recall threshold in the range of [0:.1:1] in the case of average_precision
, or IoU threshold in the range of [.50:.05:1] in for average_recall
.
average_precision
average_precision (ground_truth:geopandas.geodataframe.GeoDataFrame, preds:geopandas.geodataframe.GeoDataFrame)
Get 11-point AP score for each label separately and with all iou_thresholds
average_recall
average_recall (ground_truth:geopandas.geodataframe.GeoDataFrame, preds:geopandas.geodataframe.GeoDataFrame, max_detections:int=None)
Get 11-point AR score for each label separately and with all iou_thresholds. If max_detections
is not None
evaluate with only that most confident predictions Seems to be still bugged, needs fixing
Object detection metrics with pycocotools and gis-data
Run predict_instance_masks
or predict_bboxes
for each scene separately, and save the resulting files in data_path
containing * folder raster_tiles
that contain the corresponding raster data. Required for transforming shapefiles to pixel coordinates * folder vector_tiles
that contain ground truth masks * folder predicted_vectors
that contain predictions
All files corresponding to the same scene should have the same name, e.g. raster_tiles/1053_Hiidenportti_Chunk9_orto.tif
for raster image, vector_tiles/1053_Hiidenportti_Chunk9_orto.geojson
for ground truth and predicted_vectors/1053_Hiidenportti_Chunk9_orto.geojson
for predictions.
GisCOCOeval
GisCOCOeval (data_path:str, outpath:str, coco_info:dict, coco_licenses:list, coco_categories:list)
Initialize evaluator with data path and coco information
GisCOCOeval.prepare_data
GisCOCOeval.prepare_data (gt_label_col:str='label', res_label_col:str='label', rotated_bbox:bool=False, min_bbox_area:int=0)
Convert GIS-data predictions to COCO-format for evaluation, and save resulting files to self.outpath
GisCOCOeval.prepare_eval
GisCOCOeval.prepare_eval (eval_type:str='segm')
Prepare COCOeval to evaluate predictions with 100 and 1000 detections. AP metrics are evaluated with 1000 detections and AR with 100
GisCOCOeval.evaluate
GisCOCOeval.evaluate (classes_separately:bool=True)
Run evaluation and print metrics
GisCOCOeval.save_results
GisCOCOeval.save_results (outpath, iou_thresh:float=0.5)
Saves correctly detected ground truths, correct detections missed ground truths and misclassifications with specified iou_threshold in separate files for each scene