import sys, os
from pathlib import Path
import torch
import geopandas as gpd
import pandas as pd
import shapely
import matplotlib.pyplot as plt
from sahi.models.ultralytics import UltralyticsDetectionModel
import rasterio as rio
import rasterio.plot as rioplot
from sahi.predict import get_sliced_prediction
1, '..')
sys.path.insert(
from src.sahi_utils import *
Inference example
This document shows how to use the models to process Sentinel-2 images. The steps presented here are mostly the same than in scripts/predict_tile.py.
Acquire the data
As usual, data can be accessed from Copernicus Data Space Ecosystem (CDSE). The following products were used as the test data.
Location | Product name | Number of annotations |
---|---|---|
Bothnian sea | S2B_MSIL1C_20210714T100029_N0301_R122_T34VEN_20210714T121056 | 450 |
Bothnian sea | S2B_MSIL1C_20220619T100029_N0400_R122_T34VEN_20220619T104419 | 66 |
Bothnian sea | S2A_MSIL1C_20220624T100041_N0400_R122_T34VEN_20220624T120211 | 424 |
Bothnian sea | S2A_MSIL1C_20220813T095601_N0400_R122_T34VEN_20220813T120233 | 399 |
Kvarken | S2B_MSIL1C_20210714T100029_N0301_R122_T34VEN_20210714T121056 | 83 |
Kvarken | S2B_MSIL1C_20220712T100559_N0400_R022_T34VER_20220712T121613 | 184 |
Kvarken | S2A_MSIL1C_20220826T100611_N0400_R022_T34VER_20220826T135136 | 88 |
We use S2A_MSIL1C_20220624T100041_N0400_R122_T34VEN_20220624T120211
as the data for this example. First download the L1C-TCI mosaic with
python scripts/cdse_odata_tci.py S2A_MSIL1C_20220624T100041_N0400_R122_T34VEN_20220624T120211.safe --creds_file <path_to_creds_file> --target_path <where_to_download>
In case the TCI isn’t found, download the full product and convert the data to TCI, as seen in Dataset conversion.
Run the model
First import the required packages. sahi is used for tiled inference.
Set the model path, data path, device and other parameters:
= 'cuda' if torch.cuda.is_available() else 'cpu'
device = True
half = '../revision_models/yolo11s_Adam/tci_pt/weights/best.pt' # Location for the checkpoint to use
model_path = 0.286 # confidence threshold with highest validation F1-score conf_th
Then initialize UltralyticsDetectionModel
and overrides for the model.
= UltralyticsDetectionModel(model_path=model_path,
det_model =device,
device=conf_th,
confidence_threshold=640)
image_size
'half': half}) det_model.model.overrides.update({
Now get_sliced_prediction
can be used to perform tiled inference for the tile:
= '../tcil1c_test/34VEN/20220624.tif'
tile
= get_sliced_prediction(tile, det_model, slice_width=320, slice_height=320,
sliced_pred_results =0.2, overlap_width_ratio=0.2, perform_standard_pred=False,
overlap_height_ratio=2) verbose
Performing prediction on 1677 slices.
Slicing performed in 2.2600011825561523 seconds.
Prediction performed in 51.64028072357178 seconds.
sliced_pred_results
contains object_prediction_list
, which can then be georeferenced with georef_sahi_preds
:
= georef_sahi_preds(preds=sliced_pred_results.object_prediction_list, path_to_ref_img=tile)
georef_preds 'type'] = 'boat'
georef_preds[ georef_preds.head()
label | geometry | score | type | |
---|---|---|---|---|
0 | 0 | POLYGON ((545705 6706471.25, 545705 6706428.75... | 0.737305 | boat |
1 | 0 | POLYGON ((515035 6774110, 515035 6773943.75, 5... | 0.725098 | boat |
2 | 0 | POLYGON ((524502.5 6779197.5, 524502.5 6779155... | 0.705566 | boat |
3 | 0 | POLYGON ((518555 6740170.625, 518555 6740105.6... | 0.697754 | boat |
4 | 0 | POLYGON ((515991.875 6735191.875, 515991.875 6... | 0.692871 | boat |
Plot the data:
='none', edgecolor='red') georef_preds.plot(facecolor
Cleaning stationary targets and other obvious false positives from the data
from src.data_paths import *
At this step, the data contains all the targets the model considered to be a marine vessel. However, there are some amount of stationary non-vessel targets that look exactly like boats, as well as possibly some targets that are not on water. These are fairly straightforward to filter from the data using external datasets. Most of the cleaning is done based on Topographic Database by the National Land Survey of Finland, while the beacons and other similar targets can be cleaned using the data fron the Finnish Transport Infrastructure Agency.
Predictions not on water
This step filters the detections whose centroid points are on land. The layers jarvi
(lakes), meri
(sea) and virtavesialue
(rivers etc) are used here.
In order to speed up the processing, these data are already extracted from the topographic database.
= gpd.read_file(f'{PATH_TO_MTK_DATA}/34VEN_waters.gpkg', layer='waters').to_crs(georef_preds.crs) waters
Filtering is done by
- changing the geometries of the predictions to their centroid points
- keep only the geometries which are located on water
'centroids'] = georef_preds.centroid
georef_preds['centroids', inplace=True)
georef_preds.set_geometry(= georef_preds.sjoin(waters, how='inner', predicate='within')[['label', 'score', 'geometry', 'centroids', 'type']]
joined ~georef_preds.index.isin(joined.index), 'type'] = 'land' georef_preds.loc[
Predictions on above water rock areas
There are two types of above water rock reference geometries: area (layer vesikivikko
) and points (few target classes in layer vesikivi
). Both are filtered from the data, but the point data is filtered at a later step.
vesikivikko
works same way than filtering based on water:
= tuple(georef_preds.to_crs('EPSG:3067').total_bounds) # Bounds in EPSG:3067 for faster reading
tot_bounds_3067 = gpd.read_file(PATH_TO_OTHER, layer='vesikivikko', bbox=tot_bounds_3067).dissolve(by='kohdeluokka')
rocks = rocks.to_crs(georef_preds.crs)
rocks
= gpd.sjoin(georef_preds, rocks, predicate='within', how='inner')[['label', 'score', 'geometry', 'centroids', 'type']]
joined 'type'] = 'area_rocks' georef_preds.loc[georef_preds.index.isin(joined.index),
Fisheries
There is no available data for the locations of fisheries, so those data are manually derived from aerial images and satellite data.
= gpd.read_file(PATH_TO_FISHERIES, bbox=tot_bounds_3067)
fisheries = rocks.to_crs(georef_preds.crs)
fisheries
= gpd.sjoin(georef_preds, fisheries, predicate='within', how='inner')[['label', 'score', 'geometry', 'centroids', 'type']]
joined 'type'] = 'fisheries' georef_preds.loc[georef_preds.index.isin(joined.index),
Individual large rocks
Now the geometry can be reverted to bounding boxes.
= georef_preds.set_geometry('geometry')[['label', 'score', 'geometry', 'type']] georef_preds
Next we drop the detections that contain rocks above waterline within them
= gpd.read_file(PATH_TO_OTHER, layer='vesikivi', bbox=tot_bounds_3067).to_crs(georef_preds.crs)
above_water_rocks = above_water_rocks[above_water_rocks.kohdeluokka.isin([38511,38512,38513])]
above_water_rocks any(g.contains(above_water_rocks.geometry)) for g in georef_preds.geometry), 'type'] = 'rock' georef_preds.loc[(
Beacons etc
Sector lights, lighthouses and such beacons are visible from satellite images, and are confused with small vessels. The data is available from The Finnish Transport Infrastructure Agency.
= gpd.read_file(PATH_TO_BEACONS, tot_bounds_3067).to_crs(georef_preds.crs)
beacons = beacons[beacons.ty_jnr.isin([1,2,3,4,5,8])]
beacons any(g.contains(beacons.geometry)) for g in georef_preds.geometry), 'type'] = 'beacon' georef_preds.loc[(
/home/mayrajeo/miniconda3/envs/ml-env/lib/python3.11/site-packages/pyogrio/raw.py:198: UserWarning: Measured (M) geometry types are not supported. Original type 'Measured 3D Point' is converted to 'Point Z'
return ogr_read(
Filtering based on size
Final sanity check is to drop the predictions that are obviously too large. Here we use 750 meters as the threshold for the longer side of the bounding box. These kind of errors are extremely rare.
def get_longer_edge(geom:shapely.geometry.Polygon) -> float:
= shapely.geometry.box(*geom.bounds).exterior.coords.xy
x, y = (shapely.geometry.Point(x[0],y[0]).distance(shapely.geometry.Point(x[1],y[1])),
edge_lengths 1],y[1]).distance(shapely.geometry.Point(x[2],y[2])))
shapely.geometry.Point(x[return max(edge_lengths)
'max_edge'] = georef_preds.geometry.apply(get_longer_edge)
georef_preds[>= 750, 'type'] = 'size' georef_preds.loc[georef_preds.max_edge
Plot the cleaned data
Finally plot how many false positives were cleaned from the predictions:
= plt.subplots(1,2, dpi=200, figsize=(10,6))
fig, axs
with rio.open(tile) as src:
=axs[0])
rioplot.show(src, ax=axs[1])
rioplot.show(src, ax
'type'] == 'boat'].plot(ax=axs[0], column='type', facecolor='none')
georef_preds[georef_preds['type'] != 'boat'].plot(ax=axs[1], column='type', linewidth=1.5, facecolor='none', legend=True, cmap='gist_rainbow')
georef_preds[georef_preds[0].set_title('Marine vessels')
axs[1].set_title(f"False positives (n: {len(georef_preds[georef_preds['type'] != 'boat'])})")
axs[ plt.show()