Dataset creation
Overview
geo2ml
provides the following commands for creating datasets from geospatial raster and vector data.
geo2ml_sample_points
geo2ml_sample_polygons
geo2ml_create_raster_dataset
geo2ml_create_yolo_dataset
geo2ml_create_coco_dataset
These commands can be either used from CLI using geo2ml_
-prefixed commands, or used in python scripts or notebooks like
from geo2ml.scripts.data import sample_points
= Path(<path_to_locations>)
sampling_locations = Path(<path_to_raster>)
input_raster = 'column'
target_column = Path(<path_to_save_files>)
outpath
sample_points(sampling_locations, input_raster, target_column, outpath)
Tabular datasets
Both of these commands create a dataset by sampling point or polygon values provided in sampling_locations
from input_raster
and save the resulting table as a csv and geojson or shapfile to outpath
.
sample_points
sample_points (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str, outpath:pathlib.Path, gpkg_layer:str=None, save_as_shp:bool=False, rename_target:str=None, band_names:pathlib.Path=None, dropna_value:int=None, out_prefix:str='')
Sample pixel values from input_raster
using sampling_locations
Type | Default | Details | |
---|---|---|---|
sampling_locations | Path | Path to the geojson/shapefile containing the sampling locations as points | |
input_raster | Path | Path to the raster used for sampling | |
target_column | str | Column of sampling_locations used as the target |
|
outpath | Path | Path to save the output files. Is created if doesn’t exist | |
gpkg_layer | str | None | If sampling_locations is .gpkg, specify the layer used. Ignored otherwise. |
save_as_shp | bool | False | Save results as shapefiles? If False, saves as geojson |
rename_target | str | None | If provided, target column is renamed to this |
band_names | Path | None | Path to a file providing bands to use as rows |
dropna_value | int | None | Drop all rows with all values equal to this value |
out_prefix | str | Prefix for outputs |
sample_polygons
sample_polygons (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str, outpath:pathlib.Path, min:bool, max:bool, mean:bool, count:bool, sum:bool, std:bool, median:bool, categorical:bool=False, gpkg_layer:str=None, save_as_shp:bool=False, rename_target:str=None, band_names:pathlib.Path=None, dropna_value:int=None, out_prefix:str='')
Sample pixel values from input_raster
using sampling_locations
.
Type | Default | Details | |
---|---|---|---|
sampling_locations | Path | Path to the geojson/shapefile containing the sampling locations as polygons | |
input_raster | Path | Path to the raster used for sampling | |
target_column | str | Column of sampling_locations used for sampling |
|
outpath | Path | Path to save the output files. Is created if doesn’t exist | |
min | bool | Compute minimum | |
max | bool | Compute maximum | |
mean | bool | Compute mean | |
count | bool | Compute count | |
sum | bool | Compute sum | |
std | bool | Compute standard deviation | |
median | bool | Compute median | |
categorical | bool | False | Are bands categorical data? |
gpkg_layer | str | None | If sampling_locations is .gpkg, specify the layer used. Ignored otherwise. |
save_as_shp | bool | False | Save results as shapefiles? If False, saves as geojson |
rename_target | str | None | If provided, target column is renamed to this |
band_names | Path | None | Path to a file providing bands to use as rows |
dropna_value | int | None | Drop all rows with all values equal to this value |
out_prefix | str | Prefix for outputs |
Computer vision dataset creation
create_raster_dataset
create_raster_dataset (raster_path:pathlib.Path, mask_path:pathlib.Path, outpath:pathlib.Path, save_grid:bool=False, allow_partial_data:bool=False, keep_bg_only:bool=False, target_column:str=None, gpkg_layer:str=None, gridsize_x:int=256, gridsize_y:int=256, overlap_x:int=0, overlap_y:int=0)
Create a semantic segmentation dataset from a raster_path
and corresponding mask mask_path
. Raster image patches are saved to outpath/raster_tiles
and mask patches to outpath/mask_tiles
Type | Default | Details | |
---|---|---|---|
raster_path | Path | Path to a raster that is the base for the dataset | |
mask_path | Path | Path to corresponding mask raster or polygon layer. Must have the same extent and resolution as the raster in raster_path |
|
outpath | Path | Where to save the results | |
save_grid | bool | False | Whether to save the tiling grid |
allow_partial_data | bool | False | Whether to create tiles that have only partial data |
keep_bg_only | bool | False | Keep the mask chips that contain only the background class |
target_column | str | None | If mask_path contains vector data, identifier of the column containing the class information |
gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
gridsize_x | int | 256 | Size of tiles in x-axis in pixels |
gridsize_y | int | 256 | Size of tiles in y-axis in pixels |
overlap_x | int | 0 | Overlap of tiles in x-axis in pixels |
overlap_y | int | 0 | Overlap of tiles in y-axis in pixels |
create_coco_dataset
create_coco_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path, target_column:str, outpath:pathlib.Path, dataset_name:str, gpkg_layer:str=None, min_area_pct:float=0.0, output_format:str='geojson', save_grid:bool=False, allow_partial_data:bool=False, gridsize_x:int=320, gridsize_y:int=320, overlap_x:int=0, overlap_y:int=0, ann_format:str='box', min_bbox_area:int=0)
Create a COCO-format dataset from raster
and polygon
shapefile
Type | Default | Details | |
---|---|---|---|
raster_path | Path | Path to a raster that is the base for the dataset | |
polygon_path | Path | Path to annotated polygons | |
target_column | str | Which column contains class information | |
outpath | Path | Where to save the resulting files | |
dataset_name | str | Name of the dataset | |
gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
min_area_pct | float | 0.0 | How small polygons keep after tiling? |
output_format | str | geojson | Which format to use for saving, either ‘geojson’ or ‘gpkg’ |
save_grid | bool | False | Should tiling grid be saved |
allow_partial_data | bool | False | Whether to create tiles that have only partial image data |
gridsize_x | int | 320 | Size of tiles in x-axis in pixels |
gridsize_y | int | 320 | Size of tiles in y-axis in pixels |
overlap_x | int | 0 | Overlap of tiles in x-axis in pixels |
overlap_y | int | 0 | Overlap of tiles in y-axis in pixels |
ann_format | str | box | Annotation format, either box, polygon or rotated box |
min_bbox_area | int | 0 | Minimum bounding gox area in pixels. Smaller objects than this are discarded |
create_yolo_dataset
create_yolo_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path, target_column:str, outpath:pathlib.Path, dataset_name:str=None, gpkg_layer:str=None, min_area_pct:float=0.0, output_format:str='geojson', save_grid:bool=False, allow_partial_data:bool=False, gridsize_x:int=320, gridsize_y:int=320, overlap_x:int=0, overlap_y:int=0, ann_format:str='box', min_bbox_area:int=0)
Create a YOLO-format dataset from raster
and polygon
shapefile
Type | Default | Details | |
---|---|---|---|
raster_path | Path | Path to a raster that is the base for the dataset | |
polygon_path | Path | Path to annotated polygons | |
target_column | str | Which column contains class information | |
outpath | Path | Where to save the resulting files? | |
dataset_name | str | None | Optional name of the dataset |
gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
min_area_pct | float | 0.0 | How small polygons keep after tiling? |
output_format | str | geojson | Which format to use for saving, either ‘geojson’ or ‘gpkg’ |
save_grid | bool | False | Should tiling grid be saved |
allow_partial_data | bool | False | Whether to create tiles that have only partial image data |
gridsize_x | int | 320 | Size of tiles in x-axis, pixels |
gridsize_y | int | 320 | Size fo tiles in y-axis, pixels |
overlap_x | int | 0 | Overlap of tiles in x-axis |
overlap_y | int | 0 | Overlap of tiles in y-axis |
ann_format | str | box | Annotation format, either box, polygon or rotated box |
min_bbox_area | int | 0 | Minimum bounding box area in pixels. Smaller objects than this are discarded |