Dataset creation
Overview
geo2ml provides the following commands for creating datasets from geospatial raster and vector data.
geo2ml_sample_pointsgeo2ml_sample_polygonsgeo2ml_create_raster_datasetgeo2ml_create_yolo_datasetgeo2ml_create_coco_dataset
These commands can be either used from CLI using geo2ml_ -prefixed commands, or used in python scripts or notebooks like
from geo2ml.scripts.data import sample_points
sampling_locations = Path(<path_to_locations>)
input_raster = Path(<path_to_raster>)
target_column = 'column'
outpath = Path(<path_to_save_files>)
sample_points(sampling_locations, input_raster, target_column, outpath)Tabular datasets
Both of these commands create a dataset by sampling point or polygon values provided in sampling_locations from input_raster and save the resulting table as a csv and geojson or shapfile to outpath.
sample_points
sample_points (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str, outpath:pathlib.Path, gpkg_layer:str=None, save_as_shp:bool=False, rename_target:str=None, band_names:pathlib.Path=None, dropna_value:int=None, out_prefix:str='')
Sample pixel values from input_raster using sampling_locations
| Type | Default | Details | |
|---|---|---|---|
| sampling_locations | Path | Path to the geojson/shapefile containing the sampling locations as points | |
| input_raster | Path | Path to the raster used for sampling | |
| target_column | str | Column of sampling_locations used as the target |
|
| outpath | Path | Path to save the output files. Is created if doesn’t exist | |
| gpkg_layer | str | None | If sampling_locations is .gpkg, specify the layer used. Ignored otherwise. |
| save_as_shp | bool | False | Save results as shapefiles? If False, saves as geojson |
| rename_target | str | None | If provided, target column is renamed to this |
| band_names | Path | None | Path to a file providing bands to use as rows |
| dropna_value | int | None | Drop all rows with all values equal to this value |
| out_prefix | str | Prefix for outputs |
sample_polygons
sample_polygons (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str, outpath:pathlib.Path, min:bool, max:bool, mean:bool, count:bool, sum:bool, std:bool, median:bool, categorical:bool=False, gpkg_layer:str=None, save_as_shp:bool=False, rename_target:str=None, band_names:pathlib.Path=None, dropna_value:int=None, out_prefix:str='')
Sample pixel values from input_raster using sampling_locations.
| Type | Default | Details | |
|---|---|---|---|
| sampling_locations | Path | Path to the geojson/shapefile containing the sampling locations as polygons | |
| input_raster | Path | Path to the raster used for sampling | |
| target_column | str | Column of sampling_locations used for sampling |
|
| outpath | Path | Path to save the output files. Is created if doesn’t exist | |
| min | bool | Compute minimum | |
| max | bool | Compute maximum | |
| mean | bool | Compute mean | |
| count | bool | Compute count | |
| sum | bool | Compute sum | |
| std | bool | Compute standard deviation | |
| median | bool | Compute median | |
| categorical | bool | False | Are bands categorical data? |
| gpkg_layer | str | None | If sampling_locations is .gpkg, specify the layer used. Ignored otherwise. |
| save_as_shp | bool | False | Save results as shapefiles? If False, saves as geojson |
| rename_target | str | None | If provided, target column is renamed to this |
| band_names | Path | None | Path to a file providing bands to use as rows |
| dropna_value | int | None | Drop all rows with all values equal to this value |
| out_prefix | str | Prefix for outputs |
Computer vision dataset creation
create_raster_dataset
create_raster_dataset (raster_path:pathlib.Path, mask_path:pathlib.Path, outpath:pathlib.Path, save_grid:bool=False, allow_partial_data:bool=False, keep_bg_only:bool=False, target_column:str=None, gpkg_layer:str=None, gridsize_x:int=256, gridsize_y:int=256, overlap_x:int=0, overlap_y:int=0)
Create a semantic segmentation dataset from a raster_path and corresponding mask mask_path. Raster image patches are saved to outpath/raster_tiles and mask patches to outpath/mask_tiles
| Type | Default | Details | |
|---|---|---|---|
| raster_path | Path | Path to a raster that is the base for the dataset | |
| mask_path | Path | Path to corresponding mask raster or polygon layer. Must have the same extent and resolution as the raster in raster_path |
|
| outpath | Path | Where to save the results | |
| save_grid | bool | False | Whether to save the tiling grid |
| allow_partial_data | bool | False | Whether to create tiles that have only partial data |
| keep_bg_only | bool | False | Keep the mask chips that contain only the background class |
| target_column | str | None | If mask_path contains vector data, identifier of the column containing the class information |
| gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
| gridsize_x | int | 256 | Size of tiles in x-axis in pixels |
| gridsize_y | int | 256 | Size of tiles in y-axis in pixels |
| overlap_x | int | 0 | Overlap of tiles in x-axis in pixels |
| overlap_y | int | 0 | Overlap of tiles in y-axis in pixels |
create_coco_dataset
create_coco_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path, target_column:str, outpath:pathlib.Path, dataset_name:str, gpkg_layer:str=None, min_area_pct:float=0.0, output_format:str='geojson', save_grid:bool=False, allow_partial_data:bool=False, gridsize_x:int=320, gridsize_y:int=320, overlap_x:int=0, overlap_y:int=0, ann_format:str='box', min_bbox_area:int=0)
Create a COCO-format dataset from raster and polygon shapefile
| Type | Default | Details | |
|---|---|---|---|
| raster_path | Path | Path to a raster that is the base for the dataset | |
| polygon_path | Path | Path to annotated polygons | |
| target_column | str | Which column contains class information | |
| outpath | Path | Where to save the resulting files | |
| dataset_name | str | Name of the dataset | |
| gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
| min_area_pct | float | 0.0 | How small polygons keep after tiling? |
| output_format | str | geojson | Which format to use for saving, either ‘geojson’ or ‘gpkg’ |
| save_grid | bool | False | Should tiling grid be saved |
| allow_partial_data | bool | False | Whether to create tiles that have only partial image data |
| gridsize_x | int | 320 | Size of tiles in x-axis in pixels |
| gridsize_y | int | 320 | Size of tiles in y-axis in pixels |
| overlap_x | int | 0 | Overlap of tiles in x-axis in pixels |
| overlap_y | int | 0 | Overlap of tiles in y-axis in pixels |
| ann_format | str | box | Annotation format, either box, polygon or rotated box |
| min_bbox_area | int | 0 | Minimum bounding gox area in pixels. Smaller objects than this are discarded |
create_yolo_dataset
create_yolo_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path, target_column:str, outpath:pathlib.Path, dataset_name:str=None, gpkg_layer:str=None, min_area_pct:float=0.0, output_format:str='geojson', save_grid:bool=False, allow_partial_data:bool=False, gridsize_x:int=320, gridsize_y:int=320, overlap_x:int=0, overlap_y:int=0, ann_format:str='box', min_bbox_area:int=0)
Create a YOLO-format dataset from raster and polygon shapefile
| Type | Default | Details | |
|---|---|---|---|
| raster_path | Path | Path to a raster that is the base for the dataset | |
| polygon_path | Path | Path to annotated polygons | |
| target_column | str | Which column contains class information | |
| outpath | Path | Where to save the resulting files? | |
| dataset_name | str | None | Optional name of the dataset |
| gpkg_layer | str | None | If polygon_path is a geopackage, specify the layer used. Ignored otherwise. |
| min_area_pct | float | 0.0 | How small polygons keep after tiling? |
| output_format | str | geojson | Which format to use for saving, either ‘geojson’ or ‘gpkg’ |
| save_grid | bool | False | Should tiling grid be saved |
| allow_partial_data | bool | False | Whether to create tiles that have only partial image data |
| gridsize_x | int | 320 | Size of tiles in x-axis, pixels |
| gridsize_y | int | 320 | Size fo tiles in y-axis, pixels |
| overlap_x | int | 0 | Overlap of tiles in x-axis |
| overlap_y | int | 0 | Overlap of tiles in y-axis |
| ann_format | str | box | Annotation format, either box, polygon or rotated box |
| min_bbox_area | int | 0 | Minimum bounding box area in pixels. Smaller objects than this are discarded |