geo2ml
  1. CLI
  2. Dataset creation
  • geo2ml
  • Examples
    • Tabular data workflow
    • Unet workflow
    • COCO workflow
    • YOLOv8 workflow
  • Tabular data
    • Tabular data
  • Image data
    • Tiling
    • Coordinate transformations
    • Image data
    • Postprocessing
  • Plotting
  • CLI
    • Dataset creation

On this page

  • Overview
  • Tabular datasets
    • sample_points
    • sample_polygons
  • Computer vision dataset creation
    • create_raster_dataset
    • create_coco_dataset
    • create_yolo_dataset
  • Report an issue
  1. CLI
  2. Dataset creation

Dataset creation

CLI commands for creating different types of datasets from remote sensing data

Overview

geo2ml provides the following commands for creating datasets from geospatial raster and vector data.

  • geo2ml_sample_points
  • geo2ml_sample_polygons
  • geo2ml_create_raster_dataset
  • geo2ml_create_yolo_dataset
  • geo2ml_create_coco_dataset

These commands can be either used from CLI using geo2ml_ -prefixed commands, or used in python scripts or notebooks like

from geo2ml.scripts.data import sample_points

sampling_locations = Path(<path_to_locations>)
input_raster = Path(<path_to_raster>)
target_column = 'column'
outpath = Path(<path_to_save_files>)

sample_points(sampling_locations, input_raster, target_column, outpath)

Tabular datasets

Both of these commands create a dataset by sampling point or polygon values provided in sampling_locations from input_raster and save the resulting table as a csv and geojson or shapfile to outpath.


source

sample_points

 sample_points (sampling_locations:pathlib.Path,
                input_raster:pathlib.Path, target_column:str,
                outpath:pathlib.Path, gpkg_layer:str=None,
                save_as_shp:bool=False, rename_target:str=None,
                band_names:pathlib.Path=None, dropna_value:int=None,
                out_prefix:str='')

Sample pixel values from input_raster using sampling_locations

Type Default Details
sampling_locations Path Path to the geojson/shapefile containing the sampling locations as points
input_raster Path Path to the raster used for sampling
target_column str Column of sampling_locations used as the target
outpath Path Path to save the output files. Is created if doesn’t exist
gpkg_layer str None If sampling_locations is .gpkg, specify the layer used. Ignored otherwise.
save_as_shp bool False Save results as shapefiles? If False, saves as geojson
rename_target str None If provided, target column is renamed to this
band_names Path None Path to a file providing bands to use as rows
dropna_value int None Drop all rows with all values equal to this value
out_prefix str Prefix for outputs

source

sample_polygons

 sample_polygons (sampling_locations:pathlib.Path,
                  input_raster:pathlib.Path, target_column:str,
                  outpath:pathlib.Path, min:bool, max:bool, mean:bool,
                  count:bool, sum:bool, std:bool, median:bool,
                  categorical:bool=False, gpkg_layer:str=None,
                  save_as_shp:bool=False, rename_target:str=None,
                  band_names:pathlib.Path=None, dropna_value:int=None,
                  out_prefix:str='')

Sample pixel values from input_raster using sampling_locations.

Type Default Details
sampling_locations Path Path to the geojson/shapefile containing the sampling locations as polygons
input_raster Path Path to the raster used for sampling
target_column str Column of sampling_locations used for sampling
outpath Path Path to save the output files. Is created if doesn’t exist
min bool Compute minimum
max bool Compute maximum
mean bool Compute mean
count bool Compute count
sum bool Compute sum
std bool Compute standard deviation
median bool Compute median
categorical bool False Are bands categorical data?
gpkg_layer str None If sampling_locations is .gpkg, specify the layer used. Ignored otherwise.
save_as_shp bool False Save results as shapefiles? If False, saves as geojson
rename_target str None If provided, target column is renamed to this
band_names Path None Path to a file providing bands to use as rows
dropna_value int None Drop all rows with all values equal to this value
out_prefix str Prefix for outputs

Computer vision dataset creation


source

create_raster_dataset

 create_raster_dataset (raster_path:pathlib.Path, mask_path:pathlib.Path,
                        outpath:pathlib.Path, save_grid:bool=False,
                        allow_partial_data:bool=False,
                        keep_bg_only:bool=False, target_column:str=None,
                        gpkg_layer:str=None, gridsize_x:int=256,
                        gridsize_y:int=256, overlap_x:int=0,
                        overlap_y:int=0)

Create a semantic segmentation dataset from a raster_path and corresponding mask mask_path. Raster image patches are saved to outpath/raster_tiles and mask patches to outpath/mask_tiles

Type Default Details
raster_path Path Path to a raster that is the base for the dataset
mask_path Path Path to corresponding mask raster or polygon layer. Must have the same extent and resolution as the raster in raster_path
outpath Path Where to save the results
save_grid bool False Whether to save the tiling grid
allow_partial_data bool False Whether to create tiles that have only partial data
keep_bg_only bool False Keep the mask chips that contain only the background class
target_column str None If mask_path contains vector data, identifier of the column containing the class information
gpkg_layer str None If polygon_path is a geopackage, specify the layer used. Ignored otherwise.
gridsize_x int 256 Size of tiles in x-axis in pixels
gridsize_y int 256 Size of tiles in y-axis in pixels
overlap_x int 0 Overlap of tiles in x-axis in pixels
overlap_y int 0 Overlap of tiles in y-axis in pixels

source

create_coco_dataset

 create_coco_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path,
                      target_column:str, outpath:pathlib.Path,
                      dataset_name:str, gpkg_layer:str=None,
                      min_area_pct:float=0.0, output_format:str='geojson',
                      save_grid:bool=False, allow_partial_data:bool=False,
                      gridsize_x:int=320, gridsize_y:int=320,
                      overlap_x:int=0, overlap_y:int=0,
                      ann_format:str='box', min_bbox_area:int=0)

Create a COCO-format dataset from raster and polygon shapefile

Type Default Details
raster_path Path Path to a raster that is the base for the dataset
polygon_path Path Path to annotated polygons
target_column str Which column contains class information
outpath Path Where to save the resulting files
dataset_name str Name of the dataset
gpkg_layer str None If polygon_path is a geopackage, specify the layer used. Ignored otherwise.
min_area_pct float 0.0 How small polygons keep after tiling?
output_format str geojson Which format to use for saving, either ‘geojson’ or ‘gpkg’
save_grid bool False Should tiling grid be saved
allow_partial_data bool False Whether to create tiles that have only partial image data
gridsize_x int 320 Size of tiles in x-axis in pixels
gridsize_y int 320 Size of tiles in y-axis in pixels
overlap_x int 0 Overlap of tiles in x-axis in pixels
overlap_y int 0 Overlap of tiles in y-axis in pixels
ann_format str box Annotation format, either box, polygon or rotated box
min_bbox_area int 0 Minimum bounding gox area in pixels. Smaller objects than this are discarded

source

create_yolo_dataset

 create_yolo_dataset (raster_path:pathlib.Path, polygon_path:pathlib.Path,
                      target_column:str, outpath:pathlib.Path,
                      dataset_name:str=None, gpkg_layer:str=None,
                      min_area_pct:float=0.0, output_format:str='geojson',
                      save_grid:bool=False, allow_partial_data:bool=False,
                      gridsize_x:int=320, gridsize_y:int=320,
                      overlap_x:int=0, overlap_y:int=0,
                      ann_format:str='box', min_bbox_area:int=0)

Create a YOLO-format dataset from raster and polygon shapefile

Type Default Details
raster_path Path Path to a raster that is the base for the dataset
polygon_path Path Path to annotated polygons
target_column str Which column contains class information
outpath Path Where to save the resulting files?
dataset_name str None Optional name of the dataset
gpkg_layer str None If polygon_path is a geopackage, specify the layer used. Ignored otherwise.
min_area_pct float 0.0 How small polygons keep after tiling?
output_format str geojson Which format to use for saving, either ‘geojson’ or ‘gpkg’
save_grid bool False Should tiling grid be saved
allow_partial_data bool False Whether to create tiles that have only partial image data
gridsize_x int 320 Size of tiles in x-axis, pixels
gridsize_y int 320 Size fo tiles in y-axis, pixels
overlap_x int 0 Overlap of tiles in x-axis
overlap_y int 0 Overlap of tiles in y-axis
ann_format str box Annotation format, either box, polygon or rotated box
min_bbox_area int 0 Minimum bounding box area in pixels. Smaller objects than this are discarded
  • Report an issue