geo2ml
  1. CLI
  2. Dataset creation
  • geo2ml
  • Examples
    • Tabular data workflow
    • Unet workflow
    • COCO workflow
    • YOLOv8 workflow
  • Tabular data
    • Tabular data
  • Image data
    • Tiling
    • Coordinate transformations
    • Image data
    • Postprocessing
  • Plotting
  • CLI
    • Dataset creation

On this page

  • Overview
  • Tabular datasets
    • sample_points
    • sample_polygons
  • Computer vision dataset creation
    • create_raster_dataset
    • create_coco_dataset
    • create_yolo_dataset
  • Report an issue
  1. CLI
  2. Dataset creation

Dataset creation

CLI commands for creating different types of datasets from remote sensing data

Overview

geo2ml provides the following commands for creating datasets from geospatial raster and vector data.

  • geo2ml_sample_points
  • geo2ml_sample_polygons
  • geo2ml_create_raster_dataset
  • geo2ml_create_yolo_dataset
  • geo2ml_create_coco_dataset

These commands can be either used from CLI using geo2ml_ -prefixed commands, or used in python scripts or notebooks like

from geo2ml.scripts.data import sample_points

sampling_locations = Path(<path_to_locations>)
input_raster = Path(<path_to_raster>)
target_column = 'column'
outpath = Path(<path_to_save_files>)

sample_points(sampling_locations, input_raster, target_column, outpath)

Tabular datasets

Both of these commands create a dataset by sampling point or polygon values provided in sampling_locations from input_raster and save the resulting table as a csv and geojson or shapfile to outpath.


source

sample_points


def sample_points(
    sampling_locations:Path, # Path to the geojson/shapefile containing the sampling locations as points
    input_raster:Path, # Path to the raster used for sampling
    target_column:str, # Column of `sampling_locations` used as the target
    outpath:Path, # Path to save the output files. Is created if doesn't exist
    gpkg_layer:str=None, # If `sampling_locations` is .gpkg, specify the layer used. Ignored otherwise.
    save_as_shp:bool=False, # Save results as shapefiles? If False, saves as geojson
    rename_target:str=None, # If provided, target column is renamed to this
    band_names:Path=None, # Path to a file providing bands to use as rows
    dropna_value:int=None, # Drop all rows with all values equal to this value
    out_prefix:str='', # Prefix for outputs
):

Sample pixel values from input_raster using sampling_locations


source

sample_polygons


def sample_polygons(
    sampling_locations:Path, # Path to the geojson/shapefile containing the sampling locations as polygons
    input_raster:Path, # Path to the raster used for sampling
    target_column:str, # Column of `sampling_locations` used for sampling
    outpath:Path, # Path to save the output files. Is created if doesn't exist
    min:bool, # Compute minimum
    max:bool, # Compute maximum
    mean:bool, # Compute mean
    count:bool, # Compute count
    sum:bool, # Compute sum
    std:bool, # Compute standard deviation
    median:bool, # Compute median
    categorical:bool=False, # Are bands categorical data?
    gpkg_layer:str=None, # If `sampling_locations` is .gpkg, specify the layer used. Ignored otherwise.
    save_as_shp:bool=False, # Save results as shapefiles? If False, saves as geojson
    rename_target:str=None, # If provided, target column is renamed to this
    band_names:Path=None, # Path to a file providing bands to use as rows
    dropna_value:int=None, # Drop all rows with all values equal to this value
    out_prefix:str='', # Prefix for outputs
):

Sample pixel values from input_raster using sampling_locations.

Computer vision dataset creation


source

create_raster_dataset


def create_raster_dataset(
    raster_path:Path, # Path to a raster that is the base for the dataset
    mask_path:Path, # Path to corresponding mask raster or polygon layer. Must have the same extent and resolution as the raster in `raster_path`
    outpath:Path, # Where to save the results
    save_grid:bool=False, # Whether to save the tiling grid
    allow_partial_data:bool=False, # Whether to create tiles that have only partial data
    keep_bg_only:bool=False, # Keep the mask chips that contain only the background class
    target_column:str=None, # If mask_path contains vector data, identifier of the column containing the class information
    gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
    gridsize_x:int=256, # Size of tiles in x-axis in pixels
    gridsize_y:int=256, # Size of tiles in y-axis in pixels
    overlap_x:int=0, # Overlap of tiles in x-axis in pixels
    overlap_y:int=0, # Overlap of tiles in y-axis in pixels
):

Create a semantic segmentation dataset from a raster_path and corresponding mask mask_path. Raster image patches are saved to outpath/raster_tiles and mask patches to outpath/mask_tiles


source

create_coco_dataset


def create_coco_dataset(
    raster_path:Path, # Path to a raster that is the base for the dataset
    polygon_path:Path, # Path to annotated polygons
    target_column:str, # Which column contains class information
    outpath:Path, # Where to save the resulting files
    dataset_name:str, # Name of the dataset
    gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
    min_area_pct:float=0.0, # How small polygons keep after tiling?
    output_format:str='geojson', # Which format to use for saving, either 'geojson' or 'gpkg'
    save_grid:bool=False, # Should tiling grid be saved
    allow_partial_data:bool=False, # Whether to create tiles that have only partial image data
    gridsize_x:int=320, # Size of tiles in x-axis in pixels
    gridsize_y:int=320, # Size of tiles in y-axis in pixels
    overlap_x:int=0, # Overlap of tiles in x-axis in pixels
    overlap_y:int=0, # Overlap of tiles in y-axis in pixels
    ann_format:str='box', # Annotation format, either box, polygon or rotated box
    min_bbox_area:int=0, # Minimum bounding gox area in pixels. Smaller objects than this are discarded
):

Create a COCO-format dataset from raster and polygon shapefile


source

create_yolo_dataset


def create_yolo_dataset(
    raster_path:Path, # Path to a raster that is the base for the dataset
    polygon_path:Path, # Path to annotated polygons
    target_column:str, # Which column contains class information
    outpath:Path, # Where to save the resulting files?
    dataset_name:str=None, # Optional name of the dataset
    gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
    min_area_pct:float=0.0, # How small polygons keep after tiling?
    output_format:str='geojson', # Which format to use for saving, either 'geojson' or 'gpkg'
    save_grid:bool=False, # Should tiling grid be saved
    allow_partial_data:bool=False, # Whether to create tiles that have only partial image data
    gridsize_x:int=320, # Size of tiles in x-axis, pixels
    gridsize_y:int=320, # Size fo tiles in y-axis, pixels
    overlap_x:int=0, # Overlap of tiles in x-axis
    overlap_y:int=0, # Overlap of tiles in y-axis
    ann_format:str='box', # Annotation format, either box, polygon or rotated box
    min_bbox_area:int=0, # Minimum bounding box area in pixels. Smaller objects than this are discarded
):

Create a YOLO-format dataset from raster and polygon shapefile

  • Report an issue