Dataset creation
Overview
geo2ml provides the following commands for creating datasets from geospatial raster and vector data.
geo2ml_sample_pointsgeo2ml_sample_polygonsgeo2ml_create_raster_datasetgeo2ml_create_yolo_datasetgeo2ml_create_coco_dataset
These commands can be either used from CLI using geo2ml_ -prefixed commands, or used in python scripts or notebooks like
from geo2ml.scripts.data import sample_points
sampling_locations = Path(<path_to_locations>)
input_raster = Path(<path_to_raster>)
target_column = 'column'
outpath = Path(<path_to_save_files>)
sample_points(sampling_locations, input_raster, target_column, outpath)Tabular datasets
Both of these commands create a dataset by sampling point or polygon values provided in sampling_locations from input_raster and save the resulting table as a csv and geojson or shapfile to outpath.
sample_points
def sample_points(
sampling_locations:Path, # Path to the geojson/shapefile containing the sampling locations as points
input_raster:Path, # Path to the raster used for sampling
target_column:str, # Column of `sampling_locations` used as the target
outpath:Path, # Path to save the output files. Is created if doesn't exist
gpkg_layer:str=None, # If `sampling_locations` is .gpkg, specify the layer used. Ignored otherwise.
save_as_shp:bool=False, # Save results as shapefiles? If False, saves as geojson
rename_target:str=None, # If provided, target column is renamed to this
band_names:Path=None, # Path to a file providing bands to use as rows
dropna_value:int=None, # Drop all rows with all values equal to this value
out_prefix:str='', # Prefix for outputs
):
Sample pixel values from input_raster using sampling_locations
sample_polygons
def sample_polygons(
sampling_locations:Path, # Path to the geojson/shapefile containing the sampling locations as polygons
input_raster:Path, # Path to the raster used for sampling
target_column:str, # Column of `sampling_locations` used for sampling
outpath:Path, # Path to save the output files. Is created if doesn't exist
min:bool, # Compute minimum
max:bool, # Compute maximum
mean:bool, # Compute mean
count:bool, # Compute count
sum:bool, # Compute sum
std:bool, # Compute standard deviation
median:bool, # Compute median
categorical:bool=False, # Are bands categorical data?
gpkg_layer:str=None, # If `sampling_locations` is .gpkg, specify the layer used. Ignored otherwise.
save_as_shp:bool=False, # Save results as shapefiles? If False, saves as geojson
rename_target:str=None, # If provided, target column is renamed to this
band_names:Path=None, # Path to a file providing bands to use as rows
dropna_value:int=None, # Drop all rows with all values equal to this value
out_prefix:str='', # Prefix for outputs
):
Sample pixel values from input_raster using sampling_locations.
Computer vision dataset creation
create_raster_dataset
def create_raster_dataset(
raster_path:Path, # Path to a raster that is the base for the dataset
mask_path:Path, # Path to corresponding mask raster or polygon layer. Must have the same extent and resolution as the raster in `raster_path`
outpath:Path, # Where to save the results
save_grid:bool=False, # Whether to save the tiling grid
allow_partial_data:bool=False, # Whether to create tiles that have only partial data
keep_bg_only:bool=False, # Keep the mask chips that contain only the background class
target_column:str=None, # If mask_path contains vector data, identifier of the column containing the class information
gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
gridsize_x:int=256, # Size of tiles in x-axis in pixels
gridsize_y:int=256, # Size of tiles in y-axis in pixels
overlap_x:int=0, # Overlap of tiles in x-axis in pixels
overlap_y:int=0, # Overlap of tiles in y-axis in pixels
):
Create a semantic segmentation dataset from a raster_path and corresponding mask mask_path. Raster image patches are saved to outpath/raster_tiles and mask patches to outpath/mask_tiles
create_coco_dataset
def create_coco_dataset(
raster_path:Path, # Path to a raster that is the base for the dataset
polygon_path:Path, # Path to annotated polygons
target_column:str, # Which column contains class information
outpath:Path, # Where to save the resulting files
dataset_name:str, # Name of the dataset
gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
min_area_pct:float=0.0, # How small polygons keep after tiling?
output_format:str='geojson', # Which format to use for saving, either 'geojson' or 'gpkg'
save_grid:bool=False, # Should tiling grid be saved
allow_partial_data:bool=False, # Whether to create tiles that have only partial image data
gridsize_x:int=320, # Size of tiles in x-axis in pixels
gridsize_y:int=320, # Size of tiles in y-axis in pixels
overlap_x:int=0, # Overlap of tiles in x-axis in pixels
overlap_y:int=0, # Overlap of tiles in y-axis in pixels
ann_format:str='box', # Annotation format, either box, polygon or rotated box
min_bbox_area:int=0, # Minimum bounding gox area in pixels. Smaller objects than this are discarded
):
Create a COCO-format dataset from raster and polygon shapefile
create_yolo_dataset
def create_yolo_dataset(
raster_path:Path, # Path to a raster that is the base for the dataset
polygon_path:Path, # Path to annotated polygons
target_column:str, # Which column contains class information
outpath:Path, # Where to save the resulting files?
dataset_name:str=None, # Optional name of the dataset
gpkg_layer:str=None, # If `polygon_path` is a geopackage, specify the layer used. Ignored otherwise.
min_area_pct:float=0.0, # How small polygons keep after tiling?
output_format:str='geojson', # Which format to use for saving, either 'geojson' or 'gpkg'
save_grid:bool=False, # Should tiling grid be saved
allow_partial_data:bool=False, # Whether to create tiles that have only partial image data
gridsize_x:int=320, # Size of tiles in x-axis, pixels
gridsize_y:int=320, # Size fo tiles in y-axis, pixels
overlap_x:int=0, # Overlap of tiles in x-axis
overlap_y:int=0, # Overlap of tiles in y-axis
ann_format:str='box', # Annotation format, either box, polygon or rotated box
min_bbox_area:int=0, # Minimum bounding box area in pixels. Smaller objects than this are discarded
):
Create a YOLO-format dataset from raster and polygon shapefile