= Path('example_data')
example_data_path= example_data_path/'points/points.geojson'
example_points = example_data_path/'polygons/polygons.geojson'
example_polys = example_data_path/'s2_lataseno_ex.tif' example_raster
Tabular data
Data conversion and processing
Utility functions to process dataframes.
array_to_longform
array_to_longform (a:pandas.core.frame.DataFrame, columns:list)
Convert pd.DataFrame a
to longform array
drop_small_classes
drop_small_classes (df:pandas.core.frame.DataFrame, min_class_size:int, target_column:str|int=0)
Drop rows from the dataframe if their target_column
value has less instances than `min_class_size
Generate random data to test.
= pd.DataFrame({'label': np.random.randint(1, 10, 200)})
ex_df ex_df.label.value_counts()
label
4 32
3 23
2 22
9 21
1 21
6 21
5 21
8 21
7 18
Name: count, dtype: int64
Column name can be either specified with string or int. If not provided it defaults to first column.
= drop_small_classes(ex_df, 20, 'label')
filtered assert filtered.label.value_counts().min() >= 20
filtered.label.value_counts()
label
4 32
3 23
2 22
9 21
1 21
6 21
5 21
8 21
Name: count, dtype: int64
If not specified, defaults to first column.
= drop_small_classes(ex_df, 20)
filtered filtered.label.value_counts()
label
4 32
3 23
2 22
9 21
1 21
6 21
5 21
8 21
Name: count, dtype: int64
Sampling utilities
These functions enable sampling of raster values using either point or polygon features.
sample_raster_with_points
sample_raster_with_points (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str, gpkg_layer:str=None, band_names:list[str]=None, rename_target:str=None)
Extract values from input_raster
using points from sampling_locations
. Returns a gpd.GeoDataFrame
with columns target_column
, geometry
and bands
sample_raster_with_points
is an utility to sample point values from a raster and get the results into a gpd.GeoDataFrame
.
= sample_raster_with_points(example_points, example_raster, 'id')
out_gdf out_gdf.head()
id | geometry | band_0 | band_1 | band_2 | band_3 | band_4 | band_5 | band_6 | band_7 | band_8 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10.0 | POINT (311760.599 7604880.391) | 334 | 591 | 439 | 1204 | 2651 | 3072 | 3177 | 2070 | 1046 |
1 | 14.0 | POINT (312667.464 7605426.442) | 183 | 359 | 282 | 759 | 1742 | 2002 | 2037 | 1392 | 669 |
2 | 143.0 | POINT (313619.160 7604550.762) | 281 | 478 | 427 | 900 | 1976 | 2315 | 2423 | 2139 | 1069 |
3 | 172.0 | POINT (311989.967 7605411.190) | 287 | 530 | 393 | 1078 | 2446 | 2761 | 2978 | 1949 | 950 |
4 | 224.0 | POINT (313386.009 7604304.917) | 204 | 379 | 327 | 753 | 1524 | 1747 | 1771 | 1322 | 663 |
It is also possible to provide band_names
to rename the columns.
= ['blue', 'green', 'red', 'red_edge1', 'red_edge2', 'nir', 'narrow_nir', 'swir1', 'swir2']
band_names = sample_raster_with_points(example_points, example_raster, 'id', band_names=band_names)
out_gdf out_gdf.head()
id | geometry | blue | green | red | red_edge1 | red_edge2 | nir | narrow_nir | swir1 | swir2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10.0 | POINT (311760.599 7604880.391) | 334 | 591 | 439 | 1204 | 2651 | 3072 | 3177 | 2070 | 1046 |
1 | 14.0 | POINT (312667.464 7605426.442) | 183 | 359 | 282 | 759 | 1742 | 2002 | 2037 | 1392 | 669 |
2 | 143.0 | POINT (313619.160 7604550.762) | 281 | 478 | 427 | 900 | 1976 | 2315 | 2423 | 2139 | 1069 |
3 | 172.0 | POINT (311989.967 7605411.190) | 287 | 530 | 393 | 1078 | 2446 | 2761 | 2978 | 1949 | 950 |
4 | 224.0 | POINT (313386.009 7604304.917) | 204 | 379 | 327 | 753 | 1524 | 1747 | 1771 | 1322 | 663 |
Or rename target column
= sample_raster_with_points(example_points, example_raster, 'id', rename_target='target')
out_gdf out_gdf.head()
target | geometry | band_0 | band_1 | band_2 | band_3 | band_4 | band_5 | band_6 | band_7 | band_8 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10.0 | POINT (311760.599 7604880.391) | 334 | 591 | 439 | 1204 | 2651 | 3072 | 3177 | 2070 | 1046 |
1 | 14.0 | POINT (312667.464 7605426.442) | 183 | 359 | 282 | 759 | 1742 | 2002 | 2037 | 1392 | 669 |
2 | 143.0 | POINT (313619.160 7604550.762) | 281 | 478 | 427 | 900 | 1976 | 2315 | 2423 | 2139 | 1069 |
3 | 172.0 | POINT (311989.967 7605411.190) | 287 | 530 | 393 | 1078 | 2446 | 2761 | 2978 | 1949 | 950 |
4 | 224.0 | POINT (313386.009 7604304.917) | 204 | 379 | 327 | 753 | 1524 | 1747 | 1771 | 1322 | 663 |
sample_raster_with_polygons
sample_raster_with_polygons (sampling_locations:pathlib.Path, input_raster:pathlib.Path, target_column:str=None, gpkg_layer:str=None, band_names:list[str]=None, rename_target:str=None, stats:list[str]=['min', 'max', 'mean', 'count'], categorical:bool=False)
Extract values from input_raster
using polygons from sampling_locations
with rasterstats.zonal_stats
for all bands
Example polygons here are previous points buffered by 40 meters.
= sample_raster_with_polygons(example_polys, example_raster, 'id')
out_gdf 0] out_gdf.iloc[
id 10.0
geometry MULTIPOLYGON (((311800.59915342694 7604880.390...
band_0_min 266.0
band_0_max 415.0
band_0_mean 335.708333
band_0_count 48
band_1_min 351.0
band_1_max 696.0
band_1_mean 582.1875
band_1_count 48
band_2_min 412.0
band_2_max 699.0
band_2_mean 524.520833
band_2_count 48
band_3_min 885.0
band_3_max 1462.0
band_3_mean 1237.125
band_3_count 48
band_4_min 1310.0
band_4_max 2888.0
band_4_mean 2479.291667
band_4_count 48
band_5_min 1565.0
band_5_max 3317.0
band_5_mean 2880.25
band_5_count 48
band_6_min 1579.0
band_6_max 3665.0
band_6_mean 3127.166667
band_6_count 48
band_7_min 1860.0
band_7_max 2214.0
band_7_mean 2076.895833
band_7_count 48
band_8_min 1024.0
band_8_max 1144.0
band_8_mean 1075.375
band_8_count 48
Name: 0, dtype: object
As sample_raster_with_polygons
utilizes rasterstats.zonal_statistics
, all stats supported by it can be provided with parameter stats
. More information here.
= sample_raster_with_polygons(example_polys, example_raster, 'id', stats=['min', 'max', 'sum', 'median', 'range'])
out_gdf 0] out_gdf.iloc[
id 10.0
geometry MULTIPOLYGON (((311800.59915342694 7604880.390...
band_0_min 266.0
band_0_max 415.0
band_0_sum 16114.0
band_0_median 338.0
band_0_range 149.0
band_1_min 351.0
band_1_max 696.0
band_1_sum 27945.0
band_1_median 590.0
band_1_range 345.0
band_2_min 412.0
band_2_max 699.0
band_2_sum 25177.0
band_2_median 524.0
band_2_range 287.0
band_3_min 885.0
band_3_max 1462.0
band_3_sum 59382.0
band_3_median 1255.5
band_3_range 577.0
band_4_min 1310.0
band_4_max 2888.0
band_4_sum 119006.0
band_4_median 2625.5
band_4_range 1578.0
band_5_min 1565.0
band_5_max 3317.0
band_5_sum 138252.0
band_5_median 3020.5
band_5_range 1752.0
band_6_min 1579.0
band_6_max 3665.0
band_6_sum 150104.0
band_6_median 3270.5
band_6_range 2086.0
band_7_min 1860.0
band_7_max 2214.0
band_7_sum 99691.0
band_7_median 2087.0
band_7_range 354.0
band_8_min 1024.0
band_8_max 1144.0
band_8_sum 51618.0
band_8_median 1070.0
band_8_range 120.0
Name: 0, dtype: object