Tabular data

Utilities to process remote sensing image data into tabular format

example_data_path= Path('example_data')
example_points = example_data_path/'points/points.geojson'
example_polys = example_data_path/'polygons/polygons.geojson'
example_raster = example_data_path/'s2_lataseno_ex.tif'

Data conversion and processing

Utility functions to process dataframes.

source

array_to_longform

 array_to_longform (a:pandas.core.frame.DataFrame, columns:list)

Convert pd.DataFrame a to longform array

source

drop_small_classes

 drop_small_classes (df:pandas.core.frame.DataFrame, min_class_size:int,
                     target_column:str|int=0)

Drop rows from the dataframe if their target_column value has less instances than `min_class_size

Generate random data to test.

ex_df = pd.DataFrame({'label': np.random.randint(1, 10, 200)})
ex_df.label.value_counts()

label
4    32
3    23
2    22
9    21
1    21
6    21
5    21
8    21
7    18
Name: count, dtype: int64

Column name can be either specified with string or int. If not provided it defaults to first column.

filtered = drop_small_classes(ex_df, 20, 'label')
assert filtered.label.value_counts().min() >= 20
filtered.label.value_counts()

label
4    32
3    23
2    22
9    21
1    21
6    21
5    21
8    21
Name: count, dtype: int64

If not specified, defaults to first column.

filtered = drop_small_classes(ex_df, 20)
filtered.label.value_counts()

label
4    32
3    23
2    22
9    21
1    21
6    21
5    21
8    21
Name: count, dtype: int64

Sampling utilities

These functions enable sampling of raster values using either point or polygon features.

source

sample_raster_with_points

 sample_raster_with_points (sampling_locations:pathlib.Path,
                            input_raster:pathlib.Path, target_column:str,
                            gpkg_layer:str=None,
                            band_names:list[str]=None,
                            rename_target:str=None)

Extract values from input_raster using points from sampling_locations. Returns a gpd.GeoDataFrame with columns target_column, geometry and bands

sample_raster_with_points is an utility to sample point values from a raster and get the results into a gpd.GeoDataFrame.

out_gdf = sample_raster_with_points(example_points, example_raster, 'id')
out_gdf.head()

	id	geometry	band_0	band_1	band_2	band_3	band_4	band_5	band_6	band_7	band_8
0	10.0	POINT (311760.599 7604880.391)	334	591	439	1204	2651	3072	3177	2070	1046
1	14.0	POINT (312667.464 7605426.442)	183	359	282	759	1742	2002	2037	1392	669
2	143.0	POINT (313619.160 7604550.762)	281	478	427	900	1976	2315	2423	2139	1069
3	172.0	POINT (311989.967 7605411.190)	287	530	393	1078	2446	2761	2978	1949	950
4	224.0	POINT (313386.009 7604304.917)	204	379	327	753	1524	1747	1771	1322	663

It is also possible to provide band_names to rename the columns.

band_names = ['blue', 'green', 'red', 'red_edge1', 'red_edge2', 'nir', 'narrow_nir', 'swir1', 'swir2']
out_gdf = sample_raster_with_points(example_points, example_raster, 'id', band_names=band_names)
out_gdf.head()

	id	geometry	blue	green	red	red_edge1	red_edge2	nir	narrow_nir	swir1	swir2
0	10.0	POINT (311760.599 7604880.391)	334	591	439	1204	2651	3072	3177	2070	1046
1	14.0	POINT (312667.464 7605426.442)	183	359	282	759	1742	2002	2037	1392	669
2	143.0	POINT (313619.160 7604550.762)	281	478	427	900	1976	2315	2423	2139	1069
3	172.0	POINT (311989.967 7605411.190)	287	530	393	1078	2446	2761	2978	1949	950
4	224.0	POINT (313386.009 7604304.917)	204	379	327	753	1524	1747	1771	1322	663

Or rename target column

out_gdf = sample_raster_with_points(example_points, example_raster, 'id', rename_target='target')
out_gdf.head()

	target	geometry	band_0	band_1	band_2	band_3	band_4	band_5	band_6	band_7	band_8
0	10.0	POINT (311760.599 7604880.391)	334	591	439	1204	2651	3072	3177	2070	1046
1	14.0	POINT (312667.464 7605426.442)	183	359	282	759	1742	2002	2037	1392	669
2	143.0	POINT (313619.160 7604550.762)	281	478	427	900	1976	2315	2423	2139	1069
3	172.0	POINT (311989.967 7605411.190)	287	530	393	1078	2446	2761	2978	1949	950
4	224.0	POINT (313386.009 7604304.917)	204	379	327	753	1524	1747	1771	1322	663

source

sample_raster_with_polygons

 sample_raster_with_polygons (sampling_locations:pathlib.Path,
                              input_raster:pathlib.Path,
                              target_column:str=None, gpkg_layer:str=None,
                              band_names:list[str]=None,
                              rename_target:str=None,
                              stats:list[str]=['min', 'max', 'mean',
                              'count'], categorical:bool=False)

Extract values from input_raster using polygons from sampling_locations with rasterstats.zonal_stats for all bands

Example polygons here are previous points buffered by 40 meters.

out_gdf = sample_raster_with_polygons(example_polys, example_raster, 'id')
out_gdf.iloc[0]

id                                                           10.0
geometry        MULTIPOLYGON (((311800.59915342694 7604880.390...
band_0_min                                                  266.0
band_0_max                                                  415.0
band_0_mean                                            335.708333
band_0_count                                                   48
band_1_min                                                  351.0
band_1_max                                                  696.0
band_1_mean                                              582.1875
band_1_count                                                   48
band_2_min                                                  412.0
band_2_max                                                  699.0
band_2_mean                                            524.520833
band_2_count                                                   48
band_3_min                                                  885.0
band_3_max                                                 1462.0
band_3_mean                                              1237.125
band_3_count                                                   48
band_4_min                                                 1310.0
band_4_max                                                 2888.0
band_4_mean                                           2479.291667
band_4_count                                                   48
band_5_min                                                 1565.0
band_5_max                                                 3317.0
band_5_mean                                               2880.25
band_5_count                                                   48
band_6_min                                                 1579.0
band_6_max                                                 3665.0
band_6_mean                                           3127.166667
band_6_count                                                   48
band_7_min                                                 1860.0
band_7_max                                                 2214.0
band_7_mean                                           2076.895833
band_7_count                                                   48
band_8_min                                                 1024.0
band_8_max                                                 1144.0
band_8_mean                                              1075.375
band_8_count                                                   48
Name: 0, dtype: object

As sample_raster_with_polygons utilizes rasterstats.zonal_statistics, all stats supported by it can be provided with parameter stats. More information here.

out_gdf = sample_raster_with_polygons(example_polys, example_raster, 'id', stats=['min', 'max', 'sum', 'median', 'range'])
out_gdf.iloc[0]

id                                                            10.0
geometry         MULTIPOLYGON (((311800.59915342694 7604880.390...
band_0_min                                                   266.0
band_0_max                                                   415.0
band_0_sum                                                 16114.0
band_0_median                                                338.0
band_0_range                                                 149.0
band_1_min                                                   351.0
band_1_max                                                   696.0
band_1_sum                                                 27945.0
band_1_median                                                590.0
band_1_range                                                 345.0
band_2_min                                                   412.0
band_2_max                                                   699.0
band_2_sum                                                 25177.0
band_2_median                                                524.0
band_2_range                                                 287.0
band_3_min                                                   885.0
band_3_max                                                  1462.0
band_3_sum                                                 59382.0
band_3_median                                               1255.5
band_3_range                                                 577.0
band_4_min                                                  1310.0
band_4_max                                                  2888.0
band_4_sum                                                119006.0
band_4_median                                               2625.5
band_4_range                                                1578.0
band_5_min                                                  1565.0
band_5_max                                                  3317.0
band_5_sum                                                138252.0
band_5_median                                               3020.5
band_5_range                                                1752.0
band_6_min                                                  1579.0
band_6_max                                                  3665.0
band_6_sum                                                150104.0
band_6_median                                               3270.5
band_6_range                                                2086.0
band_7_min                                                  1860.0
band_7_max                                                  2214.0
band_7_sum                                                 99691.0
band_7_median                                               2087.0
band_7_range                                                 354.0
band_8_min                                                  1024.0
band_8_max                                                  1144.0
band_8_sum                                                 51618.0
band_8_median                                               1070.0
band_8_range                                                 120.0
Name: 0, dtype: object