
The data set contains all the available images from Sentinel-1 SAR and Sentinel-2 MSI from 2020 to 2024 over a specific area to the north of Brasov city, Romania, together with the 32 x 32 pixel size radar and multi-spectral patches for a crop identification task using a learning model. For the crop identification task, we propose the usage of the dataset for the following two concrete problems: (1) crop identification with temporal generalization in the learning model (i.e. training with 2020-2023 data and testing with 2024 data) and (2) early crop identification (i.e. crop identification during the vegetation season, considering an arbitrarily-chosen date of 20th May in the middle of the season, for the splitting of the data into training and testing). The data set also includes masks, shape files and the necessary meta data for the ground truth for the aforee-mentioned problems.
The directory Images_Sentinel1_2bands_GeoTIFF contains all the Sentinel-1 images corresponding to images from Sentinel-2 over the area of interest. All the images respect a naming convention and are saved in GeoTIFF format. The dimensions of the images are 800 x 450 x 2, specifically, each image has a height of 800 pixels, a width of 450 pixels and 2 spectral bands, with a spatial resolution of 10 x 10 meters. Each image is saved in the folder corresponding to the year when it was acquired by the Sentinel-1 SAR. The folders where the images are saved have the name Sentinel1_yyyy, where yyyy is the four-digit year.
The directory Images_Sentinel2_12bands_GeoTIFF containts all the Sentinel-2 images over the area of interest. All the images respect a naming convention and are saved in GeoTIFF format. The dimensions of the images are 800 x 450 x 12, specifically, each image has a height of 800 pixels, a width of 450 pixels and 12 spectral bands, with a spatial resolution of 10 x 10 meters. Each image is saved in the folder corresponding to the year when it was acquired by the Sentinel-2 MSI. The folders where the images are saved have the name Sentinel2_yyyy, where yyyy is the four-digit year. The directory Images_Sentinel2_GeoTIFF contains all the Sentinel-2 images like in the previous directory, but these images have all the information provided by Copernicus browser.
In addition to the radar data and multispectral images, the database contains the ground truth of agricultural crops as RGB masks in PNG format and the masks with labels corresponding to each agricultural crop in both PNG and MAT formats. These are located in the Masks_and_legend directory. This directory also contains the legend for the masks in PDF format and the 5 subdirectories where the masks for each year are stored. The subdirectories are named Masks_yyyy, where yyyy represents the four characters for the corresponding year.
The RoI_and_labels folder has three sub-folders, RoI sub-folder includes files to locate the region of interest (RoI) as rectangle polygon while WGS84 and UTM sub-folders include polygons for crop field labels in geographic (WGS84) and projected coordinate system (UTM). They are further separated based on the year.
From the afore-mentioned Sentinel-2 images, we automatically created radar and multi-spectral patches with dimensions 32 x 32 x 2 and 32 x 32 x 12, respectively, which are stored in the database under the directory 32x32_patches. The 32x32_patches directory contains two subdirectories: 32x32_SAR+MSI_patches and 32x32_RGB_patches (the latter ones purely for visualization). The first subdirectory contains radar and multispectral data used for solving the two problems, with the subdirectory problem1 corresponding to the classification of agricultural crops and problem2 corresponding to the early identification of agricultural crops. The data for each problem is divided into training and test sets. The patches in these directories are saved in both GEOTIFF and MAT formats, as indicated by the names of the subdirectories where they are stored: sentinel1_patches_mat, sentinel1_patches_tiff, sentinel2_patches_mat and sentinel2_patches_tiff. In this directory we have also put the csv files for test and training for both problems where is the correspondence between Sentinel-1 and Sentinel-2 patches. The directory 32x32_RGB_patches contains the patches generated from the RGB masks, as a ground truth for the labels at pixel level.