We used Mask R-CNN with a several pretrained backbones as our model, and fine-tuned the model with our remote sensing data. Because the convolutional layers of a CNN model extract interesting, useful features from the images, it is possible and advisable to use pretrained weights as a baseline and fine-tune the model with custom data. All models were trained for 3000 iterations with a batch size of 8, and validation metrics were recorded every 100 iterations. We used a base learning rate of 0.001 and linear warmup with cosine annealing as the learning rate scheduler, using 1000 iterations for the warmup phase.
cfg = get_cfg()cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))cfg.DATASETS.TRAIN = ("hiidenportti_train",)cfg.DATASETS.TEST = ("hiidenportti_val",)cfg.DATALOADER.NUM_WORKERS =4cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoocfg.SOLVER.IMS_PER_BATCH =8cfg.TEST.EVAL_PERIOD =100cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST =0.5cfg.OUTPUT_DIR ='detectron2_models/mask_rcnn_R_50_FPN_3x_256'cfg.SOLVER.LR_SCHEDULER_NAME ="WarmupCosineLR"cfg.SOLVER.BASE_LR =0.001# pick a good LRcfg.SOLVER.WARMUP_ITERS =1000cfg.SOLVER.MAX_ITER =3000cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE =512# (default: 512)cfg.MODEL.ROI_HEADS.NUM_CLASSES =2os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
Code
class Trainer(DefaultTrainer):""" Trainer class for training detectron2 models """def__init__(self, cfg):super().__init__(cfg)@classmethoddef build_evaluator(cls, cfg, dataset_name, output_folder=None):return DatasetEvaluators([COCOEvaluator(dataset_name, output_dir=output_folder)])@classmethoddef build_train_loader(cls, cfg): augs = build_aug_transforms(cfg, flip_horiz=True, flip_vert=True, max_rotate=10, brightness_limits=(.8,1.4), contrast_limits=(.8,1.4), saturation_limits=(.8,1.4), p_lighting=0.75)return build_detection_train_loader(cfg, mapper=DatasetMapper(cfg, is_train=True, augmentations=augs))
In order to effectively increase the amount of our training data, we applied a set of augmentations to our image patches and masks. First, each image was randomly flipped and randomly rotated up to 90 degrees. These geometric transformations were applied to both masks and image patches. In addition, brightness and contrast of the image were randomly adjusted and images had a chance to be slightly blurred. Each of these individual augmentations had a probability of 0.5 to be applied.
All models were trained with Python version 3.9.5 using deep learning stack containing version PyTorch 1.10.1 and Detectron2 detection and segmentation library. Weights & Biases was used to track the model metrics. We used a single NVIDIA V100 GPU with 32GB of memory, hosted on computing nodes of Puhti supercomputer hosted by CSC – IT Center for Science, Finland.
The following is for the example, all models were trained as batch jobs. First initialize wandb login.