Object Detction Finetuning

I was trying to finetune a object detection model on a custom dataset. I was looking for example code for the same, but most examples I could find were for classification. What could be the reason for this?

Features: After feature extraction, classification is straightforward as it is just a fully connected layer on top of the extracted features. But for object detection (say YOLO), the features we extract are in the shape S X S X B X (5 + C), where S is the grid size, B is the number of bounding boxes per grid cell, 5 is the bounding box coordinates and objectness score, and C is the number of classes. Here S X S is the grid size, to which we want to divide the image.
Loss Computation: After bounding box generation during training time, the loss computation is also different. We need to compute the loss for the bounding box coordinates, objectness score, and class prediction.
Dataset Formats: There are several formats in which the labels and bounding boxes are represented. Roboflow, makes it easy to convert between these formats.
At inference time, from all the bounding boxes generated, we need to select the bounding boxes with the highest objectness score and then apply non-max suppression to remove overlapping bounding boxes.