Data Scientist User Documentation
Welcome to the Data Scientist User Documentation for gIQ.
Updated: January 14, 2024
Updated By: Shaun Campbell
1. Introduction
The data scientist role within the product is vital for the ongoing success of the AI inside the product. While this document will be relatively short, it is packed with information on how the UX of the product is suited for the data scientist.
This document is broken down into the few functions that a data scientist may interact with in this product. For more details about the full product please consult with the various user guides within the documentation site.
2. Uploading and managing models
A user must have the correct permissions in order to see the models page and to upload a model. To manage other models, a user must have the models admin permission enabled.
gIQ gives the ability to upload new models on the product. Within the models page there is an intuitive method of uploading the models.
The model must be in .onnx format and must be accompanied with a meta file in the format below with a .yaml extension.
model:
id: 'yellow_vehicle_detector'
version: 2
description: 'Yolov5 object detector trained on xview vehicle dataset and in house annotations'
labels:
- "Bulldozer"
- "Excavator"
- "Grader"
- "Cement Truck"
- "Open Cargo Truck"
- "Others"
family: yolov5
type: MULTISPECTRAL
job_type: detection
tile_size: 1024
inputs:
- dtype: float32 shape: [3,1024,1024]
inference_params: - confidence_threshold: 0.45 - nms_threshold: 0.2 - resolution_default: 0.3 - resolution_minimum: 1.0 - resolution_maximum: 0.1
The gIQ product allows for the use of model versioning, and it is recommended that the versioning is used in order to keep the models page easy to use. Multiple models can be active at one time, but if you are using the versioning technique only one model version can be used at one time.
From here, you can easily configure the inference parameters for the model. The product also allows a user to configure if the model should run automatically or not, and if a model is active or not. Manually running the model is recommended to save of GPU resources.
It is important that the model type is set correctly in the meta file. The "type" parameter tells the product the model can run on which type of data. It allows the product to only give the option for a SAR model to run on SAR imagery, for example.
Once the model and meta files are uploaded it is important that the mapping is configured. The labels output by the model can be mapped to something which may be more meaningful to the users who are consuming the model output within the organisation.
Managing the Models
1. Click on Models…
2. Click on Configuration…
This configuration allows a user with the correct permissions to state if a model should run automatically when a file is ingested to the system. It is not recommended due to a potential high GPU resource cost, but the option is available.
3. The other configuration options
4. Opens up a panel
From here you can change some of the parameters for running a model.
5. Click on Model State…
This shows if the model should be available for running on your environment or not.
6. To change the category mapping
This feature allows you to map the model label to a category of your choice.
7. Click on Add Category
Or to add a new category!
8. Options
Adding a new category gives you a few options at this stage, such as catgeory color and if it should be a structured diagram.
3. Running a model
The model page is where a user can manage a model. But to run that model, the user must find the relevant file to run it on. This takes place in the file manager page.
It is recomended that data science users have both models and file manager features enabled for their user role.
At the time of writing the only way to manually run a model is through the file manager page. The user must know which file they want to run a model on. The easiest method is to know the file name and search for that within the file manager.
Once the file has been identified, the user should select it. At this stage if there is a model which is suitable for the file type (highlighting the importance of the model "type" in the meta file), then the model name will appear in the list on the right hand side. To make the model run, all a user needs to do is press the "run" button.
The model will run. The speed at which the model runs depends on numerous factors, such as model complexity, data size and available resources.
Once the model has run there will either be a failed state, where you may then re-run the model, or a done state. In the done state it will show the number of prediction in each class for each model that has run, together with the model run time.
If there are no predictions listed then the model did not detect anything. If you believe this is a fault please contact us.
If a model run is in a done state, it is not possible to re-run. This feature will be developed in due course.
4. Visulaising the predictions
Once a model has run it is now possible to visualise the results. Best method is to simply click on the thumbnail of the relevnat file. This will take you directly to the exploration view.
From here, click on the detections tab, which will display all of the model results from all of the models which have run successfully. You can select an individual model if required.
There are now a couple of features you can do. First, you will see all of the predictions as thumnails in the list. Clicking on each will snap the view over to the specific prediction. It will also enable the view of the metadata of the individual prediction.
You can download the detections. This will download the detections as a geojson file. Downloading as a YOLO format is in development.
Finally, you can also filter the results of the detections by the confidence score. This can help to remove some of the lower confidence results which may be false positives.
5. Interacting with the annotations
Currently, annotation work is confined to the workspace. The concept of the workspace is for all analytical and annotation work to occur within the workspace to contain the thematic set together. To use the annotation workflows it is vital that a user has the workspace permission set in their user role.
There will be a specific annotation workflow coming soon. This section will explain how to work with annotations in the product. An annotation within gIQ specifically refers to markups made for the benefit of building new AI models.
The new annotation
In order to build a new annotation from scratch, a user must be inside a workspace with the relevant raster layers. The annotation feature set must be selected.
An annotation layer must now be created. to do this, a user should select the raster file, open the burger menu and select create annotation layer. This becomes a child layer of the raster. Enable the layer to enable the editing tools.
You now have the annotation tools enabled and the layers enabled. Now enable the edit mode. When the annotation layer is selected, you will see the right panel has changed giving an option for annotations. This panel will show all of the categories available within your organisation. You can also create new categories here.
Using the available tools, you can now draw polygons around the objects of interest. It is important to select the category from the list while you are drawing. Furthermore, you can select a polygon and change the category by clicking on a different category. If you draw a polygon without a category being selected first, they the polygon will turn grey.
If a polygon has been created and the user needs to remove it, then simply select the relevant polygon and delete using the bin icon.
Creating an annotation layer from the predictions
It is possible to use the predictions that a model has made to create a new annotation layer. This allows for a Human-In-The-Loop approach to annotation creation.
To use this approach follow the steps below.
Visualise the relevant raster layer and go to the detections tab.
From here, there is a burger menu. Ensure the right model results are selected, then copy and create a new annotation layer.
This now creates a new annotation layer for a user to interact with. This approach helps to speed up the process of creating annotations in the product. Now the user will have access to all of the tools for annotation.
6. Annotation Projects - aka Auto Model Development
As a world first capability, gIQ has the ability to create new computer vision models in a no-code environment. To use this feature, a user must have the Auto Model Development permissions enbabled.
This powerful feature allows a user to build validated datasets for use in model training. It is used in conjunction with the annotations from the workspaces.
Once a user has fininshed the annotations on a raster file, they can then send the annotations to the annotation projects environment. They will select a "annotation project" to send the data to.
Inside the annotation project page the user will see a list of all of the annotation projects. Selecting an annotation project will open a panel which shows all of the experiments and all of the annotation datasets.
The annotation dataset panel is split into several sections. It shows dataset versioning, if that has already been applied. If there is no dataset version which has been verified there will be a button available to start a verification process. More on that below.
Second section is the list of all of the annotation sources. This is a list of all of the annotation layers which make up the dataset. It also provides a link to the relevant raster layer, but this will take you to the exploration page to simply view the annotations. It will not take you to the area to edit them.
Next panel shows the categories which make up the annotation dataset, followied by a panel which gives all of the thumbnails for all of the annotations within the dataset.
In experiments, this allows you to configure and run a new model training. It gives two different architectures at the moment - YOLOv5 and YOLOv8. As this feature matures more architectures will be added.
Once the model has been trained it will be listed within this experiments page. It will show the status of the model rune together with the accuracy score. It also allows you to download the model and review the clearML page for the model run.
7. Marketplace
With the correct permissions set for the user role, it is possible to publish a model directly to the marketplace from the models page. It is recommended to put as much details into the marketplace item as possible to enable users of other organisations to see your model and to understand the potential benefit to them.