Welcome to SimpleETL’s documentation!

SimpleETL - QGis plugin

Introduction

This is a QGIS plugin for data integration. It is based on the original [https://github.com/Mi-Kbs-gis/Small-ETL] which is now deprecated (since QGis 3.x) and seems unmaintained. QGis 3.x introduced a lot of changes in the plugin development environment, including Python 3 and Qt5, so the code baseline had to be reworked.

How To

  1. Install the plugin as any other QGis plugin.

  2. Add vector layers (source and target) in your QGis project instance. Be sure your target layer already has the expected fields.

  3. Start the plugin through the Plugins menu or the shortcut from the toolbar

  4. Select your source and target layers

  5. Define your transformation rules using comboboxes and expressions if needed.

  6. Start the ETL pipeline

  7. _Et voilà!_

Specific ETL parameters:

  • Use geometry: indicates if the pipeline should insert geometry to target layer or only attributes;

  • Update data: specifies if existing data in the target layer should be updated (overwritten) based on a specific, unique, user defined field. This option is useful if the integration process needs to be done on a regular basis. If this option is unchecked, the program will skip duplicates integration instead of updating existing features;

  • ID field: defines which field in the target layer should be considered for duplicate detection, or during the update process depending on the Update data checkbox.

  • All features vs. Selected features: simply apply the process on the whole source layer vs. selected features only.

Note: target layer may contain a new duplicates field that contains the list of possible duplicates. Field type is String.

Tell me how does it work

For every feature in the source target, the ETL pipeline does the following:

  • evaluate QGIS expression for each fields in the target layer. If no expression is provided, field is left as blank;

  • if use geometry is checked, the source feature geometry is added to the integrated feature in the target layer;

  • if update data is checked, target layer features are updated based on the ID field. For sake of simplicity, updating here means overwriting. Otherwise, the pipeline will check that no other feature in the target layer already have the value evaluated for the ID field;

  • finally, duplicate detection is applied on both attributes (upcoming) and geometry. Duplicate detection on geometry is based on Hausdorff distance computation. If a duplicate is detected, the ID (value in ID field) of the duplicate for the corresponding feature is added to a newly created duplicates field.

Development

Here are some tips for developers:

  • Plugin boilerplate code was generated using Plugin Builder 3

  • Use QtDesigner to update the UI if needed

  • pb_tool can be installed using pip in a virtual environment

  • pb_tool compile needs to be called for generating updated resources.py, if some changes are made, e.g. on the icon image file.

License

This work is licensed under the GPLv3.

How-to

Prerequisites: make sure the source and target shapefile layers are loaded into the QGIS project.

  • Open the plugin window. The main window is divided in 3 columns:
    • Layer selection and source layer information

    • target layer information, processing options, and transformations rules

    • actions: save/load configuration file, start and cancel process, logs display

  • Select the source and target layers from the upper left combobox. This will update tables with layer fields

  • Choose processing options :
    • if you want to process geometries or not (only attributes)

    • if you want to update records based on a unique field, or just integrate new data

    • if you want to process all features or selected features only from source layer

  • Define transformation rules:
    • for each field in the target layer, you can select a field from the source layer or use the field calculator for more complex rules. Once defined, you can use the Save button to save the rules in a JSON file.

    • or use the Load button on the upper right to load predefined configuration file (JSON format)

  • Start the ETL process

Some messages may appear in the logs window, for information about loading transformation rules, results of ETL process, errors, etc.

_images/main_window.png

Transformation rules definition file

JSON files for saving/loading transformation rules are structured as follows:

{
   "parameters": {
      "use_geometry": true,
      "id_field": 0,
      "update_data": false,
      "all_features": true
   },
   "rules": {
      "0": "concat(code, '_', $id)",
      "1": "id"
   }
}

Indices and tables