Snaplogic
Important Capabilities
Capability | Status | Notes |
---|---|---|
Column-level Lineage | ✅ | Enabled by default. |
Detect Deleted Entities | ❌ | Not supported yet. |
Platform Instance | ❌ | Snaplogic does not support platform instances. |
Table-Level Lineage | ✅ | Enabled by default. |
A source plugin for ingesting lineage and metadata from Snaplogic.
Integration Details
This integration extracts data lineage information from the public SnapLogic Lineage API and ingests it into DataHub. It enables visibility into how data flows through SnapLogic pipelines by capturing metadata directly from the source API. This allows users to track data transformations and dependencies across their data ecosystem, enhancing observability, governance, and impact analysis within DataHub.
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
Snap-pack | Data Platform | Snap-packs are mapped to Data Platforms, either directly (e.g., Snowflake) or dynamically based on connection details (e.g., JDBC URL). |
Table/Dataset | Dataset | May be differernt. It depends on a snap type. For sql databases it's table. For kafka it's topic, etc |
Snap | Data Job | |
Pipeline | Data Flow |
Metadata Ingestion Quickstart
Prerequisites
In order to ingest lineage from snaplogic, you will need valid snaplogic credentials with access to the SnapLogic Lineage API.
Install the Plugin(s)
Run the following commands to install the relevant plugin(s):
pip install 'acryl-datahub[snaplogic]'
Configure the Ingestion Recipe(s)
Use the following recipe(s) to get started with ingestion.
'acryl-datahub[snaplogic]'
pipeline_name: <action-pipeline-name>
source:
type: snaplogic
config:
username: <snaplogic-username>
password: <snaplogic-password>
base_url: https://elastic.snaplogic.com
org_name: <snaplogic-org-name>
stateful_ingestion:
enabled: True
remove_stale_metadata: False
View All Recipe Configuartion Options
Field | Required | Default | Description |
---|---|---|---|
username | ✅ | SnapLogic account login | |
password | ✅ | SnapLogic account password. | |
base_url | ✅ | https://elastic.snaplogic.com | Snaplogic url |
org_name | ✅ | Organisation name in snaplogic platform | |
namespace_mapping | ❌ | Namespace mapping. Used to map namespaces to platform instances | |
case_insensitive_namespaces | ❌ | List of case insensitive namespaces |
Troubleshooting
[Common Issue]
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: "snaplogic_incremental_ingestion"
source:
type: snaplogic
config:
username: example@snaplogic.com
password: password
base_url: https://elastic.snaplogic.com
org_name: "ExampleOrg"
namespace_mapping:
snowflake://snaplogic: snaplogic
case_insensitive_namespaces:
- snowflake://snaplogic
stateful_ingestion:
enabled: True
remove_stale_metadata: False
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
org_name ✅ string | Organization name from Snaplogic instance |
password ✅ string | Password |
username ✅ string | Username |
base_url string | Url to your Snaplogic instance: https://elastic.snaplogic.com , or similar. Used for making API calls to Snaplogic. Default: https://elastic.snaplogic.com |
bucket_duration Enum | Size of the time window to aggregate usage stats. Default: DAY |
enable_stateful_lineage_ingestion boolean | Enable stateful lineage ingestion. This will store lineage window timestamps after successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. Default: True |
enable_stateful_usage_ingestion boolean | Enable stateful lineage ingestion. This will store usage window timestamps after successful usage ingestion. and will not run usage ingestion for same timestamps in subsequent run. Default: True |
end_time string(date-time) | Latest date of lineage/usage to consider. Default: Current time in UTC |
namespace_mapping object | Mapping of namespaces to platform instances Default: {} |
platform string | Default: Snaplogic |
start_time string(date-time) | Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on bucket_duration ). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'. |
case_insensitive_namespaces array | List of namespaces that should be treated as case insensitive Default: [] |
case_insensitive_namespaces.object object | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Base specialized config for Stateful Ingestion with stale metadata removal capability. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "SnaplogicConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"bucket_duration": {
"description": "Size of the time window to aggregate usage stats.",
"default": "DAY",
"allOf": [
{
"$ref": "#/definitions/BucketDuration"
}
]
},
"end_time": {
"title": "End Time",
"description": "Latest date of lineage/usage to consider. Default: Current time in UTC",
"type": "string",
"format": "date-time"
},
"start_time": {
"title": "Start Time",
"description": "Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on `bucket_duration`). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'.",
"type": "string",
"format": "date-time"
},
"enable_stateful_usage_ingestion": {
"title": "Enable Stateful Usage Ingestion",
"description": "Enable stateful lineage ingestion. This will store usage window timestamps after successful usage ingestion. and will not run usage ingestion for same timestamps in subsequent run. ",
"default": true,
"type": "boolean"
},
"enable_stateful_lineage_ingestion": {
"title": "Enable Stateful Lineage Ingestion",
"description": "Enable stateful lineage ingestion. This will store lineage window timestamps after successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. ",
"default": true,
"type": "boolean"
},
"stateful_ingestion": {
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
},
"platform": {
"title": "Platform",
"default": "Snaplogic",
"type": "string"
},
"username": {
"title": "Username",
"description": "Username",
"type": "string"
},
"password": {
"title": "Password",
"description": "Password",
"type": "string"
},
"base_url": {
"title": "Base Url",
"description": "Url to your Snaplogic instance: `https://elastic.snaplogic.com`, or similar. Used for making API calls to Snaplogic.",
"default": "https://elastic.snaplogic.com",
"type": "string"
},
"org_name": {
"title": "Org Name",
"description": "Organization name from Snaplogic instance",
"type": "string"
},
"namespace_mapping": {
"title": "Namespace Mapping",
"description": "Mapping of namespaces to platform instances",
"default": {},
"type": "object"
},
"case_insensitive_namespaces": {
"title": "Case Insensitive Namespaces",
"description": "List of namespaces that should be treated as case insensitive",
"default": [],
"type": "array",
"items": {}
}
},
"required": [
"username",
"password",
"org_name"
],
"additionalProperties": false,
"definitions": {
"BucketDuration": {
"title": "BucketDuration",
"description": "An enumeration.",
"enum": [
"DAY",
"HOUR"
],
"type": "string"
},
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
},
"fail_safe_threshold": {
"title": "Fail Safe Threshold",
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"default": 75.0,
"minimum": 0.0,
"maximum": 100.0,
"type": "number"
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.snaplogic.snaplogic.SnaplogicSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Snaplogic, feel free to ping us on our Slack.