Skip to main content
Version: Next

Snaplogic

Testing

Important Capabilities

CapabilityStatusNotes
Column-level LineageEnabled by default.
Detect Deleted EntitiesNot supported yet.
Platform InstanceSnaplogic does not support platform instances.
Table-Level LineageEnabled by default.

A source plugin for ingesting lineage and metadata from Snaplogic.

Integration Details

This integration extracts data lineage information from the public SnapLogic Lineage API and ingests it into DataHub. It enables visibility into how data flows through SnapLogic pipelines by capturing metadata directly from the source API. This allows users to track data transformations and dependencies across their data ecosystem, enhancing observability, governance, and impact analysis within DataHub.

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
Snap-packData PlatformSnap-packs are mapped to Data Platforms, either directly (e.g., Snowflake) or dynamically based on connection details (e.g., JDBC URL).
Table/DatasetDatasetMay be differernt. It depends on a snap type. For sql databases it's table. For kafka it's topic, etc
SnapData Job
PipelineData Flow

Metadata Ingestion Quickstart

Prerequisites

In order to ingest lineage from snaplogic, you will need valid snaplogic credentials with access to the SnapLogic Lineage API.

Install the Plugin(s)

Run the following commands to install the relevant plugin(s):

pip install 'acryl-datahub[snaplogic]'

Configure the Ingestion Recipe(s)

Use the following recipe(s) to get started with ingestion.

'acryl-datahub[snaplogic]'

pipeline_name: <action-pipeline-name>
source:
type: snaplogic
config:
username: <snaplogic-username>
password: <snaplogic-password>
base_url: https://elastic.snaplogic.com
org_name: <snaplogic-org-name>
stateful_ingestion:
enabled: True
remove_stale_metadata: False
View All Recipe Configuartion Options
FieldRequiredDefaultDescription
usernameSnapLogic account login
passwordSnapLogic account password.
base_urlhttps://elastic.snaplogic.comSnaplogic url
org_nameOrganisation name in snaplogic platform
namespace_mappingNamespace mapping. Used to map namespaces to platform instances
case_insensitive_namespacesList of case insensitive namespaces

Troubleshooting

[Common Issue]

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

pipeline_name: "snaplogic_incremental_ingestion"
source:
type: snaplogic
config:
username: example@snaplogic.com
password: password
base_url: https://elastic.snaplogic.com
org_name: "ExampleOrg"
namespace_mapping:
snowflake://snaplogic: snaplogic
case_insensitive_namespaces:
- snowflake://snaplogic
stateful_ingestion:
enabled: True
remove_stale_metadata: False

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
org_name 
string
Organization name from Snaplogic instance
password 
string
Password
username 
string
Username
base_url
string
Url to your Snaplogic instance: https://elastic.snaplogic.com, or similar. Used for making API calls to Snaplogic.
bucket_duration
Enum
Size of the time window to aggregate usage stats.
Default: DAY
enable_stateful_lineage_ingestion
boolean
Enable stateful lineage ingestion. This will store lineage window timestamps after successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run.
Default: True
enable_stateful_usage_ingestion
boolean
Enable stateful lineage ingestion. This will store usage window timestamps after successful usage ingestion. and will not run usage ingestion for same timestamps in subsequent run.
Default: True
end_time
string(date-time)
Latest date of lineage/usage to consider. Default: Current time in UTC
namespace_mapping
object
Mapping of namespaces to platform instances
Default: {}
platform
string
Default: Snaplogic
start_time
string(date-time)
Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on bucket_duration). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'.
case_insensitive_namespaces
array
List of namespaces that should be treated as case insensitive
Default: []
case_insensitive_namespaces.object
object
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Base specialized config for Stateful Ingestion with stale metadata removal capability.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.fail_safe_threshold
number
Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.
Default: 75.0
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.snaplogic.snaplogic.SnaplogicSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Snaplogic, feel free to ping us on our Slack.