NiFi
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Table-Level Lineage | ✅ | Supported. See docs for limitations | 
This plugin extracts the following:
- NiFi flow as DataFlowentity
- Ingress, egress processors, remote input and output ports as DataJobentity
- Input and output ports receiving remote connections as Datasetentity
- Lineage information between external datasets and ingress/egress processors by analyzing provenance events
Current limitations:
- Limited ingress/egress processors are supported- S3: ListS3,FetchS3Object,PutS3Object
- SFTP: ListSFTP,FetchSFTP,GetSFTP,PutSFTP
 
- S3: 
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[nifi]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: "nifi"
  config:
    # Coordinates
    site_url: "https://localhost:8443/nifi/"
    # Credentials
    auth: SINGLE_USER
    username: admin
    password: password
sink:
  # sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| site_url ✅ string | URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/ | 
| auth Enum | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS Default: NO_AUTH | 
| ca_file One of boolean, string | Path to PEM file containing certs for the root CA(s) for the NiFi | 
| client_cert_file string | Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT" | 
| client_key_file string | Path to PEM file containing the client’s secret key | 
| client_key_password string | The password to decrypt the client_key_file | 
| password string | Nifi password, must be set for auth = "SINGLE_USER" | 
| provenance_days integer | time window to analyze provenance events for external datasets Default: 7 | 
| site_name string | Site name to identify this site with, useful when using input and output ports receiving remote connections Default: default | 
| site_url_to_site_name map(str,string) | |
| username string | Nifi username, must be set for auth = "SINGLE_USER" | 
| env string | The environment that all assets produced by this connector belong to Default: PROD | 
| process_group_pattern AllowDenyPattern | regex patterns for filtering process groups Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| process_group_pattern.allow array(string) | |
| process_group_pattern.deny array(string) | |
| process_group_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
The JSONSchema for this configuration is inlined below.
{
  "title": "NifiSourceConfig",
  "description": "Any source that produces dataset urns in a single environment should inherit this class",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "site_url": {
      "title": "Site Url",
      "description": "URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/",
      "type": "string"
    },
    "auth": {
      "description": "Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS",
      "default": "NO_AUTH",
      "allOf": [
        {
          "$ref": "#/definitions/NifiAuthType"
        }
      ]
    },
    "provenance_days": {
      "title": "Provenance Days",
      "description": "time window to analyze provenance events for external datasets",
      "default": 7,
      "type": "integer"
    },
    "process_group_pattern": {
      "title": "Process Group Pattern",
      "description": "regex patterns for filtering process groups",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "site_name": {
      "title": "Site Name",
      "description": "Site name to identify this site with, useful when using input and output ports receiving remote connections",
      "default": "default",
      "type": "string"
    },
    "site_url_to_site_name": {
      "title": "Site Url To Site Name",
      "description": "Lookup to find site_name for site_url ending with /nifi/, required if using remote process groups in nifi flow",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    },
    "username": {
      "title": "Username",
      "description": "Nifi username, must be set for auth = \"SINGLE_USER\"",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "Nifi password, must be set for auth = \"SINGLE_USER\"",
      "type": "string"
    },
    "client_cert_file": {
      "title": "Client Cert File",
      "description": "Path to PEM file containing the public certificates for the user/client identity, must be set for auth = \"CLIENT_CERT\"",
      "type": "string"
    },
    "client_key_file": {
      "title": "Client Key File",
      "description": "Path to PEM file containing the client\u2019s secret key",
      "type": "string"
    },
    "client_key_password": {
      "title": "Client Key Password",
      "description": "The password to decrypt the client_key_file",
      "type": "string"
    },
    "ca_file": {
      "title": "Ca File",
      "description": "Path to PEM file containing certs for the root CA(s) for the NiFi",
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "string"
        }
      ]
    }
  },
  "required": [
    "site_url"
  ],
  "additionalProperties": false,
  "definitions": {
    "NifiAuthType": {
      "title": "NifiAuthType",
      "description": "An enumeration.",
      "enum": [
        "NO_AUTH",
        "SINGLE_USER",
        "CLIENT_CERT",
        "KERBEROS",
        "BASIC_AUTH"
      ]
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Authentication
This connector supports following authentication mechanisms
Single User Authentication (auth: SINGLE_USER)
Connector will pass this username and password as used on Nifi Login Page over /access/token REST endpoint. This mode also works when Kerberos login identity provider is set up for Nifi.
Client Certificates Authentication (auth: CLIENT_CERT)
Connector will use client_cert_file(required) and client_key_file(optional), client_key_password(optional) for mutual TLS authentication. 
Kerberos Authentication via SPNEGO (auth: Kerberos)
If nifi has been configured to use Kerberos SPNEGO, connector will pass user’s Kerberos ticket to nifi over  /access/kerberos REST endpoint. It is assumed that user's Kerberos ticket is already present on the machine on which ingestion runs. This is usually done by installing krb5-user and then running kinit for user.
sudo apt install krb5-user
kinit user@REALM
Basic Authentication (auth: BASIC_AUTH)
Connector will use HTTPBasicAuth with username and password.
No Authentication (auth: NO_AUTH)
This is useful for testing purposes.
Access Policies
This connector requires following access policies to be set in Nifi for ingestion user.
Global Access Policies
| Policy | Privilege | Resource | Action | 
|---|---|---|---|
| view the UI | Allows users to view the UI | /flow | R | 
| query provenance | Allows users to submit a Provenance Search and request Event Lineage | /provenance | R | 
Component level Access Policies (required to be set on root process group)
| Policy | Privilege | Resource | Action | 
|---|---|---|---|
| view the component | Allows users to view component configuration details | /<component-type>/<component-UUID> | R | 
| view the data | Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events | /data/<component-type>/<component-UUID> | R | 
| view provenance | Allows users to view provenance events generated by this component | /provenance-data/<component-type>/<component-UUID> | R | 
Code Coordinates
- Class Name: datahub.ingestion.source.nifi.NifiSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.