Data Pipelines

Schema-agnostic, dynamic configuration-based approach to data delivery at scale. No service disruption to any changes

Braineous transforms any data-format, XML, CSV, into JSON for its processor. Once JSON a simple Object Graph and uses configured JSONPATH-expressions to produce data for the target system and delivers it to desired data-format. Completely dynamic. Just change the pipe configuration and roll.

A data pipeline connects the data source to a target store/system. The data source generates the data and posts it to the Braineous ingestion system via the Braineous Data Ingestion Java SDK. Braineous data pipeline engine supports multiple data sources and multiple target systems that can be associated with a single pipe. In turn it supports as many pipes that maybe necessary for the applications in question at scale.

Quarkus Build Time Principle

Pipe Registration

          
          
          {
              "pipeId": "123",
              "configuration": [
                {
                  "storeDriver" : "com.appgallabs.dataplatform.receiver.core.driver.MongoDBStoreDriver",
                  "name": "scenario2_store0",
                  "config": {
                    "connectionString": "mongodb://localhost:27017",
                    "database": "scenario2_store0",
                    "collection": "data"
                  },
                  "jsonpathExpression": "jsonpath:1"
                },
                {
                  "storeDriver" : "com.appgallabs.dataplatform.receiver.core.driver.MongoDBStoreDriver",
                  "name": "scenario2_store1",
                  "config": {
                    "connectionString": "mongodb://localhost:27017",
                    "database": "scenario2_store1",
                    "collection": "data"
                  },
                  "jsonpathExpression": "jsonpath:1"
                }
              ]
          }
          
      

Data Ingestion

The Data Ingestion process starts by sending data to the BRAINEOUS Data Ingestion endpoint. It is processed asynchronously as the data moves through its pipe through an Apache Kafka Topic and an Apache Flink Stream Processor

Each ingestion goes through an ordered process of phases which are

  • Validation
  • Transformation
  • Cleansing especially for Machine Learning applications
  • Delivery to the Target system
  • Pipeline Registration Update

    BRAINEOUS' Pipeline Engine is completely dynamic. You can add or remove Target Stores/Systems by updating the Registration via the BRAINEOUS Pipeline Management Endpoint.

    Data Ingestion Java SDK

    You can send data to the BRAINEOUS Data Ingestion engine using the included Java Data Ingestion SDK or directly via the REST API. The advantage of using the SDK has benefits of pre-processing optimizations such as:

  • Tenant Authentication
  • Data Throttling
  • GraphQL endpoint to query the ingested data (planned for CR2 release)
  • Data Ingestion REST API

    Create/Update Pipe Registration

              
              
                /ingestion/register_pipe:
        post:
          tags:
          - Data Ingestion
          requestBody:
            content:
              application/json:
                schema:
                  type: string
          responses:
            "200":
              description: OK
              
          

    Json-format ingestion

              
              
                /ingestion/json:
        post:
          tags:
          - Data Ingestion
          requestBody:
            content:
              application/json:
                schema:
                  type: string
          responses:
            "200":
              description: OK
              
          

    XML-format ingestion

              
              
                /ingestion/xml:
        post:
          tags:
          - Data Ingestion
          requestBody:
            content:
              application/json:
                schema:
                  type: string
          responses:
            "200":
              description: OK
              
          

    CSV-format ingestion

              
              
                /ingestion/csv:
        post:
          tags:
          - Data Ingestion
          requestBody:
            content:
              application/json:
                schema:
                  type: string
          responses:
            "200":
              description: OK