Creating Your First Braineous Data Ingestion Application

Learn how to create a Braineous Data Ingestion application.

This guide covers:

  • Get the Source Data

  • Register a data pipe and send source data to multiple target MongoDB databases

  • Verify all target databases receive the data

1. Prerequisites

To complete this guide, you need:

  • Roughly 15 minutes

  • An IDE

  • JDK 11+ installed with JAVA_HOME configured appropriately

  • Apache Maven 3.9.5

Verify Maven is using the Java you expect

If you have multiple JDK’s installed, it is not certain Maven will pick up the expected java and you could end up with unexpected results. You can verify which JDK Maven uses by running mvn --version.

Download the Braineous-1.0.0-CR1 zip archive

This tutorial is located under: braineous-1.0.0-cr1/tutorials/get-started

2. Get the Source Data

Let’s start with a simple Json array to be used as datasource to be ingested by the Braineous Data Ingestion Engine

Source Data

[
  {
    "id" : 1,
    "name": "Joe Black1",
    "age": 50,
    "addr": {
      "email": "test@email.com",
      "phone": "123456"
    }
  },
  {
    "id": "2",
    "name": "Joe Black2",
    "age": 50,
    "addr": {
      "email": "test@email.com",
      "phone": "123456"
    }
  }
]

Java Code

//setup source data for ingestion
String sourceData = "[\n" +
        "  {\n" +
        "    \"id\" : 1,\n" +
        "    \"name\": \"Joe Black1\",\n" +
        "    \"age\": 50,\n" +
        "    \"addr\": {\n" +
        "      \"email\": \"test@email.com\",\n" +
        "      \"phone\": \"123456\"\n" +
        "    }\n" +
        "  },\n" +
        "  {\n" +
        "    \"id\": \"2\",\n" +
        "    \"name\": \"Joe Black2\",\n" +
        "    \"age\": 50,\n" +
        "    \"addr\": {\n" +
        "      \"email\": \"test@email.com\",\n" +
        "      \"phone\": \"123456\"\n" +
        "    }\n" +
        "  }\n" +
        "]";

3. Register a data pipe and send source data to multiple target MongoDB databases

Register a data pipe with the Braineous Data Ingestion Engine using the Java Braineous Data Ingestion Client SDK.

Pipe Configuration

{
  "pipeId": "123",
  "configuration": [
    {
      "storeDriver" : "com.appgallabs.dataplatform.receiver.core.driver.MongoDBStoreDriver",
      "name": "scenario1_store",
      "config": {
        "connectionString": "mongodb://localhost:27017",
        "database": "scenario1_store",
        "collection": "data"
      },
      "jsonpathExpression": "jsonpath:1"
    }
  ]
}
  • pipeId : As a data source provider, this is id identifies this data pipe uniquely with the Braineous Data Pipline Engine.

  • configuration.storeDriver: MongoDB target store driver

  • name: a user-friendly way to indentify the target store

  • config.connectionString: MongoDB database connection string

  • config.database: MongoDB database

  • config.collection: MongoDB database collection

A data pipe can be configured with multiple target stores/systems associated with the same data pipe for data delivery.

In the next Candidate Release, Braineous team will add support for more target stores and systems such as :

  • Postgresql

  • Mysql

  • Oracle

  • Snowflake

  • Microservices

  • Airbyte Catalog

Java Code - Register Pipe

//setup data pipe configuration json
String dataPipeConfiguration = "{\n" +
        "  \"pipeId\": \"123\",\n" +
        "  \"configuration\": [\n" +
        "    {\n" +
        "      \"storeDriver\" : \"com.appgallabs.dataplatform.receiver.core.driver.MongoDBStoreDriver\",\n" +
        "      \"name\": \"get_started_store\",\n" +
        "      \"config\": {\n" +
        "        \"connectionString\": \"mongodb://localhost:27017\",\n" +
        "        \"database\": \"get_started_store\",\n" +
        "        \"collection\": \"data\"\n" +
        "      },\n" +
        "      \"jsonpathExpression\": \"jsonpath:1\"\n" +
        "    }\n" +
        "  ]\n" +
        "}";
JsonObject configJson = JsonUtil.validateJson(dataPipeConfiguration).getAsJsonObject();

//register pipe
JsonObject response = DataPipeline.registerPipe(dataPipeConfiguration);
JsonUtil.printStdOut(response);

Java Code - Send Data for ingestion

//send source data through the pipeline
String pipeId = configJson.get("pipeId").getAsString();
String entity = "books";
DataPipeline.sendData(pipeId, entity, sourceData);

4. Verify all target collections receive the data

To verify the success of the ingestion and delivery to the configured target databases, use the following MongoDB commands.

Expected Result: You should see two records added to a collection called "data" in a database called "get_started_store"

mongosh

mongosh
use get_started_store
show collections
db.data.find()
db.data.count()