Braineous Documentation version (1.0.0-CR3) Candidate Release 3
1. Core Concepts
1.1. Pipeline
1.1.1. What is a Pipeline?
A data pipeline connects the data source to a target store/system. The data source generates the data and posts it to the Braineous ingestion system via the Braineous Data Ingestion Java SDK. Braineous data pipeline engine supports multiple data sources and multiple target systems that can be associated with a single pipe. In turn it supports as many pipes that maybe necessary for the applications in question at scale.
1.1.2. Building Blocks
-
PipeId : The 'pipeId' uniquely identifies a pipeline to be managed by Braineous. The Braineous ingestion engine can manage and monitor thousands of pipelines on a single of instance of the Braineous server.
-
Entity : The 'entity' indicates the type of dataset that would be processed by a Braineous pipeline. Multiple entities can be processed by a single pipeline as the datasets are partitioned by 'staging stores' and the 'datalake store' associated with a pipeline.
-
Staging Store : A 'staging store' is the final destination of delivery for a pipeline. A pipeline can deliver to multiple stores in real time with network latency the only environmental factor.
-
Data Integration Agent : A 'staging store' is the final destination of delivery for a pipeline. 'Data Integration Agent' is a phase to make the data operational for the application. Application is arbitrary. It can be a Dashboard, a backend process, an analytics system, or a machine learning application.
-
DataLake Store : A 'datalake store' is the internal store the pipeline delivers the data for applications that have usecases related to analytics and machine learning. Braineous is built on Apache Flink as its data processing engine and supports Apache Hive as its core data lake. Future releases of Braineous will include a Data Lake Connector framework that can support third-party data lakes like Apache Iceberg, Snowflake or a custom data lake developed by the customer.
3. Staging Store
3.1. What is a Staging Store?
A 'staging store' is the final destination of delivery for a pipeline. A pipeline can deliver to multiple stores in real time with network latency the only environmental factor.
Logically, a staging store is identified by its pipeId and entity
3.2. What is a Data Integration Agent?
A 'staging store' is the final destination of delivery for a pipeline. A pipeline can deliver to multiple stores in real time with network latency the only environmental factor.
'Data Integration Agent' is a phase to make the data operational for the application. Application is arbitrary. It can be a Dashboard, a backend process, an analytics system, or a machine learning application.
For a hands-on tutorial please refer to: Creating Your Custom Data Integration Agent
3.2.1. Pipe Configuration
A sample Braineous pipe configuration
{
"pipeId": "yyya",
"entity": "abc",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.MongoDBStagingStore",
"name": "yyya",
"config": {
"connectionString": "mongodb://localhost:27017",
"database": "yyya",
"collection": "data",
"jsonpathExpressions": []
}
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.connectionString: MongoDB database connection string for your target store
-
configuration.config.database: MongoDB database on your target store
-
configuration.config.collection: MongoDB database collection on your target store
A data pipe can be configured with multiple target stores/systems associated with the same data pipe for data delivery.
The current Release, supports the following target stores
-
Snowflake
-
MySQL
-
ElasticSearch
-
MongoDB
-
ClickHouse
In the future releases, Braineous team will add support for more target stores and systems such as :
-
Postgresql
-
Oracle
-
Amazon RedShift
Braineous also provides a Staging Store Framework for developers to develop custom connectors. For a hands-on tutorial please refer to: Create a Data Connector
4. Data Lake
4.1. What is a Data Lake?
A 'datalake store' is the internal store the pipeline delivers the
data for applications that have use cases related to analytics and machine learning. Braineous
is built on Apache Flink
as its data processing engine and supports Apache Hive
based data lakes. Future releases of Braineous will include a Data Lake Connector framework that
can support third-party data lakes like Apache Iceberg, Snowflake or a custom data lake
developed by the customer.
5. Data Transformation
5.1. What is Data Transformation?
Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another.
Transformations typically involve converting a raw data source into a cleansed, validated and ready-to-use format. Data transformation is crucial to data management processes that include data integration, data migration, data warehousing and data preparation.
Braineous is fully compliant with JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
You can transform your payload on the fly using JSONPath expressions on the pipeline configuration.
Here is a tutorial: Data Transformation
6. Pipeline Monitoring
Braineous
includes a pipeline monitoring tool called pipemon
available
in the bin
directory of the binary distribution.
The tool is evolving and is designed to support the following features:
./pipemon.sh
6.1. Create a Tenant with API_KEY
and API_SECRET
babyboy@BabyBoys-MBP bin % ./pipemon.sh
********************
πΉβπΈπβπΌπππ
*****************
> If you have a tenant press [l] to login, If you need to create a tenant press[c]
c
Create a tenant
> tenant:
ruku_tenant
> email:
ruku@ruku.com
> password:
password
***TENANT_CREATION_SUCCESS***
{principal: 38016be6-2869-4543-b7bd-9a772edd4c07, apiSecret: 1093eafc-4445-4a44-8d62-0a9d8512f23d, name: ruku_tenant, email: ruku@ruku.com, password: 5F4DCC3B5AA765D61D8327DEB882CF99, apiKey: 38016be6-2869-4543-b7bd-9a772edd4c07}
Please keep the API secret safe for Braineous Data Platform usage. This will not be displayed in the future for security reasons
*****************************
38016be6-2869-4543-b7bd-9a772edd4c07 > Press exit or CTRL+C to exit
6.2. Show registered pipelines
********************
πΉβπΈπβπΌπππ
*****************
> If you have a tenant press [l] to login, If you need to create a tenant press[c]
l
Login
> email:
paro@paro.com
> password:
password
6ca06059-aebb-43a6-ba47-306d2469c059 > Press exit or CTRL+C to exit
show pipes
*******All registered pipes********
[{subscriptionId: 20350d12-7563-4931-82b8-0e7dcb0ef442, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: yyya, subscriptionId: 20350d12-7563-4931-82b8-0e7dcb0ef442, pipeName: yyya, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}, {subscriptionId: ced17e1b-604d-4d6c-93f4-8933e8c3e14b, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: yyya, subscriptionId: ced17e1b-604d-4d6c-93f4-8933e8c3e14b, pipeName: yyya, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}, {subscriptionId: eba00d05-80df-4207-8c20-286cd305564b, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: yyya, subscriptionId: eba00d05-80df-4207-8c20-286cd305564b, pipeName: yyya, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}, {subscriptionId: 9b9c3897-8bde-4c9a-adbb-a23e3faa5d2a, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: zzza, subscriptionId: 9b9c3897-8bde-4c9a-adbb-a23e3faa5d2a, pipeName: zzza, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}, {subscriptionId: dd24d228-0a4d-4453-a9bd-85a3aec92664, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: yyya, subscriptionId: dd24d228-0a4d-4453-a9bd-85a3aec92664, pipeName: yyya, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}, {subscriptionId: 3ddce006-e26e-45fc-be68-d4ad4c8f4522, group: {subscribers: [{email: 6ca06059-aebb-43a6-ba47-306d2469c059}]}, pipe: {pipeId: zzza, subscriptionId: 3ddce006-e26e-45fc-be68-d4ad4c8f4522, pipeName: zzza, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}}]
6ca06059-aebb-43a6-ba47-306d2469c059 > Press exit or CTRL+C to exit
6.3. Show ingestion statistics
6ca06059-aebb-43a6-ba47-306d2469c059 > Press exit or CTRL+C to exit
use yyya
Using pipe: yyya
yyya > Press exit or CTRL+C to exit
show ingestion_stats
*******Data_Ingestion_Stats********
{type: ingestion, pipeId: yyya, pipeName: yyya, sizeInBytes: 2176}
yyya > Press exit or CTRL+C to exit
6.4. Show delivery statistics
yyya > Press exit or CTRL+C to exit
show delivery_stats
*******Data_Delivery_Stats********
{type: delivery, pipeId: yyya, pipeName: yyya, sizeInBytes: 1088}
yyya > Press exit or CTRL+C to exit
6.5. Show live snapshots
yyya > Press exit or CTRL+C to exit
show live_snapshot
*******Live_Snapshot********
[{_id: {$oid: 660d132a0638fb53523e72f2}, pipelineServiceType: DATALAKE, metadata: {datalake: {name: mongodb, configuration: {connectionString: mongodb://localhost:27017, collection: datalake}}}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d132a0638fb53523e72f1}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d132b0638fb53523e72f3}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, outgoing: true}, {_id: {$oid: 660d155e0638fb53523e72f9}, pipelineServiceType: DATALAKE, metadata: {datalake: {name: mongodb, configuration: {connectionString: mongodb://localhost:27017, collection: datalake}}}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d155e0638fb53523e72fc}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d155e0638fb53523e72fd}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, outgoing: true}, {_id: {$oid: 660d17910638fb53523e72ff}, pipelineServiceType: DATALAKE, metadata: {datalake: {name: mongodb, configuration: {connectionString: mongodb://localhost:27017, collection: datalake}}}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d17920638fb53523e7303}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d17920638fb53523e7304}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, outgoing: true}, {_id: {$oid: 660d190f0638fb53523e7314}, pipelineServiceType: DATALAKE, metadata: {datalake: {name: mongodb, configuration: {connectionString: mongodb://localhost:27017, collection: datalake}}}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d190f0638fb53523e7317}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, incoming: true}, {_id: {$oid: 660d190f0638fb53523e7318}, pipelineServiceType: INGESTION, metadata: {connectionString: mongodb://localhost:27017, database: yyya, collection: data, jsonpathExpressions: []}, entity: abc, pipeId: yyya, message: [{"id":1,"name":"name_1","age":46,"addr":{"email":"name_1@email.com","phone":"123"}},{"id":"2","name":"name_2","age":55,"addr":{"email":"name_2@email.com","phone":"1234"}}], sizeInBytes: 272, outgoing: true}]
yyya > Press exit or CTRL+C to exit
6.6. Move Pipe to development
yyya > Press exit or CTRL+C to exit
move pipe_to_development
*******Move pipe to development********
PIPE_SUCCESSFULLY_REGISTERED
{pipeId: yyya, subscriptionId: 20350d12-7563-4931-82b8-0e7dcb0ef442, pipeName: yyya, pipeStage: DEVELOPMENT, pipeType: PUSH, cleanerFunctions: []}
yyya > Press exit or CTRL+C to exit
6.7. Move Pipe to staging
yyya > Press exit or CTRL+C to exit
move pipe_to_staging
*******Move pipe to staging********
PIPE_SUCCESSFULLY_STAGED
{pipeId: yyya, subscriptionId: 20350d12-7563-4931-82b8-0e7dcb0ef442, pipeName: yyya, pipeStage: STAGED, pipeType: PUSH, cleanerFunctions: []}
yyya > Press exit or CTRL+C to exit
6.8. Move Pipe to production
yyya > Press exit or CTRL+C to exit
move pipe_to_production
*******Move pipe to staging********
PIPE_SUCCESSFULLY_DEPLOYED
{pipeId: yyya, subscriptionId: 20350d12-7563-4931-82b8-0e7dcb0ef442, pipeName: yyya, pipeStage: DEPLOYED, pipeType: PUSH, cleanerFunctions: []}
yyya > Press exit or CTRL+C to exit
7. Supported Target System Data Pipeline Connectors
7.1. MongoDB
A sample MongoDB pipe configuration
{
"pipeId": "flightpipe",
"entity": "flight",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.MongoDBStagingStore",
"name": "mongodb_flight_staging_store",
"config": {
"connectionString": "mongodb://localhost:27017",
"database": "flightpipe",
"collection": "flight",
"jsonpathExpressions": []
}
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.connectionString: MongoDB database connection string for your target store
-
configuration.config.username: MongoDB user
-
configuration.config.password: MongoDB user’s password
-
configuration.config.jsonpathExpressions: Data Transformation based on JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
NOTE: Data is Transformed before delivery to any configured Target Staging Store.
7.2. MySql
A sample MySql pipe configuration
{
"pipeId": "flightpipe",
"entity": "flight",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.MySqlStagingStore",
"name": "mysql_flight_staging_store",
"config": {
"connectionString": "jdbc:mysql://localhost:3306/braineous_staging_database",
"username": "root",
"password": "",
"jsonpathExpressions": []
}
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.connectionString: MySQL database connection string for your target store
-
configuration.config.username: MySQL user
-
configuration.config.password: MySQL’s user’s password
-
configuration.config.staging_table: MySql staging table
-
configuration.config.jsonpathExpressions: Data Transformation based on JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
NOTE: Data is Transformed before delivery to any configured Target Staging Store.
7.3. Elastic Search
A sample Elastic Search pipe configuration
{
"pipeId": "flightpipe",
"entity": "flight",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.ElasticSearchStagingStore",
"name": "elastic_flight_staging_store",
"config": {
"elasticSearchUrl": "http://localhost:9200/",
"index": "yyya",
"security": {
"type": "built_in_users",
"user": "elastic",
"password": "password"
},
"jsonpathExpressions": []
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.elasticSearchUrl: Elastic Search URl of your ElasticSearch instance
-
configuration.config.index: Elastic Search Index for the Staging Store
-
configuration.config.security.type: Elastic Search authentication type
-
configuration.config.security.user: Elastic Search user
-
configuration.config.security.password: Elastic Search user’s password
-
configuration.jsonpathExpressions: Data Transformation based on JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
NOTE: Data is Transformed before delivery to any configured Target Staging Store.
7.4. Snowflake
A sample Snowflake pipe configuration
{
"pipeId": "flightpipe",
"entity": "flight",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.SnowflakeStagingStore",
"name": "snowflake_staging_store",
"config": {
"account_identifier": "<your_snowflake_account_account_identifier>",
"host": "<your_snowflake_computing_host>",
"user": "<your_snowflake_user>",
"password": "<your_snowflake_password>",
"port": "443",
"database": "testdb",
"schema": "public",
"stage": "flightpipe",
"table": "flight",
"pipe": "flightpipe",
"source": "local_fs",
"source_location":"file:///tmp/braineous/",
"jsonpathExpressions": []
}
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.account_identifier: Your Snowflake instance’s account identifier
-
configuration.config.username: Snowflake username
-
configuration.config.password: Snowflake user’s password
-
configuration.config.port: Snowflake port
-
configuration.config.database: Snowflake database
-
configuration.config.schema: Snowflake database schema
-
configuration.config.stage: Snowflake database staging area
-
configuration.config.table: Snowflake database staging table
-
configuration.config.pipe: Snowflake staging pipe that stages the data
-
configuration.config.source: Currently Braineous supports local file system. Support for Amazon S3, Google Cloud Storage, and Miccrosoft Azure will be added in a future release
-
configuration.config.jsonpathExpressions: Data Transformation based on JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
NOTE: Data is Transformed before delivery to any configured Target Staging Store.
7.5. ClickHouse
A sample Clickhouse pipe configuration
{
"pipeId": "flightpipe",
"entity": "flight",
"configuration": [
{
"stagingStore" : "com.appgallabs.dataplatform.targetSystem.core.driver.ClickHouseStagingStore",
"name": "click_house_flight_staging_store",
"config": {
"connectionString": "jdbc:ch://localhost:8123/default",
"username": "default",
"password": "",
"jsonpathExpressions": []
}
}
]
}
-
pipeId : As a data source provider, this id identifies this data pipe uniquely with the Braineous Data Pipline Engine.
-
entity : The business/domain entity that this dataset should be associated with.
-
configuration.stagingStore: The
Staging Store
driver -
configuration.name: a user-friendly way to indentify the target store
-
configuration.config.connectionString: ClickHouse database connection string for your target store
-
configuration.config.username: ClickHouse user
-
configuration.config.password: ClickHouses' user’s password
-
configuration.config.staging_table: ClickHouse staging table
-
configuration.config.jsonpathExpressions: Data Transformation based on JSONPath specification: https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html
NOTE: Data is Transformed before delivery to any configured Target Staging Store.