Data Lake
Data Lake
A 'datalake store' is the internal store the pipeline delivers the data for applications that have use cases related to analytics and machine learning. Braineous is built on Apache Flink as its data processing engine and supports Apache Hive based data lakes. Future releases of Braineous will include a Data Lake Connector framework that can support custom data lakes developed by the customer.
Apache Hive
Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered, defined, and evolved. Flink offers a two-fold integration with Hive. The first is to leverage Hive’s Metastore as a persistent catalog with Flink’s HiveCatalog for storing Flink specific metadata across sessions. For example, users can store their Kafka or Elasticsearch tables in Hive Metastore by using HiveCatalog, and reuse them later on in SQL queries. The second is to offer Flink as an alternative engine for reading and writing Hive tables. The HiveCatalog is designed to be “out of the box” compatible with existing Hive installations. You do not need to modify your existing Hive Metastore or change the data placement or partitioning of your tables.
ETL/ELT
Braineous allows for seamless data integration. Customers have to spend months of integration cost in both money and time to just do the Transform part of ETL. With Braineous they simply send raw data and 'Transormation' via Configuration makes systems talk to each other saving time and money.
Machine Learning
What challenge does Braineous solve for Machine Learning ? From a birds eye view, machine learning is a way to teach a computer how to think and make decisions which could also translate to prediction as it can learn patterns of data as it learns just like a human brain but at massive scale and then predicts an outcome. Accuracy of prediction is the goal. In Aviation, if I am an airline, it could be what will my flight network looks like in terms of delays and disruptions 2 weeks from now, so I am better prepared with crew as in pilot/ crew hours and flight plans. How can I offer an optimized solution to mitigate having to put passengers on another flight in my network that day.
To do that at a high level, you need to train this machine using data relevant to the application. More data you can better the accuracy. This is where Braineous data pipeline platform focuses as it can process data at high scale
Braineous would provide the infrastructure a data scientist and machine learning engineer would need to focus on their main goal which is building the AI model.