Oryx2 project realization of lambda architecture

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine learning. It is a framework for building applications, but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering.

Lambda Architecture is a useful framework to think about designing big data applications. Nathan Marz designed this generic architecture addressing common requirements for big data based on his experience working on distributed data processing systems at Twitter.

Some of the key requirements in building this architecture include: Fault-tolerance against hardware failures and human errors Support for a variety of use cases that include low latency querying as well as updates Linear scale-out capabilities, meaning that throwing more machines at the problem should help with getting the job done Extensibility so that the system is manageable and can accommodate newer features easily.

All data entering the system is dispatched to both the batch layer and the speed layer for processing.
The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Any incoming query can be answered by merging results from batch views and real-time views.

Developers can consume Oryx 2 as a framework for building custom applications as well. Following the architecture overview below,If you’re looking to deploy a ready-made, end-to-end application for collaborative filtering, clustering or classification,here are the steps to follow :

Prepare your Hadoop cluster with Cluster Setup
Get a Release
Prepare a config file from the Configuration Reference
Run the binaries with Running Oryx

Learn about the REST API endpoints here API Endpoint Reference

Oryx2 consists of three tiers

Lambda Tier – Providing base implementation which is not specific to machine learning. It internally contains side-by-side cooperating layers of the lambda architecture:

A Batch Layer, which computes a new “result” (think model, but, could be anything) as a function of all historical data, and the previous result. This may be a long-running operation which takes hours, and runs a few times a day for example.
A Speed Layer, which produces and publishes incremental model updates from a stream of new data. These updates are intended to happen on the order of seconds.
A Serving Layer, which receives models and updates and implements a synchronous API exposing query operations on the result.
A data transport layer, which moves data between layers and receives input from external sources

ML Tier Implementation – The ML tier is simply an implementation and specialization of the generic interfaces mentioned above, which implement common ML needs and then expose a different ML-specific interface for applications to fill in.
End-to-end Application Implementation Tier – In addition to being a framework, Oryx 2 contains complete implementations of the batch, speed and serving layer for three machine learning use cases. These are ready to deploy out-of-the-box, or to be used as the basis for a custom application:

Collaborative filtering / recommendation based on Alternating Least Squares
Clustering based on k-means
Classification and regression based on random decision forests

Download link : https://github.com/OryxProject/oryx/releases

InLambda architecture, Oryx2

Minimum Viable Product (MVP) Development: A Startup’s Roadmap to Success

How to Integrate Salesforce with Your Ecommerce Platform : Step-by-Step Guide

Guide To Building Successful eCommerce WordPress Site

How Paraphrasing is Helpful in Academic Work

How to Fix Microsoft Compatibility Telemetry High Disk Usage?

Get smallest, fastest, fully-conformant MicroK8s Kubernetes

How to run Java application as service on Linux

How to set memory limit for your Java containers?

Oryx2 project realization of lambda architecture

Like this:

Related

Average Rating

Leave a Reply Cancel reply

The Ultimate Guide to Boosting Workplace Productivity in 2025

Unlock High-Performance Data Transfers with Apache Arrow Flight

Automate PR/MR Checks with Danger JS: Streamline Your Code Review Process

MinIO for On-Premise Object Storage: A Scalable, Secure Alternative to the Cloud

Mutation Testing Explained: Boost Software Quality with Smarter Test Coverage

Deciding Between Customizing Your Current Tech or Building Your Own Solutions

Share this:

Like this:

Related

Average Rating