Spark on Tap
We are experts on Spark, machine learning, and Data Science modeling.
A single, unifying Stack
Our focus on Spark allows us to leverage it’s unique ability to simplify Machine Learning.
What makes Spark most practical is that it’s a single technology needed to take care of all of the compute needs for Machine Learning:
- Spark Core – contains the basic functionality of Spark, including components for task scheduling, memory management, fault recovery, interacting with storage systems and more.
- Spark SQL – Spark SQL provides support for interacting with Spark via SQL as well as the Apache Hive variant of SQL, called the Hive Query Language (HiveQL).
- Spark Streaming – Spark Streaming enables processing of live stream of data.
- MLlib – MLlib provides multiple types of machine learning algorithms, including binary classification, regression, clustering and collaborative filtering, as well as supporting functionality such as model evaluation and data import.
- GraphX – GraphX is a library added in Spark 0.9 that provides an API for manipulating graphs (e.g., a social network’s friend graph) and performing graph-parallel computations Query Language (HiveQL).
Spark Core API
All of these libraries extend Resilient Distributed Datasets (RDD). Most of the compute can be done using spark-shell itself, and for advanced programming, Scala can be used using your favorite editor like Eclipse.
Now all you need to do is find an expert on your team, or hire one. But wait!
We are Spark experts!
InfoObjects brings together a deep understanding of open source technologies like Spark and critical systems of record technologies like SAP HANA.
Our suite of information strategy, data warehousing, data mining and information analytics services leverage all of the power of Spark and Machine learning. We also fully integrate those services to your enterprise. For Spark and machine learning we provide the following services:
- Writing Spark Jobs
- Spark SQL query optimization
- Visualization and Dashboarding
- Apache Spark Core Usage monitoring
- Making Spark run on AWS/Azure clouds
- Apache Spark cluster optimization
- Troubleshooting and tuning Data ingress and egress related issues
- Storage layer specific optimization
- Spark Streaming optimization and Spark ETL tuning
Don’t try this at home!
…unless you want to get your feet wet with Spark. Infoobjects periodically publishes new Spark recipes with tips and tricks on how to do cool stuff with it. Scroll down to start browsing, or click here to let the heroes take care of it.
Use-case: HBase Servers are in a Kerberos Enabled Cluster. HBase Servers (Masters and RegionServers) are configured to use Authentication to Connect to Zookeeper. Assumption: HBase + secured Zookeeper. This Java code snippet can be used to connect to HBase configured...read more
Hungry for more?
Rishi Yadav, CEO of Infoobjects, has published a Spark Cookbook available for purchase.