Here’s our weekly roundup of articles related to Big Data Analytics, Machine Learning and Data Science. Hope you find it useful – please don’t forget to subscribe!
- Hadoop/HDFS multi-tenancy support
HDFS, as one of the most widely used storage infrastructure in Hadoop ecosystem, has limited multi-tenancy support. Many upstream projects such YARN and HBASE have added various multi-tenancy features, respectively. This talk explores the existing multi-tenancy features including their use cases and limitations, and ongoing work to provide better multi-tenancy support for Hadoop Ecosystem from HDFS layer such as Effective Namenode Throttling, Datanode and Yarn Qos integration.
- Hadoop Summit San Jose 2016 Wrap-up – ODPi progress
- Installing Hadoop on a single node running CentOS 7
- Horses for Courses: Apache Spark Streaming and Apache Nifi
Comparing Apache Nifi and Apache Spark Streaming for different streaming and IOT use cases.
- MongoDB and Apache Spark at China Eastern Airlines
New MongoDB Connector for Apache Spark Enables New Fare Calculation Engine, Supporting 180m Fares and 1.6 billion Queries per Day, Migrated off Oracle.
- Getting started with GraphFrames in Apache Spark
GraphX is one of the 4 foundational components of Spark — along with SparkSQL, Spark Streaming and MLlib — that provides general purpose Graph APIs including graph-parallel computation.
- Structured Streaming (aka Streaming Datasets) – Mastering Apache Spark
Structured Streaming is a new computation model introduced in Spark 2.0.0. It has a high-level streaming API built on top of Datasets (inside Spark SQL engine) for continuous incremental execution of structured queries.
- Combining machine learning frameworks with Apache Spark
Machine Learning (ML) workflows involve a sequence of processing and learning stages. Realistic workflows combine specialized libraries with more general data management workflows. Apache Spark is well-known as a powerful platform to perform iterative computations required for ML. This talk presents how to combine the strengths of Spark’s ML library (MLlib) with popular packages such as CoreNLP, scikit-learn, and TensorFlow.
- Building a Machine Learning Orchestration Framework on Apache Mesos This talk outlines how Docker, Spark, Hadoop and several other building blocks can be integrated into a machine learning framework on Mesos. Mesos framework leverages custom executors, framework/status messages and resource attributes to schedule tasks in a multi-tenant environment. A heterogeneous workload of Spark, Python, R & Scala tasks co-exist and run thousands of computations concurrently on an elastic Mesos cluster of hundreds of nodes.
- University of Zurich: Machine Learning Introduction and Data Sets. Example using Weka [Slides]
- Learn to Create D3.js Data Visualizations by Example
- MATLAB implementation of the TensorFlow Neural Networks Playground
Inspired by the TensorFlow Neural Networks Playground interface readily available online at http://playground.tensorflow.org/, MathWorks released a MATLAB implementation of the same Neural Network interface for using Artificial Neural Networks for regression and classification of highly non-linear data.
- AI, Deep Learning, and Machine Learning: A Primer
From types of machine intelligence to a tour of algorithms, a16z (Andreessen Horowitz) Deal and Research team head Frank Chen walks us through the basics (and beyond) of AI and deep learning in this slide presentation.