This Week in Big Data Analytics (July 24, 2016) – Weekly roundup

July 24, 2016 admin 0

Here’s our weekly roundup of articles related to Big Data Analytics, Machine Learning and Data Science. Hope you find it useful – please don’t forget to subscribe! Monitoring Hadoop’s health and performance metrics This article explores the key Hadoop metrics – HDFS metrics (NameNode and DataNode), MapReduce counters (Job, Task, […]

This Week in Big Data Analytics (July 3, 2016)

July 4, 2016 admin 0

Here’s our weekly roundup of updates on Big Data Analytics. Hope you find it useful – please don’t forget to subscribe! 2016 Hadoop Summit @ San Jose, CA (Jun 28 – 30) Spring for Apache Hadoop 2.4.0 GA released Supports Apache Hadoop stable 2.7.1, Pivotal HD 3.0, Cloudera CDH 5.7, Hortonworks HDP […]

Apache POI: Stream-reading large xlsx-type excel spreadsheet

June 21, 2016 admin 0

We tried to load a large spreadsheet file of type .xlsx (hundreds of MBs) for an analytics project using Apache POI’s XSSFWorkbook and we were constantly getting Out of Memory exception. We realized POI read the entire spreadsheet in one go resulting in higher memory usage and hence the exception. We looked for POI interfaces that’d […]

Running a single node Couchbase server using docker

June 20, 2016 admin 1

Couchbase Server, earlier known as Membase, is an open source, distributed NoSQL document-oriented database with shared-nothing architecture. It exposes a fast easy-to-scale key-value store with a managed cache for sub-millisecond data operations, purpose-built indexers for fast queries and a query engine for executing SQL-like queries. In the parlance of Eric Brewer’s […]

Open source web traffic analytics & dashboard tool using D3.js

June 20, 2016 admin 0

US government’s web traffic analytics is available online for anyone to see at https://analytics.usa.gov/ providing a visual representation of people’s online interactions in the websites of US government agencies and departments. U.S. government’s Google Analytics account, known by name – Digital Analytics Program, consolidates the data available across about 5000 websites and it […]

Copy Protected by Chetan's WP-Copyprotect.