Posted inBasics Spark Streaming Posted by admin August 13, 2021 With Spark core, we can analyze batch of data (say, daily). With Spark Streaming, we…
Posted inBasics Streaming Data Posted by admin August 13, 2021 So far we assumed data is sitting on the cluster, but, we didn’t say how…
Posted inBasics Query Engines Posted by admin August 13, 2021 There are handful of solution to query the data in the whole cluster, or from…
Posted inBasics Cassandra Posted by admin August 13, 2021 A distributed NoSQL database with no point of failure (no master node). It has a…
Posted inBasics External Data Storage Posted by admin August 13, 2021 We can use and integrate external data storage, relational and non-relational databases to expose the…
Posted inBasics Tez Posted by admin August 13, 2021 It runs on top of YARN. It’s an alternative to MapReduce. In fact, we can…
Posted inBasics Mesos Posted by admin August 13, 2021 It’s an alternative to YARN, yet it’s still different. It’s used by Twitter, not directly…
Posted inBasics Hadoop Posted by admin August 13, 2021 This article is not a technical deep dive. It’s rather a quick glimpse over what…
Posted inBasics Optimizing Our Puppeteer Script Posted by admin August 13, 2021 The general idea is to not let the headless browser do any extra work. This…
Posted inBasics Setup Headless Chrome and Puppeteer Posted by admin August 13, 2021 I’d recommend installing Puppeteer with npm, as it’ll also include the stable up-to-date Chromium version that…