Architecting and building end-to-end streaming applications

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Karthik Ramasamy on Heron, DistributedLog, and designing real-time applications. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode […]

Continue reading


Building Apache Kafka from scratch

[A version of this post originally appeared on the O’Reilly Radar blog.] In this episode of the O’Reilly Data Show Podcast, Jay Kreps talks about data integration, event data, and the Internet of Things. At the heart of big data platforms are robust data flows that connect diverse data sources. Over the past few years, […]

Continue reading


How companies are using Spark

[A version of this post appears on the O’Reilly Strata blog.] When an interesting piece of big data technology gets introduced, early1 adopters tend to focus on technical features and capabilities. Applications get built as companies develop confidence that it’s reliable and that it really scales to large data volumes. That seems to be where […]

Continue reading


Simplifying interactive, realtime, and advanced analytics

[A version of this post appears on the O’Reilly Strata blog and Forbes.] Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year after […]

Continue reading


Big Data and Advertising: In the trenches

[A version of this post appears on the O’Reilly Strata blog.] The $35B merger of Omnicom and Publicis put the convergence of Big Data and Advertising1 in the front pages of business publications. Adtech2 companies have long been at the forefront of many data technologies, strategies, and techniques. By now it’s well-known that many impressive […]

Continue reading


Near realtime, streaming, and perpetual analytics

[A version of this post appears on the O’Reilly Strata blog.] Simple example of a near realtime app built with Hadoop and HBase Over the past year Hadoop emerged from its batch processing roots and began to take on interactive and near realtime applications. There are numerous examples that fall under these categories, but one […]

Continue reading