Celebrating the real-time processing revival

[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in the […]

Continue reading


The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries for […]

Continue reading


More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wrote […]

Continue reading


A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly, […]

Continue reading


Let’s build open source tensor libraries for data science

[A version of this post appears on the O’Reilly Radar blog.] Tensor methods for machine learning are fast, accurate, and scalable, but we’ll need well-developed libraries. Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra, […]

Continue reading


Time-turner: Strata San Jose 2015, day 2

[Our friends at Dato created an interesting content-based, Strata session recommender. Check it out here.] There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same […]

Continue reading


Time-turner: Strata San Jose 2015, day 1

[Our friends at Dato created an interesting content-based, Strata session recommender. Check it out here.] There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same […]

Continue reading


Forecasting events, from disease outbreaks to sales to cancer research

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis. Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data: […]

Continue reading