Turning big data into actionable insights

[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: Evangelos Simoudis on data mining, investing in data startups, and corporate innovation. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. Can developments in data science and big data infrastructure […]

Continue reading


From search to distributed computing to large-scale information extraction

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. February 2016 marks the 10th anniversary of Hadoop — at a point in time when many IT organizations actively use Hadoop, and/or one of the open source, big data projects that originated after, and in some […]

Continue reading


Celebrating the real-time processing revival

[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in the […]

Continue reading


The tensor renaissance in data science

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning. After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries for […]

Continue reading


More tools for managing and reproducing complex data projects

A survey of the landscape shows the types of tools remain the same, but interfaces continue to improve. [A version of this post appears on the O’Reilly Radar.] As data projects become complex and as data teams grow in size, individuals and organizations need tools to efficiently manage data projects. A while back, I wrote […]

Continue reading


A real-time processing revival

[A version of this post appears on the O’Reilly Radar blog.] Things are moving fast in the stream processing world. There’s renewed interest in stream processing and analytics. I write this based on some data points (attendance in webcasts and conference sessions; a recent meetup), and many conversations with technologists, startup founders, and investors. Certainly, […]

Continue reading


Let’s build open source tensor libraries for data science

[A version of this post appears on the O’Reilly Radar blog.] Tensor methods for machine learning are fast, accurate, and scalable, but we’ll need well-developed libraries. Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra, […]

Continue reading


Forecasting events, from disease outbreaks to sales to cancer research

[A version of this post appears on the O’Reilly Radar blog.] The O’Reilly Data Show Podcast: Kira Radinsky on predicting events using machine learning, NLP, and semantic analysis. Editor’s note: One of the more popular speakers at Strata + Hadoop World, Kira Radinsky was recently profiled in the new O’Reilly Radar report, Women in Data: […]

Continue reading