Efficient Textual Similarity Across Millions of Web Queries

Computing textual similarity (such as Jaccard similarity coefficient) between millions of search queries can be an arduous task. The main challenge is the number of pairs that one needs to consider; a relatively small dataset containing ten thousands queries leads to more than 49 million possible query pairs (). Based on Vernica, et.al. paper, I show […]

Continue reading


SAS Takes Next Steps to Cloud Analytics

SAS Viya is now available as the cloud-friendly platform for SAS Visual apps and, soon, SAS 9. Next up should be more cloud-based services options. SAS, like many well-established tech vendors, has to keep one eye on the future and one eye on the past. At the April 2-5 SAS Global Forum in Orlando, FL, the […]

Continue reading


Cloudera Focuses Message, Takes Fifth On Pending Moves

Cloudera executives can’t talk about IPO or cloud-services rumors. Here what’s on the record from the Cloudera Analyst Conference. There were a few elephants in the room at the March 21-22 Cloudera Analyst Conference in San Francisco. But between a blanket “no comment” about IPO rumors and non-disclosure demands around cloud plans — even whether […]

Continue reading


Spark Gets Faster for Streaming Analytics

Spark Summit East highlights progress on machine learning, deep learning and continuous applications combining batch and streaming workloads. Despite challenges including a new location and a nasty Nor’easter that put a crimp on travel, Spark Summit East managed to draw more than 1,500 attendees to its February 7-9 run at the John B. Hynes Convention […]

Continue reading


Spark Summit East Report: Enterprise Appeal Grows

Spark adopters including Bloomberg, Comcast, Capital One and EBay share compelling use cases. Data processing, streaming and analytics use-case scenarios multiply. What’s the business case for Apache Spark? After the opening (general-session) day of Spark Summit East 2016 in New York, I was thinking that Spark promoter and Summit host Databricks needed to do a […]

Continue reading


MapR Ambition: Next-Generation Application Platform

MapR promises a more scaleable, reliable, real-time-capable and converged alternative to Hadoop, NoSQL databases and Kafka combined. Are companies buying it? MapR is frequently mentioned in the same breath with Hadoop vendors Cloudera and Hortonworks, but maybe it’s time to stop thinking of them as competitors. Indeed, over the last eighteen months, MapR has added […]

Continue reading


Teradata Amps Up Cloud And Consulting Offerings

Teradata is thinking outside of box sales with VMware, AWS and Azure deployment options and new solutions and consulting services. At this week’s Teradata Partners Conference in Atlanta the company hit several important cloud milestones with its “Teradata Everywhere” and “Borderless Analytics” announcements. And in another sign that it’s evolving, Teradata also announced a range […]

Continue reading


Hewlett Packard Enterprise Powers Machine Learning Apps, Revs Vertica Database

HPE streamlines use of machine learning services with Haven OnDemand Combinations. Vertica release improves performance, adds Hadoop and Spark support. Hewlett Packard Enterprise announced August 30 at its HPE Big Data Conference in Boston that it’s making its library of machine learning services easier for developers to build into smart, “cognitive” applications through Haven OnDemand […]

Continue reading