Programming collective intelligence for financial trading

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. […]

Continue reading


InnoCentive

InnoCentive Home Page InnoCentive is a Waltham, Massachusetts-based crowdsourcing company that accepts by commission research and development problems in engineering, computer science, math, chemistry, life sciences, physical sciences and business. The company frames these as “challenge problems” for anyone to solve. It gives cash awards for the best solutions to solvers who meet the challenge criteria.[1] […]

Continue reading


Data preparation in the age of deep learning

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Lukas Biewald on why companies are spending millions of dollars on labeled data sets. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, […]

Continue reading


SiSense: Scaling Users, Not Data

Amit Bendov, SiSense In a new Saturday Night Live sketch, “Secretary Sebelius” explains that HealthCare.gov is so slow because it was designed to handle only six users at a time. We’ve become accustomed to everything online slowing down with additional users, but SiSense last week announced the general availability of patent-pending technology that makes its big data analytics engine […]

Continue reading


The OED, Big Data, and Crowdsourcing

The term “big data” was included in the most recent quarterly online update of the Oxford English Dictionary (OED). So now we have a most authoritative definition of what recently became big news: “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.” Beyond succinct definitions, the enchanting beauty of the […]

Continue reading


Revisiting Big Data and Crowdsourcing: Kaggle Today

I launched this blog a year ago in June 2011. In one of my first posts, I discussed “Crowdsourcing and Big Data,” offering a typology of crowdsourcing and connecting it to big data by mentioning a little-known (at the time) Australia-based venture called Kaggle. Today, Kaggle is a well-funded, Silicon Valley-based leading platform for predictive modeling […]

Continue reading


Building a business that combines human experts and data science

The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams. [A version of this post appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. In this episode of the O’Reilly Data Show, I spoke […]

Continue reading


“Humans-in-the-loop” machine learning systems

Next week I’ll be hosting a webcast featuring Adam Marcus, one of the foremost experts on the topic of “humans-in-the-loop” machine learning systems. It’s a subject many data scientists have heard about, but very few have had the experience of building productions systems that leverage humans: Crowdsourcing marketplaces like Elance-oDesk or CrowdFlower give us access […]

Continue reading


Real-world Active Learning

Beyond building training sets for machine-learning, crowdsourcing is being used to enhance the results of machine-learning models: in active learning, humans take care of uncertain cases, models handle the routine ones. Active Learning is one of those topics that many data scientists have heard of, few have tried, and a small handful know how to […]

Continue reading


Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists [A version of this post appears on the O’Reilly Data blog and Forbes.] Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons […]

Continue reading