Big-data analysis is the process of looking for buried or secret patterns that influence a predictive outcome. Determining which 'patterns' to look for requires much human intuition as you may have seen in the movie "Limitless."
Researchers from the Massachusetts Institute of Technology (MIT) are planning to take out the need for human intuition in big-data analysis and let computers comb the data for predictive patterns.
The MIT researchers planning to take this big risk in big-data analysis are from the Computer Science and Artificial Intelligence Laboratory (CSAIL). The prototype of the software system is called 'Data Science Machine'. In order to test the prototype's ability, they listed it in three data science competitions. The prototype has successfully beaten 615 out of 906 human teams.
Data Science Machine's predictions in the two competitions were 94 percent and 96 percent as precise as the successful submissions. In the third competition, Data Science Machine's prediction was 87 percent as accurate. The human teams took months to figure the patterns. Data Science Machine's predictions were revealed between two to 12 hours.
"We view the Data Science Machine as a natural complement to human intelligence. There's so much data out there to be analyzed. And right now it's just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving," said Max Kanter whose computer science master thesis is the foundation for the Data Science Machine.
Big-data is a massive and multifaceted network. While much of it is algorithmic and automated, humans are still needed to search for features that eventually reveal the secret patterns. That's where human intuition comes in handy, it allows one to visualize an end result, connect the data and make it happen.
Kanter's thesis advisor, CSAIL research scientist Kalyan Veeramachaneni is the co-leader of CSAIL's Anyscale Learning for All group, who utilizes machine-learning tactics to solve common problems in big-data analysis. These common problems can range from determining a wind-farm's capacity for generating power to foreseeing which students will most likely drop out of courses offered online.
Kanter's paper on Data Science Machine will be presented (PDF) at the IEEE International Conference on Data Science and Advanced Analytics on Oct. 19 to 21 at Sarlat-la-Canéda, France.