February 2011
19 posts
4 tags
IBM's Watson Uses Hadoop →
This Information Management article reports that IBM is using open source technologies, including Hadoop, in its Watson system that will compete on Jeopardy in February. Article: How IBM’s Watson Churns Analytics
Feb 1st
January 2011
21 posts
NoSQL at Netflix →
Yury Izrailevsky, the Director of Cloud and Systems Infrastructure at Netflix, wrote a great post on how NoSQL systems are in use at the company. The post discusses the mindset adjustment when moving away from traditional ACID database systems to systems that only satisfy two of the three CAP properties. Most big corporations would have a big job retraining their in house IT developers to...
Jan 29th
2 tags
The Periodic Table of Google APIs →
Here is a diagram of Google’s application programming interfaces (APIs) in a format that would be familiar to anyone who’s taken high school chemistry: a periodic table. Who knew Blogger had an API? Link: Periodic Table of Google APIs
Jan 27th
2 notes
3 Skills a Data Scientist Needs →
LinkedIn has one of the best teams of big data experts in the world. In this O’Reilly video, Pete Skomoroch from LinkedIn explains what skills are necessary to excel in the data scientist role. Video: 3 Skills A Data Scientist Needs (O’Reilly)
Jan 27th
Impressions of Excella Consulting After 4 Months
I joined Excella Consulting back in September 2010 after two stints with very large consulting firms. My first experience at Booz Allen Hamilton in their Charlottesville office was great. I learned a lot, worked with great people who I still keep in touch with, and really respected the leadership in the Charlottesville office. It was a tremendous growth opportunity when I was only two years out...
Jan 25th
Who Needs Software Engineers with Big Data Skills?
Google, Facebook, LinkedIn, and Twitter have software engineers using big data analysis tools, but what other companies need these skill sets at the beginning of 2011? GitHub: Software Engineer, Big Data Groupon: Software Engineer, Big Data Infrastructure Amazon: Data Engineer (Amazon Web Services) Massive Data News also has a list of jobs that require big data skills:...
Jan 22nd
2 tags
Linked Data Will Succeed Where the Semantic Web... →
I never understood the W3C’s push behind the semantic web. Yes, if done right with accurate markup and advanced parsers in browsers and applications it could provide much of the “intelligence” currently lacking for searching the web for more than just keywords. But it seemed like too much developer work for little benefit. Also, how could anyone ensure the RDF semantic...
Jan 20th
3 tags
The Growth of Linked Data →
ReadWriteWeb has a summary of the growth of published connected and structured data, known as “linked data.” There are several great diagrams that show how major data sources continue to proliferate and integrate on the web. Article: The Growth of Linked Data (ReadWriteWeb)
Jan 20th
3 tags
Military Data Overload →
The military is producing and collecting very large amounts of data from unmanned aerial vehicles, spy satellites, communications channels, internal applications used by intelligence analysts, and reports from troops in the field. This article by the NY Times describes the result of attempting to handle all of that data: overload. The military needs new visualization techniques and analysis...
Jan 18th
MapReduce: From the Basics to the Useful →
This article is a great introduction to both NoSQL and MapReduce. The author’s goal is to explain the basic concepts, show code, and examine how MapReduce can be useful. Article: MapReduce from the basics to the actually useful (in under 30 minutes)
Jan 15th
1 tag
7 Classic Visualization Papers →
Enrico Bertini provides a list of 7 papers that influenced the data visualization field. Enrico admits there are some newer papers that are just as influential but he chose to only include older papers that set the foundation for the discipline. Article: 7 Classic Visualization Papers
Jan 15th
Secrets of BackType's Data Engineers →
This is a great article that answers the question “how exactly can you derive meaningful information from big data using existing technologies?” BackType has 3 engineers and is currently using Hadoop and Cassandra, among other home grown tools, to analyze Twitter, Facebook, blogs, and other user-generated content sources and provide useful information to companies that use their...
Jan 13th
2 tags
Dojo Toolkit - How to Get the Value Attribute From...
I spent a couple of hours today with Dojo Toolkit trying to figure out how to get the value attribute (instead of the text value) from the dijit.form.ComboBox widget’s option elements. A bunch of Google searches ended in finding other people who asked the same question, but no real answers. So how do you get the value attribute from a Dojo ComboBox widget’s selected option element? ...
Jan 13th
Jan 12th
The NoSQL Tapes →
The NoSQL Tapes site is a compilation of videos and case studies with influential people in the NoSQL field. The site just launched but already has several videos with many more in the “coming soon” list. Website: The NoSQL Tapes
Jan 11th
Google's BigQuery →
Google is granting limited access to its BigQuery functionality in Google Apps. BigQuery allows people to use the Spreadsheet application to run SQL-like queries against data sets using Google’s infrastructure. If BigQuery is opened to the general public and compelling use cases are created, could this become the “killer app” that Google Spreadsheet has over Microsoft Excel? ...
Jan 9th
How Twitter Uses NoSQL →
Twitter currently handles about twelve terabytes of new data daily. A couple of years ago, when Twitter was mostly a Ruby on Rails and MySQL application, infrastructure stability was a major issue for them. The difficulties prompted Twitter to move to a NoSQL solution. Considering the tremendous growth they’ve had since then and the lack of serious downtime the switch has been very...
Jan 8th
4 Websites for Aspiring Data Journalists  →
O’Reilly gives an overview of four websites that provide raw data on Web traffic and site popularity. Aspiring data journalists can analyze and extract information from these sources to find interesting patterns and combine them with other sources to create original reports. Article: 4 Websites for Aspiring Data Journalists
Jan 7th
Using Google Refine to Clean Messy Data →
Google Refine 2.0 was released late last year as free software for cleaning up messy data sets. Refine is a powerful tool for working with unstructured data, extracting value from it, and linking it to other data sets. This tutorial is a great starting point beyond Google’s own documentation for how to get started. Article: Using Google Refine to Clean Messy Data
Jan 6th
ReadWriteWeb: 2011 Predictions →
In addition to Information Management’s 2011 prediction that big data will move further into the mainstream, one of ReadWriteWeb’s columnists posted a similar prediction. Audrey Watters describes “Data Scientist” as the hot new occupation in addition to growth in data storage, processing, and analytics sectors. Article: ReadWriteWeb: 2011 Predictions
Jan 5th
6 Predictions For The Year Ahead →
Information Management magazine ranks big data as one of its six big IT trends for 2011. They expect big data to move further into the mainstream as companies throw away less data produced in anticipation of extracting value from it in the future. Article: 6 Predictions For The Year Ahead
Jan 4th