PC World reviews the dispute about why Linpack is not a one-stop measurement tool for the performance of supercomputers. One of the main reasons: it does not stress memory systems that are critical to big data problems.
There is no universally agreed upon standard for a big data stack. The NoSQL and big data movements are still in their infancy. It will likely be several years before a single "LAMP"-style stack becomes normalized.
Data collected by telecommunications companies from our cell phones can be used as an input into predictive algorithms. This technique is only beginning and will continue to grow as the number of data sources about our individual activities increases over the next decade.
It is interesting Facebook chose HBase over its own Cassandra NoSQL database, but as the article describes the eventual consistency model was not the right fit for real time messaging. Good for Facebook that technical considerations trumped potential intra-company political conflicts by not backing Apache Cassandra, which they open sourced in 2008.
EMC is making major inroads into cloud computing and the big data space. Their latest acqusition of Isilon, comes after they bought Greenplum, a data warehouse provider, in July.
PhysOrg.com describes a new supercomputer benchmark which differs from the current standard, Linpack. Linpack is a computationally intensive benchmark that contains a small executable and measures a machine’s double precision floating point operations per second (FLOPS). The new benchmark, Graph500, instead measures how well a system analyzes graph-based structures with a large data set. The actual mentions several fields such as the natural sciences in which this benchmark is more applicable than Linpack for measuring how fast a supercomputer can solve real world problems.
Yesterday, Google introduced a new tool for working with messy data sets and cleaning inconsistencies.
This article is worth checking out for the visualization alone. It discusses Gravity, a new startup that is creating a web application that crawls social networking sites and produces a profile of you and your friends’ interests. The purpose of the profile is to provide insight into who in your social network shares interests with you. You may discover casual acquaintances actually share things in common that you never knew about. Discovery and visualization are two major facets of the coming data revolution.
ReadWriteWeb does a good job of bringing together multiple sources to compare current NoSQL solutions.
For online businesses, user interface and data field design have a major impact on profits. This article presents an interesting perspective by Expedia on how their website was losing millions of dollars in incomplete transactions due to user confusion.