I’m blogging over at http://www.mattmakai.com/ now. Tumblr’s been great over the past two years but I wanted more flexibility to revamp my web presence. My latest posts since January 1, 2012 are all over on my new site.
Link: Matt Makai’s Blog
At last, a version of Django that isn’t a maintenance release is out, albeit in alpha form (not for production use):
https://www.djangoproject.com/weblog/2011/dec/22/14-alpha-1/
I’m currently researching build tools. I need a build and configure tool for a iOS/Android/BlackBerry mobile app with a single set of core components but many configurations for different app deployments. I could hack something together with Python or Bash shell scripts, but there’s a reason why build tools exist. Options:
Building and Testing with Gradle by Tim Berglund and Matthew McCullough is an O'Reilly book was the logical pick. I saw Matt speak on Hadoop & Git at No Fluff Just Stuff last year and really enjoyed his clear, concise explanations. This book is also clear, concise, and only took a few hours to read and work through the examples.
My notes for the book are on my programming notes Github repository. The notes won’t make as much sense without reading the book, so I suggest picking up the eBook copy over at O'Reilly.
Over the next couple of days I’ll be working through producing a Gradle build for the mobile app I’m working on for Excella. I’ll post my full impressions of Gradle after I’ve had more time with the tool.
Android and BlackBerry expect two different folder locations for resources. Android has its assets/www folder while BlackBerry expects just a www folder under the project root directory. Since the BlackBerry .cod file is opaque after compilation, it makes it difficult to understand how resources are structured in the file.
I found that putting the BlackBerry config.xml file under assets/www and referencing my HTML file within the assets/www folder worked fine as long as there is a www folder in the root of the project which contains plugins.xml, the ext folder with the phonegap.1.2.0.jar file, and a resources directory with my icons and splash screen image.
Make sure to modify the BlackBerry build.xml file to reference the config.xml file in the assets/www folder and to include both the assets/www folder as well as the www under the project’s root directory.
With this folder set up, I was able to successfully deploy to both Android and BlackBerry without duplicating any files within the project structure.
Once I integrate iOS into the project structure I’ll post the code on Github.
I spent a couple of hours yesterday evening trying to figure out why I only got a blank white screen after the app splash screen on the BlackBerry 9930 with a JQM 1.0 and PhoneGap 1.2 app that worked fine on the 9550 simulator.
Apparently this problem has been duplicated and the “recommended” fix is to use JQM Alpha 4 (not an option for me in this case). See this post on Google Groups.
I’ve opened an outstanding Github issue ticket for the problem and hopefully can extract the code for analysis.
Dear BlackBerry,
I’m going out of my way to develop a cross-platform app that will work on your devices. Don’t inconvenience me by making me download some “Akamai Net Session” software which I have no idea what it does before I can even download your simulator or SDK files.
Thanks,
Matt
I upgraded from PhoneGap (Apache Callback) 1.1 to version 1.2 today. Unfortunately, I encountered the following cryptic stacktrace:
ERROR/AndroidRuntime(9469): FATAL EXCEPTION: Thread-9
java.lang.RuntimeException: Can’t create handler inside thread that has not called Looper.prepare()
at android.os.Handler.<init>(Handler.java:121)
at android.webkit.WebView$PrivateHandler.<init>(WebView.java:7341)
at android.webkit.WebView.<init>(WebView.java:416)
at android.webkit.WebView.<init>(WebView.java:967)
at android.webkit.WebView.<init>(WebView.java:957)
at android.webkit.WebView.<init>(WebView.java:948)
at com.phonegap.DroidGap.init(DroidGap.java:268)
at com.phonegap.DroidGap.loadUrlIntoView(DroidGap.java:381)
at com.phonegap.DroidGap.access$300(DroidGap.java:159)
at com.phonegap.DroidGap$3.run(DroidGap.java:537)
at java.lang.Thread.run(Thread.java:1102)
The same stacktrace is found under issue #23 for PhoneGap.
Unfortunately PhoneGap 1.2 isn’t patched yet to accomodate the issue. So there’s 2 options if you hit this snag:
Hopefully this issue will be resolved soon with a new PhoneGap release (maybe a 1.2.1 release)?
Recent college grads and young professionals, listen up: you’ve got 5-10 years. The labor market has been changing for a couple of decades and will continue in its trajectory. Today’s reality is that large companies are willing to pay you a decent wage for your first 5-10 years worth of work. After that, the gig is up.
Depending on your industry, after your first 5-10 years, you need to prove you’re worth real money. You can do that in one of two ways, depending on your personal strengths and weaknesses (be honest with yourself about them). The two choices are:
There will be no other reliable options. There will always be other options, but they won’t be reliable. They are mirages held out by large companies to trick you into continuing to work for them until they lay you off. For example, you could try to get into mid-management and “climb the ladder.” Chances are high you will be laid off in that position long before you ever reach the executive ranks. There’s simply not enough “executive” positions to go around.
So knowing you have 5-10 years, you need to prepare today for that inevitable future. Whether you work for a non-profit, a law firm, a tech firm, or even in government, you need to figure out how you can successfully start your own firm or join an existing firm and be directly responsible for its growth. That’s the reality for our generation. If we embrace it, we can make it work for us.
I came across this blog post on the Python Ecosystem while reading Hacker News yesterday. What a fantastic post. Highly recommended reading for aspiring Python developers.
Link: Python Ecosystem
To understand Django, it helps to know its origins and how the framework has evolved. Django’s developers have added many features since the framework’s 1.0 release back in September 2008. Some of the major additions include aggregation support in the ORM, multi-database support, CSRF protection, a messaging framework, and many improvements to the models framework. This post outlines the major framework changes Django has gone through from 1.0 through the current stable 1.3 release. The intention for 1.x releases is that only minor code changes are necessary to transition Django projects from one version to another.
Django 1.0
(1.0 SVN Revision 8961, 1.0.4 SVN Revision 11613) [1]
Django 1.0 was released September 3, 2008 after a three year public incubation period and a total of five years of development [2]. Release 1.0 included features that are still core to the framework today, including the Model-Template-View (MTV) architecture, Object-Relational Mapper (ORM) models, explicit URL resolution with regular expression support, and a lightweight template system.
Django 1.1
(1.1 SVN Revision 11366, 1.1.4 SVN Revision 15477)
Django 1.1 was released Jul 29, 2009, just shy of 11 months after the official 1.0 release [3]. Version 1.1 included the following new features:
Django 1.2
(1.2 SVN Revision 13285, 1.2.5 SVN Revision 15476)
Django 1.2 was released May 17, 2010, approximately 10 months after the 1.1 release [4]. Major features added to Django in this release:
Django 1.3
(1.3 SVN Revision 15906, 1.3.1 SVN Revision 16771)
Django 1.3 was released on March 23, 2011 [5]. New features in Django 1.3 included:
Django 1.4 and Beyond
There is currently no release date set for Django 1.4 although work continues in the public development branch. Django 1.4 will drop support for Python 2.4. The motivation behind the change is to use more context managers, a creation from Python 2.5, and make the internal Django code better. I wrote a post on Django 1.4 and how one of the Django core committers, Alex Gaynor, sees Django 1.4.
[2] Django 1.0 Release Announcement
[3] Django 1.1 Release Announcement
JQuery Mobile 1.0 has been released! A big part of the release is the maturity of the project and speed up in page rendering time (a big previous complaint especially when creating mobile native apps wrapped with PhoneGap).
A note of caution: JQuery Mobile 1.0 is only compatible with JQuery 1.6.4. JQuery 1.7 will be supported by JQuery Mobile 1.1.
I’m currently working on moving from my traditional Apache/mod_wsgi set up on Ubuntu to the new Django-serving community favorite, Gunicorn/Nginx.
I found the following links very beneficial for getting an initial set up going locally then moving over my production servers to the new configuration:
Python Weekly has provided me a lot of value over the past couple of weeks. I originally found out about the newsletter on Hacker News. I thought, “I’ll try one newsletter then unsubscribe if it’s not worth my time.”
Boy have I been pleasantly surprised. There are loads of great articles on Django deployments, best practices for settings.py configurations, interesting pip packages, and tips on Python programming best practices.
Thanks to the curator for putting together what has quickly become a must-read for Python developers.
You can check out an example newsletter and sign up at http://www.pythonweekly.com/.
My primary piece of advice to college students who are computer science majors is this: double major.
Computer science is great for understanding how computers work, programming, and learning the theory of computation. But where it really matters is how you apply those principles to real world problems that exist outside the computer science field.
If you double major, you’ll be exposed to a different discipline and begin to understand its problems. Hopefully down the road you can use your computer science knowledge to create solutions to those problems and produce real value in that field.
That’s my 2 cents for college computer science majors coming from someone who’s far enough outside of school to have some perspective on how you can produce value.
I do the majority of my development work on virtualized Ubuntu instances to closely mimic my production deployment environment. Today I needed to access a Django server running on the VirtualBox instance from my host operating system (Windows 7).
I was simply using the built-in Django server (manage.py runserver) running on a high-level port instead of deploying to Apache or gunicorn. To do a pass through with this set up, use the following 3 steps:
Now you should be able to access the Django server from your host OS through the browser at 10.38.1.119:8000 (again, replace with your specific IP and port number).
References:
[1] How to connect from Windows 7 to localhost on Ubuntu VirtualBox
I needed push notification support for the Android C2DM platform for a Django project, but Urban Airship’s Python libraries only supported iOS push notifications.
So I forked the code on Github, modified it to incorporate support for C2DM APIDs. There’s an outstanding pull request (just issued) so hopefully it gets integrated back into Urban Airship’s official original master branch.
Thanks to Excella Consulting for allowing me to contribute this code back to the community!
There’s been a spat of links on Hacker News lately about the failings of MongoDB and 10gen (see links at the end). I see this as a very good thing, not because I want NoSQL in general and MongoDB in particular to fail, but because it is a sign of maturation. Developers are doing really interesting work with MongoDB and they are hitting the limits of the technology. There’s criticism of 10gen’s working process and concern over implementation choices.
If these concerns are addressed, MongoDB will be a much better, more mature product in the long run. We can only hope that CouchDB, Riak, and other document-oriented data stores receive the same amount of attention and feedback to address their unique sets of issues.
I recently realized I’m constantly looking for a fix. Not from an external drug or chemical substance, but from flow. I get flow most commonly from programming, although I’ve felt it before while writing and working out.
It’s scary though because I am constantly hungry for flow. If I haven’t had it in awhile I go looking for it. I browse Hacker News and Reddit Programming looking for new languages and libraries to learn. Recently I picked up Stripe and it provided a fix for awhile, just like Clojure, Hadoop, and other tools and languages before them.
But on some level I feel like an addict. Wikipedia describes addiction as
… a continued involvement with a substance or activity despite the negative consequences associated with it.
Does the lack of (obvious) negative consequences make what feels like an addiction okay? Is the fact that I am constantly learning to try to get into flow make it alright since it makes me better at my software development job? On the surface these seem like good things to do, but maybe there’s something out of balance when you’re constantly looking for the fix.
There’s a great discussion going on over at Hacker News about people’s opinions on what happened to CouchDB’s popularity as compared to MongoDB (and other NoSQL data stores).
My guess is that MongoDB took off lately as 10gen really gained traction with their outreach to developers while CouchDB is still fragmented despite the backing of Cloudant and Couchbase. I also found MongoDB easier to get started with than CouchDB. It is possible developers who recently learned about NoSQL considered MongoDB to be a better starting point to learn than CouchDB.
Also, here’s my introduction to MongoDB. Here’s my installation guide to CouchDB on Ubuntu (introduction to functionality coming soon).
I found this blog post really insightful for understanding the differences between MongoDB and CouchDB. The key takeaways for me were in understanding the difference between the way querying works (views in CouchDB versus find queries in MongoDB) and availability versus replication (continuous availability during network partitions in CouchDB versus one master plus replication in MongoDB). This post is worth reading in its entirety.
I needed to piece together several sources (see the links at end of this post) to install CouchDB 1.1.0 on Ubuntu 10.04 LTS (Lucid Lynx).
First, get build essentials:
sudo apt-get install build-essential
Next, install SpiderMonkey 1.8.5:
wget http://ftp.mozilla.org/pub/mozilla.org/js/js185-1.0.0.tar.gz
tar -xvf js185-1.0.0.tar.gz
cd js-1.8.5/js/src
make
make install
Get the libraries required for making, configuring, and installing CouchDB:
sudo apt-get install xulrunner-dev
sudo apt-get install erlang libicu-dev libcurl4-openssl-dev
Inside the CouchDB installation’s bin directory, run:
./configure –with-erlang=/wherever/your/erlang/install/is –with-js-lib=/usr/local/lib/ –with-js-include=/usr/local/include/js/
You should see:
You have configured Apache CouchDB, time to relax.
Run (enter your sudo password when prompted during the installation):
make && sudo make install
After installation, you’ll see:
You have installed Apache CouchDB, time to relax.
Next, run CouchDB by running this command from within CouchDB’s bin directory:
sudo ./couchdb
You’ll see:
Apache CouchDB has started. Time to relax.
Finally, browse to http://localhost:5984 or http://localhost:5984/_utils/ to make sure everything worked.
Sources:
I’m working on a cross-platform mobile app with the following technologies:
While in the Urban Airship web console, I was creating a new application when I came upon the section specific to Android. The first field, Android Package, was easy enough (com.mobile.app). But I wasn’t quite sure what to put in the C2DM Authorization Token field.
After further research, I found I have to manually register for the Android C2DM service. My access was approved about two hours later.
Next step, how to create that C2DM Authorization Token? Luckily, Urban Airship provides a Python script to create the token. Within the Urban Airship Android download, there is a script named clientauth.py under the tools directory.
Just run clientauth.py, enter the Google account that was approved for access, enter your account password, and clientauth.py will print the your C2DM authorization token. Make sure to keep this token! Then paste the token into the Urban Airship field and you’re ready to create your Urban Airship application.
There are a dearth of up to date Django books that cover 1.3+ and the latest community projects. However, Reinout van Rees just announced he is working on a new Django book currently titled “Solid Django.” Hopefully this will lead to further interest in the Django project and continue the positive momentum for our community.
Oh Python, you make a developer’s life so easy.
I executed a “python manage.py datadump > db.json” which sent the contents of my Django-created database out to a file. However, I realized that the results were all on one line and I wanted to go through the output to create some test data. A quick “more db.json | python -mjson.tool > db-pretty.json” command transformed the whole thing into a more readable format.
This is just one of many examples of why I love Python programming.
The reviews are starting to come in for JQuery Mobile: First Look, which I technical reviewed before publication. Looks like people like it so far.
Cornell University presents some findings that dispute the supposed health benefits from standing desks, including height-adjustable desks that go from sitting to standing position.
In short, the researchers suggest sitting to do computer work but getting up every 20-30 minutes and moving around. The moving around part is critical: a short walk to get a drink of water, go to the bathroom, or head to a meeting is the best way to prevent the negative side effects on your body from sitting all day.
Having a bunch of little successes over time is a lot more fun than a big stinking failure after working hard on a project for years.
Well, of course, right? Isn’t it obvious that success is better than failure?
Yet why do most organizations, especially big companies, continue to produce big failure after big failure?
Look at what the federal government spends on failed IT projects each year. This happens across many agencies: FBI, DoD, DoL, DoJ, USPTO, etc etc. The private sector doesn’t do much better either. Look at the cluster that is HP’s TouchPad. Or what’s happening with Yahoo!.
There’s a solution for most of these failures: building from small successes instead of some pie in the sky idea that may not correspond to reality. This approach is essentially what the Lean Startup methodology teaches and a big part of what Agile software development is about.
Build from actual strengths that are grounded in reality, not from how you envision yourself or your organization in your head. Keep the fun little successes coming and in time you’ll create a big success without the risk of the big failure.
I love my home made standing desk at my apartment. I also occasionally convert my desk at work to standing position with some boxes so I can be more productive. Apparently it’s catching on. Here’s a WSJ article on standing desks.
Link: Standing Desks Are on the Rise (WSJ)
Arin Sime, founder of AgilityFeat and one of my former classmates at UVA, put together this great email course on the Agile Methodology. It’s great for those unfamiliar with Agile (especially clients you are trying to convince to use it!) and brushing up on your concepts.
Highly recommended!
At 26 minutes into this video, the presenter gives a great summary and example of what a closure is in general and in JavaScript. The entire video is worth watching if you have the time.
One of my coworkers didn’t like my 10ish parameter argument Java method during a code review. I admitted it looked awful and modifying the parameters to pass in was a serious annoyance. We end up refactoring the code to use an intermediate object, which worked fine. But I wondered why I haven’t faced the same thing in Python during my Django projects.
Then I realized - when specifying any more than a couple of parameters in Python I name them as they go in. Named parameters make it easy to mix and match the parameters without getting confused.
Java needs to get with the program. Except for C and C++, Java is the only major language that doesn’t support named parameters.
There was an interesting video of a talk given by Alex Gaynor, a Django core committer, on the direction of the framework. Here’s the summary of what’s coming.
There is currently no release date set for Django 1.4 although work towards 1.4 happens in the public development trunk that can be checked out with SVN (http://code.djangoproject.com/svn/django/trunk/).
Django 1.4 will drop support for Python 2.4. The motivation behind the change is to use more context managers, a creation from Python 2.5, and make the internal Django code better.
Alex sees future base Django installations coming with less “stuff” by default with the option to install packages specific to your needs. For example, the EmailBackends would be an interface that if you needed SMTP support you would install a separate module to use.
Several current efforts for Django 1.4 or beyond that are going on are template compilation, composite fields, and making templates and forms better. Template compilation is a refactoring of the way the templating system works behind the scenes. The idea is to genericize the way templates work so other templating systems can use the shared infrastructure. Composite field improvements will enable better ways to query the database. Forms are difficult to work with in templates so this effort aims to improve how forms can be used within templates.
It’s interesting to see the dramatic change going on in the Rails community over 3.0/3.1 compared with the steady plodding along of Django. As both a Java and Python developer, it’s much easier to keep up with Django and know that my framework knowledge won’t be out of date within a year.
If you have team of 16 developers and they spend 30 minutes in a meeting that does not help you get the product out the door, you’re 1 man day behind where you were before you walked into the meeting.
For every additional 15 minutes the team spends sitting around and not solving technical or project problems beyond that 30 minute meeting, you’re losing a ½ man day of work.
People need to meet to pool talents and solve problems, but everyone in the room that isn’t contributing is simply wasting her time. Math doesn’t lie: you’re wasting man hours sitting there not solving problems. Eliminate unnecessary meetings or you won’t make your deadline.
As I contemplate moving from MySQL (or Google BigTable for AppEngine) backends to PostgreSQL for Django project, this OSCON 2011 presentation has been valuable to me.
The presentation goes into model design, efficient ORM usage, and database debugging. Worth a read for all Django developers.
Link: Unbreaking Your Django Application
Also, here’s a related Reddit Django subreddit thread on switching from MySQL to PostgreSQL: Should I Switch from MySQL to PostgreSQL?
For better or worse, the following rant is so true. Whether executed in the large or the small, "big data" does not matter if your organization refuses to see the results of data analysis.
In addition, I agree that simply dumping data in big piles and then performing analysis on it will not lead to any insight into topics outside the domain of the data. That idea seems obvious but it often gets lost when people get all excited about big data.
I ran into this cryptic little error when trying to initialize a new Gondor project with the gondor init [key] command:
ERROR: must run gondor init from a Django project directory.
Digging through the source code for the Gondor client revealed where the error was coming from:
files = [
os.path.join(os.getcwd(), “__init__.py”),
os.path.join(os.getcwd(), “manage.py”)
]
if not all all([os.path.exists(f) for f in files]):
error(“must run gondor init from a Django project directory.\n”)
Although I had manage.py, I was missing the __init__.py Python package directory file since I copied part of an existing project into a new directory instead of using the django-admin.py commands.
So if you get this error just double check that you have both manage.py and __init__.py files in the same directory as you try to init a new Gondor project.
I just finished reading The Monk and the Riddle by Randy Komisar. The book was a quick and interesting read. Here are three things I took away from the story:
I looked at Sencha Touch over the weekend. Since I have experience with JQuery Mobile, I wanted to know how the two frameworks compared. Two things struck me so far:
I might be wincing at my lack of knowledge in this post as I gain more experience with Sencha Touch, but that’s my 2 cents so far.
Redis is getting a lot of buzz for its fast read/write performance and its innovative use cases beyond just being a key/value store. For example, at the second Big Data DC meetup last week, Nick Kleinschmidt of Lucid Media discussed how they are using Redis at his firm for online display advertising.
Today the top link on Hacker News is how to add Redis to your current stack. It’s a great piece that explains what Redis is and how you can use it to augment your existing web application infrastructure.
Despite some consternation from the “you’re doing storage wrong” traditional SQL crowd, the “Not Only”-SQL movement is great for innovation in the storage space. More of these innovative use cases will continue to come up as NoSQL solutions with different flavors of storage formats and CAP Theorem choices proliferate.
I use a simple test with every prospective employer or client as an aid to determine their culture:
Responses range between two extremes:
This is where it gets interesting. It’s a good sign if employees are interested in your outsider viewpoint and want to talk further. They care enough beyond their day to day tasks to discuss a topic related to their company or industry and actively seek outside perspectives. It’s similar to how there’s a correlation between better software developers and programming outside of work.
Proceed with caution where there is apathy or hostility towards your viewpoint. Even if you’re incorrect in what you wrote because you don’t have a clear picture of the company or industry, you should never be belittled for taking the time to write down your perspective.
That’s my litmus test for prospective employers and clients: do the employees care enough about their company and industry to actively engage me before I perform work for them? Do they value my input and my commitment to their mission? Will they view me as a respected peer or a butt in a seat they order around? An affirmative answer to these questions is critical to the success of highly motivated employees and can be found in part by performing this blog post litmus test.
Well, I can’t say it any better myself so I’ll just point you to this fantastic rant on how you should actually be interviewing software development candidates.
LexisNexus just open sourced their data analysis platform, HPCC.
Competition in the data analysis space is a good thing. I don’t know enough about HPCC to compare it to Hadoop just yet. However, developers have to learn a new language ECL, which has a relatively sophisticated syntax, to run analysis jobs on HPCC.
I find it unlikely that most developers will be willing to spend the time to learn that new language until a community springs up to show what advantages HPCC offers over Hadoop. The supposed advantage of ECL’s conciseness in expressing analysis jobs is in relation to Java. The real test comes when comparing ECL to Clojure and Scala, much better programming languages for concise MapReduce jobs.
Further reading: LexisNexus open sources Hadoop competitor (GigaOm)
Document-Oriented Data Stores
A document-oriented data store extends the key-value pair model by providing a structure that the data store understands.[1] Document-oriented data stores are inspired by Lotus Notes[2] and the simplicity of the JavaScript Object Notation (JSON) format. The current leading document-oriented data stores are MongoDB, CouchDB, and Riak.
MongoDB
MongoDB is an open source document-oriented data store favoring the Consistency and Availability principles of the CAP Theorem. The term ‘Mongo’ comes from ‘humongous,’ as in the amount of data MongoDB allows you to store in its non-relational structure.
The company 10gen actively leads development on MongoDB and coordinates open source contributions. Core MongoDB functionality is written in C++ and official drivers are available for Java, C++, Python, Ruby, Scala and several other languages.[3] Drivers for Clojure, Groovy, R, Erlang, and many other languages are supported by community efforts. 10gen also provides commercial training and services to generate revenue which is partially reinvested in the data store’s development.[4]
MongoDB should only be run under 64-bit operating systems because of the way it addresses the data store. The limitation stems from MongoDB’s storage implementation using memory mapped files for performance reasons.[5] Running on 32-bit systems will work, but MongoDB will only be able to store about 2.5 gigabytes total - fine for some local development work but not for most production software.
Lingo
There are several terms commonly used in MongoDB literature:
Data Storage Structure
Data are stored and represented as documents in Binary JavaScript Object Notation (BSON).[6] The BSON notation is identical to standard JavaScript Object Notation (JSON) for most structures. For example, here is an order for a single coffee at a cafe:
{
“_id” : ObjectId(“4de2fefcfe376e36c3bc620b”),
“coffee” : “Americano”,
“room_for_milk” : false,
“price” : 3.95
}
In the preceding example there are four keys: “_id”, “coffee”, “room_for_milk”, and “price”. Each of the keys has a single corresponding value: ObjectId(“4de2fefcfe376e36c3bc620b”), “Americano”, false, and 3.95, respectively. Each value has a data type. ObjectId(“4de2fefcfe376e36c3bc620b”) is an object identifier that is automatically generated by the MongoDB data store upon insertion of the document. “Americano” is a string. false is a Boolean. 3.95 is a float (note that floats should not be used to store monetary values in a production setting because of inaccuracies in rounding). The four keys and values are wrapped in curly braces and the resulting structure is called a document.
There are six basic JSON data types as well as several additional data types in MongoDB. The original six JSON data types are:
MongoDB’s extended types beyond the basic JSON data types are:
MongoDB’s schema-less design allows the creation of documents with variable structure. The variable structure works well for rapid prototyping and prevention of having to alter tables to add new attributes to documents. However, the schema-less design also prevents the creation of constraints to standardize data found in SQL databases.
There is less normalization involved in a typical MongoDB set up because there are no server-side joins.[7] Instead of joining separate relational tables, embedded objects can be inserted inside documents.
Inserting
Data manipulation in MongoDB can be performed through the shell and its JavaScript syntax. For example, here is the syntax for inserting a document into the “mydb” collection:
> db.mydb.insert({“coffee” : “Latte”, “price” : 4.95, “notes” : “customer wants room for milk”})
Inserts are non-blocking by default and do not wait for a response from the server. You can also specify “safe inserts” that wait for a response value from the server indicating whether the operation was successful or had an error.
Batch inserts are much faster than incremental data insertion. The MongoDB team recommends preallocating space with blank documents when performing numerous inserts of a predefined size.
Querying
10gen also touts MongoDB’s dynamic query language as a core feature and critical to accelerate of the development process.# The query language is not SQL, instead it is based on key and value matching. For example, here is a query to find all the documents with a value of “Latte” for the “coffee” key in the mydb collection:
> db.mydb.find({“coffee” : “Latte”})
The result of this command after executing the insertion from the previous section is:
{ “_id” : ObjectId(“4df75a03d30a7515a35f5942”), “coffee” : “Latte”, “price” : 4.95, “notes” : “customer wants room for milk” }
Note that querying on keys and values is case sensitive. If you instead used the following command…
> db.mydb.find({“COFFEE” : “latte”})
… the mydb collection would return no matching documents.
That covers MongoDB’s background information, basic inserting, and querying. Next post I’ll cover updating, deleting, capped collections, and a few other things.
[1] http://stackoverflow.com/questions/3046001/what-does-document-oriented-vs-key-value-mean-when-talking-about-mongodb-vs-ca
[2] http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html
[3] http://www.mongodb.org/display/DOCS/Drivers
[5] http://blog.mongodb.org/post/137788967/32-bit-limitations
[6] http://www.mongodb.org/display/DOCS/BSON
[7] http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-Embedvs.Reference
Bob Gleichau from In-Q-Tel wrote an interesting article entitled “Beyond Data.” As the article discusses, the intelligence sector adds complexity onto the difficult job of sorting, searching, and understanding large data sets. Some of the challenges Bob wrote about include:
There are also many additional challenges:
One of the most interesting ideas from the article was embedding great developers with intelligence analysts to create and execute very complicated queries. I’m sure some agencies are doing this already but from my experience it’s not a common practice. (Private industry may need to do this in the future as well but that’s a different topic.)
A second important idea is the concept of allowing full search capabilities but masking search output when a user’s clearance is not high enough to see results. This is a very hard problem that involves user authority and access management, metadata mark up, and clear, unambiguous rules for clearance resolution.
Finally, one last concept that isn’t in the article but is crucial. The government needs to be careful of throwing money at hard problems. Building information systems (including data analysis systems) isn’t like designing a new fighter jet. It’s amazing what a small team of six to eight capable software developers with a passion for intelligence community domain challenges can accomplish when given access to large data sets and the freedom to choose their own tools. That’s why companies like LinkedIn, Facebook, and Google are successful with using data to generate business value.
Article: Beyond Data (IQT Quarterly), see also Data Science in the U.S. Intelligence Community (IQT Quarterly)
The Register has a great article on Google App Engine, Google’s scalable Platform-as-a-Service that will be removing the beta label later this year. App Engine is built upon BigTable, Google’s proprietary Column Family NoSQL data store. I’ve create several apps on Google App Engine, including http://www.mattmakai.com/ and http://scholarmaker.com/. Once you get past the standard Column Family data store quirks and understand the App Engine API (I used the Python version), it’s very easy to deploy an app and have it ready to scale to potentially millions of visitors.
Article: Google App Engine (The Register)
I needed to go to college to be successful. I required the disciplined studying, mentoring from my professors, social learning through meeting new friends, and enriching experiences from the general community at James Madison University. Even though I’ve been using computers since I was 3 years old and programming since sixth grade, I needed classes on operating systems, programming languages, computer networking, and information security to be successful in my career.
So I watch with some dismay as influential figures rail against the college model. I agree with a lot of the things Peter Thiel discusses. It’s important to provide an alternative development model for insanely smart people. Some people don’t need college because they already have all of the drive and intelligence to get started now. College simply slows those insanely smart people down!
But I’m not that smart. I simply was not ready to be a full-time member of the real world until I spent countless hours studying in the library and in front of a computer learning computer science.
I don’t think I’m unusual. Sure, in information technology I would hire a better software developer with a high school degree over a developer with a college degree any day. But I rarely see that. While it’s possible in theory to be successful without a college degree in information technology, completing a computer science degree at a good college is a strong signaling mechanism. The degree is neither necessary nor sufficient, but it provides a starting point for discussions about background in software development.
Side projects, technical blogging, past experience, open source contributions, enthusiasm, and dedication to constant learning should make or break decisions on whether or not to hire a software developers. But often those topics are so heavily influenced by learning from college in programming language theory, algorithms, and software engineering practices, it’s hard to pull them apart.
There are many issues with the college model besides holding back really smart people: extraordinary costs, massive student loans, majors of questionable value, and grade inflation. But in IT, while in theory you can be successful without a college degree, it’s a strong signaling mechanism that you can’t set your mind to finishing a major commitment to learning and education.
The quote in this post’s title is from Gnip’s CEO on their challenges of handling a sustained 35MB/sec stream of constant updates from Twitter. Gnip is the only partner that receives the entire Twitter data stream for analysis.
“Everyone is building custom stuff right now” echoes what I’m seeing at companies handling big data like Clearspring. Although NoSQL data stores and tools like Hadoop are gaining mainstream acceptance, the companies really handling big data don’t have tools they can use out of the box to perform analysis. The big data trends are just beginning to take shape and no one is yet offering the right solutions to handle them.
Link: Gnip CEO on the Challenges of Handling the Real-time, Big Data Firehouse (ReadWriteWeb)
A few interesting articles came out today from mainstream sources such as The Economist, Forbes, and MIT’s Technology Review.
IBM is ratcheting up their big data PR push with a splashy $100 million investment in the field. While that $100 million will cover both basic R&D and system development, it remains to be seen whether IBM can successfully create enterprise-class products that actually add business value instead of useless features that sound good to technology executives, a la Rational Suite, WebSphere, and RAD. Surprisingly though, the time frame for practical Watson big data applications such as a medical assistant is a long-term 5-8 year time horizon, indicating Watson technology is not ready for prime time just yet.
Link: IBM’s Watson Now A Second Year Med Student (Forbes)
The Economist has an interesting fact-laden article on the data revolution. For example, there 4 billion people have mobile phones and 12% of them (480 million) are smartphone users. Much of The Economist’s article is based on McKinsey’s recent big data report.
Link: Building with Big Data (The Economist)
MIT’s Technology Review has an article on why big data needs a code of the ethics, a topic few people consider because they do not understand the vast amount of data being collected about them.
Link: What Big Data Needs: A Code of Ethical Practices (Technology Review)
Monitis has a nice summary of Apache Cassandra up on their blog. It looks like they are doing a series of overviews on NoSQL solutions that will be worth checking out.
I hate corporate performance reviews. Trying to fit the work you performed over the last six months to a year into pre-defined generic boxes such as “flexibility”, “interpersonal skills”, “creativity”, and so on strikes me as really dull. I doubt I’m alone in this sentiment.
I created some consternation during my first performance assessment at Excella by not filling anything in on my performance self-assessment. All blanks in every field. It wasn’t that I didn’t care. I just felt the 2-3 hours of filling in those generic buckets with work I performed struck me as a useless exercise. I’d prefer using those hours to learn more about Clojure or Hadoop.
Why go through performance reviews at all if they don’t provide value? Well, they are supposed to provide value through self-reflection. It’s just that most people (myself included) half-ass them and don’t perform the self-reflection part because it’s tedious and boring. But if the process is more enjoyable, maybe that will help people to think through their performance.
So here’s what I’m going to do in the future to make the process more enjoyable. I’m going to tell a story. Storytelling provides more value. Storytelling is more fun for both the writer and the reader. It's memorable. Give me a story over a bulleted list any day. And I need to be a better storyteller so I can get better at explaining the results of data analysis and visualizations.
Here’s an off-the-cuff attempt at a performance review story excerpt for the fall (without any sensitive client info).
I glanced at my watch. 2pm. Ready to start this client demo. I wasn’t happy with how the demo the week earlier turned out. Apparently I wasn’t clued in on some last minute changes the business expected. I got defensive when I asked whether I should have followed the conflicting requirements document, wireframes, or numerous emails I received from our business analyst.
I made some changes in the time between that first demo and this one. Right now my part of the system was working well minus a few features I could explain away since we were more than a month out from delivery.
Fast forward an hour. It went great. Lots of congratulations on a job well done. It was nice to get some positive feedback directly from the client and people who’d be actually using the system. No defensiveness as I carefully took notes when the client asked for small changes to the system. What a difference a week and some self-reflection on how I should handle client feedback could make.
I don’t always handle my client interactions perfectly, but I’ve learned from a few failings so far this fall. This successful demo was the result of of that learning process.
Definitely more fun writing that little excerpt than a bulleted list. Hopefully better for consumption as well. And while not everything will fit well within the context of a story, the outcome of self-reflection from writing the story will make the exercise worthwhile.