Google and IBM say we need to train more supercrunchers

There was an article in the New York Times today about the effort that companies like Google and IBM are making to allow university students access to very powerful computing environments to allow engineers and scientists to plow through massive data sets. Their argument is that students are being trained right now to think on a gigabyte scale (if they’re lucky enough to be trained how to analyze real data at all), when all the breakthroughs are happening with datasets in the tera and peta-byte scales.

I couldn’t agree more with this analysis. If people are serious about analyzing those “very rare events”, “long tails” or whatever that can make the difference between a profit and loss, success or failure, or even life or death, then we can’t continue running around assuming things because the model fits 80% of the time and anyways, it’s too hard to do that level of analysis. We all saw what happened with that idea.

When I was working at Lincoln, we created a highly accurate model of U.S. near mid-air collisions. We did this by analyzing about 5 terabytes worth of radar data from across the country (about 8 months worth). Nobody had ever done this before on anything close to that scale.

As a result, we had orders of magnitude more data on near mid-air collisions (a very rare event) than the last model in the early 90′s. Without this data, and the high-powered systems available at Lincoln that we used to analyze it, our model would have suffered from the same assumptions and modeling error as previous attempts, and that is just not good enough for developing something as important as the next generation of collision avoidance systems for manned and unmanned aircraft, which people are now doing at Lincoln, largely as a result of that effort.

The ability to analyze massive data sets has been proven again and again as a competitive advantage in bio-tech, finance (those who do it correctly), internet, and even marketing, making those companies who developed those competencies hundreds of billions of dollars.

Is it then a stretch to say that the next lucrative opportunity in operations management will be to develop the capabilities to harness the massive amounts of data companies already generate every day? I’m talking about everything from inventories to machine control outputs and even to intra-company emails.  There are signals in that data, just as there are signals in everything from our DNA to the stock markets, if you look hard enough.

To be honest, I don’t know (I’m new to this stuff!) but that’s why I and several of my classmates are trying to start a new track for LGOs in the EECS department this year called Information and Decision Systems. The focus in this track is to develop the theoretical, practical and communication skills for students who want to take on this operations challenge in the real world, for real companies. That means not just studying and learning the algorithms, but also getting a design background in the networking, database and parallel computing systems that are critical enablers of this type of work. It also means developing specialized communication skills to explain the opportunities and the results, because like the NYT article said, most people have not been trained to think on this scale before.

I could talk for pages more about this topic, but lets just leave it at that for now. I just had to write something because I’m obsessed with this idea, and this article got me all excited. I’m definitely going to look into Hadoop…

Plug for MIT Lincoln Laboratory (“The Lab”)

 

 

As cool as it looks

As cool as it looks

For the past two years I have worked at MIT Lincoln Laboratory, or the Lab for short. Not many people outside of the research circles know about this place, and in fact I only stumbled upon it in my job search because I went to a job fair for high tech companies and got some literature.  In fact, the wikipedia page on the Lab is comically short considering the amount of research that has been conducted there for the past 50+ years – while the Lab didn’t invent radar, it probably perfected it. However, because most of this research is classified for national security, 95% of the amazing work by Lab scientists goes unpublished.  When I was looking around for information on the Lab, I could hardly find any online, so hopefully somebody may stumble upon this page when they are thinking about working there. 

While at the Lab, I worked for the Surveillance Systems group, Group 42. When I got to the Lab, the group was called the Air Traffic Control and Surveillance Systems Group, and about half of the group’s research is sponsored by the FAA. Most of my work centered around the TCAS collision avoidance system, which is mandated by Congress to be installed on every commercial passenger aircraft above a certain size.

TCAS vertical speed indicator (displayed in cockpit)

TCAS vertical speed indicator (displayed in cockpit)

I also worked on new collision avoidance systems for UAVs (Unmanned Air Vehicles). My major project was developing airspace encounter models for generating random, realistic encounters so that these systems can be tested in simulation. Over the course of developing these models, I learned a great deal about Bayesian networks, Monte Carlo methods, importance sampling, radar (note: the giant radar ALTAIR in that link is operated by Lincoln for missile defense in the Marshall Islands), flight dynamics, and especially air traffic control. It was a fascinating project, and only one of many fascinating projects that I was involved in (I will probably blog about my thoughts on ATC at some point in the future).

 

Global Hawk, one of the platforms we worked with.

Global Hawk, one of the platforms we worked with.

I’ve worked with some amazingly smart and capable people who I now consider my friends.  In particular Mykel Kochenderfer and Jim Kuchar have been my mentors, among many others who don’t have personal websites. Over half of the Lab employees have PhDs, most from MIT and other top schools, and the sheer brain power there is kind of awesome. People also work at the Lab for the love of their research, and that definately comes through as well.

I will be leaving the Lab in a couple months because I accepted an offer to be a LGO fellow at MIT. However, my work there has been very intellectually satisfying, and it has served to focus my future academic interests. For instance, I will recieve a MS in Computer Science, largely because my work at the Lab has piqued my interest in AI and machine learning. In particular, I hope to research how these principles can be applied to improve the efficiency and operation of manufacturing companies.  I would say that the Lab is an excellent place to work if you are interested in doing cutting edge research, are intellectually curious and smart, and enjoy tackling difficult problems in the area of national defense, homeland security or the FAA.