Robert Munro
/ Rob Munro

Bio

I'm a computational linguist working in communication technologies, especially in less-resourced languages. This covers a broad area of technological and social development, from crowdsourcing and machine-learning for extracting rich information from natural language to the installation of supporting infrastructures. I do this as a graduate fellow at Stanford University, the Chief Information Officer for Energy for Opportunity, and as consultant to various organizations globally. I also write an occasional article at: Jungle Light Speed.

Some current and recent work:

Global Viral Forecasting. In 2011 I worked at Global Viral Forecasting as the Chief Technology Officer for EpidemicIQ; a system that is tracking disease outbreaks world-wide. The goal is to predict and prevent future epidemics. Google Flu trends found that you can predict flu outbreaks by simply modeling the symptoms that people choose to search for. Imagine if you modeled all the world's available medical information and reports?

Crowdsourcing for language and cognition research. Language and cognition research that used to take thousands of dollars over several months can now be completed in a matter of hours for a few dollars. While I originally worked in commercial crowdsourcing applications, and more recently in social development, many of the most exciting applications are in scientific research.

In July 2011, I will be helping run the Workshop on Crowdsourcing Technologies for Language and Cognition Studies, for the first time bringing together the researchers who are embracing these new technologies and strategies. We are already seeing the beginning of paradigm shift in language research back to empirically savvy approaches. Crowdsourcing technologies are set to become one of the leading tools in this new wave of research methodologies.

Mission 4636. I coordinated the translation, geolocation and categorization of emergency text messages sent in Haiti in the wake of January 12, 2010 earthquake. This was the only emergency response service available to people within Haiti during this critical period. The primary emergency responders were the US Military who for the most part did not speak Haitian Kreyol or know the locations of addresses in Haiti. Working with more than 1000 Kreyol and French-speaking volunteers from 49 countries, we created a system that allowed us to turn raw text messages in Haitian Kreyol into categorized English messages with precise coordinates with an average turnaround of just 10 minutes. According to the responders this saved hundreds of lives and directed the first aid to tens of thousands.

In total, we processed more than 80,000 messages. It was the first time that crowdsourcing had been used for real-time humanitarian relief and it is still the largest deployment of humanitarian crowdsourcing to date.

Classifying text messages with machine learning. This ongoing project looks at methods for automatically classifying text messages (SMS) in low resource languages. Early work with Medic Mobile looked at classifiying text messages the Chichewa language, where we found that natural language processing methods that worked well for systems optimized for English do not work well for low resource languages like Chichewa.

A new architecture is currently proposed that adapts to the variation in the language by combining subword models with incremental learning over streaming data. By looking at messages in Chichewa, Kreyol, Pashto, Urdu and Sindhi we were able to combine linguistic models with spatial and temporal information to identify the topics of messages with high accuracy and confidence.

Pakreport. I developed modules that allows Pakreport's crisis mapping component to outsource the value-adding tasks of translation, geolocation and categorization to volunteers working with CrowdFlower.

This means that work is cross-checked among multiple volunteers so that the information is not susceptible to the potential errors of any one volunteer, ensuring data-quality for the aid agencies using the service and meaning that the volunteers can help without fear of accidentally introducing bad information.

Reported Speech in Matses. I recently undertook fieldwork to study the Matses language. The Matses people live in a remote enough corner of Peruvian Amazon and only gave up their prior nomadic lifestyle in 1969, making it a under-studied and endangered language.

Reported speech in Matses is unlike any other language. If someone says, "I will go to there tomorrow", you can quote that person directly (they said "I will go there tomorrow"), but you cannot rephrase it from your own spatio-temporal or interpersonal point-of-view (they said "they will come here today"). However, you are otherwise free to paraphrase (they said "I will canoe to there in the morning") or extract (where did they said "I will go"?). This challenges some of the fundamental assumptions about cross-linguistic semantic constraints and raises interesting questions about the possibilities for how we encode the world with respect to the speech of others.

For fun...

I love seeing the world - I moved to San Francisco from Sierra Leone, where I was working for the Environmental Foundation for Africa and the UN High Commission for Refugees, and have previously called Melbourne, London, Sydney and the Blue Mountains home. I travel as often as possible, usually by bicycle. For my last major trip I cycled through East and Southern Africa:

Africa by bicycle

and last year I spent a couple of weeks cycling in California - the first tour on my 6th continent!:

The Californian coast