Bio
I'm a computational linguist working in communication technologies,
especially in less-resourced languages.
This covers a broad area of technological and social development, from crowdsourcing and machine-learning for
extracting rich information from natural language to the installation of supporting infrastructures.
I do this as a graduate fellow at Stanford University,
the Chief Information Officer for Energy for Opportunity,
and as consultant to various organizations globally.
I also write an occasional article at: Jungle Light Speed.
Some current and recent work:
Global Viral Forecasting.
In 2011 I worked at Global Viral Forecasting
as the Chief Technology Officer for EpidemicIQ; a system that is tracking
disease outbreaks world-wide. The goal is to predict and prevent future epidemics.
Google Flu trends
found that you can predict flu outbreaks by simply modeling the symptoms that people choose to search for.
Imagine if you modeled all the world's available medical information and reports?
Crowdsourcing for language and cognition research.
Language and cognition research that used to take thousands of dollars over several months can now be completed in a matter of hours for a few dollars. While I originally worked in commercial crowdsourcing applications, and more recently in social development, many of the most exciting applications are in scientific research.
In July 2011, I will be helping run the
Workshop on Crowdsourcing Technologies for Language and Cognition Studies, for the first time bringing together the researchers who are embracing these new technologies and strategies.
We are already seeing the beginning of paradigm shift in language research back
to empirically savvy approaches. Crowdsourcing technologies are set to become one of the leading tools in this
new wave of research methodologies.
Mission 4636. I coordinated the translation, geolocation and categorization
of emergency text messages sent in Haiti in the wake of January 12, 2010 earthquake. This was the only
emergency response service available to people within Haiti during this critical period. The primary emergency responders were the US Military who for the most part did not
speak Haitian Kreyol or know the locations of addresses in Haiti. Working with more than 1000 Kreyol and French-speaking
volunteers from 49 countries, we created a system that allowed us to turn raw text messages in Haitian Kreyol into categorized English messages with precise coordinates
with an average turnaround of just 10 minutes. According to the responders this saved hundreds of lives and directed the first
aid to tens of thousands.
In total, we processed more than 80,000 messages. It was the first time that crowdsourcing had been used for real-time humanitarian relief and it is still the largest deployment of humanitarian crowdsourcing to date.
Classifying text messages with machine learning.
This ongoing project looks at methods for automatically classifying text messages (SMS) in low resource languages.
Early work with Medic Mobile looked at classifiying
text messages the Chichewa language, where
we found that natural language processing methods that worked well for systems optimized for English
do not work well for low resource languages like Chichewa.
A new architecture is currently proposed that adapts to the variation in the language by
combining subword models with incremental learning
over streaming data. By looking at messages in Chichewa, Kreyol, Pashto, Urdu and Sindhi we
were able to combine linguistic models with spatial and temporal information to identify the
topics of messages with high accuracy and confidence.
Pakreport. I developed modules that allows Pakreport's crisis mapping component to
outsource the value-adding tasks of translation, geolocation and categorization to
volunteers working with CrowdFlower.
This means that work is cross-checked among multiple volunteers so that the information is not
susceptible to the potential errors of any one volunteer, ensuring data-quality for the
aid agencies using the service and meaning that the volunteers can help without
fear of accidentally introducing bad information.
Reported Speech in Matses. I recently undertook fieldwork to study the Matses language.
The Matses people live in a remote enough corner of Peruvian Amazon and only
gave up their prior nomadic lifestyle in 1969, making it a under-studied and endangered language.
Reported speech in Matses is unlike any other language. If someone says, "I will go to there tomorrow",
you can quote that person directly (they said "I will go there tomorrow"), but you cannot rephrase
it from your own spatio-temporal or interpersonal point-of-view (they said "they will come here today").
However, you are otherwise free to paraphrase (they said "I will canoe to there in the morning") or extract
(where did they said "I will go"?). This challenges some of the fundamental assumptions
about cross-linguistic semantic constraints and raises interesting
questions about the possibilities for how we encode the world with respect to the speech of others.
For fun...
I love seeing the world - I moved to San Francisco from Sierra Leone, where I was working for the Environmental Foundation for Africa and
the UN High Commission for Refugees,
and have previously called Melbourne, London, Sydney and the Blue Mountains home. I travel as often as possible, usually by bicycle.
For my last major trip I cycled through East and Southern Africa:
- Africa by bicycle
and last year I spent a couple of weeks cycling in California - the first tour on my 6th continent!:
- The Californian coast
|