Bio
I'm a computational linguist working in communication technologies.
This covers a broad area of technology and development, from crowdsourcing and machine-learning for
extracting rich information from natural language, to the installation of supporting infrastructures.
I work as the CEO of Idibon, based in San Francisco,
and (when I get the chance)
as the Chief Information Officer for Energy for Opportunity in Sierra Leone.
When not working, I write an occasional article on language and technology at: Jungle Light Speed and travel the world by bicycle, most recently
cycling across Alaska
Some past work:
Global Viral Forecasting.
In 2011 I worked at Global Viral Forecasting (now Metabiota)
as the Chief Technology Officer for EpidemicIQ; a system that is tracking
disease outbreaks world-wide. The goal is to predict and prevent future epidemics.
Google Flu trends
found that you can predict flu outbreaks by simply modeling the symptoms that people choose to search for.
Imagine if you modeled all the world's available medical information and reports?
Crowdsourcing language and cognition.
Language and cognition tasks that used to take thousands of dollars over several months can now be completed in a matter of hours for a few dollars. While I originally worked in commercial crowdsourcing applications, and more recently in social development, many of the most exciting applications are in scientific research.
In July 2011, I helped run the
Workshop on Crowdsourcing Technologies for Language and Cognition Studies, for the first time bringing together the researchers who are embracing these new technologies and strategies.
We are already seeing the beginning of paradigm shift in language research back
to empirically savvy approaches. Crowdsourcing technologies are set to become one of the leading tools in this
new wave of research methodologies.
Mission 4636. I coordinated the translation, geolocation and categorization
of emergency text messages sent in Haiti in the wake of January 12, 2010 earthquake. This was the only
emergency response service available to people within Haiti during this critical period. The primary emergency responders were the US Military who for the most part did not
speak Haitian Kreyol or know the locations of addresses in Haiti. Working with more than 1000 Kreyol and French-speaking
volunteers from 49 countries, we created a system that allowed us to turn raw text messages in Haitian Kreyol into categorized English messages with precise coordinates
with an average turnaround of just 10 minutes. According to the responders this saved hundreds of lives and directed the first
aid to tens of thousands.
In total, we processed more than 80,000 messages. It was the first time that crowdsourcing had been used for real-time humanitarian relief and it is still the largest deployment of humanitarian crowdsourcing to date.
Classifying and extracting
meaning from short message communications with machine learning and natural language procesing.
This project was the focus of my Ph.D. and it looked at methods for automatically classifying text messages (SMS) in low resource languages,
and for extracting information such as locations and the names of people.
A new architecture was developed that adapts to the variation in the language by
combining subword models with incremental learning
over streaming data. By looking at messages in Chichewa, Kreyol, Pashto, Urdu and Sindhi we
were able to combine linguistic models with spatial and temporal information to identify the
topics of messages with high accuracy and confidence.
Pakreport. I developed modules that allows Pakreport's information management component to
outsource the value-adding tasks of translation, geolocation and categorization to
volunteers working with CrowdFlower.
This means that work is cross-checked among multiple workers so that the information is not
susceptible to the potential errors of any one volunteer, ensuring data-quality for the
aid agencies using the service and meaning that the volunteers can help without
fear of accidentally introducing bad information.
Reported Speech in Matses. In late 2009 I had the privilege to live with the Matses and study their language.
The Matses people live in a remote enough corner of Peruvian Amazon and only
gave up their prior nomadic lifestyle in 1969, making it an under-studied and endangered language.
Reported speech in Matses is unlike any other language. If someone says, "I will go to there tomorrow",
you can quote that person directly (they said "I will go there tomorrow"), but you cannot rephrase
it from your own spatio-temporal or interpersonal point-of-view (they said "they will come here today").
However, you are otherwise free to paraphrase (they said "I will canoe to there in the morning") or extract
(where did they said "I will go"?). This challenges some of the fundamental assumptions
about cross-linguistic semantic constraints and raises interesting
questions about the possibilities for how we encode the world we perceive.
|