FAQ GDELT: The Global Database of Events, Language, and Tone
- What sources does GDELT rely upon?
- Can you send me the source news articles behind your event database?
- How does GDELT compare with the US DOD ICEWS?
- How does GDELT compare with other event databases?
- I've previously used TABARI for a coding project and have an event database, how can I contribute?
- I have some ideas for additional event categories you should add.
- I'm a computer science researcher with some NLP tools that might be useful.
What sources does GDELT rely upon?
GDELT is based purely on unclassified mainstream news media reports from throughout the world. See the About page for more detail.
We are actively exploring incorporating a range of social media sources, but work is ongoing there.
Can you send me the source news articles behind your event database?
Due to copyright restrictions and publisher agreements we cannot redistribute any of the news content that was used in the creation of GDELT, only the codified numeric event records extracted from that content. However, for content after April 1, 2013 we do include the URL or citation to the source article for each event so that you can locate the material on your own to read more about the event and its surrounding context. In the next release of GDELT, tentatively slated for late Fall 2013, we will be including source citations for all events back to 1979.
In addition, the Fall 2013 release of GDELT 2.0 will include a vast array of new codified emotional and thematic indicators such as "View Towards Government", "Focus on Healthcare", "Emphasis on Women's Rights", etc. Each of these will offer a numeric score codifying, by city, the attention and emotional response towards each of these dimensions by day, allowing for rich contextualization of the GDELT physical event database.
How does GDELT compare with the US DOD ICEWS?
GDELT is extremely similar to ICEWS in concept and indeed uses many of the same core underpinnings, but includes a vast array of enhancements over ICEWS and uniquely is fully unclassified and available for open academic and commercial research, updated daily.
- GDELT includes a multi-stage filtering and linguistic rewriting pipeline that significantly increases TABARI's accuracy
- GDELT uses a sophisticated full-text disambiguating geocoding system, one of just a handful of production systems available, which offers massively enhanced geographic recovery and resolution
- GDELT assigns events to the city or landmark-level, while ICEWS is limited to a maximum resolution of an administrative division (roughly equivalent to a US state) - with ICEWS this is equivalent to stating that a riot took place "somewhere in the State of California", while GDELT gives you the specific city
- GDELT also captures the city-level geographic affiliation of both of the actors involved in the event, allowing for city-level tracing of leader movements and diplomatic activity
- GDELT provides more sophisticated matching under the ambiguous reporting common to high-conflict zones such as Syria and Afghanistan
- GDELT captures both religion and ethnic group membership information where available
- GDELT covers all countries in the world over more than three decades
- GDELT covers the entire past quarter-century: 1979 to present
- GDELT is 100% open, unclassified, and available for unlimited use and redistribution
How does GDELT compare with other event databases?
Leetaru & Schrodt's (2013) International Studies Association paper announcing the data includes several comparisons to existing datasets and many more comparisons are actively underway. GDELT currently is the only database to cover nearly all countries globally back to 1979 with all events georeferenced and daily updates available. See the About page for a list of key features that make GDELT unique.
I've previously used TABARI for a coding project and have an event database, how can I contribute?
Please contact us, as we're currently exploring setting up a community contribution program where users like you can contribute your event databases (with full credit and citation to you) to offer expanded coverage of specific geographic regions, social or insurgent groups, and topics.
I have some ideas for additional event categories you should add.
We'd love to hear from you! In particular, we're looking for help in developing rigorous new taxonomies, such as for human rights violations, expanding the underlying grammars to recognize new event types, and finding sufficient examples of each event to be able to robustly test the accuracy of the system.
I'm a computer science researcher with some NLP tools that might be useful.
We're extremely interested in hearing from computer scientists and other researchers working in the areas of Natural Language Processing and other areas of research that are developing tools that could help us in our quest to constantly enhance and upgrade the event extraction core of GDELT. There are already several efforts underway developing completely new event extraction algorithms and pipelines to replace TABARI, which we hope to roll out in GDELT 2.0's release this fall.
GDELT relies heavily on many areas of active research in the NLP community, including entity recognition, relationship extraction, fact and claim extraction, separating potential from actual statements, pronoun coreference resolution, fulltext geographic disambiguation, and a vast array of other areas, and we need tools that are robust against OCR'd text, machine translated text, social media content, text written by non-native speakers, and an array of other material that is filled with typographical errors, non-words, massive grammatical violations, and the like. So, if you've got tools you think could help with this, we'd love to hear from you!