About GDELT: The Global Database of Events, Language, and Tone
- What is GDELT?
- How to cite GDELT
- What are the data sources used in GDELT?
- Who made GDELT?
What is GDELT?
The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level globally, to make all of this data freely available for open research, and to provide daily updates to create the first "realtime social sciences earth observatory." Nearly a quarter-billion georeferenced events capture global behavior in more than 300 categories covering 1979 to present with daily updates.
GDELT is designed to help support new theories and descriptive understandings of the behaviors and driving forces of global-scale social systems from the micro-level of the individual through the macro-level of the entire planet by offering realtime synthesis of global societal-scale behavior into a rich quantitative database allowing realtime monitoring and analytical exploration of those trends.
GDELT's evolving ability to capture ethnic, religious, and other social and cultural group relationships will offer profoundly new insights into the interplay of those groups over time, offering a rich new platform for understanding patterns of social evolution, while the data's realtime nature will expand current understanding of social systems beyond static snapshots towards theories that incorporate the nonlinear behavior and feedback effects that define human interaction and greatly enrich fragility indexes, early warning systems, and forecasting efforts.
GDELT's goal is to help uncover previously-obscured spatial, temporal, and perceptual evolutionary trends through new forms of analysis of the vast textual repositories that capture global societal activity, from news and social media archives to knowledge repositories.
- Covers all countries globally
- Covers a quarter-century: 1979 to present
- Daily updates every day, 365 days a year
- Based on cross-section of all major international, national, regional, local, and hyper-local news sources, both print and broadcast, from nearly every corner of the globe, in both English and vernacular
- 58 fields capture all available detail about event and actors
- Ten fields capture significant detail about each actor, including role and type
- All records georeferenced to the city or landmark as recorded in the article
- Sophisticated geographic pipeline disambiguates and affiliates geography with actors
- Separate geographic information for location of event and for both actors, including GNS and GNIS identifiers
- All records include ethnic and religious affiliation of both actors as provided in the text
- Even captures ambiguous events in conflict zones ("unidentified gunmen stormed the mosque and killed 20 civilians")
- Specialized filtering and linguistic rewriting filters considerably enhance TABARI's accuracy
- Wide array of media and emotion-based "importance" indicators for each event
- Nearly a quarter-billion event records
- 100% open, unclassified, and available for unlimited use and redistribution
How to cite GDELT
To cite GDELT, please cite the 2013 International Studies Association (ISA) announcing the dataset:Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA.
Sources that were examined to identify events include all international news coverage from AfricaNews, Agence France Presse, Associated Press Online, Associated Press Worldstream, BBC Monitoring, Christian Science Monitor, Facts on File, Foreign Broadcast Information Service, United Press International, and the Washington Post.
Additional sources examined include all national and international news coverage from the New York Times, all international and major US national stories from the Associated Press, and all national and international news from Google News with the exception of sports, entertainment, and strictly economic news.
Events are actively drawn from local, regional, national, and international mainstream news media outlets from throughout the world, including local domestic sources in almost every country on Earth. We are also actively experimenting with incorporating realtime and hyperlocal social media sources along several dimensions, including understanding the geography of social media and engaging in discussions with citizen crisis mapping organizations. We would love to hear from you.
Who made GDELT?
The GDELT team currently consists of Kalev Leetaru of Georgetown University, Philip Schrodt of Penn State University (PSU), Patrick Brandt of the University of Texas at Dallas (UTD), and John Beieler of Penn State University (PSU).
GDELT is available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee. You may also redistribute and republish the data in any form. However, any use or redistribution of the data must include a citation to GDELT and a link to this website.
The GDELT dataset is generated completely by fully automatic software algorithms operating with no human oversight or intervention and is based on global news media reporting. No warranties or guarantees of any kind, express or implied, are offered regarding the accuracy or completeness of the data.
We would like to specifically acknowledge the following organizations in making this research possible: BBC Monitoring, Reed Elsevier's LexisNexis Group, Google and Google News, and the School of Economic, Political and Policy Sciences at the University of Texas, Dallas.
Philip Schrodt's contributions to the project were funded in part by National Science Foundation grant SES-1004414 and by a Fulbright-Hays Research Fellowship for work at the Peace Research Institute, Oslo (http://www.prio.no). Patrick Brandt's contributions were funded in part by National Science Foundation grant SES-092105.