new release Archives - DBpedia Association https://www.dbpedia.org/blog/tag/new-release/ Global and Unified Access to Knowledge Graphs Mon, 12 Feb 2024 16:50:39 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png new release Archives - DBpedia Association https://www.dbpedia.org/blog/tag/new-release/ 32 32 DBpedia Snapshot 2022-12 Release https://www.dbpedia.org/blog/dbpedia-snapshot-2022-12-release/ Mon, 27 Mar 2023 09:36:32 +0000 https://www.dbpedia.org/?p=5585 We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset.  News since DBpedia Snapshot 2022-09 Work in progress: Smoothing the community issue reporting and fixing at Github What is the “DBpedia Snapshot” Release? […]

The post DBpedia Snapshot 2022-12 Release appeared first on DBpedia Association.

]]>
We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. 

News since DBpedia Snapshot 2022-09

  • New Abstract Extractor due to GSOC 2022 (credits to Celian Ringwald) 

Work in progress: Smoothing the community issue reporting and fixing at Github

What is the “DBpedia Snapshot” Release?

Historically, this release has been associated with many names: “DBpedia Core”, “EN DBpedia”, and — most confusingly — just “DBpedia”. In fact, it is a combination of —

  • EN Wikipedia data — A small, but very useful, subset (~ 1 Billion triples or 14%) of the whole DBpedia extraction using the DBpedia Information Extraction Framework (DIEF), comprising structured information extracted from the English Wikipedia plus some enrichments from other Wikipedia language editions, notably multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga, id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh.
  • Links — 62 million community-contributed cross-references and owl:sameAs links to other linked data sets on the Linked Open Data (LOD) Cloud that allow to effectively find and retrieve further information from the largest,  decentral, change-sensitive knowledge graph on earth that has formed around DBpedia since 2007. 
  • Community extensions — Community-contributed extensions such as additional ontologies and taxonomies. 

Release Frequency & Schedule

Going forward, releases will be scheduled for the 1th of February, May, August, and November (with +/- 5 days tolerance), and are named using the same date convention as the Wikipedia Dumps that served as the basis for the release. An example of the release timeline is shown below: 

December 6–8 December 8–20Dec 20–Jan 1Jan 1–Feb 15
Wikipedia dumps for June 1 become available on https://dumps.wikimedia.org/Download and extraction with DIEFPost-processing and quality-control periodLinked Data and SPARQL endpoint deployment 

Data Freshness

Given the timeline above, the EN Wikipedia data of DBpedia Snapshot has a lag of 1-4 months. We recommend the following strategies to mitigate this:

  1. DBpedia Snapshot as a kernel for Linked Data: Following the Linked Data paradigm, we recommend using the Linked Data links to other knowledge graphs to retrieve high-quality and recent information. DBpedia’s network consists of the best knowledge engineers in the world, working together, using linked data principles to build a high-quality, open, decentralized knowledge graph network around DBpedia. Freshness and change-sensitivity are two of the greatest data-related challenges of our time, and can only be overcome by linking data across data sources. The “Big Data” approach of copying data into a central warehouse is inevitably challenged by issues such as co-evolution and scalability. 
  2. DBpedia Live: Wikipedia is unmistakenly the richest, most recent body of human knowledge and source of news in the world. DBpedia Live is just minutes behind edits on Wikipedia,  which means that as soon as any of the 120k Wikipedia editors press the “save” button, DBpedia Live will extract fresh data and update.  DBpedia Live consists of the DBpedia Live Sync API (for syncing into any kind of on-site databases), Linked Data and SPARQL endpoint.
  3. Latest-Core is a dynamically updating Databus Collection. Our automated extraction robot “MARVIN” publishes monthly dev versions of the full extraction, which are then refined and enriched to become Snapshot.      

Data Quality & Richness

We would like to acknowledge the excellent work of Wikipedia editors (~46k active editors for EN Wikipedia), who are ultimately responsible for collecting information in Wikipedia’s infoboxes, which are refined by DBpedia’s extraction into our knowledge graphs. Wikipedia’s infoboxes are steadily growing each month and according to our measurements grow by 150% every three years. EN Wikipedia’s inboxes even doubled in this timeframe. This richness of knowledge drives the DBpedia Snapshot knowledge graph and is further potentiated by synergies with linked data cross-references. Statistics are given below

Data Access & Interaction Options

Linked Data

Linked Data is a principled approach to publishing RDF data on the Web that enables interlinking data between different data sources, courtesy of the built-in power of Hyperlinks as unique Entity Identifiers.


HTML pages comprising Hyperlinks that confirm to Linked Data Principles is one of the methods of interacting with data provided by the DBpedia Snapshot, be it manually via the web browser or programmatically using REST interaction patterns via https://dbpedia.org/resource/{entity-label} pattern. Naturally, we encourage Linked Data interactions, while also expecting user-agents to honor the cache-control HTTP response header for massive crawl operations. Instructions for accessing Linked Data, available in 10 formats.

SPARQL Endpoint

This service enables some astonishing queries against Knowledge Graphs derived from Wikipedia content. The Query Services Endpoint that makes this possible is identified by http://dbpedia.org/sparql, and it currently handles 7.2 million queries daily on averageSee powerful queries and instructions (incl. rates and limitations).

An effective Usage Pattern is to filter a relevant subset of entity descriptions for your use case via SPARQL and then combine with the power of Linked Data by looking up (or de-referencing) data via owl:sameAs property links en route to retrieving specific and recent data from across other Knowledge Graphs across the massive Linked Open Data Cloud.

Additionally, DBpedia Snapshot dumps and additional data from the complete collection of datasets derived from Wikipedia are provided by the DBpedia Databus for use in your own SPARQL-accessible Knowledge Graphs.

DBpedia Ontology

This Snapshot Release was built with DBpedia Ontology (DBO) version: https://databus.dbpedia.org/ontologies/dbpedia.org/ontology–DEV/2021.11.08-124002 We thank all DBpedians for the contribution to the ontology and the mappings. See documentation and visualizations, class tree and properties, wiki.

DBpedia Snapshot Statistics

Overview. Overall the current Snapshot Release contains more than 850 million facts (triples).

At its core, the DBpedia ontology is the heart of DBpedia. Our community is continuously contributing to the DBpedia ontology schema and the DBpedia infobox-to-ontology mappings by actively using the DBpedia Mappings Wiki.

The current Snapshot Release utilizes a total of 55 thousand properties, whereas 1377 of these are defined by the DBpedia ontology.

Classes. Knowledge in Wikipedia is constantly growing at a rapid pace. We use the DBpedia Ontology Classes to measure the growth: Total number in this release (in brackets we give: a) growth to the previous release, which can be negative temporarily and b) growth compared to Snapshot 2016-10): 

  • Persons: 1792308 (1.01%, 1.13%)
  • Places: 748372 (1.00%, 1820.86%), including but not limited to 590481 (1.00%, 5518.51%) populated places
  • Works 610589 (1.00%, 619.89%), including, but not limited to
    • 157566 (1.00%, 1.38%) music albums
    • 144415 (1.01%, 15.94%) films
    • 24829 (1.01%, 12.53%) video games
  • Organizations: 345523 (1.01%, 109.31%), including but not limited to
    • 87621 (1.01%, 2.25%) companies
    • 64507 (1.00%, 64507.00%) educational institutions
  • Species: 1933436 (1.01%, 322239.33%)
  • Plants: 7718 (0.82%, 1.71%)
  • Diseases: 10591 (1.00%, 8.54%)

Detailed Growth of Classes: The image below shows the detailed growth for one class. Click on the links for other classes: Place, PopulatedPlace, Work, Album, Film, VideoGame, Organisation, Company, EducationalInstitution, Species, Plant, Disease. For further classes adapt the query by replacing the <http://dbpedia.org/ontology/CLASS> URI. Note, that 2018 was a development phase with some failed extractions. The stats were generated with the Databus VOID Mod.

Links. Linked Data cross-references between decentral datasets are the foundation and access point to the Linked Data Web. The latest Snapshot Release provides over 130.6 million links from 7.62 million entities to 179 external sources.

Top 11

###TOP11###

33,975305 http://www.wikidata.org 

  7,206,254 https://global.dbpedia.org 

  4,308,772 http://yago-knowledge.org 

  3,855,108 http://de.dbpedia.org 

  3,731,002 http://fr.dbpedia.org 

  2,991,921 http://viaf.org 

  2,929,808 http://it.dbpedia.org 

  2,925,530 http://es.dbpedia.org 

  2,788,703 http://fa.dbpedia.org 

  2,587,004 http://ru.dbpedia.org 

  2,580,398 http://sr.dbpedia.org 

Top 10 without DBpedia namespaces

###TOP10###

33,975,305 http://www.wikidata.org 

  4,308,772 http://yago-knowledge.org 

  2,991,921 http://viaf.org

  1,708,533 http://d-nb.info 

     612,227 http://sws.geonames.org 

     596,134 http://umbel.org 

     537,602 http://data.bibliotheken.nl 

     430,839 http://www.w3.org 

     422,989 http://musicbrainz.org 

     104,433 http://linkedgeodata.org 

DBpedia Extraction Dumps on the Databus

All extracted files are reachable via the DBpedia account on the Databus. The Databus has two main structures:

Snapshot Download. For downloading DBpedia Snapshot, we prepared this collection, which also includes detailed releases notes: 

https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03

The collection is roughly equivalent to http://downloads.dbpedia.org/2016-10/core/

Collections can be downloaded in many different ways, some download modalities such as bash script, SPARQL, and plain URL list are found in the tabs at the collection. Files are provided as bzip2 compressed n-triples files. In case you need a different format or compression, you can also use the “Download-As” function of the Databus Client (GitHub), e.g. -s $collection -c gzip would download the collection and convert it to GZIP during download. 

Replicating DBpedia Snapshot on your server can be done via Docker, see https://hub.docker.com/r/dbpedia/virtuoso-sparql-endpoint-quickstart 

git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git

cd virtuoso-sparql-endpoint-quickstart

COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-09 VIRTUOSO_ADMIN_PASSWD=password docker-compose up

Download files from the whole DBpedia extraction. The whole extraction consists of approx. 20 Billion triples and 5000 files created from 140 languages of Wikipedia, Commons  and Wikidata. They can be found in https://databus.dbpedia.org/dbpedia/(generic|mappings|text|wikidata

You can copy-edit a collection and create your own customized (e.g.) collections via “Actions” -> “Copy Edit” , e.g. you can Copy Edit the snapshot collection above, remove some files that you do not need and add files from other languages. Please see the Rhizomer use case: Best way to download specific parts of DBpedia. Of course, this only refers to the archived dumps on the Databus for users who want to bulk download and deploy into their own infrastructure. Linked Data and SPARQL allow for filtering the content using a small data pattern.  

Acknowledgments

First and foremost, we would like to thank our open community of knowledge engineers for finding & fixing bugs and for supporting us by writing data tests. We would also like to acknowledge the DBpedia Association members for constantly innovating the areas of knowledge graphs and linked data and pushing the DBpedia initiative with their know-how and advice. OpenLink Software supports DBpedia by hosting SPARQL and Linked Data; University Mannheim, the German National Library of Science and Technology (TIB) and the Computer Center of University Leipzig provide persistent backups and servers for extracting data. We thank Marvin Hofer and Mykola Medynskyi for technical preparation. This work was partially supported by grants from the Federal Ministry for Economics and Climate Action (BMWK) for the LOD-GEOSS Project (03EI1005E), PenFLaaS (100594042) as well as for the PLASS Project (01MD19003D).

The post DBpedia Snapshot 2022-12 Release appeared first on DBpedia Association.

]]>
New DBpedia Release – 2016-10 https://www.dbpedia.org/blog/new-dbpedia-release-2016-10/ Tue, 04 Jul 2017 11:53:03 +0000 http://blog.dbpedia.org/?p=435 We are happy to announce the new DBpedia Release. This release is based on updated Wikipedia dumps dating from October 2016. You can download the new DBpedia datasets in N3 / TURTLE serialisation from http://wiki.dbpedia.org/downloads-2016-10 or directly here http://downloads.dbpedia.org/2016-10/. This release took us longer than expected. We had to deal with multiple issues and included […]

The post New DBpedia Release – 2016-10 appeared first on DBpedia Association.

]]>
We are happy to announce the new DBpedia Release.

This release is based on updated Wikipedia dumps dating from October 2016.

You can download the new DBpedia datasets in N3 / TURTLE serialisation from http://wiki.dbpedia.org/downloads-2016-10 or directly here http://downloads.dbpedia.org/2016-10/.

This release took us longer than expected. We had to deal with multiple issues and included new data. Most notable is the addition of the NIF annotation datasets for each language, recording the whole wiki text, its basic structure (sections, titles, paragraphs, etc.) and the included text links. We hope that researchers and developers, working on NLP-related tasks, will find this addition most rewarding. The DBpedia Open Text Extraction Challenge (next deadline Mon 17 July for SEMANTiCS 2017) was introduced to instigate new fact extraction based on these datasets.

We want to thank anyone who has contributed to this release, by adding mappings, new datasets, extractors or issue reports, helping us to increase coverage and correctness of the released data.  The European Commission and the ALIGNED H2020 project for funding and general support.

You want to read more about the  New Release? Click below for further  details.[expander_maker id=”1″ more=”Read more” less=”Read less”]

 Statistics

Altogether the DBpedia 2016-10 release consists of 13 billion (2016-04: 11.5 billion) pieces of information (RDF triples) out of which 1.7 billion (2016-04: 1.6 billion) were extracted from the English edition of Wikipedia, 6.6 billion (2016-04: 6 billion) were extracted from other language editions and 4.8 billion (2016-04: 4 billion) from Wikipedia Commons and Wikidata.

In addition, adding the large NIF datasets for each language edition (see details below) increased the number of triples further by over 9 billion, bringing the overall count up to 23 billion triples.

Changes

  • The NLP Interchange Format (NIF) aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. To extend the versatility of DBpedia, furthering many NLP-related tasks, we decided to extract the complete human- readable text of any Wikipedia page (‘nif_context’), annotated with NIF tags. For this first iteration, we restricted the extent of the annotations to the structural text elements directly inferable by the HTML (‘nif_page_structure’). In addition, all contained text links are recorded in a dedicated dataset (‘nif_text_links’).
    The DBpedia Association started the Open Extraction Challenge on the basis of these datasets. We aim to spur knowledge extraction from Wikipedia article texts in order to dramatically broaden and deepen the amount of structured DBpedia/Wikipedia data and provide a platform for benchmarking various extraction tools with this effort.
    If you want to participate with your own NLP extraction engine, the next deadline for the SEMANTICS 2017 is July 17th.
    We included an example of these structures in section five of the download-page of this release.
  • A considerable amount of work has been done to streamline the extraction process of DBpedia, converting many of the extraction tasks into an ETL setting (using SPARK). We are working in concert with the Semantic Web Company to further enhance these results by introducing a workflow management environment to increase the frequency of our releases.

In case you missed it, what we changed in the previous release (2016-04)

  • We added a new extractor for citation data that provides two files:
    • citation links: linking resources to citations
    • citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
  • In addition to normalised datasets to English DBpedia (en-uris), we additionally provide normalised datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
  • We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
  • Major improvements in our citation extraction. See here for more details.
  • We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as additional datasets (instance_types_sdtyped_dbo).

 

Upcoming Changes

  • DBpedia Fusion: We finally started working again on fusing DBpedia language editions. Johannes Frey is taking the lead in this project. The next release will feature intermediate results.
  • Id Management: Closely pertaining to the DBpedia Fusion project is our effort to introduce our own Id/IRI management, to become independent of Wikimedia created IRIs. This will not entail changing out domain or entity naming regime, but providing the possibility of adding entities of any source or scope.
  • RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to an RML based approach on Github. Wouter started working exclusively on implementing the Git based wiki and the conversion of existing mappings last week. We are looking forward to the consequent results of this process.
  • Further development of SPARK Integration and workflow-based DBpedia extraction, to increase the release frequency.

 

New Datasets

  • New languages extracted from Wikipedia:

South Azerbaijani (azb), Upper Sorbian (hsb), Limburgan (li), Minangkabau (min), Western Mari (mrj), Oriya (or), Ossetian (os)

  • SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch.
  • Extensions: In the extension folder (2016-10/ext) we provide two new datasets (both are to be considered in an experimental state:
    • DBpedia World Facts: This dataset is authored by the DBpedia Association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
    • JRC-Alternative-Names: This resource is a link based complementary repository of spelling variants for person and organisation names. The data is multilingual and contains up to hundreds of variations entity. It was extracted from the analysis of news reports by the Europe Media Monitor (EMM) as available on JRC-Names.

 Community

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:

  • 760 classes
  • 1,105 object properties
  • 1,622 datatype properties
  • 132 specialised datatype properties
  • 414 owl:equivalentClass and 220 owl:equivalentProperty mappings external vocabularies

The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-10 extraction, we used a total of 5887 template mappings (DBpedia 2015-10: 5800 mappings). The top language, gauged by the number of mappings, is Dutch (648 mappings), followed by the English community (606 mappings).[/expander_maker]

 Credits to

  • Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
  • Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
  • Václav Zeman and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
  • Alan Meehan (TCD) for performing a big external link cleanup
  • Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
  • SpringerNature for offering a co-internship to a bright student and developing a closer relation to DBpedia on multiple issues, as well as Links to their SciGraph subjects.
  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
  • OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support.
  • Ruben Verborgh from Ghent University – imec for publishing the dataset as Triple Pattern Fragments, and imec for sponsoring DBpedia’s Triple Pattern Fragments server.
  • Ali Ismayilov (University of Bonn) for extending and cleaning of the DBpedia Wikidata dataset.
  • All the GSoC students and mentors which have directly or indirectly worked on the DBpedia release
  • Special thanks to members of the DBpedia Association, the AKSW and the Department for Business Information Systems of the University of Leipzig.

The work on the DBpedia 2016-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering.

More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2016-10 release!

The post New DBpedia Release – 2016-10 appeared first on DBpedia Association.

]]>
YEAH! We did it again ;) – New 2016-04 DBpedia release https://www.dbpedia.org/blog/yeah-we-did-it-again-new-2016-04-dbpedia-release/ Wed, 19 Oct 2016 09:17:59 +0000 http://blog.dbpedia.org/?p=223 Hereby we announce the release of DBpedia 2016-04. The new release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology. You can download the new DBpedia datasets in a variety of RDF-document formats from: […]

The post YEAH! We did it again ;) – New 2016-04 DBpedia release appeared first on DBpedia Association.

]]>
Hereby we announce the release of DBpedia 2016-04. The new release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.

You can download the new DBpedia datasets in a variety of RDF-document formats from: http://wiki.dbpedia.org/downloads-2016-04 or directly here: http://downloads.dbpedia.org/2016-04/

Support DBpedia

During the latest DBpedia meeting in Leipzig we discussed about ways to support DBpedia and what benefits this support would bring. For the next two months, we are aiming to raise money to support the hosting of the main services and the next DBpedia release (especially to shorten release intervals). On top of that we need to buy a new server to host DBpedia Spotlight that was so generously hosted so far by third parties. If you use DBpedia and want us to keep going forward, we kindly invite you to donate here or become a member of the DBpedia association.

Statistics

The English version of the DBpedia knowledge base currently describes 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M depictions. In total, 5.2M resources are classified in a consistent ontology, consisting of 1.5M persons, 810K places (including 505K populated places), 490K works (including 135K music albums, 106K films and 20K video games), 275K organizations (including 67K companies and 53K educational institutions), 301K species and 5K diseases. The total number of resources in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M intermediate nodes.

Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10: 8.8 billion) pieces of information (RDF triples) out of which 1.3 billion (2015-10: 1.1 billion) were extracted from the English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other language editions and 3.2 billion (2015-10: 3.2 billion) from  DBpedia Commons and Wikidata. In general, we observed a growth in mapping-based statements of about 2%.

Thorough statistics can be found on the DBpedia website and general information on the DBpedia datasets here.

Community

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:

  • 754 classes (DBpedia 2015-10: 739)
  • 1,103 object properties (DBpedia 2015-10: 1,099)
  • 1,608 datatype properties (DBpedia 2015-10: 1,596)
  • 132 specialized datatype properties (DBpedia 2015-10: 132)
  • 410 owl:equivalentClass and 221 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 407 – 221)

The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-04 extraction, we used a total of 5800 template mappings (DBpedia 2015-10: 5553 mappings). For the second time the top language, gauged by the number of mappings, is Dutch (646 mappings), followed by the English community (604 mappings).

(Breaking) Changes

  • In addition to normalized datasets to English DBpedia (en-uris) we additionally provide normalized datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
  • We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
  • Major improvements in our citation extraction. See here for more details.
  • We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as an additional datasets (instance_types_sdtyped_dbo).

In case you missed it, what we changed in the previous release (2015-10):

  • English DBpedia switched to IRIs. This can be a breaking change to some applications that need to change their stored DBpedia resource URIs / links. We provide the “uri-same-as-iri” dataset for English to ease the transition.
  • The instance-types dataset is now split into two files: instance-types (containing only direct types) and instance-types-transitive containing the transitive types of a resource based on the DBpedia ontology
  • The mappingbased-properties file is now split into three (3) files:
    • “geo-coordinates-mappingbased” that contains the coordinated originating from the mappings wiki. the “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor
    • “mappingbased-literals” that contains mapping based fact with literal values
    • “mappingbased-objects” that contains mapping based fact with object values
    • the “mappingbased-objects-disjoint-[domain|range]” are facts that are filtered out from the “mappingbased-objects” datasets as errors but are still provided
  • We added a new extractor for citation data that provides two files:
    • citation links: linking resources to citations
    • citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
  • All datasets are available in .ttl and .tql serialization (nt, nq dataset were neglected for reasons of redundancy and server capacity).

Upcoming Changes

  • Dataset normalization: We are going to normalize datasets based on wikidata uris and no longer on the English language edition, as a prerequisite to finally start the fusion process with wikidata.
  • RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to a RML based approach on Github. We are not there yet but this is at the top of our list of changes.
  • Starting with the next release we are adding datasets with NIF annotations of the abstracts (as we already provided those for the 2015-04 release). We will eventually extend the NIF annotation dataset to cover the whole Wikipedia article of a resource.

New Datasets

  • SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch (see above).
  • Extensions: In the extension folder (2016-04/ext) we provide two new datasets, both are to be considered in an experimental state:
    • DBpedia World Facts: This dataset is authored by the DBpedia association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
    • Lector Facts: This experimental dataset was provided by Matteo Cannaviccio and demonstrates his approach to generating facts by using common sequences of words (i.e. phrases) that are frequently used to describe instances of binary relations in a text. We are looking into using this approach as a regular extraction step. It would be helpful to get some feedback from you.

Credits

Lots of thanks to

  • Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
  • Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
  • Heiko Paulheim (University of Mannheim) for providing the necessary code for his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements. Which is now part of the DIEF.
  • Václav Zeman, Thomas Klieger and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
  • Marco Fossati (FBK) for contributing the DBTax types
  • Alan Meehan (TCD) for performing a big external link cleanup
  • Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
  • OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing  infrastructure for DBpedia in addition to their continuous infrastructure support.
  • Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server.
  • Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset.
  • Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort.
  • All the GSoC students and mentors which directly or indirectly influenced the DBpedia release
  • Special thanks to members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig.

The work on the DBpedia 2016-04 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering  (http://aligned-project.eu/). More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2016-04 release!

For more information about DBpedia, please visit our website or follow us on facebook!
Your DBpedia Association

The post YEAH! We did it again ;) – New 2016-04 DBpedia release appeared first on DBpedia Association.

]]>
YEAH! We did it !! (and it is not an April fool’s joke) – New 2015-10 DBpedia release https://www.dbpedia.org/blog/yeah-we-did-it-and-it-is-not-an-april-fools-joke-new-2015-10-dbpedia-release/ Fri, 01 Apr 2016 08:42:47 +0000 http://blog.dbpedia.org/?p=187 We proudly present our new 2015-10 DBpedia release, which is abailable now via:  http://dbpedia.org/sparql. Go an check it out! This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and cleaner data based on the DBpedia ontology. So, what did we […]

The post YEAH! We did it !! (and it is not an April fool’s joke) – New 2015-10 DBpedia release appeared first on DBpedia Association.

]]>
We proudly present our new 2015-10 DBpedia release, which is abailable now via:  http://dbpedia.org/sparql. Go an check it out!

This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and cleaner data based on the DBpedia ontology.

So, what did we do?

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses

  • 739 classes (DBpedia 2015-04: 735)
  • 1,099 properties with reference values (a/k/a object properties) (DBpedia 2015-04: 1,098)
  • 1,596 properties with typed literal values (a/k/a datatype properties) (DBpedia 2015-04: 1,583)
  • 132 specialized datatype properties (DBpedia 2015-04: 132)
  • 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 408 and 200, respectively)

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 extraction, we used a total of 5553 template mappings (DBpedia 2015-04: 4317 mappings). For the first time the top language, gauged by number of mappings, is Dutch (606 mappings), surpassing the English community (600 mappings).

And what are the (breaking) changes ?

  • English DBpedia switched to IRIs from URIs. 
  • The instance-types dataset is now split to two files:
    • “instance-types” contains only direct types.
    • “Instance-types-transitive” contains transitive types.
    • The “mappingbased-properties” file is now split into three (3) files:
      • “geo-coordinates-mappingbased”
      • “mappingbased-literals” contains mapping based statements with literal values.
      • “mappingbased-objects”
  • We added a new extractor for citation data.
  • All datasets are available in .ttl and .tql serialization 
  • We are providing DBpedia as a Docker image.
  • From now on, we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories.
  • In addition, we revamped the dataset table on the download-page. It’s created dynamically based on the DataID of all languages. Likewise, the tables on the statistics- page are now based on files providing information about all mapping languages.
  • From now on, we also include the original Wikipedia dump files(‘pages_articles.xml.bz2’) alongside the extracted datasets.
  • A complete changelog can always be found in the git log.

And what about the numbers?

Altogether the new DBpedia 2015-10 release consists of 8.8 billion (2015-04: 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion (2015-04: 737 million) were extracted from the English edition of Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other language editions, and 3.2 billion (2015-04: 2.4 billion) came from  DBpedia Commons and Wikidata. In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%.  Thorough statistics are available via the Statistics page.

And what’s up next?

We will be working to move away from the mappings wiki but we will have at least one more mapping sprint. Moreover, we have some cool ideas for GSOC this year. Additional mentors are more than welcome. 🙂

And who is to blame for the new release?

We want to thank all editors that contributed to the DBpedia ontology mappings via the Mappings Wiki, all the GSoC students and mentors working directly or indirectly on the DBpedia release and the whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.

Special thanks go to Markus Freudenberg and Dimitris Kontokostas (University of Leipzig), Volha Bryl (University of Mannheim / Springer), Heiko Paulheim (University of Mannheim), Václav Zeman and the whole LHD team (University of Prague), Marco Fossati (FBK), Alan Meehan (TCD), Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy), Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software), OpenLink Software (http://www.openlinksw.com/), Ruben Verborgh from Ghent University – iMinds, Ali Ismayilov (University of Bonn), Vladimir Alexiev (Ontotext) and members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig for their committment in putting tremendous time and effort to get this done.

The work on the DBpedia 2015-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering  (http://aligned-project.eu/).

 

Detailed information about the new release are available here. For more information about DBpedia, please visit our website or follow us on Facebook!

Have fun and all the best!

Yours

DBpedia Association

The post YEAH! We did it !! (and it is not an April fool’s joke) – New 2015-10 DBpedia release appeared first on DBpedia Association.

]]>