Marvin Release Bot
MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata, abstract extraction. This repository can be used to fork the architecture for creating extensions, developing new extractors or debugging old ones. Fixes and patches will be deployed on the DBpedia servers each month via a fresh git clone
from the master
branch of the DIEF (DBpedia Information Extraction Framework).
Quick Start: Run a MARVIN Extraction
Implementation note: the scripts creates a folder marvin-extraction
where the code, results and logs are.
# check out this repo with all config files
git clone https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config
cd marvin-config
# (optional) delete previous versions of the DIEF
# (~10 minutes) install dief in marvin-extraction/extraction-framework
# if you installed it already you can run `git pull && mvn clean install` to update
rm -rf marvin-extraction/extraction-framework
./setup-or-reset-dief.sh
# test run Romanian extraction, very small
./marvin_extraction_run.sh test
To run the other extractions, use either of
# around 4-7 days
./marvin_extraction_run.sh generic
# around 4-7 days
./marvin_extraction_run.sh mappings
# around 7-14 days
./marvin_extraction_run.sh wikidata
To specify a different dump-date
# Set it in extractionConfiguration/{download|extraction}.*.properties
dump-date=20200301
If specified dump-date is newer as current local dumps, then adding it to extractionConfiguration/download.*.properties
is enough
Contributions & License
All scripts and config files in this repo are CC-0 (Public Domain). We accept pull requests to improve the config files, all contributions will be merged as CC-0. Marvin-config is intended to bootstrap developing fixes for the DIEF.
Acknowledgements
We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providing three servers to run:
- the main DBpedia extraction on a monthly basis
- community-provided extractors on Wikipedia, Wikidata or other sources
- enrichment, cleaning and parsing services, so-called Databus mods for open data on the Databus
This contribution by TIB to DBpedia & its community is a great push towards incentivizing Open Data and establishing a global and national research and innovation data infrastructure.
- Did you consider this information as helpful?
- Yep!Not quite ...