Python and JSON-LD
08-03-14
I've published some code for mapping CSV data to RDF using Python and JSON-LD on Github. The motivation for this work was:
- to provide sample data to help people get started with VIVO, the research profile system built on Semantic Web standards
- to learn more about JSON-LD and explore it as a tool for assisting with the Extract Transform Load (ETL) type work that's required for projects like VIVO.
Most people wanting to convert data to RDF will have access to relational data sources or tabular data exported as CSV or TSV files. Using Python to read in a CSV source, convert it to a list of dictionaries using the standard library's CSV module and then map it RDF using a JSON-LD context can be really straightforward. See the context and code for creating academic appointments in VIVO as example.
In a more real-world example, I've used a different JSON-LD context to convert JSON data from Pubmed and CrossRef APIs to a local publication ontology. Here, too, JSON-LD provides a nice way to map from multiple, slightly different sources, to a common RDF model.
There seems to be potential for a community working with RDF in a common ontology, like VIVO, to collaborate on developing common contexts for various data types and sharing and reusing them. Members could use these contexts in a variety of tools and not be tied to a particular implementation (like Python) since the JSON can be read by nearly all programming languages.
The RDFLib plugin for JSON-LD parsing and serializing is still undergoing development and the spec only became final in January of this year, so these are early days. But I look forward to learning more about the spec and implementing it in other tools.
I've published some code for mapping CSV data to RDF using Python and JSON-LD on Github. The motivation for this work was:
- to provide sample data to help people get started with VIVO, the research profile system built on Semantic Web standards
- to learn more about JSON-LD and explore it as a tool for assisting with the Extract Transform Load (ETL) type work that's required for projects like VIVO.
Most people wanting to convert data to RDF will have access to relational data sources or tabular data exported as CSV or TSV files. Using Python to read in a CSV source, convert it to a list of dictionaries using the standard library's CSV module and then map it RDF using a JSON-LD context can be really straightforward. See the context and code for creating academic appointments in VIVO as example.
In a more real-world example, I've used a different JSON-LD context to convert JSON data from Pubmed and CrossRef APIs to a local publication ontology. Here, too, JSON-LD provides a nice way to map from multiple, slightly different sources, to a common RDF model.
There seems to be potential for a community working with RDF in a common ontology, like VIVO, to collaborate on developing common contexts for various data types and sharing and reusing them. Members could use these contexts in a variety of tools and not be tied to a particular implementation (like Python) since the JSON can be read by nearly all programming languages.
The RDFLib plugin for JSON-LD parsing and serializing is still undergoing development and the spec only became final in January of this year, so these are early days. But I look forward to learning more about the spec and implementing it in other tools.