Semantic Crawler Library

Reading Time: 3 minutes

During my work over the last months I have written at least three times code that parses FOAF files. Always slightly different with different frameworks (Jena, Jaxen) on different platforms. Now I have decided to consolidate at least the parsing in a central library.

The library with the name Semantic Crawler or shortly scrawler should not only parse FOAF file but also the other main ontologies. It’s more a parser than a crawler.

Advantage

  • Able to skip the parsing step and use directly the class representation. Only needed the URL/IRI of the RDF file.
  • Easier use of Semantic Web Technology for Software Engineers and Programmer.
  • Easy to extend the parsing step.
  • OpenSource (as all stuff on this website)
  • More time for the essential parts of Semantic Web programs. For example: Reasoning

Interface Example

For the interface I have introduced Java Annotations to be able to generate from the object a RDF file. For this purpose there is not enough to store the raw data, but also the meta data how the sub tree has looked liked before the parsing.

@RDFEntity(ontoURI = Ontologies.foafURI, concept="Person")
public interface IFOAFPerson {

    @RDFProperty(ontoURI=Ontologies.foafURI, value="name")
    public abstract String getName();

    /*
     * Here are more method definitions...
     */

    @RDFProperty(
        ontoURI=Ontologies.geoURI,
        value="Point",
        type = Type.ROOTNODE,
        subNode= { "lat", "long" },
        subNodeOntoURI = { Ontologies.geoURI, Ontologies.geoURI },
        subNodeDeep = { 1, 1 },
        subNodeType = { Type.LITERAL, Type.LITERAL } )
    public abstract Vector<double> getLocation();

}

A RDF file that represent the Java interface should look like the following example (only a excerpt with the data shown also in the interface):

<foaf:Agent>
        <foaf:name>Alex Oberhauser</foaf:name>
        <geo:Point>
                <geo:lat>10.1174855232239</geo:lat>
                <geo:long>99.8058342933655</geo:long>
        </geo:Point>
</foaf:Agent>

The automatic transformation from Java Object to RDF with the help of the annotation is not written yet.

The Java Annotations describes at the moment only the classes and the properties. The implemented interface is only the first approach to handle efficiently the transformation from Java to RDF and back. The name value and the lat/long values are returned from the methods.

  • subNodeDeep:

Indicates how far the node is away from the root node (described by value). 1 (one) means that the subNode is a child, 2 (two) means that is the child of a child.

  • subNodeType/type:

The type that could be stored in this node. Possible values: RESOURCE, ROOTNODE, LITERAL, RESOURCE_LITERAL, UNDEFINED

  • ontoURI:

The URI to the ontology. Stored in the Ontolgies class.

Example for Programmers/Software Engineers

GIT WebAccess

The library is a maven project. The two important maven commands for the project are listed below:

  • mvn package

Creates the library as jar, the sources as jar and the JavaDoc as jar.

  • mvn site

Creates a website with useful information.

If you want to learn what ontologies are supported and what values you can gain from such a RDF file you should read the JavaDoc and/or look into the interface package to.networld.scrawler.interfaces. Is possible that not all interfaces have an implementation, but I am working on it.

Now to the interesting part. The following code reads out my Name from my FOAF file and prints it to the STDOUT. The following code is only an excerpt of the important part.

IFOAFPerson myFOAF = new Person("http://devnull.networld.to/foaf.rdf");
String myName = myFOAF.getName();
System.out.println("My name is " + myName);

Looks easy? That was the intention to write a simple library that could be used in more complex and more interesting application.

Please feel free to contact me if you have question or improvements. I will try to answer or to fix it. And of course let me know in what application you use my library. And keep in mind that at the moment the best working parts is the parsing of FOAF files but I will for sure extend the library with other useful ontologies.

14 comments

  1. it was very interesting to read.
    I want to quote your post in my blog. It can?
    And you et an account on Twitter?

    1. Thank you, nice to hear.

      Of course you can quote the post, but please add the source.

      My name on twitter is obale.

  2. it was very interesting to read networld.to
    I want to quote your post in my blog. It can?
    And you et an account on Twitter?

    1. I could list you as partner if you participate in one of our project or support the project in general in one or another way.

Comments are closed.