About the Demo

This demo uses the KBpedia knowledge structure to analyze your submitted Web page. KBpedia combines six public knowledge bases — Wikipedia, Wikidata, GeoNames, OpenCyc, DBpedia and UMBEL — to supply the standard concepts and entities used in the analysis.

 
 

As you submit your page or click on various aspects of the demo, Web service calls are then issued, with matching results from KBpedia (or other services, such as metadata extraction) returned back.

The combined KBpedia knowledge structure contains more than 54,000 reference concepts (RCs), organized into a knowledge graph as defined by the KBpedia Knowledge Ontology. KKO is a logically organized and computable structure that supports inference and reasoning. About 85% of the RCs are themselves entity types — that is, 46,000 natural classes of similar entities such as astronauts or marigolds — which are organized into about 30 "core" typologies that are mostly disjoint (non-overlapping) with one another. The typologies provide a powerful means for slicing-and-dicing the knowledge structure and the individual entity types provide the tie-in points to about 20 million individual entities. The remaining RCs are devoted to other logical divisions of the knowledge graph, specifically attributes, relations and topics.

Specific conventions and other things to keep in mind while working with the demo are described below.

About the interface

After submitting your sample Web page (full URL with http:// prefix required), the demo returns analysis and results in a series of panels and tabs. A progress bar displays while the analysis occurs. Though the full submitted page is analyzed, the displayed results are limited to a smaller fraction of the input page.

Where multiple panels are displayed on a single results page, panels can often be expanded or collapsed using the standard icons.

Some of the panels have multiple sub-sections that may be selected via the horizontal tabs across the panel.

If you ever need more information about the demo, click the 'more info' icon at any time, which will return you to this page.

Main Analysis Panel

The main analysis panel is organized at the top into five (5) main tabs.

Concepts Tab

The Concepts tab shows the main body of the text you submitted. The main body is "defluffed", meaning that navigation, side panels and header and footer are removed, along with embedded HTML markup, to just present the main text of your submitted page.

All reference concepts in this text are highlighted by background color. Standard concepts are backgrounded in blue. The top ten (10) concepts on your page, as measured by frequency and the unusualness of the terms, are shown in red.

When you mouseover each highlight bubble, you see a popup that lists one or more possible matching concepts. (If there are more than one concept shown, it means there is some ambiguity as to the precise match for the concept. Often subsequent disambiguation steps are applied to production content to resolve these differences, but this step requires more context than a simple URL submission provides.)

You may click on any item in this popup to go to a contextual page to see structured data about the given RC.

Entities Tab

The Entities tab shows the main body of the text you submitted. The main body is "defluffed", meaning that navigation, side panels and header and footer are removed, along with embedded HTML markup, to just present the main text of your submitted page.

All reference concepts in this text that correspond to entities are highlighted by background color, with a special emphasis on organizational entities. (Multiple entity types may be emphasized in any given tagging effort.) Standard entities are backgrounded in blue. The the likely publishing organization is shown in orange. (Special analysis using machine learning and supplementary organizational lists from the US PTO is applied in this particular case.)

When you mouseover each highlight bubble, you see a popup that lists one or more possible matching entities. (If there are more than one entity shown, it means there is some ambiguity as to the precise match. Often subsequent disambiguation steps are applied to production content to resolve these differences, but this step requires more context than a simple URL submission provides.)

You may click on any item in this popup to go to a contextual page to see structured data about the given entity.

Analysis Tab

The Analysis tab contains multiple evaluations, each provided under its own separate sub-panel.

Language Analysis Sub-panel

The Language Analysis sub-panel provides the analysis of the source language of the submitted Web page. Fifty-three (53) languages are currently supported. The ISO code for the language is also returned.

Topic Analysis Sub-panel

The Topic Analysis sub-panel presents results for the top ten (10) reference concepts (RCs) found in the submitted page. The top ten is determined by a score comprised of the term or phrase's uniqueness and frequency within the document.

When the link for more than one topic is returned, that means the system was not able to resolve the precise match, so all candidates are shown. Post-processing may be needed to disambiguate some topics, which can often be automatic with better understanding of your context and domain vocabulary. Clicking on a topic link takes you to the structured data for that RC in the KBpedia knowledge system.

Publisher Analysis Sub-panel

The system analyzes the Web page with special analysis to determine the possible publisher. This information is returned in this analysis section, including the likelihood score and other supporting information about the (probable) publisher.

Organizational Analysis Sub-panel

Special emphasis is given to organizational identification in this demo. The emphasis results from machine learning, special extractors for footers and copyrights, and use of an enhanced organizations gazetteer with millions of supplementary entries from the US PTO.

Results in this section indicate where the organizational entities were found, and links to those organizations from within KBpedia and other external knowledge bases.

Extracted Metadata Sub-panel

A rich variety of metadata may be extracted from your submitted Web page, the scope and exact nature of which depends on what is embedded in the source site. Standard HTML metadata is inspected, as well as structured content from RDFa and third parties such as Facebook, Google, Twitter, etc.

As a result, the list of possible metadata items that may be shown numbers into the hundreds, though typically only a much smaller set is found for any given page. Submit a few different source pages to the demo to see the diversity of items that may be extracted.

Graphs

The Graphs tab provides two graphing layout options, the ability to trace graph connections, and the ability to use the view in full screen or not.

Network Layout Sub-tab

The Network Layout relates the subject concept to the KBpedia reference concepts (RCs) via their graph placement into SuperTypes (also called Generals). By default, only the SuperTypes structure is displayed, with all the links between the identified concept and any of these super types. The subject concept pathways are highlighted through the appropriate (linked) SuperTypes. This layout provides a reference for placing and understanding the overall structure of the KBpedia knowledge graph (as expressed in KKO).

The overall size of the knowledge graph makes complete views of all structure unnavigable. The Network Layout has proven to be an effective scaffolding for understanding and navigating large graphs. It can also regenerate and refresh within acceptable times for dynamic display.

The graph can be zoomed with your mouse's wheel and you can navigate it by clicking and dragging it. View a portion of the knowledge graph for your page, using different layouts or options.

Hierarchical Layout Sub-tab

The Hierarchical Layout relates the subject concept to the KBpedia reference concepts (RCs) via their hierarchical placement into SuperTypes. The upper structure of the conceptual structure (called the SuperType structure) is always displayed in the graph. The subject concept pathways are highlighted through the appropriate (linked) SuperTypes. This layout provides a reference for placing and understanding the overall structure of the KBpedia knowledge graph (as expressed in KKO).

The overall size of the knowledge graph makes complete views unnavigable. The Hierarchical Layout has proven to be an effective scaffolding for understanding and navigating large graphs. It can also regenerate and refresh within acceptable times for dynamic display.

The graph can be zoomed with your mouse's wheel and you can navigate it by clicking and dragging it.

Expand Sub-tab

Expand Graph uses the Hierarchical Layout to extend its links to the actual concepts identified in the page. Like the Hierarchical Layout, the graph relates the page's subject concepts to the KBpedia reference concepts (RCs) via their hierarchical placement into SuperTypes. The upper structure of the conceptual structure (called the SuperType structure) is always displayed in the graph. The subject concept pathways are highlighted through the appropriate (linked) SuperTypes. This layout provides a reference for placing and understanding the overall structure of the KBpedia knowledge graph (as expressed in KKO).

The graph can be zoomed with your mouse's wheel and you can move it by clicking and dragging it.

Fullscreen Sub-tab

You can toggle the fullscreen mode by clicking the Fullscreen tab.

Export tab

The Export tab enables you to export your demo results in a variety of formats.

The structured data for KBpedia's knowledge base entities may be exported in the formats of JavaScript object notation (JSON), Resource Description Framework in XML format (RDF+XML), RDF in Notation 3 format (RDF+N3), or Extensible Data Notation (EDN, a Clojure format). A variety of other formats are commercially available, and others may be implemented upon request.

Metadata Panel

A rich variety of metadata may be extracted from your submitted Web page, the scope and exact nature of which depends on what is embedded in the source site. Standard HTML metadata is inspected, as well as structured content from RDFa and third parties such as Facebook, Google, Twitter, etc.

As a result, the list of possible metadata items that may be shown numbers into the hundreds, though typically only a much smaller set is found for any given page. Submit a few different source pages to the demo to see the diversity of items that may be extracted.

The Tip of the Iceberg

Of course, there are many aspects of the KBpedia knowledge structure not shown by the demo. For example, mapping and data interoperability do not lend themselves to quick demos. Also, disambiguation and dedicated extractors and taggers often require additional data or inputs, or specific enhancements to the code base. The demo also does not show the hundreds of document formats supported by KBpedia, nor automated harvesting, ingest or transformation of Web or structured content.