tue 27 april
spent the whole day studying document architecture literature -argumentative zoning / rhetorical structure theory- .. topic maps in xtm (link) or rdf (link) syntax could be used to express document structure semantics semi-automatically (namely, without having to manually annotate every single document) -- the idea being the following: a. create a pool of topic maps -a taxonomy of cultural heritage document structures- using a basic corpus of texts and b. the learning process: compare the structure of every document to this pool in order to determine what kind of document it is -- hence, we would get a good idea of what to expect from this document not only in terms of structure (where to find the abstract / the references / the quotes) but also in terms of writing style and other semantic riches
maybe this will work -- the thing is, it has to work with documents distributed over a HyperCuP-topology-structured Internet space (link), in conjuction with a formal cultural heritage ontology (link) used to express content semantics in those same documents :| .. it's been nice knowing you!
10:43:41 PM
|