After looking at the profession field of the biography record and looking at our mapping it seems rather poor:
In Merlin, the profession is a flat authority so this could be made into a thesaurus…
- Vlado: Excellent! But please first take a look at the COMPLETE list of values, it may need some fixing. Eg
- dealer/auction house: seems this is an OR
- author/poet; military/naval: seems this is a parent/child
- Printmaker vs printmaker
- sculptor/medallist: seems this is an AND so you should break it into 2 terms
Therefore I was thinking of modelling in different ways. Recommendations?
- Modelling the profession as an E7_Activity (and/or E30_Right) and using the “profession” as a skos:Concept / E55_Type to type the activity as that profession.
- Vlado: no, E7 is for a specific activity, while you're describing kinds of activities.
- Modelling the profession as a E74_Group, making the Actor a member of that group P107i_is_current_or_former_member_of
- That's how we model Nationality, so currently you can ask: "give me paintings created by Italian"
- I'd do this only if Dominic wants to be able to search for eg "give me paintings created by archeologist".
Seems like a useful query, though maybe a bit exotic
- Let's define two subprops of P107i since we'd use it for two different purposes (Nationality vs Profession)
- Reifying the property with an EX_Association and typing the “profession” as a skos:Concept/E55_Type.
- Vlado: unnecessary. Since the only characteristic of that group is the profession, put the profession in the URI of the group.
- Vlado: another way is to model the profession as P2_has_type directly on the person
JM: I think as with Nationality, I didn't like extending P2_has_type as you can change professions and felt too permenant...I think I'll go with modelling as group and have sub-props for nationality & profession.
Josh, these P82a,P82b are bogus. You can't have 2 dates per field:
JM: Ah yes, there was an error in the logic of the mapping..
I will amend this to use P81a/b & P82a/b if there are both dates:
JM: If there is just one date then I will just use P82.
JM2: Actually, there is an issue here: if there is only one date (i.e only a first / last date) means the above will look like - only first date:
As a result, which predicate should be used?
- Still use P82a & P81a to model the fdate_earliest/latest
- Just use P82 and assign the value as just the fdate_text (i.e. it's just a text display and not a xsd:date)
VA2: If you model periods with 4 separate events (birth/death for life, joining/leaving for profession) as shown below,
then in each event you'll use a single P82, no matter whether earliest=latest.
VA3: Please write out a decision table: for each of the feasible combinations of (f|l)date_(earliest|latest) (presence or absence), what CRM you'll emit. Are there <= constraints between these dates? What heuristics will you use if you need to?
And I think we should clean up the page at the end: document the final mapping and some examples, not all the trials & tribulations.
JM: Here is the basic model I'm using
Does this make sense?
Here's an example using approach 2
It also shows period of activity:
- you must either correlate "ruler" (profession) with "ruled" (period of activity)
- or else assume the same period applies to all professions. Are there several period for one person in Merlin?
I've looked at biography_0.trig and the dates are still not perfect:
- says "Bullard, Alfred" was named on 1923-01-01 and ceased to be named so on 1923-12-31
- better rename "[de]assigned/timespan" to "[de]assigned/date" for consistency
- says "E H Heusler" participated in activity P82_at_some_time_within "1859" but the note "1859 -" implies that’s when he started.
So better use P82a_begin_of_the_begin. Also, there's no type (xsd:gYear)
- says "Yeomanic Press" participated in activity P82_at_some_time_within "1914".
But this being an org, it’s fair to say this was the E66_Formation date of the org!
- says "Lancaster, Seth" was born in 1924. But the note says "1924 active", so this really was not his birth.
Don't know if you want to look for the word "active" in the note, this starts to smell of AI...
- says "Marie Louise of Tassis" was born 1630.
But the note "portrayed by Van Dyck (q.v.) c.1629/30." means she was painted 1629-1630, so probably was born earlier.
The note "1630 c. fl.", according to http://en.wikipedia.org/wiki/List_of_Latin_abbreviations, means
- c., ca., ca or cca. (circa): Used in dates to indicate approximately.
- fl., flor. (floruit) means the period of time during which a person, school, movement or even species was active or flourishing (literally, "he/she/it flourished")
- "'Amr ibn al-Layth" has note "879 - 900 :: ruled; AH 265-AH 287"
- (Note: the AH time span expresses the same years, but in the Islamic/Muslim/Hijri calendar (AH), see http://en.wikipedia.org/wiki/Hijri_year)
- you model this correctly as <activity/1/date>
- but maybe it's better to model it as "Joined profession/ruler in 0879, and Left profession/ruler in 0900", as explained above
- says "'Abd al-Rahman" was active only within "798", but that's when he started
JM4: The problem with most of these suggestions rely things we cannot do:
- Distinguishing between orgs and people.
- Distinguishing between birth/death dates and dates of other activities - the text along with each date is the only identifying factor
- Recognising what the text for each date means (these are free text fields and there is a loose standard)
The logic as specified in the pseudo code above produces these triples so it's not perfect as you say but to more toward better RDF based on more intelligent logic is difficult due to the points above.