View Source

{toc}

h1. Representing Profession

After looking at the profession field of the biography record and looking at our mapping it seems rather poor:

{code}
id/person-institution/123 bmo:PX_field_of_activity_of_the_agent “profession” .
bmo:PX_field_of_activity_of_the_agent rdfs:subPropertyOf crm:P3_has_note.
{code}

In Merlin, the profession is a flat authority so this could be made into a thesaurus…
- Vlado: Excellent! But please first take a look at the COMPLETE list of values, it may need some fixing. Eg
-- dealer/auction house: seems this is an OR
-- author/poet; military/naval: seems this is a parent/child
-- Printmaker vs printmaker
-- sculptor/medallist: seems this is an AND so you should break it into 2 terms

Therefore I was thinking of modelling in different ways. Recommendations?
# Modelling the profession as an E7_Activity (and/or E30_Right) and using the “profession” as a skos:Concept / E55_Type to type the activity as that profession.
## Vlado: no, E7 is for a specific activity, while you're describing kinds of activities.
# Modelling the profession as a E74_Group, making the Actor a member of that group P107i_is_current_or_former_member_of
## That's how we model Nationality, so currently you can ask: "give me paintings created by Italian"
## I'd do this only if Dominic wants to be able to search for eg "give me paintings created by archeologist".
Seems like a useful query, though maybe a bit exotic
## Let's define two subprops of P107i since we'd use it for two different purposes (Nationality vs Profession)
# Reifying the property with an EX_Association and typing the “profession” as a skos:Concept/E55_Type.
## Vlado: unnecessary. Since the only characteristic of that group is the profession, put the profession in the URI of the group.
# Vlado: another way is to model the profession as P2_has_type directly on the person

JM: I think as with Nationality, I didn't like extending P2_has_type as you can change professions and felt too permenant...I think I'll go with modelling as group and have sub-props for nationality & profession.

h2. Period of Activity

Josh, these P82a,P82b are bogus. You can't have 2 dates per field:
{noformat}
<id/person-institution/119413/activity/1/date>
crm:P82a_begin_of_the_begin "-2169-01-01"^^xsd:date,"-2189-01-01"^^xsd:date;
crm:P82b_end_of_the_end "-2169-01-01"^^xsd:date,"-2189-01-01"^^xsd:date;
crm:P3_has_note "2189BC - 2169BC :: ruled"^^xsd:string;
a crm:E52_Time-Span; rdfs:label "2189BC - 2169BC"^^xsd:string.
{noformat}

JM: Ah yes, there was an error in the logic of the mapping..
{code}
<if match="{bm_bi_fdate_text[.!='']}">
<triple predicate="crm:P82a_begin_of_the_begin" type="http://www.w3.org/2001/XMLSchema#date" value="{bm_bi_fdate_earliest}" modifier="formatmerlinearliestdateasxsddate"></triple>
<triple predicate="crm:P82b_end_of_the_end" type="http://www.w3.org/2001/XMLSchema#date" value="{bm_bi_fdate_latest}" modifier="formatmerlinearliestdateasxsddate"></triple>
</if>
<if match="{bm_bi_ldate_text[.!='']}">
<triple predicate="crm:P82a_begin_of_the_begin" type="http://www.w3.org/2001/XMLSchema#date" value="{bm_bi_ldate_earliest}" modifier="formatmerlinearliestdateasxsddate"></triple>
<triple predicate="crm:P82b_end_of_the_end" type="http://www.w3.org/2001/XMLSchema#date" value="{bm_bi_ldate_latest}" modifier="formatmerlinearliestdateasxsddate"></triple>
</if>
<if match="{bm_bi_ldate_text[.!='']|bm_bi_fdate_text[.!='']}">
<triple predicate="crm:P3_has_note" type="http://www.w3.org/2001/XMLSchema#string" value="{bm_bi_fdate_text} - {bm_bi_ldate_text} :: {bm_bi_datedet}"></triple>
<triple predicate="rdfs:label" type="http://www.w3.org/2001/XMLSchema#string" value="{bm_bi_fdate_text} - {bm_bi_ldate_text}"></triple>
</if>
{code}

I will amend this to use P81a/b & P82a/b if there are both dates:
{code}
<bm_alias_bi_dates>
<_>
<bm_bi_fdate_text>2189BC</bm_bi_fdate_text>
<bm_bi_fdate_earliest>2189 BC</bm_bi_fdate_earliest>
<bm_bi_fdate_latest>2189 BC</bm_bi_fdate_latest>
<bm_bi_ldate_text>2169BC</bm_bi_ldate_text>
<bm_bi_ldate_earliest>2169 BC</bm_bi_ldate_earliest>
<bm_bi_ldate_latest>2169 BC</bm_bi_ldate_latest>
<bm_bi_datedet>ruled</bm_bi_datedet>
</_>
</bm_alias_bi_dates>
{code}

JM: If there is just one date then I will just use P82.
JM2: Actually, there is an issue here: if there is only one date (i.e only a first / last date) means the above will look like - only first date:
{code}
<bm_alias_bi_dates>
<_>
<bm_bi_fdate_text>2189BC</bm_bi_fdate_text>
<bm_bi_fdate_earliest>2189 BC</bm_bi_fdate_earliest>
<bm_bi_fdate_latest>2189 BC</bm_bi_fdate_latest>
<bm_bi_ldate_text></bm_bi_ldate_text>
<bm_bi_ldate_earliest></bm_bi_ldate_earliest>
<bm_bi_ldate_latest></bm_bi_ldate_latest>
<bm_bi_datedet>ruled</bm_bi_datedet>
</_>
</bm_alias_bi_dates>
{code}
As a result, which predicate should be used?
# Still use P82a & P81a to model the fdate_earliest/latest
# Just use P82 and assign the value as just the fdate_text (i.e. it's just a text display and not a xsd:date)

VA2: If you model periods with 4 separate events (birth/death for life, joining/leaving for profession) as shown below,
then in each event you'll use a single P82, no matter whether earliest=latest.

VA3: Please write out a decision table: for each of the feasible combinations of (f|l)date_(earliest|latest) (presence or absence), what CRM you'll emit. Are there <= constraints between these dates? What heuristics will you use if you need to?

And I think we should clean up the page at the end: document the final mapping and some examples, not all the trials & tribulations.
JM3: Agreed!

JM: Here is the basic model I'm using
{code}
IF bm_bi_datedet == '' THEN
IF bm_bi_fdate_text != '' THEN
CREATE BIRTH DATE RDF (using P82a/b for fdate_earliest & fdate_latest)
ENDIF
IF bm_bi_ldate_text != '' THEN
CREATE DEATH DATE RDF (using P82a/b for ldate_earliest & ldate_latest)
ENDIF
ELSE
IF bm_bi_fdate_text != '' and bm_bi_ldate_text != '' THEN
CREATE ACTIVITY WITH TIMESPAN
P82a/P81a for fdate_earliest & fdate_latest
P81b/P82b for ldate_earliest & ldate_latest
ELSE
IF bm_bi_fdate_text != '' THEN
CREATE ACTIVITY WITH TIMESPAN
P82 for fdate_text
ENDIF
IF bm_bi_ldate_text != '' THEN
CREATE ACTIVITY WITH TIMESPAN
P82 for ldate_text
ENDIF
ENDIF
ENDIF
{code}
Does this make sense?
h1. Example

Here's an example using approach 2
It also shows period of activity:
- you must either correlate "ruler" (profession) with "ruled" (period of activity)
- or else assume the same period applies to all professions. Are there several period for one person in Merlin?

{noformat}
bmo:PX_nationality rdfs:subPropertyOf crm:P107i_is_current_or_former_member_of.
bmo:PX_profession rdfs:subPropertyOf crm:P107i_is_current_or_former_member_of.

<id/thesauri/profession> a skos:ConceptScheme.

<id/thesauri/profession/ruler> a crm:E74_Group, skos:Concept;
skos:prefLabel "ruler";
skos:inScheme <id/thesauri/profession>.

<id/thesauri/nationality> a skos:ConceptScheme.
<id/thesauri/nationality/Mesopotamian> a crm:E74_Group, skos:Concept;
skos:prefLabel "Mesopotamian";
skos:inScheme <id/thesauri/nationality>.

<id/person-institution/119413> a crm:E21_Person, skos:Concept;
skos:inScheme id:person-institution;
skos:prefLabel "Dudu";
crm:P3_has_note "Dynasty of Akkad";
bmo:PX_profession <id/thesauri/profession/ruler>;
bmo:PX_gender <id/thesauri/gender/male>;
bmo:PX_nationality <id/thesauri/nationality/Mesopotamian>;
P143i_was_joined_by <id/person-institution/119413/profession/1/start>;
P145i_left_by <id/person-institution/119413/profession/1/end>.

<id/person-institution/119413/profession/1/start> a E85_Joining;
P4_has_time-span <id/person-institution/119413/profession/1/start/date>;
P144_joined_with <id/thesauri/profession/ruler>.
<id/person-institution/119413/profession/1/start/date> a E52_Time-Span;
P82_at_some_time_within "-2189"^^xsd:gYear;

<id/person-institution/119413/profession/1/end> a E86_Leaving;
P4_has_time-span <id/person-institution/119413/profession/1/end/date>;
P145_separated <id/thesauri/profession/ruler>.
<id/person-institution/119413/profession/1/end/date> a E52_Time-Span;
P82_at_some_time_within "-2169"^^xsd:gYear;
{noformat}

h1. Biographical Dates
I've looked at biography_0.trig and the dates are still not perfect:
- says "Bullard, Alfred" was named on 1923-01-01 and ceased to be named so on 1923-12-31
- better rename "[de]assigned/timespan" to "[de]assigned/date" for consistency
JM4: Agreed!

- says "E H Heusler" participated in activity P82_at_some_time_within "1859" but the note "1859 -" implies that’s when he started.
So better use P82a_begin_of_the_begin. Also, there's no type (xsd:gYear)
- says "Yeomanic Press" participated in activity P82_at_some_time_within "1914".
But this being an org, it’s fair to say this was the E66_Formation date of the org!
- says "Lancaster, Seth" was born in 1924. But the note says "1924 active", so this really was not his birth.
Don't know if you want to look for the word "active" in the note, this starts to smell of AI...
- says "Marie Louise of Tassis" was born 1630.
But the note "portrayed by Van Dyck (q.v.) c.1629/30." means she was painted 1629-1630, so probably was born earlier.
The note "1630 c. fl.", according to http://en.wikipedia.org/wiki/List_of_Latin_abbreviations, means
-- c., ca., ca or cca. (circa): Used in dates to indicate approximately.
-- fl., flor. (floruit) means the period of time during which a person, school, movement or even species was active or flourishing (literally, "he/she/it flourished")
- "'Amr ibn al-Layth" has note "879 - 900 :: ruled; AH 265-AH 287"
-- (Note: the AH time span expresses the same years, but in the Islamic/Muslim/Hijri calendar (AH), see http://en.wikipedia.org/wiki/Hijri_year)
-- you model this correctly as <activity/1/date>
crm:P81a_end_of_the_begin "0879-12-31"^^xsd:date;
crm:P81b_begin_of_the_end "0900-01-01"^^xsd:date;
crm:P82a_begin_of_the_begin "0879-01-01"^^xsd:date;
crm:P82b_end_of_the_end "0900-12-31"^^xsd:date;
-- but maybe it's better to model it as "Joined profession/ruler in 0879, and Left profession/ruler in 0900", as explained above
- says "'Abd al-Rahman" was active only within "798", but that's when he started
JM4: The problem with most of these suggestions rely things we cannot do:
# Distinguishing between orgs and people.
# Distinguishing between birth/death dates and dates of other activities - the text along with each date is the only identifying factor
# Recognising what the text for each date means (these are free text fields and there is a loose standard)
The logic as specified in the pseudo code above produces these triples so it's not perfect as you say but to more toward better RDF based on more intelligent logic is difficult due to the points above.