People data map
Data Sources
The main sources of PatentPia people (inventors, paper authors, SW developers, etc.) data include i) patent data for over 100 million patents worldwide, ii) paper data for over 250 million papers, and iii) developer data for over 10 million GitHub SW (repositories).
Specific to people related documentsets
Documentsets related to a person are specified as i) if the person is an inventor of patents, the patent documents that the person invented, ii) if the person is an author of papers, the papers that the person participated in authorship (first author participation is coming soon), and iii) if the person is a SW developer, the SW (repository) that the person participated in.
People data utilization scheme
The utilization scheme for people data can be summarized as follows. The scheme below is centered on patent inventors (researchers), but it can also be applied to paper authors or SW developers.
People-related analysis contents scheme
To support the people data utilization scheme, the following analysis contents can be provided for individual people.
For each individual inventor (researcher), if the data scheme as shown in the figure above is in place, a comparison analysis of inventors (researchers) by specified categories is possible. Representative categories include i) field, ii) company/organization, and iii) field of the company/organization. Fields include things like i) keywords, ii) patent classifications such as CPC, iii) technology categories, etc. These fields are organized by attributes, which can be things like technologies, products-parts, products-materials, diseases, etc.
For paper authors, the authorship, forward citations, share and concentration, and keywords and related researchers part is equivalent to inventors of patents. However, papers do not have events such as transactions/litigations, etc., nor do they have patent-specific data such as patent families.
For SW developers, there are unique schemes such as stars, forks, etc. that can correspond to forward citations. On the other hand, keywords and related developers are equally applicable for SW developers.
Below is an example of analysis contents related to people.
The example is for key researchers of invented patents related to field of augmented reality display.
The kinds of comparative analysis that can be done by inventors for a specific technology field are: i) index comparison, ii) trend comparison, iii) citation comparison, iv) patent families comparison, v) quality comparison, vi) comparison by type of events, vii) share comparison, and viii) concentration comparison.
Reference links related to people contents
The following links are related to inventor contents.
People data processing
Here is the process for handling people data
Difficulty identifying people
The nation operates by assigning identifiers to people (e.g., social security numbers, etc.), but these identifiers are not open to individuals. As a result, it is nearly impossible to identify a person's uniqueness based on their first name as expressed in data.
There are other challenges in identifying a person, such as i) homonyms (two people with the same first name, but actually different people), ii) differences in first name representation in two languages (especially middle names), etc.
Identifying people in PatentPia
Identifying inventors in patent data
In PatentPia, the unit of inventor identification is 'Applicant & Inventor Notation'. Within the same application, we assume that there are no identical people, and if the first name notation is the same, but the applicant is different, we assume that they are different inventors. Amendments via address, etc. could be considered, but identifying/differentiating people using address data also does more harm than good due to side issues such as moving (address transferred). You could also compensate via Experience (moving organizations) on LinkedIn, etc., but this also has a number of problems.
Mapping person sameness across patents vs. papers
PatentPia is researching the matching between inventors and paper authors as represented in patent data.