Cleaner APIs - Clone
Description
The cleaner API’s are a set of tools for standardizing raw data fields according to our internal PDL standards. They complement our Search API’s and are typically used for preprocessing and cleaning data fields before using them in a search query. We provide Cleaner API's for cleaning Company, School, and Location names.
Examples of Location Cleaning (using Location Cleaner):
Raw Location Name | Cleaned Location Name |
---|---|
"San Francisco" | san francisco, california, united states |
"dc" | washington, district of columbia, united states |
"USA" | united states |
"london" | london, greater london, united kingdom |
Therefore the cleaner API’s let you:
- Clean company, school and location names before using them with our Search API’s
- Easily clean and standardize your own raw data for your own use
Endpoints
The endpoint for the company cleaner api is https://api.peopledatalabs.com/v5/company/clean
.
The endpoint for the school cleaner api is https://api.peopledatalabs.com/v5/school/clean
.
The endpoint for the location cleaner api is https://api.peopledatalabs.com/v5/location/clean
.
Cleaner API Access and Billing.
The Cleaners APIs are currently in production with restricted access for enterprise customers only. If you are looking for access to company data, check out our Company Enrichment Endpoint. If you'd like access to our school or location data reach out to us.
If you are a contracted customer, you have access to the cleaner APIs with your standard key(s). If you are a Data License customer, reach out to your Data Consultant or Customer Success Manager to get an API key.
Usage
PDL's cleaner APIs are designed to clean your company/school/location data so you can better query our person data. We use these cleaners for standardizing our data as part of our Data Build Process.
Requests
See Authentication and Requests to see possible ways to input requests. We recommend using the body of your request and will do so in the examples below.
Rate Limiting
The standard rate limit is 10/min and we have a standard total limit of 10,000/month -- free of charge. To increase your monthly limit above 10,000 reach out to your Data Consultant or Customer Success Manager to get an API key.
Endpoints
/company/clean
The endpoint for the company cleaner is https://api.peopledatalabs.com/v5/company/clean
. GET requests only.
Parameters
You can query for companies with at least one of the parameters name
, website
and/or profile
.
Valid Profiles are linkedin, facebook, and twitter.
For our Company Cleaner API requests we require a non-ambiguous match. Because name
is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.
import requests
url = "https://api.peopledatalabs.com/v5/company/clean"
querystring = {"website":"peopledatalabs.com"}
headers = {
'accept': "application/json",
'content-type': "application/json",
'x-api-key': "YOUR_API_KEY"
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
Response
If a matching company is found a 200
response will be returned along with the company record in json. The schema of the response matches the company schema in the person record.
{
"name": "people data labs",
"size": "51-200",
"id": "peopledatalabs",
"founded": "2015",
"industry": "computer software",
"type": "private",
"ticker": None,
"location": {
"name": "san francisco, california, united states",
"locality": "san francisco",
"region": "california",
"country": "united states",
"continent": "north america",
"street_address": null,
"address_line_2": null,
"postal_code": null,
"geo": "37.77,-122.41"
},
"linkedin_url": "linkedin.com/company/peopledatalabs",
"linkedin_id": "18170482",
"facebook_url": "facebook.com/peopledatalabs",
"twitter_url": "twitter.com/peopledatalabs",
"website": "peopledatalabs.com",
"fuzzy_match": false
}
If a matching company is not found a 404
will be returned.
For querying the person schema you can use the the following field mapping
company field | person field |
---|---|
id | job_company_id, experience.company.id |
/school/clean
The endpoint for the school cleaner is https://api.peopledatalabs.com/v5/school/clean
. GET requests only. The current rate limit is 10 requests per minute.
Parameters
You can query for schools with at least one of the parameters name
, website
and/or profile
.
Valid Profiles are linkedin, facebook, and twitter.
For our School Cleaner API requests we require a non-ambiguous match. Because name
is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.
import requests
url = "https://api.peopledatalabs.com/v5/school/clean"
querystring = {"profile":"linkedin.com/school/ucla"}
headers = {
'accept': "application/json",
'content-type': "application/json",
'x-api-key': "YOUR_API_KEY"
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
Response
If a matching school is found a 200 response will be returned along with the cleaned school json.
The schema of the response matches the school schema in the person record.
{
"name": "university of california, los angeles",
"type": "post-secondary institution",
"id": "72978d72-275a-49c8-b9b9-f227ccfb1361",
"location": {
"name": "los angeles, california, united states",
"locality": "los angeles",
"region": "california",
"country": "united states",
"continent": "north america"
},
"linkedin_url": "linkedin.com/school/ucla",
"facebook_url": null,
"twitter_url": null,
"linkedin_id": "17950",
"website": "ucla.edu",
"domain": "ucla.edu"
}
If a matching school is not found a 404
will be returned.
For querying the person schema you can use the the following field mapping
school field | person field |
---|---|
id | education.school.id |
/location/clean
The endpoint for the location cleaner is https://api.peopledatalabs.com/v5/location/clean
. GET requests only. The current rate limit is 10 requests per minute.
Parameters
To query the location cleaner use the parameter location
with a string of the location info to be parsed
For our Location Cleaner API requests we require a non-ambiguous match. Because location
is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.
import requests
url = "https://api.peopledatalabs.com/v5/location/clean"
querystring = {"location":"239 NW 13th Ave, Portland, Oregon 97209, US"}
headers = {
'accept': "application/json",
'content-type': "application/json",
'x-api-key': "YOUR_API_KEY"
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
Response
If the location is successfully parsed a 200 response will be returned along with the parsed location json. The schema of the response matches the location schema in the person record.
{
"name": "portland, oregon, united states",
"locality": "portland",
"region": "oregon",
"subregion": "multnomah county",
"country": "united states",
"continent": "north america",
"type": "locality",
"geo": "45.52,-122.67",
}
If the location parsing fails a 404
will be returned.
For querying the person schema you can use the the following field mapping
location field | person field |
---|---|
name | job_company_location_name, location_name, location_names, street_addresses.name, experience.location_names, experience.company.location.name, education.school.location.name |
Note: the top-level "regions" field in the person schema contains strings built of the form "{region}, {country}" as per the output of the above
Example use
Company Cleaner API and Person Search
I want to find employees at a particular company, but don't have the PDL identifier for the company
import json
import requests
PDL_COMPANY_CLEANER_URL = "https://api.peopledatalabs.com/v5/company/clean"
PDL_PERSON_SEARCH_URL = "https://api.peopledatalabs.com/v5/person/search"
API_KEY = "####" # Enter your api key here
company_website = "peopledatalabs.com"
# Clean company then find people at that company:
pdl_employees = []
# Company Cleaning
querystring = { "website": company_website }
headers = {
'accept': "application/json",
'content-type': "application/json",
'x-api-key': API_KEY
}
response = requests.request("GET", PDL_COMPANY_CLEANER_URL, headers=headers, params=querystring)
if response.status_code == 200:
cleaned_company = response.json()
else:
cleaned_company = {}
print(f"Company Cleaner API Error for [{company_website}]: {response.text}")
# Person Search
company_employee_matches = {}
if cleaned_company:
headers = {
'Content-Type': "application/json",
'X-api-key': API_KEY
}
ES_QUERY = {
"query": {
"bool": {
"must": [
{"term": {"job_company_id": cleaned_company['id']}}
]
}
}
}
params = {
'query': json.dumps(ES_QUERY),
'size': 100
}
response = requests.get( PDL_PERSON_SEARCH_URL, headers=headers, params=params)
if response.status_code == 200:
company_employee_matches = response.json()['data']
else:
company_employee_matches = {}
print(f"Person Search Error for [{company_website}]: {response.text}")
print(f"Found {len(company_employee_matches)} employee profiles at {company_website}")
import json
import requests
PDL_COMPANY_CLEANER_URL = "https://api.peopledatalabs.com/v5/company/clean"
PDL_PERSON_SEARCH_URL = "https://api.peopledatalabs.com/v5/person/search"
API_KEY = "####" # Enter your api key here
company_website = "peopledatalabs.com"
# Clean company then find people at that company:
pdl_employees = []
# Company Cleaning
querystring = { "website": company_website }
headers = {
'accept': "application/json",
'content-type': "application/json",
'x-api-key': API_KEY
}
response = requests.request("GET", PDL_COMPANY_CLEANER_URL, headers=headers, params=querystring)
if response.status_code == 200:
cleaned_company = response.json()
else:
cleaned_company = {}
print(f"Company Cleaner API Error for [{company_website}]: {response.text}")
# Person Search
company_employee_matches = {}
if cleaned_company:
headers = {
'Content-Type': "application/json",
'X-api-key': API_KEY
}
SQL_QUERY = f"""
SELECT * FROM person
WHERE job_company_id = '{cleaned_company['id']}'
"""
params = {
'sql': SQL_QUERY,
'size': 100
}
response = requests.get( PDL_PERSON_SEARCH_URL, headers=headers, params=params)
if response.status_code == 200:
company_employee_matches = response.json()['data']
else:
company_employee_matches = {}
print(f"Person Search Error for [{company_website}]: {response.text}")
print(f"Found {len(company_employee_matches)} employee profiles at {company_website}")
Updated over 1 year ago