Cleaner APIs - Clone

Description

The cleaner API’s are a set of tools for standardizing raw data fields according to our internal PDL standards. They complement our Search API’s and are typically used for preprocessing and cleaning data fields before using them in a search query. We provide Cleaner API's for cleaning Company, School, and Location names.

Examples of Location Cleaning (using Location Cleaner):

Raw Location NameCleaned Location Name
"San Francisco"san francisco, california, united states
"dc"washington, district of columbia, united states
"USA"united states
"london"london, greater london, united kingdom

Therefore the cleaner API’s let you:

  • Clean company, school and location names before using them with our Search API’s
  • Easily clean and standardize your own raw data for your own use

Endpoints

The endpoint for the company cleaner api is https://api.peopledatalabs.com/v5/company/clean.
The endpoint for the school cleaner api is https://api.peopledatalabs.com/v5/school/clean.
The endpoint for the location cleaner api is https://api.peopledatalabs.com/v5/location/clean.

Cleaner API Access and Billing.

The Cleaners APIs are currently in production with restricted access for enterprise customers only. If you are looking for access to company data, check out our Company Enrichment Endpoint. If you'd like access to our school or location data reach out to us.

If you are a contracted customer, you have access to the cleaner APIs with your standard key(s). If you are a Data License customer, reach out to your Data Consultant or Customer Success Manager to get an API key.

Usage

PDL's cleaner APIs are designed to clean your company/school/location data so you can better query our person data. We use these cleaners for standardizing our data as part of our Data Build Process.

Requests

See Authentication and Requests to see possible ways to input requests. We recommend using the body of your request and will do so in the examples below.

Rate Limiting

The standard rate limit is 10/min and we have a standard total limit of 10,000/month -- free of charge. To increase your monthly limit above 10,000 reach out to your Data Consultant or Customer Success Manager to get an API key.

Endpoints

/company/clean

The endpoint for the company cleaner is https://api.peopledatalabs.com/v5/company/clean. GET requests only.

Parameters

You can query for companies with at least one of the parameters name, website and/or profile.
Valid Profiles are linkedin, facebook, and twitter.

For our Company Cleaner API requests we require a non-ambiguous match. Because name is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.

import requests

url = "https://api.peopledatalabs.com/v5/company/clean"

querystring = {"website":"peopledatalabs.com"}

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'x-api-key': "YOUR_API_KEY"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

Response

If a matching company is found a 200 response will be returned along with the company record in json. The schema of the response matches the company schema in the person record.

{
  "name": "people data labs",
  "size": "51-200",
  "id": "peopledatalabs",
  "founded": "2015",
  "industry": "computer software",
  "type": "private",
  "ticker": None,
  "location": {
    "name": "san francisco, california, united states",
    "locality": "san francisco",
    "region": "california",
    "country": "united states",
    "continent": "north america",
    "street_address": null,
    "address_line_2": null,
    "postal_code": null,
    "geo": "37.77,-122.41"
  },
  "linkedin_url": "linkedin.com/company/peopledatalabs",
  "linkedin_id": "18170482",
  "facebook_url": "facebook.com/peopledatalabs",
  "twitter_url": "twitter.com/peopledatalabs",
  "website": "peopledatalabs.com",
  "fuzzy_match": false
}

If a matching company is not found a 404 will be returned.

For querying the person schema you can use the the following field mapping

company fieldperson field
idjob_company_id, experience.company.id

/school/clean

The endpoint for the school cleaner is https://api.peopledatalabs.com/v5/school/clean. GET requests only. The current rate limit is 10 requests per minute.

Parameters

You can query for schools with at least one of the parameters name, website and/or profile.
Valid Profiles are linkedin, facebook, and twitter.

For our School Cleaner API requests we require a non-ambiguous match. Because name is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.

import requests

url = "https://api.peopledatalabs.com/v5/school/clean"

querystring = {"profile":"linkedin.com/school/ucla"}

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'x-api-key': "YOUR_API_KEY"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

Response

If a matching school is found a 200 response will be returned along with the cleaned school json.
The schema of the response matches the school schema in the person record.

{
  "name": "university of california, los angeles",
  "type": "post-secondary institution",
  "id": "72978d72-275a-49c8-b9b9-f227ccfb1361",
  "location": {
    "name": "los angeles, california, united states",
    "locality": "los angeles",
    "region": "california",
    "country": "united states",
    "continent": "north america"
  },
  "linkedin_url": "linkedin.com/school/ucla",
  "facebook_url": null,
  "twitter_url": null,
  "linkedin_id": "17950",
  "website": "ucla.edu",
  "domain": "ucla.edu"
}

If a matching school is not found a 404 will be returned.

For querying the person schema you can use the the following field mapping

school fieldperson field
ideducation.school.id

/location/clean

The endpoint for the location cleaner is https://api.peopledatalabs.com/v5/location/clean. GET requests only. The current rate limit is 10 requests per minute.

Parameters

To query the location cleaner use the parameter location with a string of the location info to be parsed

For our Location Cleaner API requests we require a non-ambiguous match. Because location is not a unique identifier, there are cases where we will return a 404 no match because we are not able to confidently determine one record returned from others with similar names.

import requests

url = "https://api.peopledatalabs.com/v5/location/clean"

querystring = {"location":"239 NW 13th Ave, Portland, Oregon 97209, US"}

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'x-api-key': "YOUR_API_KEY"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

Response

If the location is successfully parsed a 200 response will be returned along with the parsed location json. The schema of the response matches the location schema in the person record.

{
  "name": "portland, oregon, united states",
  "locality": "portland",
  "region": "oregon",
  "subregion": "multnomah county",
  "country": "united states",
  "continent": "north america",
  "type": "locality",
  "geo": "45.52,-122.67",
}

If the location parsing fails a 404 will be returned.

For querying the person schema you can use the the following field mapping

location fieldperson field
namejob_company_location_name, location_name, location_names, street_addresses.name, experience.location_names, experience.company.location.name, education.school.location.name

Note: the top-level "regions" field in the person schema contains strings built of the form "{region}, {country}" as per the output of the above

Example use

Company Cleaner API and Person Search

I want to find employees at a particular company, but don't have the PDL identifier for the company

import json
import requests

PDL_COMPANY_CLEANER_URL = "https://api.peopledatalabs.com/v5/company/clean"
PDL_PERSON_SEARCH_URL = "https://api.peopledatalabs.com/v5/person/search"

API_KEY = "####" # Enter your api key here

company_website = "peopledatalabs.com"

# Clean company then find people at that company:
pdl_employees = []

# Company Cleaning
querystring = { "website": company_website }

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'x-api-key': API_KEY
}

response = requests.request("GET", PDL_COMPANY_CLEANER_URL, headers=headers, params=querystring)

if response.status_code == 200:
    cleaned_company = response.json()
else:
    cleaned_company = {}
    print(f"Company Cleaner API Error for [{company_website}]: {response.text}")

# Person Search
company_employee_matches = {}

if cleaned_company:
    headers = {
        'Content-Type': "application/json",
        'X-api-key': API_KEY
    }

    ES_QUERY = {
        "query": {
            "bool": {
                "must": [
                    {"term": {"job_company_id": cleaned_company['id']}}
                ]
            }
        }
    }

    params = {
        'query': json.dumps(ES_QUERY),
        'size': 100
    }

    response = requests.get( PDL_PERSON_SEARCH_URL, headers=headers, params=params)

    if response.status_code == 200:
        company_employee_matches = response.json()['data']
    else:
        company_employee_matches = {}
        print(f"Person Search Error for [{company_website}]: {response.text}")


print(f"Found {len(company_employee_matches)} employee profiles at {company_website}")
import json
import requests

PDL_COMPANY_CLEANER_URL = "https://api.peopledatalabs.com/v5/company/clean"
PDL_PERSON_SEARCH_URL = "https://api.peopledatalabs.com/v5/person/search"

API_KEY = "####" # Enter your api key here

company_website = "peopledatalabs.com"

# Clean company then find people at that company:
pdl_employees = []

# Company Cleaning
querystring = { "website": company_website }

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'x-api-key': API_KEY
}

response = requests.request("GET", PDL_COMPANY_CLEANER_URL, headers=headers, params=querystring)

if response.status_code == 200:
    cleaned_company = response.json()
else:
    cleaned_company = {}
    print(f"Company Cleaner API Error for [{company_website}]: {response.text}")

# Person Search
company_employee_matches = {}

if cleaned_company:
    headers = {
        'Content-Type': "application/json",
        'X-api-key': API_KEY
    }

    SQL_QUERY = f"""
    SELECT * FROM person
    WHERE job_company_id = '{cleaned_company['id']}'
    """

    params = {
        'sql': SQL_QUERY,
        'size': 100
    }

    response = requests.get( PDL_PERSON_SEARCH_URL, headers=headers, params=params)

    if response.status_code == 200:
        company_employee_matches = response.json()['data']
    else:
        company_employee_matches = {}
        print(f"Person Search Error for [{company_website}]: {response.text}")


print(f"Found {len(company_employee_matches)} employee profiles at {company_website}")