NHL API - Collect Career Stats

Posted by Nick Paul on Friday, December 3, 2021

TOC

  1. Import Libraries
  2. Find Player ID
  3. Collect Data

The NHL has an unpublished API that makes grabbing stats pretty painless. In this post we will walk through how to grab year-by-year stats for a particular player using python.

Import Libraries

To get started, we will import the following libraries:

  • requests - Conduct the API call
  • json - Convert the returned data to a python list
  • pandas - Conduct data manipulation
import requests
import json
import pandas as pd

Find Player ID

To collect player data from the NHL API, we first need their player ID, which can be found using the NHL’s suggest end point. Below is a simple example to get Connor McDavid's player ID.

base_url = 'https://suggest.svc.nhl.com/svc/suggest/v1/minplayers/'

first_name = 'connor'
last_name = 'mcdavid'
num_to_return = '1'

full_url = base_url + first_name + '%20' + last_name + '/' + num_to_return
response = requests.get(full_url)
suggestions = json.loads(response.content)['suggestions'][0]
print(suggestions)
## 8478402|McDavid|Connor|1|0|6' 1"|193|Richmond Hill|ON|CAN|1997-01-13|EDM|C|97|connor-mcdavid-8478402
player_info = str.split(suggestions, "|")
print(player_info)
## ['8478402', 'McDavid', 'Connor', '1', '0', '6\' 1"', '193', 'Richmond Hill', 'ON', 'CAN', '1997-01-13', 'EDM', 'C', '97', 'connor-mcdavid-8478402']
player_id = player_info[0]
player_id
## '8478402'

This endpoint returns a string that we can split and pull out information such as name and birthday.

Career Data

The next step is to hit the stats api using the player ID we just acquired.

url = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id + '/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']

df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )
df_splits
##       season  sequenceNumber  ...  stat_blocked  stat_shifts
## 8   20152016               1  ...          10.0       1030.0
## 11  20162017               1  ...          29.0       1998.0
## 13  20172018               1  ...          46.0       1940.0
## 15  20182019               1  ...          30.0       1998.0
## 16  20192020               1  ...          18.0       1353.0
## 17  20202021               1  ...          24.0       1182.0
## 18  20212022               1  ...          13.0        860.0
## 
## [7 rows x 31 columns]

We now have Connor McDavid’s career stats! While McDavid’s stats returned one row per year, be aware that a player that was traded during a season will have multiple rows for a particular year.

Create a Function

Since we will likely reuse this code, we should write a function that handles the two previous steps: get a player ID and get career stats.

from datetime import datetime
import numpy as np

def get_career_stats(first_name, last_name):

    base_url = 'https://suggest.svc.nhl.com/svc/suggest/v1/minplayers/'

    num_to_return = '1'

    full_url = base_url + first_name + '%20' + last_name + '/' + num_to_return

    response = requests.get(full_url)
    suggestion = json.loads(response.content)['suggestions'][0]
    player_info = str.split(suggestion, "|")
    player_id = player_info[0]
    
    url = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id + '/stats/?stats=yearByYear'
    response = requests.get(url)
    content = json.loads(response.content)['stats']
    splits = content[0]['splits']

    df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )
    
    df_splits['goals_per_game']=  df_splits['stat_goals']/df_splits['stat_games']
    df_splits['player_id'] = player_id
    df_splits['first_name'] = player_info[2]
    df_splits['last_name'] = player_info[1]
    df_splits['bday'] = pd.to_datetime(player_info[10])
    df_splits['season_end'] = [x[4:8] for x in df_splits['season']]
    df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
    df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
    df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['bday'])/ np.timedelta64(1,'Y') ))
    df_splits['age'] = df_splits['age'].astype(int)
    
    return df_splits
get_career_stats('connor', 'mcdavid')
##       season  sequenceNumber  stat_assists  ...  season_start_yr  season_start_dt  age
## 8   20152016               1            32  ...             2015       2015-09-30   18
## 11  20162017               1            70  ...             2016       2016-09-30   19
## 13  20172018               1            67  ...             2017       2017-09-30   20
## 15  20182019               1            75  ...             2018       2018-09-30   21
## 16  20192020               1            63  ...             2019       2019-09-30   22
## 17  20202021               1            72  ...             2020       2020-09-30   23
## 18  20212022               1            37  ...             2021       2021-09-30   24
## 
## [7 rows x 40 columns]

Now we can just enter a players name and pull their stats, which is very useful if we want to pull data for multiple players and compare their numbers.

df_ovi = get_career_stats('connor', 'mcdavid')
df_mcdavid = get_career_stats('alex', 'ovech')

df_compare = pd.concat([df_ovi, df_mcdavid])

df_compare.groupby(['first_name', 'last_name'])['stat_goals'].sum()
## first_name  last_name
## Alex        Ovechkin     759
## Connor      McDavid      217
## Name: stat_goals, dtype: int64