The NHL has an unpublished API that makes grabbing stats pretty painless. In this post we will walk through how to grab year-by-year stats for a particular player using python.
Import Libraries
To get started, we will import the following libraries:
- Conduct the API calljson
- Convert the returned data to a python listpandas
- Conduct data manipulation
import requests
import json
import pandas as pd
Find Player ID
To collect player data from the NHL API, we first need their player ID, which can be found using the NHL’s suggest end point. Below is a simple example to get Connor McDavid's
player ID.
base_url = ''
first_name = 'connor'
last_name = 'mcdavid'
num_to_return = '1'
full_url = base_url + first_name + '%20' + last_name + '/' + num_to_return
response = requests.get(full_url)
suggestions = json.loads(response.content)['suggestions'][0]
## 8478402|McDavid|Connor|1|0|6' 1"|193|Richmond Hill|ON|CAN|1997-01-13|EDM|C|97|connor-mcdavid-8478402
player_info = str.split(suggestions, "|")
## ['8478402', 'McDavid', 'Connor', '1', '0', '6\' 1"', '193', 'Richmond Hill', 'ON', 'CAN', '1997-01-13', 'EDM', 'C', '97', 'connor-mcdavid-8478402']
player_id = player_info[0]
## '8478402'
This endpoint returns a string that we can split and pull out information such as name and birthday.
Career Data
The next step is to hit the stats api using the player ID we just acquired.
url = '' + player_id + '/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']
df_splits = (pd.json_normalize(splits, sep = "_" )
.query('league_name == "National Hockey League"')
## season sequenceNumber ... stat_blocked stat_shifts
## 8 20152016 1 ... 10.0 1030.0
## 11 20162017 1 ... 29.0 1998.0
## 13 20172018 1 ... 46.0 1940.0
## 15 20182019 1 ... 30.0 1998.0
## 16 20192020 1 ... 18.0 1353.0
## 17 20202021 1 ... 24.0 1182.0
## 18 20212022 1 ... 13.0 860.0
## [7 rows x 31 columns]
We now have Connor McDavid’s career stats! While McDavid’s stats returned one row per year, be aware that a player that was traded during a season will have multiple rows for a particular year.
Create a Function
Since we will likely reuse this code, we should write a function that handles the two previous steps: get a player ID and get career stats.
from datetime import datetime
import numpy as np
def get_career_stats(first_name, last_name):
base_url = ''
num_to_return = '1'
full_url = base_url + first_name + '%20' + last_name + '/' + num_to_return
response = requests.get(full_url)
suggestion = json.loads(response.content)['suggestions'][0]
player_info = str.split(suggestion, "|")
player_id = player_info[0]
url = '' + player_id + '/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']
df_splits = (pd.json_normalize(splits, sep = "_" )
.query('league_name == "National Hockey League"')
df_splits['goals_per_game']= df_splits['stat_goals']/df_splits['stat_games']
df_splits['player_id'] = player_id
df_splits['first_name'] = player_info[2]
df_splits['last_name'] = player_info[1]
df_splits['bday'] = pd.to_datetime(player_info[10])
df_splits['season_end'] = [x[4:8] for x in df_splits['season']]
df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
df_splits['season_start_dt'] = [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']]
df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['bday'])/ np.timedelta64(1,'Y') ))
df_splits['age'] = df_splits['age'].astype(int)
return df_splits
get_career_stats('connor', 'mcdavid')
## season sequenceNumber stat_assists ... season_start_yr season_start_dt age
## 8 20152016 1 32 ... 2015 2015-09-30 18
## 11 20162017 1 70 ... 2016 2016-09-30 19
## 13 20172018 1 67 ... 2017 2017-09-30 20
## 15 20182019 1 75 ... 2018 2018-09-30 21
## 16 20192020 1 63 ... 2019 2019-09-30 22
## 17 20202021 1 72 ... 2020 2020-09-30 23
## 18 20212022 1 37 ... 2021 2021-09-30 24
## [7 rows x 40 columns]
Now we can just enter a players name and pull their stats, which is very useful if we want to pull data for multiple players and compare their numbers.
df_ovi = get_career_stats('connor', 'mcdavid')
df_mcdavid = get_career_stats('alex', 'ovech')
df_compare = pd.concat([df_ovi, df_mcdavid])
df_compare.groupby(['first_name', 'last_name'])['stat_goals'].sum()
## first_name last_name
## Alex Ovechkin 759
## Connor McDavid 217
## Name: stat_goals, dtype: int64