TOC
- Import Libraries
- Collect Team Info
- Collect Roster
- Collect Career Stats
- Collect All Players in a Season
In this post we will explore how to get roster data for each NHL team. We will then look at how to get career stats for each player on a particular roster. Finally, we will get career stats for all players from a particular season.
Import Libraries
import requests
import json
import pandas as pd
Collect Team Info
The first step is to collect data about NHL teams. This will allow us to focus on active NHL teams.
teams_url = "https://statsapi.web.nhl.com/api/v1/teams"
team_response = requests.get(teams_url)
team_content = json.loads(team_response.content)
df_teams = pd.json_normalize(team_content['teams'],
sep = "_")
df_teams.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 32 entries, 0 to 31
## Data columns (total 29 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 id 32 non-null int64
## 1 name 32 non-null object
## 2 link 32 non-null object
## 3 abbreviation 32 non-null object
## 4 teamName 32 non-null object
## 5 locationName 32 non-null object
## 6 firstYearOfPlay 32 non-null object
## 7 shortName 32 non-null object
## 8 officialSiteUrl 32 non-null object
## 9 franchiseId 32 non-null int64
## 10 active 32 non-null bool
## 11 venue_name 32 non-null object
## 12 venue_link 32 non-null object
## 13 venue_city 32 non-null object
## 14 venue_timeZone_id 32 non-null object
## 15 venue_timeZone_offset 32 non-null int64
## 16 venue_timeZone_tz 32 non-null object
## 17 division_id 32 non-null int64
## 18 division_name 32 non-null object
## 19 division_nameShort 32 non-null object
## 20 division_link 32 non-null object
## 21 division_abbreviation 32 non-null object
## 22 conference_id 32 non-null int64
## 23 conference_name 32 non-null object
## 24 conference_link 32 non-null object
## 25 franchise_franchiseId 32 non-null int64
## 26 franchise_teamName 32 non-null object
## 27 franchise_link 32 non-null object
## 28 venue_id 25 non-null float64
## dtypes: bool(1), float64(1), int64(6), object(21)
## memory usage: 7.2+ KB
Collect Roster
Next, we create a function to get a team roster for a particular season.
df_teams['link'] = 'https://statsapi.web.nhl.com' + df_teams['link']
df_active = df_teams.loc[df_teams['active']==True]
def get_team_roster(team, season):
base_url = df_active.loc[df_active['name']==team]['link'].iloc[0]
print(base_url)
url = base_url + "/roster/" + "?season=" + season
response = requests.get(url)
roster = response.json()["roster"]
df_roster = pd.json_normalize(roster, sep = "_").astype(str)
return df_roster
df_roster = get_team_roster("New York Rangers", '20192020')
## https://statsapi.web.nhl.com/api/v1/teams/3
df_roster
## jerseyNumber person_id ... position_type position_abbreviation
## 0 18 8471686 ... Defenseman D
## 1 7 8474090 ... Defenseman D
## 2 38 8474230 ... Forward LW
## 3 20 8475184 ... Forward LW
## 4 14 8475735 ... Forward C
## 5 71 8475855 ... Forward RW
## 6 29 8476396 ... Forward C
## 7 16 8476458 ... Forward C
## 8 93 8476459 ... Forward C
## 9 92 8476480 ... Forward C
## 10 34 8476858 ... Forward LW
## 11 76 8476869 ... Defenseman D
## 12 8 8476885 ... Defenseman D
## 13 59 8476922 ... Forward C
## 14 89 8477402 ... Forward LW
## 15 77 8477950 ... Defenseman D
## 16 48 8477962 ... Forward LW
## 17 10 8478550 ... Forward LW
## 18 23 8479323 ... Defenseman D
## 19 55 8479324 ... Defenseman D
## 20 15 8479328 ... Forward RW
## 21 25 8479333 ... Defenseman D
## 22 21 8479353 ... Forward C
## 23 26 8479364 ... Forward LW
## 24 17 8480072 ... Forward C
## 25 72 8480078 ... Forward C
## 26 24 8481554 ... Forward RW
## 27 30 8468685 ... Goalie G
## 28 31 8478048 ... Goalie G
## 29 40 8480382 ... Goalie G
##
## [30 rows x 8 columns]
Collect Career Stats
Now that we have player IDs for each player on a team, lets create a function that gets year-by-year stats for a particular player.
from datetime import datetime
import numpy as np
def get_career_stats(player_id):
url = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id + '/stats/?stats=yearByYear'
response = requests.get(url)
content = json.loads(response.content)['stats']
splits = content[0]['splits']
df_splits = (pd.json_normalize(splits, sep = "_" )
.query('league_name == "National Hockey League"')
)
if df_splits.shape[0] > 0 :
url_info = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id
response = requests.get(url_info)
player_info = json.loads(response.content)['people'][0]
if player_info['primaryPosition']['code'] != "G":
df_splits['goals_per_game']= df_splits['stat_goals']/df_splits['stat_games']
df_splits['player_id'] = player_id
df_splits['first_name'] = player_info['firstName']
df_splits['last_name'] = player_info['lastName']
df_splits['bday'] = pd.to_datetime(player_info['birthDate'])
df_splits['season_end_yr'] = [x[4:8] for x in df_splits['season']]
df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
df_splits['season_start_dt'] = [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']]
df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['bday'])/ np.timedelta64(1,'Y') ))
df_splits['age'] = df_splits['age'].astype(int)
return df_splits
get_career_stats(player_id = '8478402')
## season sequenceNumber stat_assists ... season_start_yr season_start_dt age
## 8 20152016 1 32 ... 2015 2015-09-30 18
## 11 20162017 1 70 ... 2016 2016-09-30 19
## 13 20172018 1 67 ... 2017 2017-09-30 20
## 15 20182019 1 75 ... 2018 2018-09-30 21
## 16 20192020 1 63 ... 2019 2019-09-30 22
## 17 20202021 1 72 ... 2020 2020-09-30 23
## 18 20212022 1 37 ... 2021 2021-09-30 24
##
## [7 rows x 40 columns]
Now we can loop over the player ids from a roster and collect year-by-year stats for each player.
stats = []
for player_id in df_roster['person_id']:
df = get_career_stats(player_id)
stats.append(df)
df_all = pd.concat(stats)
df_all
## season ... stat_evenStrengthSavePercentage
## 7 20072008 ... NaN
## 8 20082009 ... NaN
## 9 20092010 ... NaN
## 11 20102011 ... NaN
## 12 20112012 ... NaN
## .. ... ... ...
## 9 20172018 ... 92.280702
## 11 20182019 ... 91.399083
## 14 20192020 ... 91.397849
## 15 20202021 ... 91.044776
## 16 20212022 ... 91.079812
##
## [219 rows x 60 columns]
Collect All Players in a Season
Next, we can loop over every team and collect rosters for all active teams from a particular seaason.
rosters = []
season = '20212022'
for team in df_active['name']:
df_roster = get_team_roster(team, season)
rosters.append(df_roster)
## https://statsapi.web.nhl.com/api/v1/teams/1
## https://statsapi.web.nhl.com/api/v1/teams/2
## https://statsapi.web.nhl.com/api/v1/teams/3
## https://statsapi.web.nhl.com/api/v1/teams/4
## https://statsapi.web.nhl.com/api/v1/teams/5
## https://statsapi.web.nhl.com/api/v1/teams/6
## https://statsapi.web.nhl.com/api/v1/teams/7
## https://statsapi.web.nhl.com/api/v1/teams/8
## https://statsapi.web.nhl.com/api/v1/teams/9
## https://statsapi.web.nhl.com/api/v1/teams/10
## https://statsapi.web.nhl.com/api/v1/teams/12
## https://statsapi.web.nhl.com/api/v1/teams/13
## https://statsapi.web.nhl.com/api/v1/teams/14
## https://statsapi.web.nhl.com/api/v1/teams/15
## https://statsapi.web.nhl.com/api/v1/teams/16
## https://statsapi.web.nhl.com/api/v1/teams/17
## https://statsapi.web.nhl.com/api/v1/teams/18
## https://statsapi.web.nhl.com/api/v1/teams/19
## https://statsapi.web.nhl.com/api/v1/teams/20
## https://statsapi.web.nhl.com/api/v1/teams/21
## https://statsapi.web.nhl.com/api/v1/teams/22
## https://statsapi.web.nhl.com/api/v1/teams/23
## https://statsapi.web.nhl.com/api/v1/teams/24
## https://statsapi.web.nhl.com/api/v1/teams/25
## https://statsapi.web.nhl.com/api/v1/teams/26
## https://statsapi.web.nhl.com/api/v1/teams/28
## https://statsapi.web.nhl.com/api/v1/teams/29
## https://statsapi.web.nhl.com/api/v1/teams/30
## https://statsapi.web.nhl.com/api/v1/teams/52
## https://statsapi.web.nhl.com/api/v1/teams/53
## https://statsapi.web.nhl.com/api/v1/teams/54
## https://statsapi.web.nhl.com/api/v1/teams/55
df_all_rosters = pd.concat(rosters)
df_all_rosters
## jerseyNumber person_id ... position_type position_abbreviation
## 0 45 8473541 ... Goalie G
## 1 7 8476462 ... Defenseman D
## 2 44 8477425 ... Forward LW
## 3 29 8478406 ... Goalie G
## 4 20 8479415 ... Forward C
## .. ... ... ... ... ...
## 22 29 8478407 ... Defenseman D
## 23 55 8478468 ... Defenseman D
## 24 22 8478891 ... Forward C
## 25 8 8479985 ... Defenseman D
## 26 67 8479987 ... Forward C
##
## [851 rows x 8 columns]
Finally, we can loop over all players to get year-by-year stats for all players during a particular seasn.
stats = []
for player_id in df_all_rosters['person_id']:
print(player_id)
df = get_career_stats(player_id)
stats.append(df)
df_all_stats = pd.concat(stats)
df_all_stats
df_all_stats.to_csv("nhl_20212022_career_stats.csv")