NHL API - Collect Rosters

Posted by Nick Paul on Tuesday, December 7, 2021

TOC

  1. Import Libraries
  2. Collect Team Info
  3. Collect Roster
  4. Collect Career Stats
  5. Collect All Players in a Season

In this post we will explore how to get roster data for each NHL team. We will then look at how to get career stats for each player on a particular roster. Finally, we will get career stats for all players from a particular season.

Import Libraries

import requests
import json
import pandas as pd

Collect Team Info

The first step is to collect data about NHL teams. This will allow us to focus on active NHL teams.


teams_url = "https://statsapi.web.nhl.com/api/v1/teams"
team_response = requests.get(teams_url)

team_content = json.loads(team_response.content)
df_teams = pd.json_normalize(team_content['teams'],
sep = "_")

df_teams.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 32 entries, 0 to 31
## Data columns (total 29 columns):
##  #   Column                 Non-Null Count  Dtype  
## ---  ------                 --------------  -----  
##  0   id                     32 non-null     int64  
##  1   name                   32 non-null     object 
##  2   link                   32 non-null     object 
##  3   abbreviation           32 non-null     object 
##  4   teamName               32 non-null     object 
##  5   locationName           32 non-null     object 
##  6   firstYearOfPlay        32 non-null     object 
##  7   shortName              32 non-null     object 
##  8   officialSiteUrl        32 non-null     object 
##  9   franchiseId            32 non-null     int64  
##  10  active                 32 non-null     bool   
##  11  venue_name             32 non-null     object 
##  12  venue_link             32 non-null     object 
##  13  venue_city             32 non-null     object 
##  14  venue_timeZone_id      32 non-null     object 
##  15  venue_timeZone_offset  32 non-null     int64  
##  16  venue_timeZone_tz      32 non-null     object 
##  17  division_id            32 non-null     int64  
##  18  division_name          32 non-null     object 
##  19  division_nameShort     32 non-null     object 
##  20  division_link          32 non-null     object 
##  21  division_abbreviation  32 non-null     object 
##  22  conference_id          32 non-null     int64  
##  23  conference_name        32 non-null     object 
##  24  conference_link        32 non-null     object 
##  25  franchise_franchiseId  32 non-null     int64  
##  26  franchise_teamName     32 non-null     object 
##  27  franchise_link         32 non-null     object 
##  28  venue_id               25 non-null     float64
## dtypes: bool(1), float64(1), int64(6), object(21)
## memory usage: 7.2+ KB

Collect Roster

Next, we create a function to get a team roster for a particular season.

df_teams['link'] = 'https://statsapi.web.nhl.com' + df_teams['link']
df_active = df_teams.loc[df_teams['active']==True]

def get_team_roster(team, season):

    base_url = df_active.loc[df_active['name']==team]['link'].iloc[0]
    print(base_url)
    url = base_url + "/roster/" + "?season=" + season
    
    response = requests.get(url)
    roster = response.json()["roster"]
    
    df_roster = pd.json_normalize(roster, sep = "_").astype(str)

    return df_roster
    
df_roster = get_team_roster("New York Rangers", '20192020')
## https://statsapi.web.nhl.com/api/v1/teams/3
df_roster
##    jerseyNumber person_id  ... position_type position_abbreviation
## 0            18   8471686  ...    Defenseman                     D
## 1             7   8474090  ...    Defenseman                     D
## 2            38   8474230  ...       Forward                    LW
## 3            20   8475184  ...       Forward                    LW
## 4            14   8475735  ...       Forward                     C
## 5            71   8475855  ...       Forward                    RW
## 6            29   8476396  ...       Forward                     C
## 7            16   8476458  ...       Forward                     C
## 8            93   8476459  ...       Forward                     C
## 9            92   8476480  ...       Forward                     C
## 10           34   8476858  ...       Forward                    LW
## 11           76   8476869  ...    Defenseman                     D
## 12            8   8476885  ...    Defenseman                     D
## 13           59   8476922  ...       Forward                     C
## 14           89   8477402  ...       Forward                    LW
## 15           77   8477950  ...    Defenseman                     D
## 16           48   8477962  ...       Forward                    LW
## 17           10   8478550  ...       Forward                    LW
## 18           23   8479323  ...    Defenseman                     D
## 19           55   8479324  ...    Defenseman                     D
## 20           15   8479328  ...       Forward                    RW
## 21           25   8479333  ...    Defenseman                     D
## 22           21   8479353  ...       Forward                     C
## 23           26   8479364  ...       Forward                    LW
## 24           17   8480072  ...       Forward                     C
## 25           72   8480078  ...       Forward                     C
## 26           24   8481554  ...       Forward                    RW
## 27           30   8468685  ...        Goalie                     G
## 28           31   8478048  ...        Goalie                     G
## 29           40   8480382  ...        Goalie                     G
## 
## [30 rows x 8 columns]

Collect Career Stats

Now that we have player IDs for each player on a team, lets create a function that gets year-by-year stats for a particular player.

from datetime import datetime
import numpy as np

def get_career_stats(player_id):

      
    url = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id + '/stats/?stats=yearByYear'
    response = requests.get(url)
    content = json.loads(response.content)['stats']
    splits = content[0]['splits']

    df_splits = (pd.json_normalize(splits, sep = "_" )
             .query('league_name == "National Hockey League"')
            )
    if df_splits.shape[0] > 0 :
    
        url_info = 'https://statsapi.web.nhl.com/api/v1/people/' + player_id
        response = requests.get(url_info)
        player_info = json.loads(response.content)['people'][0]

        if player_info['primaryPosition']['code'] != "G":
            df_splits['goals_per_game']=  df_splits['stat_goals']/df_splits['stat_games']
        df_splits['player_id'] = player_id
        df_splits['first_name'] = player_info['firstName']
        df_splits['last_name'] = player_info['lastName']
        df_splits['bday'] = pd.to_datetime(player_info['birthDate'])
        df_splits['season_end_yr'] = [x[4:8] for x in df_splits['season']]
        df_splits['season_start_yr'] = [x[0:4] for x in df_splits['season']]
        df_splits['season_start_dt'] =  [datetime.strptime(x + '0930', "%Y%m%d") for x in df_splits['season_start_yr']] 
        df_splits['age'] = (np.floor((df_splits['season_start_dt'] - df_splits['bday'])/ np.timedelta64(1,'Y') ))
        df_splits['age'] = df_splits['age'].astype(int)
    
    return df_splits
    
get_career_stats(player_id = '8478402')
##       season  sequenceNumber  stat_assists  ...  season_start_yr  season_start_dt  age
## 8   20152016               1            32  ...             2015       2015-09-30   18
## 11  20162017               1            70  ...             2016       2016-09-30   19
## 13  20172018               1            67  ...             2017       2017-09-30   20
## 15  20182019               1            75  ...             2018       2018-09-30   21
## 16  20192020               1            63  ...             2019       2019-09-30   22
## 17  20202021               1            72  ...             2020       2020-09-30   23
## 18  20212022               1            37  ...             2021       2021-09-30   24
## 
## [7 rows x 40 columns]

Now we can loop over the player ids from a roster and collect year-by-year stats for each player.

stats = []
for player_id in df_roster['person_id']:
    df = get_career_stats(player_id)
    stats.append(df)
df_all = pd.concat(stats)
df_all
##       season  ...  stat_evenStrengthSavePercentage
## 7   20072008  ...                              NaN
## 8   20082009  ...                              NaN
## 9   20092010  ...                              NaN
## 11  20102011  ...                              NaN
## 12  20112012  ...                              NaN
## ..       ...  ...                              ...
## 9   20172018  ...                        92.280702
## 11  20182019  ...                        91.399083
## 14  20192020  ...                        91.397849
## 15  20202021  ...                        91.044776
## 16  20212022  ...                        91.079812
## 
## [219 rows x 60 columns]

Collect All Players in a Season

Next, we can loop over every team and collect rosters for all active teams from a particular seaason.

rosters = []
season = '20212022'
for team in df_active['name']:
    df_roster = get_team_roster(team, season)
    rosters.append(df_roster)
## https://statsapi.web.nhl.com/api/v1/teams/1
## https://statsapi.web.nhl.com/api/v1/teams/2
## https://statsapi.web.nhl.com/api/v1/teams/3
## https://statsapi.web.nhl.com/api/v1/teams/4
## https://statsapi.web.nhl.com/api/v1/teams/5
## https://statsapi.web.nhl.com/api/v1/teams/6
## https://statsapi.web.nhl.com/api/v1/teams/7
## https://statsapi.web.nhl.com/api/v1/teams/8
## https://statsapi.web.nhl.com/api/v1/teams/9
## https://statsapi.web.nhl.com/api/v1/teams/10
## https://statsapi.web.nhl.com/api/v1/teams/12
## https://statsapi.web.nhl.com/api/v1/teams/13
## https://statsapi.web.nhl.com/api/v1/teams/14
## https://statsapi.web.nhl.com/api/v1/teams/15
## https://statsapi.web.nhl.com/api/v1/teams/16
## https://statsapi.web.nhl.com/api/v1/teams/17
## https://statsapi.web.nhl.com/api/v1/teams/18
## https://statsapi.web.nhl.com/api/v1/teams/19
## https://statsapi.web.nhl.com/api/v1/teams/20
## https://statsapi.web.nhl.com/api/v1/teams/21
## https://statsapi.web.nhl.com/api/v1/teams/22
## https://statsapi.web.nhl.com/api/v1/teams/23
## https://statsapi.web.nhl.com/api/v1/teams/24
## https://statsapi.web.nhl.com/api/v1/teams/25
## https://statsapi.web.nhl.com/api/v1/teams/26
## https://statsapi.web.nhl.com/api/v1/teams/28
## https://statsapi.web.nhl.com/api/v1/teams/29
## https://statsapi.web.nhl.com/api/v1/teams/30
## https://statsapi.web.nhl.com/api/v1/teams/52
## https://statsapi.web.nhl.com/api/v1/teams/53
## https://statsapi.web.nhl.com/api/v1/teams/54
## https://statsapi.web.nhl.com/api/v1/teams/55
df_all_rosters = pd.concat(rosters)
df_all_rosters
##    jerseyNumber person_id  ... position_type position_abbreviation
## 0            45   8473541  ...        Goalie                     G
## 1             7   8476462  ...    Defenseman                     D
## 2            44   8477425  ...       Forward                    LW
## 3            29   8478406  ...        Goalie                     G
## 4            20   8479415  ...       Forward                     C
## ..          ...       ...  ...           ...                   ...
## 22           29   8478407  ...    Defenseman                     D
## 23           55   8478468  ...    Defenseman                     D
## 24           22   8478891  ...       Forward                     C
## 25            8   8479985  ...    Defenseman                     D
## 26           67   8479987  ...       Forward                     C
## 
## [851 rows x 8 columns]

Finally, we can loop over all players to get year-by-year stats for all players during a particular seasn.

stats = []
for player_id in df_all_rosters['person_id']:
    print(player_id)
    df = get_career_stats(player_id)
    stats.append(df)

df_all_stats = pd.concat(stats)
df_all_stats
df_all_stats.to_csv("nhl_20212022_career_stats.csv")