{ "cells": [ { "cell_type": "markdown", "id": "807ba781", "metadata": { "papermill": { "duration": 0.034485, "end_time": "2023-06-06T00:09:17.035990", "exception": false, "start_time": "2023-06-06T00:09:17.001505", "status": "completed" }, "tags": [] }, "source": [ "*The data, concept, and initial implementation of this notebook was done in Colab by Ross Wightman, the creator of timm. I (Jeremy Howard) did some refactoring, curating, and expanding of the analysis, and added prose.*" ] }, { "cell_type": "markdown", "id": "cb51ad49", "metadata": { "papermill": { "duration": 0.03021, "end_time": "2023-06-06T00:09:17.097808", "exception": false, "start_time": "2023-06-06T00:09:17.067598", "status": "completed" }, "tags": [] }, "source": [ "## timm\n", "\n", "[PyTorch Image Models](https://timm.fast.ai/) (timm) is a wonderful library by Ross Wightman which provides state-of-the-art pre-trained computer vision models. It's like Huggingface Transformers, but for computer vision instead of NLP (and it's not restricted to transformers-based models)!\n", "\n", "Ross has been kind enough to help me understand how to best take advantage of this library by identifying the top models. I'm going to share here so of what I've learned from him, plus some additional ideas." ] }, { "cell_type": "markdown", "id": "248460f0", "metadata": { "papermill": { "duration": 0.030863, "end_time": "2023-06-06T00:09:17.159071", "exception": false, "start_time": "2023-06-06T00:09:17.128208", "status": "completed" }, "tags": [] }, "source": [ "## The data\n", "\n", "Ross regularly benchmarks new models as they are added to timm, and puts the results in a CSV in the project's GitHub repo. To analyse the data, we'll first clone the repo:" ] }, { "cell_type": "code", "execution_count": 1, "id": "cf208740", "metadata": { "papermill": { "duration": 2.565717, "end_time": "2023-06-06T00:09:19.755674", "exception": false, "start_time": "2023-06-06T00:09:17.189957", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/bin/sh: 1: git: not found\n", "[Errno 2] No such file or directory: 'pytorch-image-models/results'\n", "/workspace/Education\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.10/site-packages/IPython/core/magics/osm.py:393: UserWarning: using bookmarks requires you to install the `pickleshare` library.\n", " bkms = self.shell.db.get('bookmarks', {})\n" ] } ], "source": [ "! git clone --depth 1 https://github.com/rwightman/pytorch-image-models.git\n", "%cd pytorch-image-models/results" ] }, { "cell_type": "markdown", "id": "68fcf101", "metadata": { "papermill": { "duration": 0.037981, "end_time": "2023-06-06T00:09:19.832345", "exception": false, "start_time": "2023-06-06T00:09:19.794364", "status": "completed" }, "tags": [] }, "source": [ "Using Pandas, we can read the two CSV files we need, and merge them together." ] }, { "cell_type": "code", "execution_count": 2, "id": "df355a97", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:19.907795Z", "iopub.status.busy": "2023-06-06T00:09:19.907457Z", "iopub.status.idle": "2023-06-06T00:09:19.926381Z", "shell.execute_reply": "2023-06-06T00:09:19.925291Z" }, "papermill": { "duration": 0.058692, "end_time": "2023-06-06T00:09:19.929054", "exception": false, "start_time": "2023-06-06T00:09:19.870362", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "df_results = pd.read_csv('results-imagenet.csv')" ] }, { "cell_type": "code", "execution_count": 3, "id": "4e25867e", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:20.006128Z", "iopub.status.busy": "2023-06-06T00:09:20.005747Z", "iopub.status.idle": "2023-06-06T00:09:20.022803Z", "shell.execute_reply": "2023-06-06T00:09:20.021657Z" }, "papermill": { "duration": 0.059181, "end_time": "2023-06-06T00:09:20.025310", "exception": false, "start_time": "2023-06-06T00:09:19.966129", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "df_results['model_org'] = df_results['model'] \n", "df_results['model'] = df_results['model'].str.split('.').str[0]" ] }, { "cell_type": "markdown", "id": "b45c1928", "metadata": { "papermill": { "duration": 0.035119, "end_time": "2023-06-06T00:09:20.097374", "exception": false, "start_time": "2023-06-06T00:09:20.062255", "status": "completed" }, "tags": [] }, "source": [ "We'll also add a \"family\" column that will allow us to group architectures into categories with similar characteristics:\n", "\n", "Ross has told me which models he's found the most usable in practice, so I'll limit the charts to just look at these. (I also include VGG, not because it's good, but as a comparison to show how far things have come in the last few years.)" ] }, { "cell_type": "code", "execution_count": 4, "id": "baa099eb", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:20.172024Z", "iopub.status.busy": "2023-06-06T00:09:20.171675Z", "iopub.status.idle": "2023-06-06T00:09:20.181011Z", "shell.execute_reply": "2023-06-06T00:09:20.180119Z" }, "papermill": { "duration": 0.049639, "end_time": "2023-06-06T00:09:20.183317", "exception": false, "start_time": "2023-06-06T00:09:20.133678", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "def get_data(part, col):\n", " df = pd.read_csv(f'benchmark-{part}-amp-nhwc-pt111-cu113-rtx3090.csv').merge(df_results, on='model')\n", " df['secs'] = 1. / df[col]\n", " df['family'] = df.model.str.extract('^([a-z]+?(?:v2)?)(?:\\d|_|$)')\n", " df = df[~df.model.str.endswith('gn')]\n", " df.loc[df.model.str.contains('in22'),'family'] = df.loc[df.model.str.contains('in22'),'family'] + '_in22'\n", " df.loc[df.model.str.contains('resnet.*d'),'family'] = df.loc[df.model.str.contains('resnet.*d'),'family'] + 'd'\n", " return df[df.family.str.contains('^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg|swin')]" ] }, { "cell_type": "code", "execution_count": 5, "id": "71215c1e", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:20.258526Z", "iopub.status.busy": "2023-06-06T00:09:20.257452Z", "iopub.status.idle": "2023-06-06T00:09:20.299124Z", "shell.execute_reply": "2023-06-06T00:09:20.298113Z" }, "papermill": { "duration": 0.082408, "end_time": "2023-06-06T00:09:20.301660", "exception": false, "start_time": "2023-06-06T00:09:20.219252", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "df = get_data('infer', 'infer_samples_per_sec')" ] }, { "cell_type": "markdown", "id": "16c47388", "metadata": { "papermill": { "duration": 0.03543, "end_time": "2023-06-06T00:09:20.372280", "exception": false, "start_time": "2023-06-06T00:09:20.336850", "status": "completed" }, "tags": [] }, "source": [ "## Inference results" ] }, { "cell_type": "markdown", "id": "8d097dee", "metadata": { "papermill": { "duration": 0.035859, "end_time": "2023-06-06T00:09:20.444242", "exception": false, "start_time": "2023-06-06T00:09:20.408383", "status": "completed" }, "tags": [] }, "source": [ "Here's the results for inference performance (see the last section for training performance). In this chart:\n", "\n", "- the x axis shows how many seconds it takes to process one image (**note**: it's a log scale)\n", "- the y axis is the accuracy on Imagenet\n", "- the size of each bubble is proportional to the size of images used in testing\n", "- the color shows what \"family\" the architecture is from.\n", "\n", "Hover your mouse over a marker to see details about the model. Double-click in the legend to display just one family. Single-click in the legend to show or hide a family.\n", "\n", "**Note**: on my screen, Kaggle cuts off the family selector and some plotly functionality -- to see the whole thing, collapse the table of contents on the right by clicking the little arrow to the right of \"*Contents*\"." ] }, { "cell_type": "code", "execution_count": 6, "id": "4c2a97d1", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:20.518520Z", "iopub.status.busy": "2023-06-06T00:09:20.518198Z", "iopub.status.idle": "2023-06-06T00:09:22.319635Z", "shell.execute_reply": "2023-06-06T00:09:22.318516Z" }, "papermill": { "duration": 1.841847, "end_time": "2023-06-06T00:09:22.322215", "exception": false, "start_time": "2023-06-06T00:09:20.480368", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "import plotly.express as px\n", "w,h = 1000,800\n", "\n", "def show_all(df, title, size):\n", " return px.scatter(df, width=w, height=h, size=df[size]**2, title=title,\n", " x='secs', y='top1', log_x=True, color='family', hover_name='model_org', hover_data=[size])" ] }, { "cell_type": "code", "execution_count": 7, "id": "26c23a28", "metadata": { "execution": { "iopub.execute_input": "2023-06-06T00:09:22.397315Z", "iopub.status.busy": "2023-06-06T00:09:22.397009Z", "iopub.status.idle": "2023-06-06T00:09:23.870864Z", "shell.execute_reply": "2023-06-06T00:09:23.869937Z" }, "papermill": { "duration": 1.515062, "end_time": "2023-06-06T00:09:23.873393", "exception": false, "start_time": "2023-06-06T00:09:22.358331", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "