Transfer Learning with Hugging Face

Last Updated : 14 Apr, 2026

Transfer learning is a technique where pre-trained models are adapted for specific tasks using smaller, task-specific datasets. It helps leverage knowledge learned from large datasets to improve efficiency and performance.

  • Reduces training time and computational cost
  • Improves performance, especially with limited data
  • Enables reuse of learned patterns across tasks
  • Widely used in NLP with tools and models from Hugging Face

Hugging Face Features

Hugging Face provides tools and resources that make transfer learning easier and more efficient.

  • Pre-trained Models: Hugging Face offers a wide range of pre-trained models trained on large datasets, ready for fine-tuning across tasks like text classification and question answering, saving time and effort.
  • Datasets Library: Provides a collection of curated datasets for various NLP tasks, making it easier to find suitable data and improve model performance.
  • Simple API: Includes a user-friendly API that simplifies working with transformer models and reduces implementation complexity.
  • Model Hub: A community-driven platform to share and access pre-trained models, making it easier to find models for specific tasks.

Implementation

Here we will fine-tune a DistilBERT model on the AG News dataset to classify news articles into one of four categories: World, Sports, Business or Sci/Tech. Let's see the various steps involved in this implementation.

Step 1: Setup and Installing Necessary Libraries

Before we start, we need to ensure that the required libraries are installed.

!pip install transformers datasets evaluate -U

Step 2: Log In to Hugging Face Hub

Before uploading the model to Hugging Face, we must authenticate our session using our Hugging Face account. This step allows us to store models on the Hugging Face Model Hub making it easy to share and deploy models.

Python
from huggingface_hub import notebook_login
notebook_login()

Step 3: Importing Required Libraries

Here we will be importing all the necessary modules for loading the dataset, training the model and evaluating performance.

Python
from datasets import load_dataset
from transformers import Trainer
from transformers import AutoTokenizer
from transformers import pipeline
from transformers import TrainingArguments
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
import numpy as np
import evaluate

Step 4: Loading the Dataset

Now we'll load the AG News dataset which is available in Hugging Face’s datasets library. This dataset contains news articles that will be used for multi-class classification (World, Sports, Business, Sci/Tech).

Python
dataset = load_dataset("ag_news")

Step 5: Loading Tokenizer and Tokenizing the Dataset

We load the pre-trained DistilBERT tokenizer which converts text into numerical tokens that the model can process. Next we apply the tokenizer to the dataset, ensuring that each article is tokenized into input IDs.

  • token_data = dataset.map(tokenize_function, batched=True): Applies the tokenize_function to the entire dataset in batches to tokenize the text.
  • batch_collator = DataCollatorWithPadding(tokenizer=tokenizer): Initializes a batch collator that will dynamically pad the sequences in a batch so they have consistent lengths.
Python
pretrained_model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)

def tokenize_function(row):
    return tokenizer(row["text"], truncation=True)

token_data = dataset.map(tokenize_function, batched=True)
token_data = token_data.remove_columns(["text"])
token_data = token_data.rename_column("label", "labels")
token_data.set_format("torch")

small_train_dataset = token_data["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = token_data["test"].shuffle(seed=42).select(range(1000))
batch_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Output:

tlh1
Mapping function

Step 6: Preparing the Model for Fine-Tuning

Next we load the pre-trainedmodel for sequence classification. This model will be fine-tuned to classify AG News articles into one of the four categories. We specify the number of classes based on the dataset.

  • model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name, labels=labels): Loads the pre-trained DistilBERT model and specifies the number of output labels for multi-class classification.
Python
labels = dataset["train"].features["label"].num_classes
model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name,
    num_labels=labels
)

Step 7: Defining the Evaluation Metric

Here we set up the evaluation metric (accuracy) to assess the performance of the model during training and evaluation. The function compute_metrics will compare the model’s predictions to the true labels and calculate accuracy.

Python
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

Step 8: Defining Training Arguments

Now we define the training configuration. These parameters control how the model is trained such as the learning rate, batch size, number of epochs and strategies for saving and evaluating the model. We also enable mixed precision (fp16=True) to optimize training on GPUs.

Python
model_name = "distilbert-ag-news-finetuned"
training_args = TrainingArguments(
    output_dir=model_name,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    fp16=True,

    eval_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="epoch",

    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
)

Step 9: Training the Model

The Trainer class is initialized with the model, training arguments and datasets. It handles the entire training loop, including logging, evaluation and optimization.

Python
model_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=token_data["train"],
    eval_dataset=token_data["test"],
    data_collator=batch_collator,
    compute_metrics=compute_metrics,
)
model_trainer.train()

Output:

tlh2
Training the Model

Step 10: Pushing the Model to Hugging Face Hub

After training, we can push the fine-tuned model to the Hugging Face Model Hub. This makes it publicly available for others to use or continue fine-tuning.

Python
model_trainer.push_to_hub()

Output:

tlh3
Pushing the Model

Step 11: Making Predictions Using the Fine-Tuned Model

Once the model is fine-tuned and uploaded, we can use it to make predictions on new, unseen text. Let's see how we can classify a new news article.

Python
text = "The home team won the championship after a thrilling match."
classifier = pipeline("sentiment-analysis", model=f"your-model-choice/{model_name}")

prediction = classifier(text)
print(prediction)
print(dataset["train"].features["label"].names)

Output:

file
Making Predictions

You can download the source code from here.

Applications

  • Used for sentiment analysis to classify text as positive, negative, or neutral
  • Applied in Named Entity Recognition (NER) to identify entities like names, dates, and locations
  • Enables text generation for writing, coding, and content creation
  • Supports question answering by extracting relevant information from text

Advantages

  • Reduces the need for large labeled datasets
  • Saves time and computational resources compared to training from scratch
  • Improves performance, especially on smaller datasets

Limitations

  • Performance may drop if the source and target domains are very different
  • Fine-tuning still requires computational resources
  • Risk of overfitting when working with small datasets
Comment

Explore