XceptionNet-Keras / README.md
Redgerd's picture
Rename ReadMe.md to README.md
6d773c5 verified
|
raw
history blame
4.57 kB

Deepfake Detection Model

This repository contains a deepfake detection model built using a combination of a pre-trained Xception network and an LSTM layer. The model is designed to classify videos as either "Real" or "Fake" by analyzing sequences of facial frames extracted from the video.

Model Architecture

The model architecture consists of the following components:

  1. Input Layer: Takes a sequence of TIME_STEPS frames, each resized to 299x299 pixels with 3 color channels. The input shape is (batch_size, TIME_STEPS, HEIGHT, WIDTH, 3).

  2. TimeDistributed Xception: A pre-trained Xception network (trained on ImageNet) is applied to each frame independently using a TimeDistributed wrapper. The include_top is set to False, and pooling is set to 'avg', effectively using the Xception network as a feature extractor for each frame. This produces a sequence of feature vectors, one for each frame.

  3. LSTM Layer: The sequence of feature vectors from the TimeDistributed Xception layer is fed into an LSTM (Long Short-Term Memory) layer with 256 hidden units. The LSTM layer is capable of learning temporal dependencies between frames, which is crucial for deepfake detection.

  4. Dropout Layer: A Dropout layer with a rate of 0.5 is applied after the LSTM layer to prevent overfitting.

  5. Output Layer: A Dense layer with 2 units and a softmax activation function outputs the probabilities for the two classes: "Real" and "Fake".

How to Use

1. Setup

Clone the repository and install the required libraries:

pip install tensorflow opencv-python numpy mtcnn Pillow

2. Model Loading

The model weights are loaded from COMBINED_best_Phase1.keras. Ensure this file is accessible at the specified model_path.

model_path = '/content/drive/MyDrive/Dataset DDM/FINAL models/COMBINED_best_Phase1.keras'
model = build_model() # Architecture defined in the `build_model` function
model.load_weights(model_path)

3. Face Extraction and Preprocessing

The extract_faces_from_video function processes a given video file:

  • It uses the MTCNN (Multi-task Cascaded Convolutional Networks) for robust face detection in each frame.
  • It samples TIME_STEPS frames from the video.
  • For each sampled frame, it detects the primary face, extracts it, and resizes it to 299x299 pixels.
  • The extracted face images are then preprocessed using preprocess_input from tensorflow.keras.applications.xception, which scales pixel values to the range expected by the Xception model.
  • If no face is detected in a frame, a black image of the same dimensions is used as a placeholder.
  • The function ensures that exactly TIME_STEPS frames are returned, padding with the last available frame or black images if necessary.
from mtcnn import MTCNN
import cv2
import numpy as np
from PIL import Image
from tensorflow.keras.applications.xception import preprocess_input

def extract_faces_from_video(video_path, num_frames=30):
    # ... (function implementation as provided in prediction.ipynb)
    pass

video_path = '/content/drive/MyDrive/Dataset DDM/FF++/manipulated_sequences/FaceShifter/raw/videos/724_725.mp4'
video_array = extract_faces_from_video(video_path, num_frames=TIME_STEPS)

4. Prediction

Once the video_array (preprocessed frames) is ready, you can make a prediction using the loaded model:

predictions = model.predict(video_array)
predicted_class = np.argmax(predictions, axis=1)[0]
probabilities = predictions[0]

class_names = ['Real', 'Fake']
print(f"Predicted Class: {class_names[predicted_class]}")
print(f"Class Probabilities: Real: {probabilities[0]:.4f}, Fake: {probabilities[1]:.4f}")

Parameters

  • TIME_STEPS: Number of frames to extract from each video (default: 30).
  • HEIGHT, WIDTH: Dimensions to which each extracted face image is resized (default: 299, 299).
  • lstm_hidden_size: Number of hidden units in the LSTM layer (default: 256).
  • dropout_rate: Dropout rate applied after the LSTM layer (default: 0.5).
  • num_classes: Number of output classes (default: 2 for "Real" and "Fake").

Development Environment

The provided code snippet is written in Python and utilizes tensorflow (Keras API), opencv-python, numpy, mtcnn, and Pillow. It is designed to be run in an environment with these libraries installed. The paths suggest it was developed using Google Drive, potentially within a Colab environment.