Spaces:

ChauHPham
/

AITextDetector

Running

App Files Files Community

ChauHPham commited on 9 days ago

Commit

84d4bbc

verified ·

1 Parent(s): 072b539

Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

AITextDetector.ipynb +0 -0
AITextDetector/.gradio/certificate.pem +31 -0
AITextDetector/AITextDetector/COLAB_DEPLOY.md +131 -0
AITextDetector/AITextDetector/DEPLOY.md +153 -0
AITextDetector/AITextDetector/README.md +35 -3
AITextDetector/AITextDetector/deploy.sh +19 -0
AITextDetector/README.md +3 -35
README.md +29 -3
ai_text_detector/models.py +176 -4
gradio_app.py +10 -23

AITextDetector.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

AITextDetector/.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

AITextDetector/AITextDetector/COLAB_DEPLOY.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# 🚀 Deploy to Hugging Face Spaces from Google Colab
+Step-by-step guide to deploy your AI Text Detector app permanently to Hugging Face Spaces, all from Google Colab!
+## Prerequisites
+1. **Hugging Face Account**: Create one at [huggingface.co/join](https://huggingface.co/join)
+2. **Access Token**: Get your token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
+   - Click "New token"
+   - Name it (e.g., "colab-deploy")
+   - Select "Write" permissions
+   - Copy the token (you'll need it!)
+## Step-by-Step Deployment
+### Step 1: Open Google Colab
+Go to [colab.research.google.com](https://colab.research.google.com/) and create a new notebook.
+### Step 2: Install Dependencies
+```python
+!pip install -q gradio huggingface_hub transformers torch pandas
+```
+### Step 3: Clone Your Repository
+```python
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+```
+### Step 4: Login to Hugging Face
+```python
+from huggingface_hub import login
+# Paste your token when prompted
+login()
+```
+**When prompted**, paste your Hugging Face token and press Enter.
+### Step 5: Deploy!
+```python
+!gradio deploy
+```
+**Follow the interactive prompts:**
+1. **Enter your Hugging Face username** (e.g., `yourusername`)
+2. **Enter a Space name** (e.g., `ai-text-detector`)
+   - This will create: `https://huggingface.co/spaces/yourusername/ai-text-detector`
+3. **Wait for deployment** (~5-10 minutes)
+   - Gradio will upload your files
+   - Hugging Face will build and deploy your app
+### Step 6: Access Your App!
+Once deployment completes, you'll see:
+```
+✅ Your app is live at: https://huggingface.co/spaces/yourusername/ai-text-detector
+```
+**Your app is now permanently hosted for free!** 🎉
+---
+## Complete Colab Notebook Code
+Copy-paste this entire block into a Colab cell:
+```python
+# Install dependencies
+!pip install -q gradio huggingface_hub transformers torch pandas
+# Clone repository
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+# Login to Hugging Face
+from huggingface_hub import login
+login()  # Paste your token here
+# Deploy!
+!gradio deploy
+```
+---
+## Troubleshooting
+### "Token not found" error
+- Make sure you copied the full token from Hugging Face
+- Tokens start with `hf_...`
+### "Space already exists" error
+- Choose a different Space name
+- Or delete the existing Space from [huggingface.co/spaces](https://huggingface.co/spaces)
+### Deployment takes too long
+- Normal deployment takes 5-10 minutes
+- Check the build logs in Hugging Face Spaces dashboard
+### Want to update your app?
+- Just run `!gradio deploy` again from Colab
+- It will update the existing Space
+---
+## Benefits of Hugging Face Spaces
+✅ **Free permanent hosting**
+✅ **No expiration** (unlike Colab public links)
+✅ **Shareable URL** that works forever
+✅ **Automatic updates** when you push code
+✅ **GPU support** (free tier available)
+---
+## Next Steps
+After deployment:
+1. Share your Space URL with others
+2. Customize your Space's README.md
+3. Add a Space card to your GitHub README
+4. Update your app anytime by running `gradio deploy` again
+Enjoy your permanently hosted AI Text Detector! 🚀

AITextDetector/AITextDetector/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,153 @@

+# 🚀 Deployment Guide
+## Google Colab (Recommended for Mac M2)
+**Perfect for Mac M2 users** - avoids PyTorch MPS mutex lock issues!
+### Quick Start
+1. Open [Google Colab](https://colab.research.google.com/)
+2. Create a new notebook
+3. Run:
+```python
+!pip install -q transformers torch pandas gradio kagglehub
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+!git checkout main
+!python gradio_app.py
+```
+4. **Get your public link**: After running, you'll see:
+   ```
+   * Running on public URL: https://xxxxx.gradio.live
+   ```
+   This link is shareable and works as long as the Colab notebook is running!
+### Keep It Running
+- Enable "Keep runtime alive" in Colab's runtime settings
+- The public link expires after 1 week of inactivity
+- For permanent hosting, use Hugging Face Spaces (see below)
+---
+## Hugging Face Spaces (Permanent Hosting)
+Deploy your app permanently to Hugging Face Spaces for free!
+### Option 1: Deploy from Google Colab
+**Perfect for Mac M2 users** - deploy directly from Colab!
+```python
+# 1. Install dependencies
+!pip install -q gradio huggingface_hub
+# 2. Clone your repo (if not already done)
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+# 3. Login to Hugging Face (you'll need a token)
+# Get your token from: https://huggingface.co/settings/tokens
+from huggingface_hub import login
+login()  # Paste your token when prompted
+# 4. Deploy!
+!gradio deploy
+```
+**Follow the prompts:**
+1. Enter your Hugging Face username
+2. Choose/create a Space name (e.g., `ai-text-detector`)
+3. Wait for deployment (~5-10 minutes)
+Your app will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
+### Option 2: Using Gradio CLI (Local)
+```bash
+# Install gradio if not already installed
+pip install gradio
+# Deploy from your project directory
+gradio deploy
+```
+Follow the prompts to:
+1. Login to Hugging Face (or create account)
+2. Choose/create a Space
+3. Deploy!
+### Option 3: Manual Deployment
+1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Choose "Gradio" as the SDK
+3. Upload your files:
+   - `gradio_app.py`
+   - `ai_text_detector/` (entire package)
+   - `requirements.txt`
+   - `README.md`
+4. Add a `README.md` in the Space with:
+   ```yaml
+   ---
+   title: AI Text Detector
+   emoji: 🔍
+   colorFrom: blue
+   colorTo: purple
+   sdk: gradio
+   app_file: gradio_app.py
+   pinned: false
+   ---
+   ```
+5. The Space will automatically build and deploy!
+---
+## Local Deployment
+### Requirements
+- Python 3.8+
+- See `requirements.txt`
+### Run Locally
+```bash
+# Install dependencies
+pip install -r requirements.txt
+pip install -e .
+# Run Gradio app
+python gradio_app.py
+```
+**Note for Mac M2 users**: Local training may fail due to PyTorch MPS bugs. Use Google Colab for training instead.
+---
+## Docker Deployment
+```bash
+# Build
+docker build -t ai-text-detector .
+# Run
+docker run -p 7860:7860 ai-text-detector
+```
+---
+## Troubleshooting
+### Mac M2 Issues
+If you encounter `mutex.cc lock blocking` errors on Mac M2:
+- ✅ **Use Google Colab** (recommended)
+- ✅ Use Docker with Linux base image
+- ❌ Local training may not work due to PyTorch MPS bugs
+### Model Loading Issues
+The app automatically uses the Desklib pre-trained model if no trained model is found. The model downloads automatically on first use (~1.7GB).

AITextDetector/AITextDetector/README.md CHANGED Viewed

@@ -1,8 +1,31 @@
-# AI Text Detector (CLI)
-A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, and a CLI.
-## Quickstart
 ```bash
 # 1) Create & activate a virtualenv (recommended)
@@ -46,3 +69,12 @@ See `configs/default.yaml`. Key fields:
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.

+---
+title: AITextDetector
+app_file: gradio_app.py
+sdk: gradio
+sdk_version: 5.49.1
+---
+# AI Text Detector
+A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, CLI, and a **Gradio web interface**.
+## 🌐 Web Interface (Gradio)
+**Try it now on Google Colab** (works perfectly on Mac M2!):
+```python
+!pip install -q transformers torch pandas gradio kagglehub
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+!python gradio_app.py
+```
+Get a **public shareable link** instantly! See [DEPLOY.md](DEPLOY.md) for deployment options.
+### 🍎 Mac M2 Users
+**Google Colab is recommended** - local training may fail due to PyTorch MPS mutex lock issues. The Gradio app works great in Colab with free GPU!
+## Quickstart (CLI)
 ```bash
 # 1) Create & activate a virtualenv (recommended)
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.
+* **Mac M2 users**: Use Google Colab for training (see above) to avoid PyTorch MPS bugs.
+## Deployment
+See [DEPLOY.md](DEPLOY.md) for:
+- Google Colab setup (recommended for Mac M2)
+- Hugging Face Spaces deployment (`gradio deploy`)
+- Docker deployment
+- Troubleshooting guide

AITextDetector/AITextDetector/deploy.sh ADDED Viewed

	@@ -0,0 +1,19 @@

+#!/bin/bash
+# Quick deployment script for Hugging Face Spaces
+echo "🚀 Deploying AI Text Detector to Hugging Face Spaces..."
+echo ""
+echo "Make sure you have:"
+echo "  1. Hugging Face account (https://huggingface.co/join)"
+echo "  2. Gradio installed (pip install gradio)"
+echo "  3. Hugging Face CLI installed (pip install huggingface_hub)"
+echo ""
+read -p "Press Enter to continue or Ctrl+C to cancel..."
+# Deploy using Gradio CLI
+gradio deploy
+echo ""
+echo "✅ Deployment complete!"
+echo "Your app will be available at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME"

AITextDetector/README.md CHANGED Viewed

@@ -1,31 +1,8 @@
----
-title: AITextDetector
-app_file: gradio_app.py
-sdk: gradio
-sdk_version: 5.49.1
----
-# AI Text Detector
-A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, CLI, and a **Gradio web interface**.
-## 🌐 Web Interface (Gradio)
-**Try it now on Google Colab** (works perfectly on Mac M2!):
-```python
-!pip install -q transformers torch pandas gradio kagglehub
-!git clone https://github.com/ChauHPham/AITextDetector.git
-%cd AITextDetector
-!python gradio_app.py
-```
-Get a **public shareable link** instantly! See [DEPLOY.md](DEPLOY.md) for deployment options.
-### 🍎 Mac M2 Users
-**Google Colab is recommended** - local training may fail due to PyTorch MPS mutex lock issues. The Gradio app works great in Colab with free GPU!
-## Quickstart (CLI)
 ```bash
 # 1) Create & activate a virtualenv (recommended)
@@ -69,12 +46,3 @@ See `configs/default.yaml`. Key fields:
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.
-* **Mac M2 users**: Use Google Colab for training (see above) to avoid PyTorch MPS bugs.
-## Deployment
-See [DEPLOY.md](DEPLOY.md) for:
-- Google Colab setup (recommended for Mac M2)
-- Hugging Face Spaces deployment (`gradio deploy`)
-- Docker deployment
-- Troubleshooting guide

+# AI Text Detector (CLI)
+A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, and a CLI.
+## Quickstart
 ```bash
 # 1) Create & activate a virtualenv (recommended)
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.

README.md CHANGED Viewed

@@ -4,11 +4,28 @@ app_file: gradio_app.py
 sdk: gradio
 sdk_version: 5.49.1
 ---
-# AI Text Detector (CLI)
-A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, and a CLI.
-## Quickstart
 ```bash
 # 1) Create & activate a virtualenv (recommended)
@@ -52,3 +69,12 @@ See `configs/default.yaml`. Key fields:
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.

 sdk: gradio
 sdk_version: 5.49.1
 ---
+# AI Text Detector
+A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, CLI, and a **Gradio web interface**.
+## 🌐 Web Interface (Gradio)
+**Try it now on Google Colab** (works perfectly on Mac M2!):
+```python
+!pip install -q transformers torch pandas gradio kagglehub
+!git clone https://github.com/ChauHPham/AITextDetector.git
+%cd AITextDetector
+!python gradio_app.py
+```
+Get a **public shareable link** instantly! See [DEPLOY.md](DEPLOY.md) for deployment options.
+### 🍎 Mac M2 Users
+**Google Colab is recommended** - local training may fail due to PyTorch MPS mutex lock issues. The Gradio app works great in Colab with free GPU!
+## Quickstart (CLI)
 ```bash
 # 1) Create & activate a virtualenv (recommended)
 * Labels standardized to `0=human`, `1=ai`.
 * Mixed precision (fp16) auto-enables on CUDA.
 * Evaluate with accuracy, macro-F1, and confusion matrix.
+* **Mac M2 users**: Use Google Colab for training (see above) to avoid PyTorch MPS bugs.
+## Deployment
+See [DEPLOY.md](DEPLOY.md) for:
+- Google Colab setup (recommended for Mac M2)
+- Hugging Face Spaces deployment (`gradio deploy`)
+- Docker deployment
+- Troubleshooting guide

ai_text_detector/models.py CHANGED Viewed

@@ -1,16 +1,170 @@
 import os
 # Disable tokenizer parallelism and MPS on macOS
 if os.getenv("TOKENIZERS_PARALLELISM") is None:
     os.environ["TOKENIZERS_PARALLELISM"] = "false"
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
 class DetectorModel:
-    def __init__(self, model_name="roberta-base", num_labels=2):
         self.model_name = model_name
-        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
     def save(self, path: str):
         self.model.save_pretrained(path)
@@ -18,10 +172,28 @@ class DetectorModel:
     @classmethod
     def load(cls, path: str):
         model = AutoModelForSequenceClassification.from_pretrained(path)
         tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
         obj = cls.__new__(cls)
         obj.model_name = path
         obj.model = model
         obj.tokenizer = tokenizer
         return obj

 import os
+import sys
 # Disable tokenizer parallelism and MPS on macOS
 if os.getenv("TOKENIZERS_PARALLELISM") is None:
     os.environ["TOKENIZERS_PARALLELISM"] = "false"
+import torch
+import torch.nn as nn
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig, AutoModel, PreTrainedModel
+class DesklibAIDetectionModel(PreTrainedModel):
+    """Desklib AI Detection Model - Pre-trained model for AI text detection"""
+    config_class = AutoConfig
+    def __init__(self, config):
+        super().__init__(config)
+        # Initialize the base transformer model
+        self.model = AutoModel.from_config(config)
+        # Define a classifier head
+        self.classifier = nn.Linear(config.hidden_size, 1)
+        # Initialize weights
+        self.init_weights()
+    def forward(self, input_ids, attention_mask=None, labels=None):
+        # Forward pass through the transformer
+        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
+        last_hidden_state = outputs[0]
+        # Mean pooling
+        input_mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
+        sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded, dim=1)
+        sum_mask = torch.clamp(input_mask_expanded.sum(dim=1), min=1e-9)
+        pooled_output = sum_embeddings / sum_mask
+        # Classifier
+        logits = self.classifier(pooled_output)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.BCEWithLogitsLoss()
+            loss = loss_fct(logits.view(-1), labels.float())
+        output = {"logits": logits}
+        if loss is not None:
+            output["loss"] = loss
+        return output
 class DetectorModel:
+    def __init__(self, model_name="desklib/ai-text-detector-v1.01", use_desklib=True):
+        """
+        Initialize detector model.
+        Args:
+            model_name: Model name or path. Defaults to Desklib pre-trained model.
+            use_desklib: If True, use Desklib model architecture. If False, use standard classification.
+        """
         self.model_name = model_name
+        self.use_desklib = use_desklib
+        if use_desklib and "desklib" in model_name:
+            # Try to load Desklib model, but fallback if MPS issues occur
+            if sys.platform == "darwin":
+                # On macOS: try multiple loading strategies
+                try:
+                    # Strategy 1: Load with low_cpu_mem_usage and explicit CPU
+                    print("Attempting to load Desklib model...")
+                    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+                    config = AutoConfig.from_pretrained(model_name)
+                    # Try loading with safetensors if available
+                    try:
+                        from transformers import AutoModel
+                        # Load base model first
+                        base_model = AutoModel.from_pretrained(
+                            model_name,
+                            torch_dtype=torch.float32,
+                            low_cpu_mem_usage=True,
+                            device_map="cpu"
+                        )
+                        # Create Desklib model wrapper
+                        self.model = DesklibAIDetectionModel(config)
+                        self.model.model = base_model
+                        self.model = self.model.to("cpu")
+                        # Load classifier weights
+                        from transformers.utils import cached_file
+                        try:
+                            classifier_path = cached_file(model_name, "pytorch_model.bin")
+                            state_dict = torch.load(classifier_path, map_location="cpu")
+                            # Only load classifier weights
+                            classifier_dict = {k: v for k, v in state_dict.items() if "classifier" in k}
+                            if classifier_dict:
+                                self.model.load_state_dict(classifier_dict, strict=False)
+                        except:
+                            pass  # Use initialized classifier
+                        self.model.eval()
+                        print("✅ Desklib model loaded successfully!")
+                    except Exception as e:
+                        print(f"⚠️  Desklib model loading failed: {e}")
+                        print("Falling back to DistilBERT model...")
+                        raise
+                except:
+                    # Fallback to a smaller, simpler model
+                    print("Using DistilBERT as fallback (smaller, more compatible)")
+                    self.use_desklib = False
+                    self.model = AutoModelForSequenceClassification.from_pretrained(
+                        "distilbert-base-uncased",
+                        num_labels=2
+                    )
+                    self.tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+                    self.model = self.model.to("cpu")
+            else:
+                # Non-macOS: standard loading
+                self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+                config = AutoConfig.from_pretrained(model_name)
+                self.model = DesklibAIDetectionModel.from_pretrained(model_name)
+        else:
+            # Fallback to standard classification model
+            self.model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
+            self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
+            self.use_desklib = False
+    def predict(self, text, max_length=768, threshold=0.5):
+        """
+        Predict if text is AI-generated.
+        Args:
+            text: Input text to classify
+            max_length: Maximum sequence length
+            threshold: Probability threshold for classification
+        Returns:
+            tuple: (probability, label) where label is 1 for AI-generated, 0 for human
+        """
+        # Tokenize
+        encoded = self.tokenizer(
+            text,
+            padding='max_length',
+            truncation=True,
+            max_length=max_length,
+            return_tensors='pt'
+        )
+        input_ids = encoded['input_ids']
+        attention_mask = encoded['attention_mask']
+        # Get device
+        device = next(self.model.parameters()).device
+        input_ids = input_ids.to(device)
+        attention_mask = attention_mask.to(device)
+        # Predict
+        self.model.eval()
+        with torch.no_grad():
+            if self.use_desklib:
+                outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
+                logits = outputs["logits"]
+                probability = torch.sigmoid(logits).item()
+            else:
+                outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
+                probs = torch.softmax(outputs.logits, dim=1)
+                # For standard models: prob[0] = human, prob[1] = AI
+                probability = probs[0][1].item()
+            label = 1 if probability >= threshold else 0
+        return probability, label
     def save(self, path: str):
         self.model.save_pretrained(path)
     @classmethod
     def load(cls, path: str):
+        # Try to detect if it's a Desklib model
+        try:
+            config = AutoConfig.from_pretrained(path)
+            # Check if it has the Desklib architecture
+            if hasattr(config, 'model_type') and 'deberta' in config.model_type.lower():
+                model = DesklibAIDetectionModel.from_pretrained(path)
+                tokenizer = AutoTokenizer.from_pretrained(path)
+                obj = cls.__new__(cls)
+                obj.model_name = path
+                obj.model = model
+                obj.tokenizer = tokenizer
+                obj.use_desklib = True
+                return obj
+        except:
+            pass
+        # Fallback to standard model
         model = AutoModelForSequenceClassification.from_pretrained(path)
         tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
         obj = cls.__new__(cls)
         obj.model_name = path
         obj.model = model
         obj.tokenizer = tokenizer
+        obj.use_desklib = False
         return obj

gradio_app.py CHANGED Viewed

@@ -46,13 +46,13 @@ def load_model():
             tokenizer = model.tokenizer
         except Exception as e:
             print(f"Failed to load model: {e}")
-            print("Using base RoBERTa model instead.")
-            model = DetectorModel("roberta-base")
             tokenizer = model.tokenizer
     else:
-        print("No trained model found. Using base RoBERTa model for demo.")
-        # Use a base model for demonstration
-        model = DetectorModel("roberta-base")
         tokenizer = model.tokenizer
 # Load model lazily (on first use) to avoid startup issues
@@ -76,29 +76,16 @@ def detect_text(text):
         return "Please enter some text to analyze."
     try:
-        # Tokenize the input text
-        inputs = tokenizer(
-            text,
-            truncation=True,
-            padding="max_length",
-            max_length=256,
-            return_tensors="pt"
-        )
-        # Get prediction
-        with torch.no_grad():
-            outputs = model.model(**inputs)
-            probabilities = torch.softmax(outputs.logits, dim=1)
-            human_prob = probabilities[0][0].item()
-            ai_prob = probabilities[0][1].item()
         # Determine prediction
-        if ai_prob > human_prob:
             label = "🤖 AI-generated"
             confidence = ai_prob
         else:
             label = "🧑 Human-written"
-            confidence = human_prob
         return f"{label} (confidence: {confidence:.1%})"
@@ -161,4 +148,4 @@ with gr.Blocks(title="AI Text Detector", theme=gr.themes.Soft()) as app:
     )
 if __name__ == "__main__":
-    app.launch(share=False, server_name="0.0.0.0", server_port=7860)

             tokenizer = model.tokenizer
         except Exception as e:
             print(f"Failed to load model: {e}")
+            print("Using Desklib pre-trained model instead.")
+            model = DetectorModel("desklib/ai-text-detector-v1.01", use_desklib=True)
             tokenizer = model.tokenizer
     else:
+        print("No trained model found. Using Desklib pre-trained AI detector model.")
+        # Use Desklib pre-trained model (no training needed!)
+        model = DetectorModel("desklib/ai-text-detector-v1.01", use_desklib=True)
         tokenizer = model.tokenizer
 # Load model lazily (on first use) to avoid startup issues
         return "Please enter some text to analyze."
     try:
+        # Use the model's predict method
+        ai_prob, predicted_label = model.predict(text, max_length=768, threshold=0.5)
         # Determine prediction
+        if predicted_label == 1:
             label = "🤖 AI-generated"
             confidence = ai_prob
         else:
             label = "🧑 Human-written"
+            confidence = 1 - ai_prob  # Human probability is 1 - AI probability
         return f"{label} (confidence: {confidence:.1%})"
     )
 if __name__ == "__main__":
+    app.launch(share=True, server_name="0.0.0.0", server_port=7860)