Custom Training

Train the model with your specific knowledge like (pdf,txt) and get responses from model.

Hardware Support

Optimized for both GPU and CPU environments, providing flexible deployment options.

Advanced Architecture

Built with custom transformers, attention layers, and positional encoders for optimal performance.

My Training

Trained on 1,930,119 characters
Trained over 10 epochs
Trained for 17 hrs 9 min 30 sec

My System configuration

CPU: Intel Core i7(12th gen)
GPU: NVIDIA RTX3050 Laptop(4GB GDDR6)
RAM: 16GB DDR4

What were the criteria for choosing this algorithm❓

Algorithm	Description	Benefit
Transformer	Replaces RNNs with self-attention	Fast parallel processing
Multi-Head Self-Attention (8 heads)	Learns multiple relationships in parallel	Improves context understanding
512-Dimension Embeddings	Represents tokens with high-dimensional vectors	Captures complex meanings
2048-Dimension Feed-Forward Layer	Expands and compresses attention output	Enhances feature extraction
Word-Level Tokenization	Uses full words instead of subwords	Simplifies processing, useful for short texts
Frequency-Based Vocabulary Truncation	Keeps only high-frequency words	Reduces memory usage and speeds up inference

Model Architecture

Training Status & Result

" My model is trained with fewer parameters compared to ChatGPT or LLaMA. However, I plan to continue training it whenever I have access to a GPU. "

Stay tuned__you should definitely try my model! I'm also working on refining its grammar and accuracy for even better performance.

" I'm working on the final touches for the model and are looking forward to releasing it soon! Stay tuned for updates in the coming days, and we appreciate your patience. "

Try ShortGPT without RAG

Knowledge Integration

Seamlessly incorporate PDFs, Wikipedia articles, and other text sources into a unified knowledge base.

Efficient Storage

Dual-format storage with JSON for metadata and PyTorch tensors for vector embeddings enables fast retrieval.

Batch Processing

Memory-efficient handling of large datasets through incremental batch processing and smart caching.

System Requirements

CPU: Intel Core i7 or AMD Ryzen 7
RAM: 16GB minimum, 32GB recommended
GPU: NVIDIA with 4GB+ VRAM for acceleration

Performance Metrics

3.5M tokens processed per minute on GPU
Retrieval latency under 50ms for 100K chunks
98% accuracy on benchmark QA datasets

Custom Training

Train the model with your specific knowledge like (pdf,txt) and get responses from model.

Hardware Support

Optimized for both GPU and CPU environments, providing flexible deployment options.

Advanced Architecture

Built with custom transformers, attention layers, and positional encoders for optimal performance.

RAG Architecture Components

Component/Algorithm	Description	Benefit
Transformer	Replaces RNNs with self-attention	Fast parallel processing
Multi-Head Self-Attention (8 heads)	Learns multiple relationships in parallel	Improves context understanding
512-Dimension Embeddings	Represents tokens with high-dimensional vectors	Captures complex meanings
2048-Dimension Feed-Forward Layer	Expands and compresses attention output	Enhances feature extraction
Word-Level Tokenization	Uses full words instead of subwords	Simplifies processing, useful for short texts
Frequency-Based Vocabulary Truncation	Keeps only high-frequency words	Reduces memory usage and speeds up inference
knowledge.json	Storage for text chunks, source metadata, and chunk IDs	Organizes textual data with context information
embeddings.pt	PyTorch tensor with vector representations aligned with chunks	Enables fast similarity search for retrieval
process_pdf()	Extracts text from PDF and creates chunks with metadata	Prepares documents for embedding and retrieval
compute_embeddings()	Converts text to vectors using embedding model	Creates semantic representations for chunks
save()	Saves chunks, sources, and tensor embeddings to disk	Persists knowledge base for future use
load()	Loads knowledge.json and embeddings.pt into memory	Prepares knowledge base for querying
extend_knowledge_base()	Copies original chunks and clones original embeddings	Preserves existing data while preparing for extension
Merge knowledge	Processes new PDFs, computes embeddings, concatenates with original	Integrates new information with existing knowledge
Process in batches	Gets Wikipedia articles, chunks text, tracks batch size	Manages memory usage for large corpus processing
update_embeddings_with_batch()	Computes embeddings for current batch and merges with existing data	Allows incremental updates to knowledge base
load_knowledge_base()	Loads knowledge.json to memory and embeddings.pt to GPU memory	Optimizes retrieval performance with GPU acceleration
cosine_similarity()	Compares query embedding with chunk embeddings	Identifies most relevant chunks for response generation
torch.cat()	Concatenates original and new embeddings tensors	Efficiently merges vector representations
CPU/GPU Memory Management	Strategically moves tensors between CPU and GPU memory	Balances performance and memory constraints

RAG Workflow

1. Data Storage Structure

The system uses knowledge.json for storing text chunks with metadata and embeddings.pt for vector representations in PyTorch format.

2. PDF Processing

Documents are extracted, chunked, and associated with metadata through the process_pdf() function for structured knowledge representation.

3. Embedding Computation

Text chunks are transformed into 512-dimensional vector embeddings capturing semantic meaning for effective retrieval operations.

4. Knowledge Extension

The system can incorporate new documents while preserving original data through careful copying and tensor concatenation operations.

5. Web Knowledge Integration

Wikipedia articles are processed in manageable batches to enrich the knowledge base while maintaining memory efficiency.

6. GPU-Accelerated Retrieval

Cosine similarity between query and chunk embeddings leverages GPU processing for fast and accurate information retrieval.

7. Multi-Head Attention

The system employs 8-head self-attention mechanisms to learn multiple relationship patterns in parallel across text data.

8. Feed-Forward Processing

A 2048-dimension feed-forward layer expands and compresses attention output for enhanced feature extraction capabilities.

9. Tokenization Strategy

Word-level tokenization simplifies processing for short texts while frequency-based vocabulary truncation optimizes memory usage.

Model Architecture

Results

Verifying whether the text was generated by AI or a human!

Different AI models

We conducted an in-depth comparison of ShortGPT with leading AI models, including ChatGPT, Deep Seek, ClaudeAI, Cohere, Gemma, Grok, Meta, and Mistral, to evaluate their AI-generated content. Our analysis revealed that ChatGPT, Deep Seek, ClaudeAI, Cohere, Gemma, Meta, and Mistral consist of 100% AI-generated text, while Grok contains 59% AI-generated text. Interestingly, ShortGPT was found to contain 0% AI-generated text, demonstrating its unique approach to processing and generating information. This comparison highlights the distinctiveness of ShortGPT and its potential to offer a new paradigm in AI development. Explore the detailed architecture documents below.My ShortGPT model achieves 100% humanization, outperforming other models in generating natural, human-like text.

➥ ChatGPT

➥ Deep Seek

➥ ClaudeAI

➥ Cohere

➥ Gemma

➥ Grok

➥ Meta

➥ Mistral

➥ ShortGPT

"The RAG-enhanced ShortGPT model creates a perfect balance between fixed, trained knowledge and dynamic, document-based retrieval. This hybrid approach gives you the best of both worlds."

Coming soon: Advanced filtering capabilities and multi-modal document support!

"Experience the power of on-demand knowledge without sacrificing the performance of a finely-tuned model."

Try ShortGPT with RAG