Hindustries

Pioneering AI Solutions Since 2020

Model Architecture

8 Layers Transformer

768D Hidden States

SwiGLU Activation

Training Specs

128 Effective Batch

30 Epochs Training

AdamW Optimizer

Knowledge Base

52k Dialog Samples

32k Vocabulary

512 Max Context

Core Innovations

Rotary Position Embeddings

Dynamic Sequence Handling

Hybrid Attention Masking

Training Data

DailyDialog Conversations

Empathetic Exchanges

Persona-Chat Dialogues

BlendedSkillTalk

Optimization

2e-5 Learning Rate

8-Step Accumulation

Gradient Clipping