Post
1474
Open-MalSec v0.1 – Open-Source Cybersecurity Dataset
Evening! 🫡
📂 Just uploaded an early-stage open-source cybersecurity dataset focused on phishing, scams, and malware-related text samples.
This is the base version (v0.1)—a few structured sample files. Full dataset builds will come over the next few weeks.
🔗 Dataset link:
tegridydev/open-malsec
🔍 What’s in v0.1?
A few structured scam examples (text-based)
Covers DeFi, crypto, phishing, and social engineering
Initial labelling format for scam classification
⚠️ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback.
📂 Current Schema & Labelling Approach
"instruction" → Task prompt (e.g., "Evaluate this message for scams")
"input" → Source & message details (e.g., Telegram post, Tweet)
"output" → Scam classification & risk indicators
🗂️ Current v0.1 Sample Categories
Crypto Scams → Meme token pump & dumps, fake DeFi projects
Phishing → Suspicious finance/social media messages
Social Engineering → Manipulative messages exploiting trust
🔜 Next Steps
- Expanding datasets with more phishing & malware examples
- Refining schema & annotation quality
- Open to feedback, contributions, and suggestions
If this is something you might find useful, bookmark/follow/like the dataset repo <3
💬 Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open 🤙
Evening! 🫡
📂 Just uploaded an early-stage open-source cybersecurity dataset focused on phishing, scams, and malware-related text samples.
This is the base version (v0.1)—a few structured sample files. Full dataset builds will come over the next few weeks.
🔗 Dataset link:
tegridydev/open-malsec
🔍 What’s in v0.1?
A few structured scam examples (text-based)
Covers DeFi, crypto, phishing, and social engineering
Initial labelling format for scam classification
⚠️ This is not a full dataset yet (samples are currently available). Just establishing the structure + getting feedback.
📂 Current Schema & Labelling Approach
"instruction" → Task prompt (e.g., "Evaluate this message for scams")
"input" → Source & message details (e.g., Telegram post, Tweet)
"output" → Scam classification & risk indicators
🗂️ Current v0.1 Sample Categories
Crypto Scams → Meme token pump & dumps, fake DeFi projects
Phishing → Suspicious finance/social media messages
Social Engineering → Manipulative messages exploiting trust
🔜 Next Steps
- Expanding datasets with more phishing & malware examples
- Refining schema & annotation quality
- Open to feedback, contributions, and suggestions
If this is something you might find useful, bookmark/follow/like the dataset repo <3
💬 Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open 🤙