Metadata-Version: 2.4
Name: socialpretext
Version: 0.1.1
Summary: A lightweight utility for preprocessing and cleaning social media text.
Author-email: Akash Goyal <akashpgoyal@gmail.com>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: emoji>=2.0.0

# SocialPreText

SocialPreText is a lightweight, high-performance Python utility designed specifically for preprocessing and cleaning social media text. 

Whether you are working with data from Twitter (X), Reddit, Instagram, or Discord, this package helps you transform raw, slang-heavy text into a clean, normalized format ready for Natural Language Processing (NLP) models, sentiment analysis, or machine learning.

---

## Features

* **Slang Expansion:** Automatically converts internet shorthand (e.g., idk, brb, lit) into formal English.
* **Contraction Expansion:** Fixes English contractions (e.g., don't -> do not).
* **Emoji Management:** Choose between stripping emojis entirely or "demojizing" them into text descriptions (e.g., 🔥 -> :fire:).
* **Pattern Removal:** High-speed removal of URLs, @mentions, and #hashtags using optimized Regular Expressions.
* **Text Normalization:** Handles lowercase conversion, punctuation removal, and whitespace cleanup.
* **Custom Pipelines:** Run a full cleaning suite in one line or build a custom sequence of steps.

---

## Installation

Install the package directly from PyPI using pip:

```bash
pip install socialpretext
Quick Start: The One-Step CleanThe easiest way to use SocialPreText is the clean_all function. It runs a default pipeline: Lowercase -> Expand Contractions -> Expand Slang -> Remove URLs -> Remove Mentions -> Remove Emojis -> Remove Punctuation -> Normalize Whitespace.Pythonfrom socialpretext import clean_all

raw_data = "OMG idk, @user! This update is lit 🔥 Check it out: [https://example.com](https://example.com) #tech"

# One-step cleaning
cleaned = clean_all(raw_data)

print(cleaned)
# Output: "oh my god i do not know this update is amazing check it out"
Advanced Usage1. Building a Custom PipelineIf you do not want to use the default settings, you can define your own steps.Available steps: lowercase, expand_contractions, expand_slang, remove_urls, remove_mentions, remove_punctuation, normalize_whitespace, remove_emojis, demojize.Pythonfrom socialpretext import clean_all

text = "U r gonna love this! 🔥"

# Define a custom sequence
custom_steps = ['expand_slang', 'demojize']

# Run the custom pipeline
result = clean_all(text, pipeline_steps=custom_steps)

print(result)
# Output: "you are going to love this! :fire:"
2. Using Individual FunctionsYou can also import specific functions for granular control.Pythonfrom socialpretext import expand_slang, remove_urls, demojize

text = "idk check this link [https://google.com](https://google.com) 😊"

text = expand_slang(text)   # "I do not know check this link [https://google.com](https://google.com) 😊"
text = remove_urls(text)    # "I do not know check this link  😊"
text = demojize(text)       # "I do not know check this link  :smiling_face_with_smiling_eyes:"
Available SyntaxesFunctionDescriptionclean_all(text, pipeline_steps=None)Runs a full pipeline (Default or Custom).expand_slang(text)Replaces internet slang with full words.expand_contractions(text)Expands English contractions (don't -> do not).remove_urls(text)Strips http, https, and www links.remove_mentions(text)Strips @usernames.remove_hashtags(text)Strips #hashtags.remove_emojis(text)Deletes all emoji characters.demojize(text)Converts emojis to text (e.g., :fire:).remove_punctuation(text)Removes all standard punctuation marks.normalize_whitespace(text)Fixes extra spaces, tabs, and newlines.AuthorAkash Goyal Email: akashpgoyal@gmail.comPyPI: akashgoyalLicenseThis project is licensed under the MIT License.
