N-gram Models

About N-grams

An n-gram is a sequence of n adjacent items, such as phonemes, syllables, letters, or words, arranged in a specific order. We classify n-grams based on the value of n. If n is 1, it's called a unigram. If n is 2, it's called a bigram. This pattern continues with different prefixes that indicate the number of items in the sequence.

Take the sentence: "The quick brown fox jumps over the lazy dog."

Unigrams (1 word): The, quick, brown…
Bigrams (2 words): The quick, quick brown…
Trigrams (3 words): The quick brown, quick brown fox…

Bigrams give more context than unigrams, and trigrams give even more.

If you want to learn more about n-grams you can go to here and here.

What are N-gram Language Models?

N-gram language models use word sequences (N-grams) to predict the next word in a sentence. They work by looking at the last few words and choosing the most likely word based on patterns from training data.

This makes them useful for capturing word relationships. However, if we want more variety instead of predictable results, we may need to add randomness to the word choice.

You can find more information about this here.

How My Model Works

My N-gram model is simple and doesn’t use smoothing techniques or separate functions for each N-gram. Instead, it relies on nested dictionaries: the previous word is stored as a key, and its possible next words are stored as sub-keys with their probabilities. Example (bigram):

{
- "Word1": { "Word2": 0.5, "Word9": 0.5 },
- "Word2": { "Word3": 0.33, "Word18": 0.33, "Word29": 0.33 }
}

Here, the numbers show the probability of one word following another. The model then uses these probabilities to predict the next words in a sequence. You can go to the github repository to see how its done.

How to Use

Training

Enter a large text into the corpus field.
Select the N-gram type (1 = unigram, 2 = bigram, etc.).
Pick symbols to remove under Additional settings.
Click Download to save the trained model as a JSON file.

Upoloading

If you already have a JSON file, no need to train again.
Click the upload field (drag & drop not supported yet).
Select your file (format: Ngram_{Time-stamp}.json).
Click Upload.

Generating

Enter a word or phrase from your corpus.
Options:
- Choose? → Pick one word only.
- Uncheck Choose? → See all possible next words.
- Predict many? → Generate multiple words (disables Choose? and adds a word-count field).
Set the word count (how many words to generate).
Click Generate → The result will appear under Output.

About N-grams

What are N-gram Language Models?

How My Model Works

How to Use

Training

Upoloading

Generating

Train

Upload

Generate

Output: