Run AI models locally for free with Ollama | Maurício Cantú Jochims

Want to run AI models on your own computer, for free, in a super simple and fast way? Ollama is the answer.

Ollama is a command-line tool for running and managing LLMs locally with commands like run, stop, ps, pull, and push (yes, it’s like managing Docker containers).

How to get started:

Install Ollama from the official website.
Run ollama run <model-name>, for example:

ollama run deepseek-r1:7b

Choose a model with more/less parameters depending on your machine’s capacity.
If the model is not installed yet, Ollama will automatically download and install it.

That’s it! A chat session will start in your terminal with the installed model.

Want a Local HTTP API to Integrate with a Frontend?

Ollama runs a local server on port 11434 and returns responses as a text stream:

curl http://localhost:11434/api/generate -d '{
 "model": "deepseek-r1:7b",
 "prompt": "Who was Alan Turing?"
}'

Need to disable streaming? Just add "stream": false to the request body.

Why is this tool so useful?

Great for developing and testing AI projects.
Useful for local productivity tools with privacy.
You can even set up a local Copilot in VSCode with different models.
You can also run it inside a container and deploy it to the cloud to make it available outside your local environment!