AWS is Amazon’s cloud
service.
It let’s you:
- Rent servers
- Manage domains
- Upload objects (mp4 files, jpgs, mp3s …)
- Autoscale servers
- Create k8s clusters
EC2
Stands for Elastic Compute V2.
- Elastic: Can scale up and down according to need
- Compute: A machine (a VM to be precise)
EC2 are machines (VMs) that we can rent from AWS to host our own code on. It allows you to:
- Specify the OS
- Specify the hardware capacity (CPUs, Storage, RAM, etc)
- Manage security, networking, etc
- Scale up and down as needed
Amazon uses hypervisors (like Nitro) to run multiples VMs on a single physical server. Each VM is in its own isolated environment and acts as a standalone machine.
You can interact with your provided instance using SSH.
Cloud vs Serverless
Cloud is renting servers that you do not own, but still manage. Serverless is just renting servers and deploying your code to do. The entire infrastructure, management is done by the provider
Security Groups in EC2
Security Groups is a set of firewall rules that control the traffic to your instance. Simply put, it controls the ports to your instance to which the outer world can connect to.
There are 3 options:
- Allow traffic from certain selected ports
- Allow HTTP traffic from the internet
- HTTP is by default port 80. Hence, http://google.com is analogous to http://google.com:80
- Allow HTTPS traffic from the internet
- HTTPS is by default port 883. Hence, https://google.com is analogous to https://google.com/883
Best practice is to only allow HTTPS traffic.
Reverse Proxy
Basically, it's a syntactic sugar for domain names. It helps to keep the domain names free of the http://someurl:<port>
part.
Let's understand proxies first. To do that, let's do a computer networking refresher. What happens when you type www.example.com
in a browser?
- The computer sends a request over the internet to the server hosting
www.example.com
. - The server sends the data back to the client.
- The browser displays that data
Now, a Proxy is a middleman between the client and the internet. It receives the request from the client, and sends it to the server over the internet. It even receives the request from the server, and then sends it back to the client. So you never really talk with the server directly, in this case.
Why use a proxy?
- Privacy: Hides your IP from the server
- Filtering: Blocks certain sites
- Caches frequently accessed sites to improve performance
- Admins can control/manage internet usage
Think of it as sending a letter through an assistant who reads and forwards it for you.
A reverse proxy is the direct opposite. It sits in front of the servers and handles requests on their behalf. The request from the client hits the reverse proxy and not the actual server. The reverse proxy analyzes the request and determines which server should handle it. It forwards the request, gets the response, and sends it back to the client.
Why use a reverse proxy?
- Load Balancing
- Hides server details - security
- Directs traffic based on URL, device, or content type
Think of it as a receptionist at a company who routes your call to the right department.
Now that we've understood what proxies are, let's understand what the problem is and why reverse proxies were deemed as appropriate solutions.
Let's say we rented an EC2 instance and we want to expose a Node.js process on it. Assume that we exposed it on port 8080. For any user to access it, it'd have to hit the url: http://instance_url:8080
. This looks ugly! (Also, remember to open port 8080 on your instance for the client to connect to it. You can do this by adding an inbound rule in your security group for a custom TCP rule type, and specifying the port as 8080).
One immediate workaround for this is to start the Node.js process on the HTTP default port, which is 80. If we do that, the url becomes http://instance_url
. This is good. This is what we wanted.
But, problem occurs when you try to get greedy. Now, you want to save costs and host multiple Node.js processes on the same EC2 machine. You can't expose two Node.js processes through the same default port. And so, we're back to the same problem.
A workaround for this is to use something called as a Reverse proxy. A reverse proxy is a process (not a Node.js process) that runs on the default HTTP port (port 80), that routes the incoming requests to their relevant ports based on where the requests originated from. Let's understand this through a diagram:
![[Pasted image 20250503042754.png]]
Think of it as a "Load Balancer" - it deduces where the request is coming from and forwards it accordingly.
Awesome. Now, how do you even create a reverse proxy? That's where Nginx comes from. Nginx is software with multiple offering, one of them being reverse proxies.
Installing Nginx
sudo apt update &&
sudo apt install nginx
This should start a nginx server
on port 80. Try visiting the website to check.
Creating a Reverse Proxy
sudo rm /etc/nginx/nginx.conf
sudo vi /etc/nginx/nginx.conf
events {
# Event directives...
}
http {
server {
listen 80;
server_name domain1.arvind.com;
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
}
sudo nginx -s reload
Now, start the backend server:
node index.js
And visit the website.
And, if you want your node.js process to run indefinitely (that is, even after you ssh out of the machine), you can use pm2
(npm install-g pm2
). Then run pm2 start index.js
, and it runs forever. (Use pm2 logs
to check the logs)
Musings: You can actually change what any website points to, for your local machine (for example, you can point google.com to go to any website of your choice). That way you can prank your friends and try to take their credentials. To do that, change the /etc/hosts file, and add a new entry there.
To do that:
sudo vi /etc/hosts
- Add an entry for
13.233.120.108 google.com
- this will point google.com (for your local machine) to the specified IP - the IP of your EC2 instance. - Now, when you try to hit google.com (
ping google.com
), you'll see the new IP - My EC2 instance has an NGINX Reverse Proxy running currently (routing all traffic at its default port to port 8080), and it serves a JSON of todos at the
/todos
route. - Hence, if you try to hit google.com at the
/todos
route (usingcurl google.com/todos
), you'll see a JSON of todos being returned! - So awesome!
Questions:
- ~~How to keep the Node.js process running on the instance, after I've exited using SSH?~~
- Answered! - Use
pm2
- Answered! - Use
- If we have nginx and reverse proxies set up, can we remove the extra inbound rules we created for the specific port (8080, in this case)?
- Yes. Since the client is only hitting the default HTTP port now (Port 80).
- Readings & References:
- https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys
- Read about Nginx RTMP module - media streaming module of Nginx