Learning AWS

AWS is Amazon’s cloud service. It let’s you:

  1. Rent servers
  2. Manage domains
  3. Upload objects (mp4 files, jpgs, mp3s …)
  4. Autoscale servers
  5. Create k8s clusters

EC2

Stands for Elastic Compute V2.

  • Elastic: Can scale up and down according to need
  • Compute: A machine (a VM to be precise)

EC2 are machines (VMs) that we can rent from AWS to host our own code on. It allows you to:

  • Specify the OS
  • Specify the hardware capacity (CPUs, Storage, RAM, etc)
  • Manage security, networking, etc
  • Scale up and down as needed

Amazon uses hypervisors (like Nitro) to run multiples VMs on a single physical server. Each VM is in its own isolated environment and acts as a standalone machine.

You can interact with your provided instance using SSH.

Cloud vs Serverless

Cloud is renting servers that you do not own, but still manage. Serverless is just renting servers and deploying your code to do. The entire infrastructure, management is done by the provider

Security Groups in EC2

Security Groups is a set of firewall rules that control the traffic to your instance. Simply put, it controls the ports to your instance to which the outer world can connect to.

There are 3 options:

  • Allow traffic from certain selected ports
  • Allow HTTP traffic from the internet
    • HTTP is by default port 80. Hence, http://google.com is analogous to http://google.com:80
  • Allow HTTPS traffic from the internet
    • HTTPS is by default port 883. Hence, https://google.com is analogous to https://google.com/883

Best practice is to only allow HTTPS traffic.

Reverse Proxy

Basically, it's a syntactic sugar for domain names. It helps to keep the domain names free of the http://someurl:<port> part.

Let's understand proxies first. To do that, let's do a computer networking refresher. What happens when you type www.example.com in a browser?

  • The computer sends a request over the internet to the server hosting www.example.com.
  • The server sends the data back to the client.
  • The browser displays that data

Now, a Proxy is a middleman between the client and the internet. It receives the request from the client, and sends it to the server over the internet. It even receives the request from the server, and then sends it back to the client. So you never really talk with the server directly, in this case.

Why use a proxy?

  • Privacy: Hides your IP from the server
  • Filtering: Blocks certain sites
  • Caches frequently accessed sites to improve performance
  • Admins can control/manage internet usage

Think of it as sending a letter through an assistant who reads and forwards it for you.

A reverse proxy is the direct opposite. It sits in front of the servers and handles requests on their behalf. The request from the client hits the reverse proxy and not the actual server. The reverse proxy analyzes the request and determines which server should handle it. It forwards the request, gets the response, and sends it back to the client.

Why use a reverse proxy?

  • Load Balancing
  • Hides server details - security
  • Directs traffic based on URL, device, or content type

Think of it as a receptionist at a company who routes your call to the right department.

Now that we've understood what proxies are, let's understand what the problem is and why reverse proxies were deemed as appropriate solutions.

Let's say we rented an EC2 instance and we want to expose a Node.js process on it. Assume that we exposed it on port 8080. For any user to access it, it'd have to hit the url: http://instance_url:8080. This looks ugly! (Also, remember to open port 8080 on your instance for the client to connect to it. You can do this by adding an inbound rule in your security group for a custom TCP rule type, and specifying the port as 8080).

One immediate workaround for this is to start the Node.js process on the HTTP default port, which is 80. If we do that, the url becomes http://instance_url. This is good. This is what we wanted.

But, problem occurs when you try to get greedy. Now, you want to save costs and host multiple Node.js processes on the same EC2 machine. You can't expose two Node.js processes through the same default port. And so, we're back to the same problem.

A workaround for this is to use something called as a Reverse proxy. A reverse proxy is a process (not a Node.js process) that runs on the default HTTP port (port 80), that routes the incoming requests to their relevant ports based on where the requests originated from. Let's understand this through a diagram:

![[Pasted image 20250503042754.png]]

Think of it as a "Load Balancer" - it deduces where the request is coming from and forwards it accordingly.

Awesome. Now, how do you even create a reverse proxy? That's where Nginx comes from. Nginx is software with multiple offering, one of them being reverse proxies.

Installing Nginx
sudo apt update &&
sudo apt install nginx

This should start a nginx server on port 80. Try visiting the website to check.

Creating a Reverse Proxy
sudo rm /etc/nginx/nginx.conf
sudo vi /etc/nginx/nginx.conf
events {
    # Event directives...
}

http {
	server {
    listen 80;
    server_name domain1.arvind.com;

    location / {
        proxy_pass http://localhost:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
	}
}
sudo nginx -s reload

Now, start the backend server:

node index.js

And visit the website.

And, if you want your node.js process to run indefinitely (that is, even after you ssh out of the machine), you can use pm2 (npm install-g pm2). Then run pm2 start index.js, and it runs forever. (Use pm2 logs to check the logs)


Musings: You can actually change what any website points to, for your local machine (for example, you can point google.com to go to any website of your choice). That way you can prank your friends and try to take their credentials. To do that, change the /etc/hosts file, and add a new entry there.

To do that:

  • sudo vi /etc/hosts
  • Add an entry for 13.233.120.108 google.com - this will point google.com (for your local machine) to the specified IP - the IP of your EC2 instance.
  • Now, when you try to hit google.com (ping google.com), you'll see the new IP
  • My EC2 instance has an NGINX Reverse Proxy running currently (routing all traffic at its default port to port 8080), and it serves a JSON of todos at the /todos route.
  • Hence, if you try to hit google.com at the /todos route (using curl google.com/todos), you'll see a JSON of todos being returned!
  • So awesome!

Questions:

  • ~~How to keep the Node.js process running on the instance, after I've exited using SSH?~~
    • Answered! - Use pm2
  • If we have nginx and reverse proxies set up, can we remove the extra inbound rules we created for the specific port (8080, in this case)?
    • Yes. Since the client is only hitting the default HTTP port now (Port 80).
  • Readings & References:
    • https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys
    • Read about Nginx RTMP module - media streaming module of Nginx