Amazing URL Shortener is an online tool with high availability to shorten a long link and provide detailed statistics.
Short link demo:
- Backend: https://github.com/leihehehe/java-url-shortener
- Frontend: currently is a private repo
Technical Stacks Used
Framework: SpringBoot, Spring Cloud
Gateway: Spring Cloud Gateway
Message Queues: RabbitMQ, Kafka
Database-related Techs: Redis, Redisson(operating on Redis), ClickHouse, SpringData JPA, MySQL, Apache Shardingsphere
Data Streaming-processing Framework: Flink
Others: Openfeign, etc.
ShardingSphere provides a distributed database solution based on the underlying database, which can scale computing and storage horizontally
Why use it?
Since we are developing a system that can handle massive data, it is not possible to just use one single database or a few tables to store the data. Storing massive data in only one node or database will cause much pressure on the database itself, and speed will be significantly slowed down.
Therefore, it is necessary to do sharding and partitioning. For example, I divided the
product_order table into two tables stored in the
url_shop database and divided the
short_link table into 6 tables and every two tables are distributed in a
url_link table. (For more details, please see the Database Structure part)
Partition keys are set in terms of different cases. For example, for users to find their orders quickly,
accountNo is used as a partition key, so that a user’s orders will be in the same table.
Another situation is that we need to handle a large number of shortened links across multiple databases and tables. In this case, I use
accountNo as a partition key to finding out which databases the records are currently in. Besides,
groupId is used as a partition key for finding out which tables the records are currently in.
How to query the data in this case?
I used two methods to query the data
- Modulo. For example,
k = accountNo mod nwhere n is the number of tables that will query the kth table
- Store database and column info in the data itself. For example, we got a short link code stored in the database
2, then we can update the short link code to
axxxx2, so that we could quickly locate this short link when querying the data.
Redis, Redisson & Lua Script
- Redis is used as a cache in the project. It stores the users’ plan data information to prevent frequent requests to the MySQL database.
- Redis and Redisson are used together to handle distributed locks. In addition to using Redisson, Lua script is also straightforward and used for creating locks.
- RabbitMQ is used to slow down the response waiting time as operations can be asynchronously executed by sending messages.
- RabbitMQ is also used to deliver delayed messages for different purposes like order cancellation and distributed transactions.
- Compared to
Kafka, it is more suitable for an e-commerce business service(e.g. scheduled tasks and distributed transaction management)
Flink, Kafka & ClickHouse
Flink is used for processing data streams at a large scale and to deliver real-time analytical insights about your processed data with your streaming application.
In this project, I used
Flink to get visitor information when a visitor is trying to access a shortened link. Flink helps to process the visitor information in each layer and pass the processed information to the next layer using
At the last layer, I used Flink to pass the final datasets to ClickHouse which is a fast column-oriented database management system.
ClickHouse allows us to generate analytical reports using SQL queries.
Jenkins & Kubernetes(High Availability)
The Jenkins file is included in the project, and both front-end and back-end projects are deployed using Kubernetes and Jenkins.
Jenkins runs a piepeline to build and upload images to AWS ECR.
Using 3 servers(1 master node and 1 worker node) to build a Kubernetes cluster.
Technical Difficulties & Solutions
User and Visitors Querying Links
As we know users prefer to query links in terms of their
accountNo, if the links are distributed in different databases and tables, it will be very difficult for users to query the data since the system needs to access all the databases and tables to get the records.
However, visitors, just need the short link code to locate the table, and they have no idea about the
So apparently, we would have two different partition keys to deal with these two cases.
Therefore, I duplicated the tables so that tables
group_link_mapping are used to store data for users,
short_link tables are used to store data for visitors.
Distributed Transaction Management
New issues have arisen after duplicating the short link tables. Since tables need to be synchronized, there will be inconsistent data if the operation on the user side failed but succeed on the visitor side.
RabbitMQ to send a delayed message to check if operations on both sides are successful.
Two Users Creating the Same Short Link
If two users are generating the same short link, user A has inserted data to
group_link_mapping and user B has inserted data into
short_link, at this time, both users will fail because they cannot insert data into others tables(data already exists).
Redis to add a lock. The value stored in the Redis will be
accountNo in the Redis matches the current logged-in
accountNo, continue the operations. Otherwise, stop the operations(another user is holding the lock).
Nginx - Short Link Access and API server
I have changed the Load balancer to Nginx. Using Nginx can help us load balance and proxy traffic to bankend servers. It significantly saves the overhead of using the AWS load balancer.
I used the Load Balancer in AWS to forward different requests with different URLs and the specific port (80). Basically, we will have two different domains for websites and shorten urls.
I did not deploy Eureka servers to Kubernetes. Instead, I deployed them to another machine and used Docker network functions to enable communication between them.
Database Structure (ER Diagram)