Media API¶
Overview¶
The Media API is designed to fetch user uploaded content in a predictable, understandable and a secure way. It is also an abstraction layer between the user and the internal storage scheme used in the application.
Front-end¶
All the urls handled by the user have this general shape:
https://ourhosting.com/media/:longidentifier/original_filename.ext
This method has several upsides
It stays the same for all file types and uses. From images to submissions and tasks, everything can be retrieved in the same way.
The identifier could be used in many places throughout the application, since it together with the filename will uniquely map to a particular file.
It allows for pseudo-folders. If all filenames are unique and the circumstances call for that (for example, a task folder) those files can be kept together for ease of access under the same identifier.
It preserves the original filenames, which helps the user who uploaded it to identify the file later.
Back-end¶
There are basically four parts to the inner workings of this API. The check, the database, the file server and the delivery service.
The check¶
Before even trying to find the file, the api has to check whether the user has the rights to download that file. Those are stored in a separate column in the lookup database. See permissions-label.
Note
Caching policy
At this stage the api also decides the caching policy for the files. Private files should not be cached outside of the in-app delivery service. Public files, however, should be set to cache.
The database¶
Each row has the identifier + the original filename as the primary key. For it, it stores the fs path to the file (including the sharding key) and read and modify permissions.
The file server¶
To maximize efficiency, multiple SMB servers are sharded. They are all mounted in the content delivery container, and all have the same top-level structure. File system structure is described here File System.
The delivery¶
It is logical to use the XSendfile feature. Since the file servers are mounted at some path to the nginx container the sharding can be done in the database step, just by specifying a different top level folder, corresponding to an SMB server instance.
Uploading¶
No file should be able to be uploaded directly, but rather through other more
specific endpoints (like a new task, or a new submission). Those endpoints
should return a 201 CREATED AT response with a corresponding
/media/:longidentifier/filename.ext uri. For a folder-like upload, the
endpoint should return a uri corresponding to a manifest file, or any other
file deemed to be the most important. If there is none, it can return a link to
any file in the folder.
In order to ensure a unique identifier, it should be computed in the following way for an individual file:
sha-256( TIME_OF_UPLOAD + FILE_CONTENTS + AUTHOR_USERNAME + SECRET_SALT )
FILE_CONTENTS = PLANNED_FS_PATH + REAL_FILE_CONTENTS
For a folder-like, all file contents should be concatenated in order to compute the hash.
Folder-likes¶
A folder-like upload is just a grouping feature for the end user. Some actions require multiple files uploaded at once (case study: tasks). In this case it is convenient to provide access to said files in a consistent manner. One way to achieve this might be to use the same long identifier for the whole group, only leaving the filenames to differentiate them.
Warning
This method does not have any connection to a physical folder on a hard drive. It only lets multiple files to share the same long identifier. After that, they may have completely unrelated fs paths mapped to them.
Migration¶
Folder-likes are not incompatible with regular file uploads. They only add a check on the provided filename, which should be done anyways. For example if a regular file /id1/abc.txt is registered, the url /id1/def.txt should not resolve, even if it has a correct id. So, a folder upload of abc.txt def.txt and ghi.txt would look like this:
As a regular upload:
/id1/abc.txt
/id2/def.txt
/id3/ghi.txt
As a folder-like upload:
/id4/abc.txt
/id4/def.txt
/id4/ghi.txt
Mixing both styles is acceptable within the same database which makes migrating from one to another easy.
Task uploads¶
Warning
Consult with the main problemsetting doc problemsetting-label
Task file inspection is the main use for this feature, so it is necessary to discuss it in more depth. Not all files are exposed, there is a list of predetermined files and the uploader can also specify a set of public files. This also flattens all the exported files into one folder-like. Example:
task-folder
├── lib
│ ├── olymp.sty
│ ├── problem.tex
│ └── statements.ftl
├── private
├── problem.yaml
├── public
│ ├── change_log.txt
│ ├── favicon.ico
├── scripts
│ ├── build.sh
│ ├── check.sh
│ └── validator.sh
├── solutions
│ ├── ermolin.cpp
│ ├── starkov.cpp
│ └── useless.cpp
├── src
│ ├── check.py
│ ├── test_generator.py
│ ├── tron.tex
│ └── validator.cpp
├── statements
│ ├── eng
│ └── rus
├── tests
└── visualizer
├── eng
│ ├── visualizer-eng.css
│ ├── visualizer-eng.html
│ └── visualizer-eng.js
└── rus
├── visualizer-rus.css
├── visualizer-rus.html
└── visualizer-rus.js
File paths:
/taskid/statement-eng.pdf
/taskid/main.css
/taskid/main.js
/taskid/main.html
/taskid/change_log.txt
/taskid/favicon.ico
Endpoints¶
There is only one endpoint available
GET /media/:longid/:filename¶
Parameters¶ Parameter
Description
:longid64 hex chars
:filenamecomplies to ^[a-zA-Z0-9_-()[]{}!@#$%^&*:,.?]{3,}.[a-z]{1,16}$
Return codes. See error-response-label¶ Code
Description
200
Request successful, file in the body
301
This file is considered to be replaced by another
401
User must log in before accessing this file
403
user does not have sufficient permissions to access this file
404
File does not exist or the user does not have sufficient permissions
Examples:
GET /media/0bcee7d9d61e050c32682a527bcf721d0a05149dabf9bbca7e09fae6f1dd089b/main.c 200 OK #include<stdio.h> ... GET /media/63c90b247a3c836d678d2a4a5a2678c0e9df2bd44675fd0c035c028984847b11/statement.pdf 301 MOVED PERMANENTLY Location: https://ourhosting.com/media/805cb05bbe3102db8359564091bed769e0875aecc529787768663570cea75a14/statement.pdf GET /media/1b27cf60245964a7e5a7ce09030cc63d9909120e9d2391219e464469c6f2176a/answers.txt 404 NOT FOUND { "error": "No file with this identifier" }