If you’re having a ‘play’, for want of a better word, in uploading data into object storage to use with something like ADW then you’re in luck – I thought I would put together a few things in a short series of blogs which would help make that easier.
This will hopefully solve a few issues, and ultimately stop you having to use your home WiFi and laptop as a middle man when moving data about. You can’t upload a 10GB CSV file while also watching Love Island.
First you need a VM in OCI; creating this is pretty straightforward so I’m going to skip that bit. The only thing I would say is that if you create a VCN with the default settings, remove the default inbound security rules for SSH and PING from 0.0.0.0/0 i.e. the known universe. Instead replace with the IP from ‘whatismyip.com’ or your company corporate IP (assuming this is just for testing with and not using VPNs).
Once you have the VM with a public subnet and IP and you can access it, to quickly download data from something like Kaggle I would use the Chrome download extensions. These translate the URL on your laptop to a curl or wget command which can be copied and pasted onto the VM and executed.
Copy and paste as is the to VM and it’s done in 7 seconds…Oracle has much better download speed than my WiFi.
So now you have the data on the VM, you want to upload it to object storage. When you create buckets in OCI it’s in a specific compartment and region, and more importantly it’s private.
Now there are a few ways to upload that data. If you’re trying to get it done quickly you might try the following; making the bucket public and using admin credentials for the user, and just throwing the data up there. Unfortunately, that’s probably not a great idea. It’s quite easy to be slightly more secure than that, without too much extra hassle.
So instead let’s create a specific user in a specific group with a specific policy which can be used from the VM to upload data. Create a group called ObjectAdmins, then create a policy called ObjectPolicy and add in the two lines. These are the policy statements which will be evaluated when the user is active, set up like this:
Allow group ObjectAdmins to manage buckets in compartment dsp-oac-adw
Allow group ObjectAdmins to manage objects in compartment dsp-oac-adw
This is still quite generic and you could get a lot more granular (maybe that’s the next iteration of this blog). Next we create a user and add that user to the group ObjectAdmins. That user doesn’t have any privileges granted directly; only the ones based on its group membership. Also, that user can manage buckets but it can’t do anything else. We now have a policy and a user. As we are soon going to be uploading this data from the VM we downloaded it to, we need to install and configure the OCI CLI for the specific user we just created.
Simply run the setup config and put in a few details (you will need your tenancy ID and the new user ID from the console); then take the public key highlighted in RED and upload that under API keys for the user you just created.
[opc@vm1oac ~]$ oci setup config
Enter a location for your config [/home/opc/.oci/config]:
Enter a user OCID: ocid1.user.oc1..XXXXX (Your object storage user)
Enter a tenancy OCID: ocid1.tenancy.oc1..XXXXX
Enter a region (e.g. ca-toronto-1, eu-frankfurt-1, uk-london-1, us-ashburn-1, us-gov-ashburn-1, us-gov-chicago-1, us-gov-phoenix-1, us-langley-1, us-luke-1, us-phoenix-1): uk-london-1
Do you want to generate a new RSA key pair? (If you decline you will be asked to supply the path to an existing key.) [Y/n]: Y
Enter a directory for your keys to be created [/home/opc/.oci]:
Enter a name for your key [oci_api_key]:
File /home/opc/.oci/oci_api_key_public.pem already exists, do you want to overwrite? [y/N]: y
Public key written to: /home/opc/.oci/oci_api_key_public.pem
…
…
Finally you can run the OCI command to upload the data to object storage..
[opc@vm1oac ~]$ oci os object put -bn oacadw --file /home/opc/orders.csv --name orders.csv --no-multipart
This doesn’t work…the reason being that it can’t access the bucket. To enable access to the bucket I have two options; make the bucket public and upload it, or keep it private and use a storage gateway. I will opt for the second one. Creating the storage gateway is simple and we can then add a route to the route table.
Here we can see we have two routes; one for IGW and now one for the SGW for OCI London Object Storage.
Now we can upload the data to object storage using a specific user and a storage gateway, and we don’t have to download tons of data to your laptop and back into the Cloud.
[opc@vm1oac ~]$ oci os object put -bn oacadw --file /home/opc/orders.csv --name orders.csv --no-multipart
Uploading object [####################################] 100%
Read on to Part 2: Exploring ADW Part 2: Loading Data into ADW