Notebooks have been a staple in bringing developers and data scientists together and facilitating collaboration. By integrating code from different languages with human annotations -including text and graphs- on a single page, a notebook allows for high levels of interactivity and transparency. And now, thanks to our brand new image, setting up an open-source Jupyter Notebook on Leafcloud has never been easier.
The prominence of data has risen over the years, so developers need to work closely with data scientists and analysts. While many developers may know a thing or two about data, and many data scientists dabble in code, forming an efficient collaboration within a mixed team can still take a lot of work.
Notebooks solve this by integrating the code with computational output and annotations in the form of text or visual aids. They speed up workflows and make it much, much easier to share results in real time. One of the best and most popular is Jupyter Notebook, which is available on GitHub. Because it is open-source, it is free and easily integrated into any workflow. At Leafcloud, we warmly recommend that developers working on data projects have a look at Jupyter Notebook.
Setting up our Jupyter image
For convenience, Leafcloud offers a pre-built Jupyter image with Nvidia CUDA, TensorFlow, PyTorch pre-installed. Once you create a (free) account on Leafcloud, you can start by following these instructions. Also, don’t forget to check out our previous article on when to use Jupyter, and when not to.
One of the great things about Jupyter Notebook is that it's easy to set up and use for your specific data project, and the image available on Leafcloud makes it even easier. Let's demonstrate by applying data to change a picture of myself into a cow... sort of.
We'll be using Hugging Face, a data platform for Machine Learning projects. The main advantage of Hugging Face is that it offers pre-trained models which only need some tuning. Because it uses fewer computing resources, it reduces the carbon footprint and takes up less time. Its tools are also easy to use, allowing your data team to focus on what they are good at with minimal instruction.
First, we install the Hugging Face Transformers library, the APIs, and the tools needed to get the pre-trained models.
Next, we initialize the feature extractor and model, and put the model on the GPU.
We only feed the start of sequence (SOS) special token to the model and let it generate 32x32 = 1024 pixel values using the generate() method. Each pixel value is one of 512 possible color clusters.
Tokenize Cropped Images for Image Completion
Let’s see how ImageGPT will complete an image if we only give it the top half of a picture.
Now, let's check 8 completions for a given image, in this case the one of myself we had above.
Prepare the images using ImageGPTFeatureExtractor, which will resize each image to 32x32x3, normalize it and then apply color clustering. Finally, it will flatten the pixel values out to a long list of 32x32 = 1024 values.
Next, we only keep the first 512 tokens (pixel values).
We can visualize both the original (lower-resolution and color-clustered) images and the cropped ones below:
Conditional Image Completion
Now let ImageGPT fill in the rest. For this, we also add the start token. Note that we can leverage all possibilities of Hugging Face's generate() method, which are explained in detail in this blog post.
And now we can combine them into a single image:
A-MOO-sing, isn’t it?
Not all black and white
This illustrates what you can do with Jupyter Notebook in a pinch. Of course, most data scientists prefer to get actual work done instead of toying around. Whether it's data cleaning, transformation, simulations, analysis, visualization, modeling, machine learning, or deep learning, Jupyter Notebook offers a free and easy way to do it. We at Leafcloud will happily help you set up your data workflow.
So, have a go and check out our tutorials to get started!