How to clear notebook outputs without editing them

  • Updated

We generally recommend saving notebooks without large amounts of data in outputs. In addition to preventing users from opening notebooks in preview mode, large notebooks can infrequently trigger egress alerts that require follow-up from our security team. Users may rarely encounter a situation where even opening a notebook for editing triggers an egress alert due to the size of outputs, making it difficult to easily remove these outputs from them. 

 

In case this happens with one of your notebooks, the following protocols and code can be used to create a duplicate notebook in the workspace without outputs. One method provides instructions through notebooks, while the other uses the workspace terminal. 

Method 1: Notebooks

1. Start by making a new Python notebook in the workspace with the large notebook. It can be named anything.

 

2. Copy and paste the following code in the notebook. 

 

import sys

import io

import os

import argparse

import nbformat

 

def remove_outputs(nb):

    for cell in nb.cells:

        if cell.cell_type == 'code':

            cell.outputs = []

 

def clear_notebook(old_ipynb, new_ipynb):

    with io.open(old_ipynb, 'r') as f:

        nb = nbformat.read(f, nbformat.NO_CONVERT)

 

    remove_outputs(nb)

 

    with io.open(new_ipynb, 'w', encoding='utf8') as f: 

        nbformat.write(nb, f, nbformat.NO_CONVERT)

 

#change notebook name here

#old_ipynb is the name of the large notebook

old_ipynb = "test.ipynb"

new_ipynb = "new.ipynb"

 

clear_notebook(old_ipynb, new_ipynb)

 

Note: it must be formatted identically to the image shown below for it to work effectively.

Picture1.png

 

3. Change the name of the notebook listed in “old_ipynb = ‘test.ipynb’” to the name of the large notebook you’d like to duplicate without outputs. You can find the name of the workspace by clicking the notebook in your workspace and checking the URL; it will be the last part as shown below.

 

Picture2.png

Picture3.png

 

4. You can also change the name of the new duplicated notebook (“new.ipynb”), but this isn’t required unless you already have an existing notebook called “new.ipynb”. If this is the case, then please change it so it’s not overwritten.

 

5. Run the code by going to Cell > Run All. 

Picture4.png

 

6. After running the code, you should get a new notebook called “new.ipynb“. If you can’t find the notebook in your workspace-analysis page, please go to “File-open“ to find and save it as a copy (new_5.ipynb) after double clicking to open it. This is shown below. 

Picture5.png

Picture12.png

Picture7.png

Picture8.png

Picture17.png

Picture18.png

 

7. After this step you should be able to find the new notebook (new5.ipynb) just saved in your workspace.

Method 2: Workspace Terminal

1. Login to the RW and open the workspace of interest


2. Create Cloud Analysis Environment


3. Open Cloud Analysis Terminal by clicking on and copy the URL from browser


a. Copy the workspace_name from URL:
Example: https://.../workspaces/aou-rw-xxxxxxx/wsname/terminals


4. In the terminal window run the following commands


a. Change directory to workspace
$ cd workspaces/wsname/


b. List contents of workspace in long format. The .ipynb files here are copied from your WORKSPACE_BUCKET
$ ls -l


c. Find your workspace bucket and copy the location.
$ echo $WORKSPACE_BUCKET
Example gs://fc-secure-……….


d. List all notebooks in the bucket in long format. Note the very large ipynb file you wish to clear output cells from (Note all outputs are lost)
$ gsutil ls -l gs://fc-secure-…/notebooks
Example: 98765432 2022-10-25T12:15:41Z gs://fc-secure-…/notebooks/my_very_large.ipynb


e. Copy the my_very_large.ipynb file from workspace bucket to local
$ gsutil cp gs://fc-secure-…/notebooks/my_very_large.ipynb .


f. Verify the large file is copied
$ ls -l my_very_large.ipynb


g. Remove all output cells from my_very_large.ipynb and create a new file my_no_output.ipynb
$ jupyter nbconvert my_very_large.ipynb --to notebook \
--ClearOutputPreprocessor.enabled=True --output my_no_output

 

h. Verify the my_no_output.ipynb file is created and check its size is much smaller than the original my_very_large.ipynb file
$ ls -l


i. Copy the new my_no_output.ipynb file to your bucket
$ gsutil cp my_no_output.ipynb gs://fc-secure-…/notebooks


j. Verify the new my_no_output.ipynb is in your bucket
gsutil ls -l gs://fc-secure-…/notebooks


k. Navigate to workspaces -> wsname -> ANALYSIS and check that the new file my_no_output exists


l. Open the new notebook file and fix the lines that generated the copious output


m. Delete the old my_very_large.ipynb notebook

 

 

 

 

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.