Skip to content

Conversation

@srikanth-scale
Copy link
Contributor

Working on new python API please suggest docs/improvements/cleanups

ty

@srikanth-scale
Copy link
Contributor Author

Working well, tested locally but extremely ugly -> @Nastia39 need your suggestions to cleanup everything

Copy link
Contributor

@sasha-scale sasha-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically it looks pretty good to me, I haven't had a chance to check out and test yet, but I will absolutely do so later today

Copy link
Contributor

@sasha-scale sasha-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, was pretty intuitive to figure out.

An additional improvement is that for items upload the user must convert to json to see the result:

response.json()

Whereas for annotations/predictions you simply do print(response). It would be great to standardize this.

Will also have to think about how to communicate to existing users that they need to install this update


class DatasetItem:

def __init__(self, image_location: str, reference_id: str, metadata: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should make metadata field optional. Currently not optional because it doesn't have a default value

"image_url": self.image_url,
"reference_id": self.reference_id,
"metadata": self.metadata
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit need additional newline at bottom of file

path_components = path.split('/')
return not ('https:' in path_components or 'http:' in path_components or 's3:' in path_components)

def _local_file_exists(self, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a str or repr method so that when a user prints(DatasetItem) they get a nice string representation instead of <nucleus.dataset_item.DatasetItem object at 0x11b016748>

I think just doing:
def repr(self):
return str(self.to_payload)

would be sufficient

nucleus/model.py Outdated
self.metadata = metadata
self._client = client

def create_run(self, name: str, metadata: dict, dataset: Dataset, predictions: List[BoxPrediction]) -> ModelRun:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob can delete this method?

return self._client.commit_model_run(self.model_run_id, payload)

def predict(self, payload: dict) -> dict:
def predict(self, annotations: List[Any]) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanth-scale FYI for docs, updated this method to take list of 2DBoxPredictions instead of raw JSON payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations: List[Union[Box2DPrediction, PolygonPrediction]]

Comment on lines 38 to 47
def _is_local_path(self, path: str) -> bool:
path_components = [comp.lower() for comp in path.split("/")]
return not (
"https:" in path_components
or "http:" in path_components
or "s3:" in path_components
)

def _local_file_exists(self, path: str):
return os.path.isfile(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, there methods should be moved to utils file as there aren't a part of DatasetItem abstraction

nucleus/model.py Outdated

class Model:

def __init__(self, model_id: str, name: str, reference_id: str, metadata: dict, client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata: Optional[dict]

return self._client.commit_model_run(self.model_run_id, payload)

def predict(self, payload: dict) -> dict:
def predict(self, annotations: List[Any]) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations: List[Union[Box2DPrediction, PolygonPrediction]]


def annotate(self, payload: dict) -> dict:
def annotate(
self, annotations: List[BoxAnnotation], batch_size=20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: batch_size: int = 20

return self._client.dataset_info(self.dataset_id)
return self._client.dataset_info(self.id)

def create_model_run(self, payload: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this method to a new style as well

y: int,
width: int,
height: int,
metadata: dict = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata: Optional[dict] = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add class for PolygonAnnotation

"reference_id": str,
"metadata": Dict[str, Any],
}
:return: { "model_id": str }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd docs

@sasha-scale sasha-scale requested a review from jihan-yin February 4, 2021 21:37
Copy link
Contributor

@jihan-yin jihan-yin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits

```python
datasetItem1 = {"image_url": "http://<my_image_url>", "reference_id": "my_image_name.jpg",
"metadata": {"label": "0"}}
dataset_item_1 = DatasetItem(image_location="./1.jpeg", reference_id="1", metadata={"key": "value"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still explain or at least point out the three necessary fields necessary for a DatasetItem

Creates a new dataset based on payload params:
name -- A human-readable name of the dataset.
Returns a response with internal id and name for a new dataset.
:param payload: { "name": str }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update comment

def populate_dataset(
self,
dataset_id: str,
payload: dict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing about updating the function doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused though on why we were using payload in the first place, instead of just passing a bunch of kwargs, especially in this case where we have both kwargs and payload. The usage seems kinda inconsistent too across this file

item_id: str = None,
metadata: Optional[Dict] = None,
):
if bool(reference_id) == bool(item_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a case of both reference and item id being populated? In which case the exception error may not be as clear. Also why don't we just check if either is equal to None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As things stand now we actually require that exactly one of these fields be defined at time of upload. I agree this is pretty suboptimal, I think in the future we should remove this and implement precendence between the two ids instead :)

i.e. reference_id trumps dataset_item_id


model_run.predict(predictions)

return model_run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we set the new model_run as a prop of the model class? Or at least have a way for the model instance to track the different runs that have been committed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that adding a model_run as a property will be a bit misleading, because we don't automatically pull in all of the model_runs associated with this model (other runs could have been created previously). I think the problem you're pointing out here is super valid though: how does a user fetch all model_runs associated with a model? In the future we should add API endpoints for:

  • get_model
  • delete model
  • get_model_runs for model

def __init__(
self,
label: str,
vertices: List[Any],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have a type for generic 2d points?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet lol, polygon annotation was a bit of an afterthought on this PR, unfortunately. Left it as a TODO for the future :) good suggestion!

Copy link
Contributor

@sasha-scale sasha-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship it.

def iloc(self, i: int) -> dict:
def iloc(
self, i: int
): # TODO -> List[Union[BoxPrediction, PolygonPrediction]]:
Copy link
Contributor

@Nastia39 Nastia39 Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to fix typing issue, has to be removed

def refloc(self, reference_id: str) -> dict:
def refloc(
self, reference_id: str
): # TODO -> List[Union[BoxPrediction, PolygonPrediction]]:
Copy link
Contributor

@Nastia39 Nastia39 Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to fix typing issue, has to be removed [2]

@Nastia39 Nastia39 merged commit ecda3c9 into master Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants