-
Notifications
You must be signed in to change notification settings - Fork 11
V0.5 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Working well, tested locally but extremely ugly -> @Nastia39 need your suggestions to cleanup everything |
sasha-scale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically it looks pretty good to me, I haven't had a chance to check out and test yet, but I will absolutely do so later today
sasha-scale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, was pretty intuitive to figure out.
An additional improvement is that for items upload the user must convert to json to see the result:
response.json()
Whereas for annotations/predictions you simply do print(response). It would be great to standardize this.
Will also have to think about how to communicate to existing users that they need to install this update
nucleus/dataset_item.py
Outdated
|
|
||
| class DatasetItem: | ||
|
|
||
| def __init__(self, image_location: str, reference_id: str, metadata: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should make metadata field optional. Currently not optional because it doesn't have a default value
nucleus/dataset_item.py
Outdated
| "image_url": self.image_url, | ||
| "reference_id": self.reference_id, | ||
| "metadata": self.metadata | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit need additional newline at bottom of file
nucleus/dataset_item.py
Outdated
| path_components = path.split('/') | ||
| return not ('https:' in path_components or 'http:' in path_components or 's3:' in path_components) | ||
|
|
||
| def _local_file_exists(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add a str or repr method so that when a user prints(DatasetItem) they get a nice string representation instead of <nucleus.dataset_item.DatasetItem object at 0x11b016748>
I think just doing:
def repr(self):
return str(self.to_payload)
would be sufficient
nucleus/model.py
Outdated
| self.metadata = metadata | ||
| self._client = client | ||
|
|
||
| def create_run(self, name: str, metadata: dict, dataset: Dataset, predictions: List[BoxPrediction]) -> ModelRun: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob can delete this method?
nucleus/model_run.py
Outdated
| return self._client.commit_model_run(self.model_run_id, payload) | ||
|
|
||
| def predict(self, payload: dict) -> dict: | ||
| def predict(self, annotations: List[Any]) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikanth-scale FYI for docs, updated this method to take list of 2DBoxPredictions instead of raw JSON payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations: List[Union[Box2DPrediction, PolygonPrediction]]
nucleus/dataset_item.py
Outdated
| def _is_local_path(self, path: str) -> bool: | ||
| path_components = [comp.lower() for comp in path.split("/")] | ||
| return not ( | ||
| "https:" in path_components | ||
| or "http:" in path_components | ||
| or "s3:" in path_components | ||
| ) | ||
|
|
||
| def _local_file_exists(self, path: str): | ||
| return os.path.isfile(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, there methods should be moved to utils file as there aren't a part of DatasetItem abstraction
nucleus/model.py
Outdated
|
|
||
| class Model: | ||
|
|
||
| def __init__(self, model_id: str, name: str, reference_id: str, metadata: dict, client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata: Optional[dict]
nucleus/model_run.py
Outdated
| return self._client.commit_model_run(self.model_run_id, payload) | ||
|
|
||
| def predict(self, payload: dict) -> dict: | ||
| def predict(self, annotations: List[Any]) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations: List[Union[Box2DPrediction, PolygonPrediction]]
nucleus/dataset.py
Outdated
|
|
||
| def annotate(self, payload: dict) -> dict: | ||
| def annotate( | ||
| self, annotations: List[BoxAnnotation], batch_size=20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: batch_size: int = 20
nucleus/dataset.py
Outdated
| return self._client.dataset_info(self.dataset_id) | ||
| return self._client.dataset_info(self.id) | ||
|
|
||
| def create_model_run(self, payload: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this method to a new style as well
nucleus/annotation.py
Outdated
| y: int, | ||
| width: int, | ||
| height: int, | ||
| metadata: dict = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata: Optional[dict] = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add class for PolygonAnnotation
| "reference_id": str, | ||
| "metadata": Dict[str, Any], | ||
| } | ||
| :return: { "model_id": str } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upd docs
jihan-yin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits
| ```python | ||
| datasetItem1 = {"image_url": "http://<my_image_url>", "reference_id": "my_image_name.jpg", | ||
| "metadata": {"label": "0"}} | ||
| dataset_item_1 = DatasetItem(image_location="./1.jpeg", reference_id="1", metadata={"key": "value"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still explain or at least point out the three necessary fields necessary for a DatasetItem
| Creates a new dataset based on payload params: | ||
| name -- A human-readable name of the dataset. | ||
| Returns a response with internal id and name for a new dataset. | ||
| :param payload: { "name": str } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update comment
| def populate_dataset( | ||
| self, | ||
| dataset_id: str, | ||
| payload: dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about updating the function doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused though on why we were using payload in the first place, instead of just passing a bunch of kwargs, especially in this case where we have both kwargs and payload. The usage seems kinda inconsistent too across this file
| item_id: str = None, | ||
| metadata: Optional[Dict] = None, | ||
| ): | ||
| if bool(reference_id) == bool(item_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a case of both reference and item id being populated? In which case the exception error may not be as clear. Also why don't we just check if either is equal to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As things stand now we actually require that exactly one of these fields be defined at time of upload. I agree this is pretty suboptimal, I think in the future we should remove this and implement precendence between the two ids instead :)
i.e. reference_id trumps dataset_item_id
|
|
||
| model_run.predict(predictions) | ||
|
|
||
| return model_run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we set the new model_run as a prop of the model class? Or at least have a way for the model instance to track the different runs that have been committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that adding a model_run as a property will be a bit misleading, because we don't automatically pull in all of the model_runs associated with this model (other runs could have been created previously). I think the problem you're pointing out here is super valid though: how does a user fetch all model_runs associated with a model? In the future we should add API endpoints for:
- get_model
- delete model
- get_model_runs for model
| def __init__( | ||
| self, | ||
| label: str, | ||
| vertices: List[Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not have a type for generic 2d points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet lol, polygon annotation was a bit of an afterthought on this PR, unfortunately. Left it as a TODO for the future :) good suggestion!
sasha-scale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ship it.
nucleus/model_run.py
Outdated
| def iloc(self, i: int) -> dict: | ||
| def iloc( | ||
| self, i: int | ||
| ): # TODO -> List[Union[BoxPrediction, PolygonPrediction]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to fix typing issue, has to be removed
nucleus/model_run.py
Outdated
| def refloc(self, reference_id: str) -> dict: | ||
| def refloc( | ||
| self, reference_id: str | ||
| ): # TODO -> List[Union[BoxPrediction, PolygonPrediction]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to fix typing issue, has to be removed [2]
Working on new python API please suggest docs/improvements/cleanups
ty