Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Protein initialisation #317 #318

Merged
merged 50 commits into from
May 11, 2023
Merged
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
176d884
add PSW to nonstandard residues
a-r-j Apr 17, 2023
fa89a37
improve insertion and non-standard residue handling
a-r-j Apr 17, 2023
9855b9b
refactor chain selection
a-r-j Apr 17, 2023
f143719
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2023
3f3b3d9
remove unused verbosity arg
a-r-j Apr 17, 2023
09f05e5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2023
b7475df
fix chain selection in tests
a-r-j Apr 17, 2023
2e0a371
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 17, 2023
d2c1808
fix chain selection in tutorial notebook
a-r-j Apr 17, 2023
fc332c6
fix notebook chain selection
a-r-j Apr 17, 2023
4a67851
fix chain selection typehint
a-r-j Apr 17, 2023
5f648d2
Update changelog
a-r-j Apr 17, 2023
ab26d78
Add NLW to non-standard residues
a-r-j Apr 17, 2023
a449bba
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 17, 2023
afc0f8b
add .ent support
a-r-j Apr 20, 2023
258c94d
add entry for construction from dataframe
a-r-j Apr 20, 2023
c9856ae
add missing stage arg
a-r-j Apr 20, 2023
9e1191a
improve obsolete mapping retrieving to include entries with no replac…
a-r-j Apr 20, 2023
17c38ab
Merge branch 'master' into tensor_fixes
a-r-j Apr 20, 2023
7bf4ff3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 20, 2023
5af9e06
update changelog
a-r-j Apr 21, 2023
e00bdfb
add transforms to foldcomp datasets
a-r-j Apr 22, 2023
31018bc
fix jaxtyping syntax
a-r-j Apr 25, 2023
6e26455
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j Apr 25, 2023
3681714
Merge branch 'master' into tensor_fixes
a-r-j Apr 27, 2023
adbdbe1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 27, 2023
50ac31b
Update changelog
a-r-j Apr 27, 2023
088ae02
fix double application of transforms
a-r-j Apr 27, 2023
fb684af
improve foldcomp data loading performance
a-r-j May 1, 2023
a543a75
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 1, 2023
a00e2be
Merge branch 'master' into tensor_fixes
a-r-j May 1, 2023
ccf0437
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 1, 2023
7939a82
remove unused imports
a-r-j May 1, 2023
d72abf9
remove unused imports
a-r-j May 1, 2023
8b551c7
linting
a-r-j May 1, 2023
86bedcf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 1, 2023
685d3db
Update changelog
a-r-j May 1, 2023
bebc3c4
add B factors to FC parsing output
a-r-j May 2, 2023
c973422
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 2, 2023
828af29
bugfix to alpha & kappa angle embedding
a-r-j May 7, 2023
c986df0
Merge branch 'master' into tensor_fixes
a-r-j May 7, 2023
6c48878
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2023
fc7657e
update changelog
a-r-j May 7, 2023
7192613
handle selenocysteine in sidechain torsion angle computation
a-r-j May 10, 2023
6a31729
Merge branch 'tensor_fixes' of https://www.github.com/a-r-j/graphein …
a-r-j May 10, 2023
84fc3e4
fix protein data object initialisation #317
a-r-j May 10, 2023
9dcc1c7
Merge branch 'master' into protein_obj
a-r-j May 10, 2023
f5d1f26
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 10, 2023
6269d25
restore eq dunder
a-r-j May 10, 2023
d96d60f
update changelog
a-r-j May 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
refactor chain selection
  • Loading branch information
a-r-j committed Apr 17, 2023
commit 9855b9bd8bf8ad4d5344c0b4f85e8e213f761ca6
48 changes: 22 additions & 26 deletions graphein/protein/graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,14 @@
import networkx as nx
import numpy as np
import pandas as pd
from Bio.PDB.Polypeptide import three_to_one
from biopandas.mmtf import PandasMmtf
from biopandas.pdb import PandasPdb
from loguru import logger as log
from rich.progress import Progress
from tqdm.contrib.concurrent import process_map
from typing_extensions import Literal

from graphein.protein.config import (
DSSPConfig,
GetContactsConfig,
ProteinGraphConfig,
)
from graphein.protein.config import GetContactsConfig, ProteinGraphConfig
from graphein.protein.edges.distance import (
add_distance_to_edges,
compute_distmat,
Expand Down Expand Up @@ -103,12 +98,12 @@ def read_pdb_to_dataframe(
if path is not None:
if isinstance(path, Path):
path = os.fsdecode(path)
if path.endswith(".pdb"):
if path.endswith(".pdb") or path.endswith(".pdb.gz"):
atomic_df = PandasPdb().read_pdb(path)
elif path.endswith(".mmtf"):
elif path.endswith(".mmtf") or path.endswith(".mmtf.gz"):
atomic_df = PandasMmtf().read_mmtf(path)
else:
raise ValueError("File must be either .pdb or .mmtf")
raise ValueError(f"File {path} must be either .pdb or .mmtf not")
elif uniprot_id is not None:
atomic_df = PandasPdb().fetch_pdb(
uniprot_id=uniprot_id, source="alphafold2-v3"
Expand Down Expand Up @@ -293,13 +288,10 @@ def remove_insertions(
)
df = df[~duplicates]

# Catches explicit insertions
df = filter_dataframe(
return filter_dataframe(
df, by_column="insertion", list_of_values=[""], boolean=True
)

return df


def filter_hetatms(
df: pd.DataFrame, keep_hets: List[str]
Expand Down Expand Up @@ -459,7 +451,8 @@ def sort_dataframe(df: pd.DataFrame) -> pd.DataFrame:


def select_chains(
protein_df: pd.DataFrame, chain_selection: str, verbose: bool = False
protein_df: pd.DataFrame,
chain_selection: Union[str, List[str]],
) -> pd.DataFrame:
"""
Extracts relevant chains from ``protein_df``.
Expand All @@ -468,19 +461,22 @@ def select_chains(
(``CA``, ``CB``).
:type protein_df: pd.DataFrame
:param chain_selection: Specifies chains that should be extracted from
the larger complexed structure.
:type chain_selection: str
:param verbose: Print dataframe?
:type verbose: bool
the larger complexed structure. If chain_selection is ``"all"``, all
chains will be selected. Otherwise, provide a list of strings.
:type chain_selection: Union[str, List[str]]
:return: Protein structure dataframe containing only entries in the
chain selection.
:rtype: pd.DataFrame
"""
if chain_selection != "all":
if isinstance(chain_selection, str):
raise ValueError(
"Only 'all' is a valid string for chain selection. Otherwise use a list of strings: e.g. ['A', 'B', 'C']"
)
protein_df = filter_dataframe(
protein_df,
by_column="chain_id",
list_of_values=list(chain_selection),
list_of_values=chain_selection,
boolean=True,
)

Expand Down Expand Up @@ -590,7 +586,7 @@ def add_nodes_to_graph(
# If no protein dataframe is supplied, use the one stored in the Graph
# object
if protein_df is None:
protein_df = G.graph["pdb_df"]
protein_df: pd.DataFrame = G.graph["pdb_df"]
# Assign intrinsic node attributes
chain_id = protein_df["chain_id"].apply(str)
residue_name = protein_df["residue_name"]
Expand Down Expand Up @@ -690,7 +686,7 @@ def construct_graph(
uniprot_id: Optional[str] = None,
pdb_code: Optional[str] = None,
df: Optional[pd.DataFrame] = None,
chain_selection: str = "all",
chain_selection: Union[str, List[str]] = "all",
model_index: int = 1,
df_processing_funcs: Optional[List[Callable]] = None,
edge_construction_funcs: Optional[List[Callable]] = None,
Expand Down Expand Up @@ -727,8 +723,8 @@ def construct_graph(
:param df: Pandas dataframe containing ATOM data to build graph from.
Default is ``None``.
:type df: pd.DataFrame, optional
:param chain_selection: String of polypeptide chains to include in graph.
E.g ``"ABDF"`` or ``"all"``. Default is ``"all"``.
:param chain_selection: List of strings denoting polypeptide chains to
include in graph. E.g ``["A", "B", "D", "F"]`` or ``"all"``. Default is ``"all"``.
:type chain_selection: str
:param model_index: Index of model to use in the case of structural
ensembles. Default is ``1``.
Expand Down Expand Up @@ -923,7 +919,7 @@ def construct_graphs_mp(
pdb_code_it: Optional[List[str]] = None,
path_it: Optional[List[str]] = None,
uniprot_id_it: Optional[List[str]] = None,
chain_selections: Optional[List[str]] = None,
chain_selections: Optional[Union[List[List[str]], List[str]]] = None,
model_indices: Optional[List[str]] = None,
config: ProteinGraphConfig = ProteinGraphConfig(),
num_cores: int = 16,
Expand All @@ -940,7 +936,7 @@ def construct_graphs_mp(
construction.
:type path_it: Optional[List[str]], defaults to ``None``
:param chain_selections: List of chains to select from the protein
structures (e.g. ``["ABC", "A", "L", "CD"...]``).
structures (e.g. ``[["A", "B" "C"], ["A"], ["L"], ["C", "D"]...]``).
:type chain_selections: Optional[List[str]], defaults to ``None``
:param model_indices: List of model indices to use for protein graph
construction. Only relevant for structures containing ensembles of
Expand Down Expand Up @@ -1167,7 +1163,7 @@ def compute_secondary_structure_graph(
ss_list.append(d["ss"])

# Number SS elements
ss_list = pd.Series(number_groups_of_runs(ss_list))
ss_list: pd.Series = pd.Series(number_groups_of_runs(ss_list))
ss_list.index = list(g.nodes())

# Remove unstructured elements if necessary
Expand Down