Help for package glyparse

Title:

Parsing Glycan Structure Text Representations

Version:

0.5.6

Description:

Provides functions to parse glycan structure text representations into 'glyrepr' glycan structures. Currently, it supports StrucGP-style, pGlyco-style, IUPAC-condensed, IUPAC-extended, IUPAC-short, WURCS, Linear Code, and GlycoCT format. It also provides an automatic parser to detect the format and parse the structure string.

License:

MIT + file LICENSE

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.3.3

URL:

https://glycoverse.github.io/glyparse/, https://github.com/glycoverse/glyparse

Imports:

checkmate, cli, dplyr, glyrepr (≥ 0.9.0), igraph, purrr, rlang, rstackdeque, stringr, vctrs

Depends:

R (≥ 4.1)

VignetteBuilder:

knitr

BugReports:

https://github.com/glycoverse/glyparse/issues

NeedsCompilation:

Packaged:

2026-02-24 07:13:19 UTC; fubin

Author:

Bin Fu

[aut, cre, cph]

Maintainer:

Bin Fu <23110220018@m.fudan.edu.cn>

Repository:

CRAN

Date/Publication:

2026-02-24 07:30:02 UTC

Automatic Structure Parsing

Description

Detect the structure string type and use the appropriate parser to parse automatically. Mixed types are supported.

Supported types:

GlycoCT
IUPAC-condensed
IUPAC-extended
IUPAC-short
WURCS
Linear Code
pGlyco
StrucGP

Usage

auto_parse(x)

Arguments

x

A character vector of structure strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

# Single structure
x <- "Gal(b1-3)GlcNAc(b1-4)Glc(a1-"  # IUPAC-condensed
auto_parse(x)

# Mixed types
x <- c(
  "Gal(b1-3)GlcNAc(b1-4)Glc(a1-",  # IUPAC-condensed
  "Neu5Aca3Gala3(Fuca6)GlcNAcb-"  # IUPAC-short
)
auto_parse(x)

Parse GlycoCT Structures

Description

This function parses GlycoCT strings into a glyrepr::glycan_structure(). GlycoCT is a format used by databases like GlyTouCan and GlyGen.

Usage

parse_glycoct(x)

Arguments

x

A character vector of GlycoCT strings. NA values are allowed and will be returned as NA structures.

Details

GlycoCT format consists of two parts:

RES: Contains monosaccharides (lines starting with 'b:') and substituents (lines starting with 's:')
LIN: Contains linkage information between residues

For more information about GlycoCT format, see the glycoct.md documentation.

Value

A glyrepr::glycan_structure() object.

Examples

glycoct <- paste0(
  "RES\n",
  "1b:a-dgal-HEX-1:5\n",
  "2s:n-acetyl\n",
  "3b:b-dgal-HEX-1:5\n",
  "LIN\n",
  "1:1d(2+1)2n\n",
  "2:1o(3+1)3d"
)
parse_glycoct(glycoct)

Parse IUPAC-condensed Structures

Description

This function parses IUPAC-condensed strings into a glyrepr::glycan_structure(). For more information about IUPAC-condensed notation, see doi:10.1351/pac199668101919.

Usage

parse_iupac_condensed(x)

Arguments

x

A character vector of IUPAC-condensed strings. NA values are allowed and will be returned as NA structures.

Details

The IUPAC-condensed notation is a compact form of IUPAC-extended notation. It is used by the GlyConnect database. It contains the following information:

Monosaccharide name, e.g. "Gal", "GlcNAc", "Neu5Ac".
Substituent, e.g. "9Ac", "4Ac", "3Me", "?S".
Linkage, e.g. "b1-3", "a1-2", "a1-?".

An example of IUPAC-condensed string is "Gal(b1-3)GlcNAc(b1-4)Glc(a1-".

The reducing-end monosaccharide can be with or without anomer information. For example, the two strings below are all valid:

"Neu5Ac(a2-"
"Neu5Ac"

In the first case, the anomer is "a2". In the second case, the anomer is "?2".

Value

A glyrepr::glycan_structure() object.

Examples

iupac <- "Gal(b1-3)GlcNAc(b1-4)Glc(a1-"
parse_iupac_condensed(iupac)

Parse IUPAC-extended Structures

Description

Parse IUPAC-extended-style structure characters into a glyrepr::glycan_structure(). For more information about IUPAC-extended format, see doi:10.1351/pac199668101919.

Usage

parse_iupac_extended(x)

Arguments

x

A character vector of IUPAC-extended strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

iupac <- "\u03b2-D-Galp-(1\u21923)-\u03b1-D-GalpNAc-(1\u2192"
parse_iupac_extended(iupac)

Parse IUPAC-short Structures

Description

Parse IUPAC-short-style structure characters into a glyrepr::glycan_structure(). For more information about IUPAC-short format, see doi:10.1351/pac199668101919.

Usage

parse_iupac_short(x)

Arguments

x

A character vector of IUPAC-short strings. NA values are allowed and will be returned as NA structures.

Details

The IUPAC-short notation is a compact form of IUPAC-condensed notation. It is rarely used in database, but appears a lot in literature for its conciseness. Compared with IUPAC-condensed notation, IUPAC-short notation ignore the anomer positions, assuming they are known for common monosaccharides. For example, "Neu5Aca3Gala-" assumes the anomer of Neu5Ac is C2 (a2-3 linked). Also, the parentheses around linkages are omitted, and parentheses are used to indicate branching, e.g. "Neu5Aca3Gala3(Fuca3)GlcNAcb-".

In the first case, the anomer is "a2". In the second case, the anomer is "?2".

Value

A glyrepr::glycan_structure() object.

Examples

iupac <- "Neu5Aca3Gala3(Fuca6)GlcNAcb-"
parse_iupac_short(iupac)

Parse Linear Code Structures

Description

Parse Linear Code structures into a glyrepr::glycan_structure(). To know more about Linear Code, see this article.

Usage

parse_linear_code(x)

Arguments

x

A character vector of Linear Code strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

linear_code <- "Ma3(Ma6)Mb4GNb4GNb"
parse_linear_code(linear_code)

Parse pGlyco Structures

Description

Parse pGlyco-style structure characters into a glyrepr::glycan_structure(). See example below for the structure format.

Usage

parse_pglyco_struc(x)

Arguments

x

A character vector of pGlyco-style structure strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

glycan <- parse_pglyco_struc("(N(F)(N(H(H(N))(H(N(H))))))")
print(glycan, verbose = TRUE)

Parse StrucGP Structures

Description

Parse StrucGP-style structure characters into a glyrepr::glycan_structure(). See example below for the structure format.

Usage

parse_strucgp_struc(x)

Arguments

x

A character vector of StrucGP-style structure strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

glycan <- parse_strucgp_struc("A2B2C1D1E2F1fedD1E2edcbB5ba")
print(glycan, verbose = TRUE)

Parse WURCS Structures

Description

This function parses WURCS strings into a glyrepr::glycan_structure(). Currently, only WURCS 2.0 is supported. For more information about WURCS, see WURCS.

Usage

parse_wurcs(x)

Arguments

x

A character vector of WURCS strings. NA values are allowed and will be returned as NA structures.

Value

A glyrepr::glycan_structure() object.

Examples

wurcs <- paste0(
  "WURCS=2.0/3,5,4/",
  "[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5]/",
  "1-1-2-3-3/a4-b1_b4-c1_c3-d1_c6-e1"
)
parse_wurcs(wurcs)

Automatic Structure Parsing

Description

Usage

Arguments

Value

Examples

Parse GlycoCT Structures

Description

Usage

Arguments

Details

Value

Examples

Parse IUPAC-condensed Structures

Description

Usage

Arguments

Details

Value

See Also

Examples

Parse IUPAC-extended Structures

Description

Usage

Arguments

Value

See Also

Examples

Parse IUPAC-short Structures

Description

Usage

Arguments

Details

Value

See Also

Examples

Parse Linear Code Structures

Description

Usage

Arguments

Value

Examples

Parse pGlyco Structures

Description

Usage

Arguments

Value

Examples

Parse StrucGP Structures

Description

Usage

Arguments

Value

Examples

Parse WURCS Structures

Description

Usage

Arguments

Value

Examples