Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(String): RegExp S.Replace, S.Match and S.MatchAll #94

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

didavid61202
Copy link
Contributor

@didavid61202 didavid61202 commented Apr 20, 2023

Updates

  1. Add type-level-regexp as dev-dependency, and import helper types from a subpath export type-level-regexp/regexp to prevent global declaration side effects from the package.
  2. Add Strings.RegExp to create a RegExp object with pattern, flags and parsed matches to use with other RegExp related hotscript fn, and returns RegExpSyntaxError error type with detailed error message if the provided RegExp pattern contains some syntax error (currently only show few types of error, WIP)
  3. Update Strings.Replace to also accept RegExp pattern (/<pattern>/) as from arg to replace substring matched by the given pattern. Replace value also support special replacemnet patterns.
  4. Add Strings.Match to match a string against a RegExp (support i and g flags), returns a matched object with match array and index and groups properties.
  5. Add Strings.MatchAll to match a string against a RegExp, return an array of match objects, each with a match array and index and groups properties.

Note

I have kept the implementation of RegExp matching and replacing generic types in a separate package called type-level-regexp. This allows for faster iteration as it is still in the early stages, and more features and performance improvements are coming along the way.

Usage

// Strings.Match exmaple
type MatchResult = Call<
        Strings.Match<Strings.RegExp<"a(?<g1>[b-e$]{1,4})\\W\\s\\b\\k<g1>(\\d+)">>,
        "12ab$c- b$c56#"
      >;
// type of `MatchResult` is:
// ["ab$c- b$c56", "b$c", "56"] & {
//   index: 2;
//   groups: {
//     g1: "b$c";
//   };
// }

// Strings.Replace exmaple
type ReplaceResult = Call<
//     ^? type ReplaceResult = "The release day is 2.13, 2023"
        Strings.Replace<
          Strings.RegExp<"((?:\\w|\\s)+):\\s(?<year>\\d{4})/(?<month>\\d{1,2})/(?<day>\\d{1,2})">,
          "The $1 is $<month>.$<day>, $2"
        >,
        "release day: 2023/2/13"
      >;

// RegExp syntax error exmaple
type SyntaxErrTest = Strings.RegExp<"foo(?g1>bar)baz">
// type of `SyntaxErrTest` is:
// {
//    type: "RegExpSyntaxError";
//    message: "Invalid regular expression, invalid capture group name for capturing `bar`, possibly due to a missing opening
//    '<' and group name";
// } & SyntaxError

Related issues

Resolve #33

Tasks

  • add/update tests
  • update JSDocs

Copy link
Owner

@gvergnaud gvergnaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this amazing PR @didavid61202!! 🎉

Here are a few thought:

It seems like regex parsing and evaluation is a little bit slow and it would be great if people didn’t have to pay this performance cost if they do not use the feature. I was thinking we could make regex syntax opt-in by introducing a Strings.RegEx (maybe RegExp, or Regex?) constructor that you could pass to string functions:

Call<
  Strings.Replace<
    Strings.Regex<"/([A-Z]|!)/g">,
    "_$1_",
    "OaaObbOcc!"
  >
>;

What do you think?

I’d love it if Regex was also supported on Strings.Split!

Bugs

I discovered a small bug when using 2 subsequent [a-z] ranges in a | "or" expression.

// ✅ works
type res1 = Call<Strings.Replace<"/([A-Z]|!)/g", "_$1_", "OaaObbOcc!">>;
type test1 = Expect<Equal<res1, "_O_aa_O_bb_O_cc_!_">>;

// ❌ fails
type res2 = Call<Strings.Replace<"/([a-z][A-Z]|!)/g", "_$1_", "OaaObbOcc!">>;
type test2 = Expect<Equal<res2, "Oa_aO_b_bO_cc_!_">>;

// ✅ works
type res3 = Call<Strings.Replace<"/([a-z][A-Z])/g", "_$1_", "OaaObbOcc!">>;
type test3 = Expect<Equal<res3, "Oa_aO_b_bO_cc!">>;

@didavid61202
Copy link
Contributor Author

Hi @gvergnaud , thanks for the suggestion and reporting the bug!
I've fix the bug you mention in 0.1.16

// ✅ working now
type res2 = Call<Strings.Replace<"/([a-z][A-Z]|!)/g", "_$1_", "OaaObbOcc!">>;
type test2 = Expect<Equal<res2, "Oa_aO_b_bO_cc_!_">>;

for making the regex syntax opt-in, with

Call<
  Strings.Replace<
    Strings.Regex<"/([A-Z]|!)/g">,
    "_$1_",
    "OaaObbOcc!"
  >
>;

I think this might only separate and make the parsing part opt-in but not the evaluation/matching part?
as currently we still check if the user pass in just string or a RegExp pattern in ReplaceReducer interface:

export interface ReplaceReducer<To extends string> extends Fn {
  return: this["args"] extends [
    infer Str extends string,
    infer From extends string,
    ...any
  ]
    ? Str extends Str
      ? From extends `/${infer RegExp}/`
        ? ResovleRegExpReplaceOrError<Str, RegExp, To, never>
        : From extends `/${infer RegExp}/${SupportedRegExpReplaceFlags}`
        ? ResovleRegExpReplaceOrError<
            Str,
            RegExp,
            To,
            Split<
              From extends `/${RegExp}/${infer Flags extends SupportedRegExpReplaceFlags}`
                ? Flags
                : never,
              ""
            >[number]
          >
        : Replace<Str, From, To>
      : never
    : never;
}

but I'm curious if this way is already making RegExp parsing and evaluation opt-in as the type-level RegExp will not infer type if user only pass in pure string?

Or if we want to fully separate the types, maybe we can achieve by introducing Strings.RegExpReplace, Strings.RegExpMatch, Strings.RegExpMatchAll and Strings.RegExpSplit?

What do you think? which way do you prefer?

I'll be working on adding the support for Strings.Split 👍

@ecyrbe
Copy link
Collaborator

ecyrbe commented Apr 26, 2023

I think this might only separate and make the parsing part opt-in but not the evaluation/matching part? as currently we still check if the user pass in just string or a RegExp pattern in ReplaceReducer interface:

From my point of view the /${infer RegExp/ should be enough to not trigger any big perf issue. TS is really performant for inferring and parsing strings.
The issue i see is more about: what if i want to replace strings starting and ending with / without them being interpreted as regexp ?
The solution i see is the @gvergnaud one :

type Regex<Regexpr extends string, Modifiers extends SupportedRegExpReplaceFlags = ""> = { 
  type: RegexpSymbol; 
  regex: Regexpr;
  modifiers: Modifiers
>

export interface ReplaceReducer<To extends string> extends Fn {
  return: this["args"] extends [
    infer Str extends string,
    infer From extends string | Regex<string>,
    ...any
  ]
      ? From extends Regex<infer RegExp, infer Flags>
        ? ResovleRegExpReplaceOrError<Str, RegExp, To, Flags>
        : Replace<Str, From, To>
    : never;
}

@didavid61202
Copy link
Contributor Author

@ecyrbe good point!
I think this is the best way to go 😃

@didavid61202 didavid61202 marked this pull request as draft April 29, 2023 14:09
@didavid61202 didavid61202 marked this pull request as ready for review May 8, 2023 12:18
@didavid61202 didavid61202 requested a review from gvergnaud May 8, 2023 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support regex patterns in Strings.Replace
3 participants