Skip to content

Commit

Permalink
Version 4.0 (#37)
Browse files Browse the repository at this point in the history
* Support for PHP 7 is dropped
* Minimum supported version is PHP 8.1 (currently still actively supported by the PHP team)
* The lib now behaves closer to the RFCs, fixing some 20-year-old issues
* A new option is available to enforce a strict subset of characters as per RFCs (useStd3AsciiRules)
* Some newer PHP language features are used, replacing older constructs
* Much refactoring has happened to have cleaner code overall
* Many more test cases were added to the integration tests, guarding the changes made
* Please consult UPGRADING.md for details about backwards incompatible changes

Co-authored-by: Elan Ruusamäe <[email protected]>
  • Loading branch information
algo26-matthias and glensc authored Feb 26, 2023
1 parent edd6148 commit a739f3f
Show file tree
Hide file tree
Showing 32 changed files with 389 additions and 359 deletions.
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
language: php
php:
- 7.2
- 7.3
- 7.4
- 8.0
- 8.1
- 8.2
- nightly

matrix:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ composer require algo26-matthias/idna-convert

### Official ZIP Package

The official ZIP packages are discontinued. Stick to Composer or Github to acquire your copy, please.
The official ZIP packages are discontinued. Stick to Composer or GitHub to acquire your copy, please.

## Upgrading from a previous version

Expand Down
57 changes: 44 additions & 13 deletions UPGRADING.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,40 @@
# Upgrading from previous versions

## 3.0
## 4.0.0

**The minimum PHP version is now 8.1.**

**BC break:**
We changed the behaviour of the encoding step a bit to be more in line with the actual RFCs. This means that the NAMEPREP step is now performed on ANY label
being passed to the `ToPunycode::convert()` method. This means, that case mangling (transforming UPPERCASE to lowercase) will always happen, even for ASCII only labels.

**BC break:**
Also, we added the flag `useStd3AsciiRules` (default: false) to `ToPunycode::__construct()` (and to `ToIdn::__construct()` as well) in order to allow control over this specific behaviour.
Enabling this flag will lead to a stricter rule set of characters being allowed (only the range [-a-zA-Z0-9]) and enforcing the absence of leading and trailing
hyphens in labels. In case of violating these rules a new `Std3AsciiRulesViolationException` will be thrown.

**BC break:**
When stating the IDNA version (2003 or 2008) one must always use an integer. From now on strict type checking is in place.

**BC break:**
In older version labels containing characters prohibited according to NAMEPREP were silently ignored. Now we are throwing an `InvalidCharacterException`.


## 3.1.0

We changed the behaviour of the Punycode algorithm to now include all basic ASCII characters in the output when using `ToPunycode->convert()`.
This change is expected to have no negative effect, but work more closely to the respective RFCs. The old behaviour even led to some endless loops for a few Unicode characters.


## 3.0.0

The library has been broken down into various specific classes, thus more closely following SOLID principles.

As such the single class `IdnaConvert` has been broken down into `ToIdn` and `ToUnicode` respectively. Their naming reflects
the format of the outcome, so it's more clear to distinguish, what you need. This should be easier to grasp then the old method names `encode()` and `decode()`.
the format of the outcome, so it's easier picking the right conversion direction.
Usually you will only need one conversion direction per script run, so why bother loading and parsing all the other unused code, then?

Also the handling of host names (simple labels like `my-hostname` or FQHNs like `some-host.my-domain.example`) is now separated from
Also, the handling of host names (simple labels like `my-hostname` or FQHNs like `some-host.my-domain.example`) is now separated from
that of email addresses and URLs.
Both classes offer the same set of public methods:

Expand All @@ -19,41 +45,46 @@ Both classes offer the same set of public methods:
| `convertUrl()` | To convert the host name of an URL |

There's no "strict mode" anymore, this is achieved by the separate methods above. The IDN version is selected when instantiating the class, no more setting during runtime.
Also the encoding (for the Unicode side of things) is now **always UTF-8**. Use `TransCodeUnicode` or `EncodingHelper` for converting to and from various encodings to UTF-8.
Also, the encoding (for the Unicode side of things) is now **always UTF-8**. Use `TransCodeUnicode` or `EncodingHelper` for converting to and from various encodings to UTF-8.

All actual sub classes like that for NamePrep and the actual Punycode transformation are put in their own namespaces under `Algo26\IdnaConvert`, e.g. `Algo26\IdnaConvert\NamePrep`.
Interfaces and Exceptions also have their own namespace to declutter the class structure even more.
All actual subclasses like that for NamePrep and the actual Punycode transformation are put in their own namespaces under `Algo26\IdnaConvert`, e.g. `Algo26\IdnaConvert\NamePrep`.
Interfaces and Exceptions also have their own namespace to declutter the class structure even more.

The class `EncodingHelper` is now called separated into the two classes `ToUtf8` and `FromUtf8` respectively and lies under the namespace `Algo26\idnaConvert\EncodingHelper`.
The class `UnicodeTranscoder` is now called `TransCodeUnicode` under the namespace `Algo26\idnaConvert\TransCodeUnicode`.

All examples are updated to reflect the new usage. See the ReadMe for more details.

Also the **minimum PHP version is now 7.2**.
Finally, the **minimum PHP version is now 7.2**.


## 2.0
The library has been handed over to actively maintained GitHub and Packagist accounts. This lead to a change in the namespace.
## 2.0.0
The library has been handed over to actively maintained GitHub and Packagist accounts. This led to a change in the namespace.
Replace all occurrences of
`Mso\IdnaConvert` or `PhlyLabs\IdnaConvert` to `Algo26\IdnaConvert`.
There's no further changes to the class signatures.

## 1.0

## 1.0.0
**BC break:**
As of version 1.0.0 the class closely follows the PSRs PSR-1, PSR-2 and PSR-4 of the PHP-FIG.
As such the classes' naming has been changed, a namespace has been introduced and the default IDN version has changed from 2003 to 2008 and minimum PHP engine version raised to 5.6.0.


## 0.8.0
As of version 0.8.0 the class fully supports IDNA 2008.
Thus the aforementioned parameter is deprecated and replaced by a parameter to switch between the standards. See the updated example 5 in the ReadMe.
Thus, the aforementioned parameter is deprecated and replaced by a parameter to switch between the standards. See the updated example 5 in the ReadMe.


## 0.6.4
**BC break:**
As of version 0.6.4 the class per default allows the German ligature ß to be encoded as the DeNIC, the registry for .DE allows domains containing ß.


## 0.6.0
**ATTENTION:** As of version 0.6.0 this class is written in the OOP style of PHP 5.
Since PHP 4 is no longer actively maintained, you should switch to PHP 5 as fast as possible.
We expect to see no compatibility issues with the upcoming PHP 6, too.
Since PHP 4 is no longer actively maintained, you should switch to PHP 5 as quickly as possible.
We expect no compatibility issues with the upcoming ~~PHP 6~~ PHP 7 as well.



Expand Down
4 changes: 2 additions & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@
],
"require": {
"ext-pcre": "*",
"php": ">=7.2.0",
"php": ">=8.1",
"jakeasmith/http_build_url": "^1"
},
"require-dev": {
"phpunit/phpunit": "8.0"
"phpunit/phpunit": "^9 || ^10"
},
"suggest": {
"ext-mbstring": "Install ext/mbstring for using input / output other than UTF-8 or ISO-8859-1",
Expand Down
7 changes: 3 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
version: '3.1'
services:
idna-convert-test:
image: epcallan/php7-testing-phpunit:7.2-phpunit7
image: jitesoft/phpunit:8.2
container_name: idna-convert-test
working_dir: /project
command: bash -c "composer install && phpunit -vvv"
command: ash -c "composer install && phpunit -vvv"
volumes:
- ./:/project
- ./:/app
14 changes: 2 additions & 12 deletions src/AbstractIdnaConvert.php
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?php
<?php declare(strict_types=1);

namespace Algo26\IdnaConvert;

Expand All @@ -8,14 +8,9 @@ abstract class AbstractIdnaConvert
{
abstract public function convert(string $host): string;

/**
* @param string $emailAddress
*
* @return string
*/
public function convertEmailAddress(string $emailAddress): string
{
if (strpos($emailAddress, '@') === false) {
if ( ! str_contains($emailAddress, '@')) {
throw new InvalidArgumentException('The given string does not look like an email address', 206);
}

Expand All @@ -28,11 +23,6 @@ public function convertEmailAddress(string $emailAddress): string
);
}

/**
* @param string $url
*
* @return string
*/
public function convertUrl(string $url): string
{
$parsed = parse_url($url);
Expand Down
2 changes: 1 addition & 1 deletion src/EncodingHelper/EncodingHelperInterface.php
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?php
<?php declare(strict_types=1);

namespace Algo26\IdnaConvert\EncodingHelper;

Expand Down
57 changes: 18 additions & 39 deletions src/EncodingHelper/FromUtf8.php
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
<?php
<?php declare(strict_types=1);

namespace Algo26\IdnaConvert\EncodingHelper;

class FromUtf8 implements EncodingHelperInterface
{
private const DEFAULT_ENCODING = 'ISO-8859-1';

private $encoding = self::DEFAULT_ENCODING;
private string $encoding = self::DEFAULT_ENCODING;

public function convert(
string $sourceString,
?string $encoding = self::DEFAULT_ENCODING,
?bool $safeMode = false
) {
): string {
$safe = ($safeMode) ? $sourceString : false;

$this->encoding = 'ISO-8859-1';
if ($encoding !== null) {
$this->encoding = strtoupper($encoding);
} else {
$this->encoding = 'ISO-8859-1';
}

if ($this->encoding === 'UTF-8' || $this->encoding === 'UTF8') {
Expand All @@ -38,7 +37,7 @@ public function convert(
}

$converted = $this->convertWithLibraries($sourceString);
if (false !== $converted) {
if (null !== $converted) {
return $converted;
}

Expand All @@ -47,43 +46,23 @@ public function convert(

/**
* Special treatment for our guys in Redmond
* Windows-1252 is basically ISO-8859-1 -- with some exceptions, which get dealt with here
*
* @param string $string Your input in ISO-8859-1
*
* @return string The resulting Win1252 string
* @since 0.0.1
* Windows-1252 is basically ISO-8859-1 -- with some exceptions
*/
private function mapIso8859_1ToWindows1252($string = '')
private function mapIso8859_1ToWindows1252(string $string = ''): string
{
$return = '';
for ($i = 0; $i < strlen($string); ++$i) {
$codePoint = ord($string[$i]);
switch ($codePoint) {
case 196:
$return .= chr(142);
break;
case 214:
$return .= chr(153);
break;
case 220:
$return .= chr(154);
break;
case 223:
$return .= chr(225);
break;
case 228:
$return .= chr(132);
break;
case 246:
$return .= chr(148);
break;
case 252:
$return .= chr(129);
break;
default:
$return .= chr($codePoint);
}
$return .= match ($codePoint) {
196 => chr(142),
214 => chr(153),
220 => chr(154),
223 => chr(225),
228 => chr(132),
246 => chr(148),
252 => chr(129),
default => chr($codePoint),
};
}

return $return;
Expand Down Expand Up @@ -112,6 +91,6 @@ private function convertWithLibraries(string $string): ?string
}
}

return false;
return null;
}
}
55 changes: 17 additions & 38 deletions src/EncodingHelper/ToUtf8.php
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<?php
<?php declare(strict_types=1);

namespace Algo26\IdnaConvert\EncodingHelper;

class ToUtf8 implements EncodingHelperInterface
{
private const DEFAULT_ENCODING = 'ISO-8859-1';

private $encoding = self::DEFAULT_ENCODING;
private string $encoding = self::DEFAULT_ENCODING;

public function convert(
string $sourceString,
Expand All @@ -15,10 +15,9 @@ public function convert(
) {
$safe = ($safeMode) ? $sourceString : false;

$this->encoding = 'ISO-8859-1';
if ($encoding !== null) {
$this->encoding = strtoupper($encoding);
} else {
$this->encoding = 'ISO-8859-1';
}

if ($this->encoding === 'UTF-8' || $this->encoding === 'UTF8') {
Expand All @@ -38,7 +37,7 @@ public function convert(
}

$converted = $this->convertWithLibraries($sourceString);
if (false !== $converted) {
if (null !== $converted) {
return $converted;
}

Expand All @@ -47,43 +46,23 @@ public function convert(

/**
* Special treatment for our guys in Redmond
* Windows-1252 is basically ISO-8859-1 -- with some exceptions, which get dealt with here
*
* @param string $string Your input in Win1252
*
* @return string The resulting ISO-8859-1 string
* @since 0.0.1
* Windows-1252 is basically ISO-8859-1 -- with some exceptions
*/
private function mapWindows1252ToIso8859_1($string = '')
private function mapWindows1252ToIso8859_1(string $string = ''): string
{
$return = '';
for ($i = 0; $i < strlen($string); ++$i) {
$codePoint = ord($string[$i]);
switch ($codePoint) {
case 129:
$return .= chr(252);
break;
case 132:
$return .= chr(228);
break;
case 142:
$return .= chr(196);
break;
case 148:
$return .= chr(246);
break;
case 153:
$return .= chr(214);
break;
case 154:
$return .= chr(220);
break;
case 225:
$return .= chr(223);
break;
default:
$return .= chr($codePoint);
}
$return .= match ($codePoint) {
129 => chr(252),
132 => chr(228),
142 => chr(196),
148 => chr(246),
153 => chr(214),
154 => chr(220),
225 => chr(223),
default => chr($codePoint),
};
}

return $return;
Expand Down Expand Up @@ -112,6 +91,6 @@ private function convertWithLibraries(string $string): ?string
}
}

return false;
return null;
}
}
2 changes: 1 addition & 1 deletion src/Exception/AlreadyPunycodeException.php
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?php
<?php declare(strict_types=1);

namespace Algo26\IdnaConvert\Exception;

Expand Down
Loading

0 comments on commit a739f3f

Please sign in to comment.