Ticket #122 (closed defect: fixed)

Opened 3 years ago

Last modified 10 months ago

UTF-8 support

Reported by: anonymous Owned by: mike
Priority: normal Milestone:
Component: Administration Version:
Severity: normal Keywords: admin utf8 rss
Cc: mike@…

Description

I've realized that the UTF8 character set isn't really support in Plogger. I'm talking about 2.1 version, but seems that version 3 is affected too.

Although Plogger _install.php script creates the MySQL tables with the UTF-8 character set properly, the information is not really stored in UTF-8 as far as I know. I realized that because the RSS feed generated with special characters (ñ,á,é, and so on) didn't work properly, and get complaints about the use of invalid characters (ñ and so on). Looking in the database, I see that the information itself isn't been stored in really UTF-8.

In my Plogger version I fixed this problem doing the next things:

1) The Admin backend XHTML is not in UTF-8. For that I added the line

<meta http-equiv="Content-Type" content="txt/html; charset=utf-8" />

in the XHTML generated, and to make sure:

header("Content-Type: text/html; charset=utf-8");

With this we're sure the XHTML generated is in UTF-8, but we can't assure the information from the forms is been sent to the MySQL server in UTF-8. See step # 2

2) To make sure about the last point mentioned I added the next line:

$rs = run_query("SET NAMES 'utf8'");

(I'm not going to explain here this command, don't worry ;), but you can see more info here: http://www.herongyang.com/php/non_ascii_mysql_2.html)

So, Ok, we're storing properly the utf-8 information in the Database, but... why the hell my XML isn't valid yet? Let's look at step # 3

3) In the plog-rss.php I replaced the line:

$caption = htmlentities($row['caption']);

by the line:

$caption = xmlentities($row['caption']);

That's because the htmlentities() doesn't support UFT-8 (see http://www.php.net/htmlentities) The xmlentities() it's a solution proposed in the comments of the php documentation page of htmlentities(), and is defined as follows:

function xmlentities($string) {
   return str_replace ( array ( '&', '"', "'", '<', '>' ), array ( '&amp;' , '&quot;', '&apos;' , '&lt;' , '&gt;' ), $string );
}

Although it's not strictly needed, I also added to the header() call the charset:

header("Content-Type: application/xml; charset=utf-8");

And finish! real UTF-8 support in Plogger! Notice I didn't include filenames because looks like you've changed something in the version 3 in the Admin backend, so I prefer to ignore the filenames in the ticket. I'm sure you know where to fit each thing I commented ;)

Cheers,

Victor

* http://beer2beer.com

Change History

Changed 3 years ago by mike

  • owner changed from mike to anonymous
  • status changed from new to assigned

Awesome! Thanks Victor, this problem has been plaguing us for awhile. Great work.

Changed 3 years ago by anonymous

  • owner changed from anonymous to mike
  • status changed from assigned to new

Changed 3 years ago by mike

Victor,

Does $rs = mysql_query("SET NAMES 'utf8'"); need to run before every query? Or is it something that is run once, like during an installation?

Changed 3 years ago by anonymous

Sorry Mike, I forgot to mention it ;)

Not exactly before each query, just when the connection with the database is opened. I put the line $rs = run_query("SET NAMES 'utf8'"); in the connect_db() function and works fine for me.

I you need betatesting with spanish characters, you have my email ;) Also may be a good idea validate the RSS through a XML validator.

Changed 10 months ago by kasper

  • status changed from new to closed
  • resolution set to fixed

UTF-8 support is complete in both Plogger web .php files and SQL database. Please report minor UTF-8 issues (strange characters) in the forum for immediate fix.

Note: See TracTickets for help on using tickets.