News Feed
Sections




News Archive
feed this:

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Three Devs & A Maybe Podcast:
Understanding Character Sets and Encodings
May 14, 2014 @ 13:12:06

The Three Devs & A Maybe podcast (with hosts Michael Budd, Fraser Hart, Lewis Cains and Edd Mann) has posted their latest episode (#24) talking about character sets and encodings.

Having only just recently been bit by the character encoding issue again, we thought it would be a good time to bring it up on the podcast. Starting from the beginning with ASCII, we move on to discuss how 8-bit compatible machines brought way to the ISO-8859-* standards. This leads us on to Unicode, with the goal to develop a single character-set encoding standard that could support all of the world's scripts. Finally, we discuss the de-factor character encoding implementation used on the web today 'UTF-8', and reasons why this is the case.

Lots of different topics are mentioned including reversing a Unicode String in PHP using UTF-16BE/LE, portable UTF-8 and a YouTube video covering Pragmatic Unicode. You can listen to this new episode though the in-page player, by downloading the mp3 or subscribing to their feed.

0 comments voice your opinion now!
threedevsandamaybe podcast ep24 unicode character set encoding utf8

Link: http://threedevsandamaybe.com/posts/understanding-character-sets-and-encodings/

PerishablePress.com:
Encoding & Decoding PHP Code
June 08, 2012 @ 10:56:26

On the PerishablePress.com site there's a recent article showing you how to encode your PHP project's code (though some of the methods are more obfuscation than actual encryption).

There are many ways to encode and decode PHP code. From the perspective of site security, there are three PHP functions - str_rot13(), base64_encode(), and gzinflate - that are frequently used to obfuscate malicious strings of PHP code. For those involved in the securing of websites, understanding how these functions are used to encode and decode encrypted chunks of PHP data is critical to accurate monitoring and expedient attack recovery.

They show examples of several methods of encoding/obfuscation of the code including rot13, base64, gzinflate/gzdeflate and links to some other resources.

0 comments voice your opinion now!
encoding source obfuscate tutorial


Reddit.com:
Let's talk Character Encoding
March 15, 2012 @ 11:07:07

On Reddit.com there's a recent post with a growing discussion about character encodings in PHP applications (with some various recommendations).

I would rather not have to convert these weird characters to the HTML character entities, if possible. I'd rather be able to use these characters directly on the web page. If this is for some reason a bad idea, let me know. This might be more of a general web design question (i already posted it there), but I figured it is still appropriate to post here as well since PHP is used to pull an entry from the database, and I figured a lot of you here would know the answer to the question.

The general consensus is to use UTF8 in this case, but there's a few reminders for the poster too:

  • Don't forget to make the database UTF8 too
  • Be sure you're sending the right Content-Type for the UTF8 data
  • an link to an article about what "developers must know about unicode/charactersets"
0 comments voice your opinion now!
character encoding advice reddit utf8 contenttype unicode


James Cohen's Blog:
How to Avoid Character Encoding Problems in PHP
April 25, 2011 @ 14:13:14

James Cohen has a recent post to his blog looking at a way you can avoid some of the character encoding problems in PHP that can come with working with multiple character sets.

Character sets can be confusing at the best of times. This post aims to explain the potential problems and suggest solutions. Although this is applied to PHP and a typical LAMP stack you can apply the same principles to any multi-tier stack.

He includes a "boring history" session (and recommends skipping if you just want the good stuff) that talks a bit about character sets and their history in computer system handling. All that said, he recommends using UTF-8 to ease your character encoding woes. He talks about configuring your editor to support it, making sure your browsers understand it and setting up your MySQL database connection to use it.

0 comments voice your opinion now!
character encoding issue mysql browser editor ide


Brian Swan's Blog:
SQL Server Driver for PHP Connection Options CharacterSet
February 28, 2011 @ 12:15:33

Brian Swan has posted another in his series looking at connection options for the SQL Server driver for PHP. In his latest he looks at the "CharacterSet" setting, an easy way to define which encoding the remote database is using.

One thing that helped me understand the CharacterSet option was to realize that its name is a bit misleading (although it seems to be inline with other uses of CharacterSet or charset). It is used to specify the encoding of data that is being sent to the server, not the character set. With that in mind, the possible values for the option begin to make sense: SQLSRV_ENC_CHAR, SQLSRV_ENC_BINARY, and UTF-8.

He looks at each of these three options in more detail - SQLSRV_ENC_CHAR being the default, SQLSRV_ENC_BINARY when binary data is needed and UTF-8 when, obviously, you need UTF-8 data transfer between the client and server.

0 comments voice your opinion now!
sqlserver connection option characterset encoding


WebReference.com:
Create a Localized Web Page with PHP
October 21, 2010 @ 13:21:23

On WebReference.com there's a new tutorial posted about localizing your website by defining a character set to use for your content.

The process of making your applications/websites usable in many different locales is called internationalization, While customizing your code for different locales is called localization. Localization is the process of making your applications or websites local to where it is being viewed. For example, you can make a website more local to a particular place by converting its text to the predominate language of that location and by displaying the local time (e.g. German for people living in Germany or French for people living in France).

They show how to define constants that can be used in your application for the character set and language encoding. They use two major encodings - UTF-8 and ISO-8859-1 - in their examples of showing a sample "welcome" message in different languages. There's also a simple page to show you how to switch between languages if you'd like to give your visitors the option.

0 comments voice your opinion now!
localize tutorial language encoding character


Kevin Schroeder's Blog:
You want to do WHAT with PHP? Chapter 3
August 31, 2010 @ 13:44:32

Kevin Schroeder has posted another excerpt from his "You Want to Do WHAT with PHP?" book to his blog today. This time it's from the third chapter that looks at character encodings like UTF-8 or ISO-8859-1.

I realized that while this 3.5-year PHP consultant knew Unicode, UTF-8, character encodings such as ISO-8859-1 or ISO-8859-7, I didn't understand them as well as I thought I had. With that I threw this chapter in the book. Knowing about character encoding is what many developers have. Not as many truly understand it. In this chapter I try to de-mystify character encoding as a whole.

The excerpt introduces character encoding and what it really is - a translation for the computer to be able to handle the human language. The problem comes in when multiple tools try to define the same sort of letters/chatacters in different ways. He gives an example of a "hello world" string in a normal ASCII format versus one from the EBCDIC format and how it would be rendered by an ASCII-understanding browser.

0 comments voice your opinion now!
character encoding book excerpt ascii example


Evert Pot's Blog:
Filesystem encoding and PHP
April 21, 2010 @ 12:19:39

Evert Pot has a new post to his blog about working with files in your applications, more specifically in dealing with filesystem encodings other than some of the defaults.

Many PHP applications save files to a local filesystem. Most of the times for the bulk of readers here you'll likely only ever store files using US-ASCII encoding, either because your filenames are simply based on database fields (as you should try in most cases), or simply because most of your users never have a need for non-english characters. When you do though, it's important to know how operating systems cope with these characters. Unsurprising, all of them do this differently.

He talks about encoding issues in three major operating system types - Windows, OS X and Linux - with some code snippets included to illustrate how each handles the different encodings.

4 comments voice your opinion now!
filesystem encoding osx linux windows


Zend Developer Zone:
PHP DOM XML extension encoding processing
September 02, 2009 @ 09:48:18

On the Zend Developer Zone today Alexander Veremyev shares some helpful hints he discovered about the DOM XML extension for PHP that could come in handy when working with different character encodings.

I recently worked with PHP's DOM XML extension while working on Zend Framework's Zend_Search_Lucene HTML highlighting capabilities, and uncovered some undocumented features and issues with the extension in regards to character encoding. The information contained in this article should also apply to other libxml-based DOM implementations, as PHP's DOM extension simply wraps that library.

There's five different tips he shares:

  • Internal document encoding is always UTF-8
  • Input data is always treated as UTF-8
  • Text nodes and CDATA are stored as UTF-8 without transformations
  • Document encoding does not affect loading behavior
  • Save/dumping operations and encoding

He describes each of the points and includes some sample code and XML to parse to help illustrate each.

0 comments voice your opinion now!
tutorial dom extension character encoding


Sean Coates' Blog:
UTF WTF?
November 24, 2008 @ 09:31:04

Sean Coates has reposted an article that was originally published in php|architect magazine covering UTF-8 and proper Unicode encoding.

If I had to guess, I would estimate that I've spent somewhere in the range of 40 hours wrangling UTF-8 in the past 3 months, which is not only expensive for my employer, but also disheartening as a developer who's got real work to do. Admittedly, this number is inflated, due to the heavy development cycle we completed with the launch of our new site.

Sean goes on to talk about Unicode issues in general (partially supported in some places, too many points of failure) and some of his other experiences with "the UTF-8 monster" that have given him trouble over time.

0 comments voice your opinion now!
utf8 utf character encoding unicode



Community Events





Don't see your event here?
Let us know!


unittest framework list code wordpress community testing introduction series language podcast install refactor configure developer opinion interview laravel threedevsandamaybe release

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework