Skip to main content

Unicode | What and Why?

We will discuss the following concerns in this article

  • What were the needs of character sets? (ASCII, Unicode etc.)
  • How ASCII has emerged?
  • What is  Unicode and Why is it?
  • Why 1 Bit = 8 Byte?
  • Why Java character takes Two Bytes?

In the early age of Computer Technology. The only Binary language was there(1,0). We, the Human, were not comforting with Binary. When they tried to write their name in binary it was time taking and can be a mesh. They needed a human language to use which is most often to use. This technology was emerging in the US. so they need the English language. They found Latin which contains 256 characters including 26 characters of English + German + French + special characters + numbers and others. Language is just a set of characters. So they assign a binary sequence of 8 bit for their every English alphabet's character as:
a => 01100001
=> 01100010
=> 01100011
=> 01000001 etc.

That's why 1 Byte = 8 Bit, as 2^8 = 256 characters.
That's why character data type of C, C++, and many technologies takes One byte.
This is how ASCII(American Standard Code for Information Interchange) created.

And by following this, Every country was making its own character set. As  KOI for Japanese, Big5 for Chinese, ISO-8859 for Europe's seven languages etc.


But with this, a problem arises. let's look at this:
"The character set is the fundamental raw material of any language and they are used to represent information. Like natural languages, computer language will also have a well-defined character set, which is useful to build the programs."
This means that Choosing a programming language means choosing a character set also.
So If we create a software using C language which supports ASCII character set then this software will understand English only, not Chinese or French etc. Softwares were language specific.

We humans have not one language. Every country has their own language. And every country wanted the software to understand their language too. It means that they need a character set which contains all languages of this world.  And Programming languages use that character set to make software.

So they collect the world's all languages and try to make a new character set. They became more than 256 characters so they assign 2 bytes for that character set. It means now this character set can hold 2^16 = 65536 character. But in world's all language's characters are more than 65536. To increase this size, they encode it first in Hexadecimal then convert in binary and generate a Unique Code called Unicode. This is how Unicode arises.

So languages which use Unicode, much have their character data type of 2 bytes.
for ex: Java, Ruby etc
That's why Java reserve 2 bytes for character data type. 

Comments

Popular posts from this blog

Why "F" and "L" suffix | (10.0F, 10L)

Let us take it this way, We will create their needs. So we will get why they are needed. Try to guess, which functions will be executed in the following program: public class MyClass {     public static void main(String args[]) {         MyClass obj = new MyClass();         obj.fun1(10);     }     void fun1(byte val){         System.out.println(val);     }     void fun1(int val){         System.out.println(val);     }     void fun1(float val){         System.out.println(val);     }     void fun1(long val){         System.out.println(val);     }     } It seems like every method is capable to run this program because 10 is still literal because It has no data type. Before Java, In previous technologies, this scenario gave an ambiguity error. But Java solves this problem by removing the concepts of literals. It means Java provide a data type immediately when these born. So here 10 is no more literal. Java provides Integer data type for it. So now it is of Integer t

Promises and Async-await in depth : Asynchronous Programming in Javascript

Promises and Asynchronous Programming One of the most powerful aspects of JavaScript is how easily it handles asynchronous programming. As a language created for the Web, JavaScript needed to be able to respond to asynchronous user interactions such as clicks and key presses from the beginning. Node.js further popularized asynchronous programming in JavaScript by using callbacks as an alternative to events. As more and more programs started using asynchronous programming, events and callbacks were no longer powerful enough to support everything developers wanted to do.  Promises  are the solution to this problem. Promises are another option for asynchronous programming, and they work like futures and deferreds do in other languages. A promise specifies some code to be executed later (as with events and callbacks) and also explicitly indicates whether the code succeeded or failed at its job. You can chain promises together based on success or failure in ways that make your code easier t

Swagger File | Devise Token Auth

openapi : 3.0.1 info : title : API consumes : - application/json produces : - application/json servers : - url : http://localhost:3000 schemes : - "https" - "http" paths : "/auth" : post : summary : User registration requestBody : content : application/json : schema : $ref : "#/definitions/UserRegistrationParameters" responses : "200" : description : "Valid input" content : application/json : example : status : 'success' data : email : "testuser2@yopmail.com" uid : "testuser2@yopmail.com" first_name : "testuser2" last_name : "lname" role : "ABA Admin" "422" : description : "Invalid input"